Re: Non-text database theory
Date: Sun, 07 Sep 2008 13:12:57 +1000
Message-ID: <87myikbqdi.fsf_at_lion.rapttech.com.au>
Rune Allnor <allnor_at_tele.ntnu.no> writes:
> On 6 Sep, 03:53, Tim X <t..._at_nospam.dev.null> wrote:
> ...
>> In most cases however, you do have meta information about the data. This
>> could be things like the date and time the data was obtained, the
>> location, interesting characteristics, data size etc. This is the data I would
>> store in the database together with information about where the file is
>> stored in the filesystem. The database could be responsible for
>> generating unique filenames, which is very useful if you have lots of
>> them as you don't have to think about it and you can use names that are
>> less user friendly, such as just sequencial numbers etc. The DB might
>> even manage a special filesystem hierarchy, grouping files into
>> directories based on certain meta data attributes.
>
> Your description matches what I want, I am not sufficiently familiar
> with the terminology to realize that what I was asking for was not
> a database as such.
>
> This must have been done thousands of times already. I don't want to
> invent wheels, so is there a description around on how to do these
> things? One question which immediately comes to mind is how to
> protect
> the logged files from being tampered with.
>
The answer depends on the OS your on. For example, if we are talking about Linux or one of the other members of the *nix family, I would probably handle this by creating a specific user and group for the application. You can then control access via normal OS access controls, such as adding users to the group, using umask to ensure file/directory permissions are set appropriately etc. . Under windows and other platforms you have similar functionality, but I'm not familiar enough with windows to give a detailed description.
An important consideration when working out how to lay everything out is backup and rstore. If you have lots of data in lots of files, you will want to make sure they are set out in a way that makes adding new data straight forward and that also makes it easy to do backups. How you approach this depends on how much the data changes and the total amount of data and what backup facilities you ahve available.
The design of the database to manage the meta data will depend on what meta data you have. However, this is really just the application of good database design principles.
If your database supports database constraints, such as foreign key constraints, check constraints, not null constraints etc, then use them. Some argue these are bad because they restrict your ability to make changes in the future. I think this is rubbish and a sign of a lack of real analysis and design. Use the datatypes that best match your 'natural' data and how it is to be used. Be wary of data that uses the word number in it, it may not be a number. for example, I can't count the number of systems I've worked on when the original design used a number field for something like a staff number or reference number. These sort of numbers are often best represented by character types because its not unusual for them to have leading 0s, which are significant in the sense they are part of the data. However, if you define the data as a number type, you generally can't have leading zeros. Number types should only be required when you plan to use them in numeric/math type operations. Try to use the data size that best fits your data model. I often see poor database design where a column has been defined to be as large as possible. Again, this is often done in the misguided belief that it adds flexibility. However, doing so also means that bad data can get into the system. For example, if you know all values in column A should never be larger than 10 characters, then define it to be that large. Then when something tries to insert a larger value, you know that either the value is bogus or there has been some change in your domain and you now need to increase the size of that field - the point is, you are alerted to either the fact something is doing something it shouldn't or there is an ierror in your underlying data model. An important point with databases is that the old maxim of GIGO is fundamental (Garbage In Garbage Out). Any database related application is only as good as the quality of the data it manages. No matter how flash, useful or sophisticated your application, if the data is unreliable, the application is unreliable.
While analysis and modeling are important, its also important to actually get something up and running. I'm a big believer in doing prototypes. No matter how much analysis, planning and design you do, there will always be things you discover or realise during the implementation that just were not obvious in the planning/design stage. Just trying to do the implementation teaches you a lot about your problem domain that won't be obvious from reading or thinking alone. Identify the core functionality you want to address. Keep it simple and avoid the temptation to add additional functionality (note it down for later, but move on). Keep it really simple and try to solve your key problem first. Add bells and whistles later when your more comfortable with the problem domain and have a better understanding of it. Try to get something out as quickly as possible and if others are going to use it, get them to start playing with it and get feedback.
HTH Tim
-- tcross (at) rapttech dot com dot auReceived on Sun Sep 07 2008 - 05:12:57 CEST