Re: Non-text database theory

From: Tim X <timx_at_nospam.dev.null>
Date: Sat, 06 Sep 2008 11:53:25 +1000
Message-ID: <87skseavl6.fsf_at_lion.rapttech.com.au>


Rune Allnor <allnor_at_tele.ntnu.no> writes:

> Hi all.
>
> This might be off topic for this group; if so please direct me to a
> more
> appropriate group.
>
> I have 20 years of programming experience (hobby / personal scale)
> and
> am getting my feet wet with databases for the first time. The project
> at
> hand needs a database to handle large amounts of data. The data are
> measured by sonar and amounts to the hundreds of GB, so one would
> prefer to save the data on some binary format to save time on the
> text <-> binary conversions.
>
> The textbooks I have found on database theory solely deal with text
> data, i.e. data that are stored as tables in text files, which I
> suppose
> is OK for educational purposes.
>
> 1) Where can I find material on 'real-life' databases which deal with
> the
> storage and handling of binary data?
> 2) Are there database implementations which are better suited for my
> application than others? I would like to keep the application
> platform
> independent, and use C++ as my programming language.
>

Many databases have the concept of a 'blob', (binary large object), which you could use. However, in most cases it isn't going to gain you much.

The data storage and retrieval aspects of a database are only part of the benefits of a DBMS. The real power comes from the ability to retrieve sets of data based on various criteria or attributes. However, with binary data, there is often little in the way of attributes that can be easily identified in the data itself - after all, its just sequences of 1s and 0s. In fact, with binary data, storing it in the database can actually complicate things because more often than not, you will use other stand-alone applications to process the data. If its in the database, you will now need to create some interface between the database and the applicaiton that processes the data. This could be as easy as having the database dump the data into a disk file that the application can then read, but then what has the database actually given you?

In most cases however, you do have meta information about the data. This could be things like the date and time the data was obtained, the location, interesting characteristics, data size etc. This is the data I would store in the database together with information about where the file is stored in the filesystem. The database could be responsible for generating unique filenames, which is very useful if you have lots of them as you don't have to think about it and you can use names that are less user friendly, such as just sequencial numbers etc. The DB might even manage a special filesystem hierarchy, grouping files into directories based on certain meta data attributes.

This would give you the best of both worlds in that you can obtain lists of data files from the database that represent data that meet certain characteristics e.g. all data from a particular location, date, time etc and at the same time, allow you to use other data processing applications on the data directly at the filesystem level and whthout the additional DBMS layer (assuming the processing doesn't change meta information stored in the database).

The other advantage of this approach is that you won't need one of the larger commercial databases, such as Oracle or DB2. In fact, you could probably use things like sql lite, mysql or even Berkley DB hashes.

HTH Tim

-- 
tcross (at) rapttech dot com dot au
Received on Sat Sep 06 2008 - 03:53:25 CEST

Original text of this message