Re: Non-text database theory

From: Volker Hetzer <>
Date: Thu, 11 Sep 2008 21:01:20 +0200
Message-ID: <gabpua$n2o$>

Rune Allnor schrieb:
> Hi all.
> This might be off topic for this group; if so please direct me to a
> more
> appropriate group.
> I have 20 years of programming experience (hobby / personal scale)
> and
> am getting my feet wet with databases for the first time. The project
> at
> hand needs a database to handle large amounts of data. The data are
> measured by sonar and amounts to the hundreds of GB, so one would
> prefer to save the data on some binary format to save time on the
> text <-> binary conversions.
Sounds like modeling isn't the big thing in your application. (Might play a role though.)
Offhand I can think of three standard applications that store massive amounts of binary data, with a bit of meta stuff around it: - pornographic sites have to serve huge amounts of imagery and videos. You might look down your nose at it but in terms of design and technology they are

   state of the art in private enterprises. -radio telescopes process and filter even greater amounts of data, much like your sonar data but orders of magnitude more. - the storage and processing facilities of film studios are geared to storage, retrieval and shifting of data around to various processing facilities.

> The textbooks I have found on database theory solely deal with text
> data, i.e. data that are stored as tables in text files, which I
> suppose
> is OK for educational purposes.
> 1) Where can I find material on 'real-life' databases which deal with
> the
> storage and handling of binary data?
As others here have already told, most databases can store blobs either internally or (transparently) in a file system. You still access them through the database but it allows for instance to store the meta data locally and all the binary stuff on a network drive on a file server or large storage area network.
It gets more "real life" if you read the database specific documentation. This here, for instance is for oracle 11g:

> 2) Are there database implementations which are better suited for my
> application than others? I would like to keep the application
> platform
> independent, and use C++ as my programming language.
I'm not sure about database independence, I don't think BLOB access has been standardized. But this would be just a couple of classes with some database dependent innards and a generic interface. Normally BLOB access means that you have to read out either a stream or fetch the data in packets and all BLOBable databases pack this functionality in one shape or other which you can easily repackage for a generic access.
As for C++ and platform independence, not sure about that. Most databases offer
- the generic interface (ODBC, OLEDB. ADO.NET) for the platform, - the database specific (OC(C)I, libmysqlclient) but platform independent

   interface or
- Java connectivity as platform and database independent but language

   specific interface.
You decide.

As for storing the meta data too in a hierarchical structure, I think it's worth investigating the meshed approach of the entity relationship model. The tree is a special case of it but you'll find soon that an ERM allows you to model your data more precisely and gives you more powerful retrieval possibilities.

So, without knowing the slightest thing about the technical environment your solution has to operate in, nothing about the kind of queries that are run, nothing about required performance or reliability and and not much about security I'd recommend some kind of database that allows you to separate blob storage and meta data storage.

For tamper-proofing the whole thing, securing the database and file server would be a start.
Everything else, database accounts, roles, grants, audit trails, encryption and so on are greatly dependent on your application and its users. (How many? How often do they change? Etc.)

Lots of Greetings!

For email replies, please substitute the obvious.
Received on Thu Sep 11 2008 - 21:01:20 CEST

Original text of this message