Re: Flat Query

From: Anne & Lynn Wheeler <lynn_at_garlic.com>
Date: Fri, 14 Oct 2005 10:27:56 -0600
Message-ID: <m3d5m8oygj.fsf_at_lhwlinux.garlic.com>


"Mark D Powell" <Mark.Powell_at_eds.com> writes:
> The term 'flat file' has been in use for more than 20 years. It was
> generally applied to files accessed in a sequential manner as opposed
> to being accessed via direct access or ISAM, Indexed Sequential Acess
> Method, files. Over time, especially, in the non-mainframe world the
> term was usually used a synonym for a text file.

or no-index and/or (physical) no-structure ... other then sequential set of bits (modulo sequential record boundaries).

however, there were some work on things like flat files having an implied structure ... like sorted records ... and entries were found by doing binary searches (taking advantage of the implied sorted record structure and file system that structure that would support reading random records from file ... in much the same way that disks allow reading random records ... as opposed to tape which has been strictly sequential).

there were some resources side-tracked from the system/r activity (original relational/sql implementation) http://www.garlic.com/~lynn/subtopic.html#systemr

for an internal, online phone book. this was done as a large (sorted record) flat file (over 25 years ago). there was work done comparing binary search to radix search i.e. rather than treating records as consisting of totally random bits for a binary search ... pro-rate the search probes based on letter sequence of the search argument. initially assuming an uniform letter frequency distribution. this was further refined by supplying the phone book search program with the actual first letter frequency distribution of names in the phone book.

binary search assumes that avg. search probes is the binary root of the size of the file ... i.e. 64k records requires 16 probes. letter frequency radix search reduced that to closer to five probes.

translation to unix filesystem was done assuming avg. record size. mainframe filesystems supported the concept of records ... and API semantics that allowed reading random records (both fixed-length records ... which as simpler case as well as variable-length records ... which is a little more difficult). unix filesystem API basically allow reading random bytes from a file. record structures are a characteristic of implicit in-band data in the file (i.e. null terminated) as opposed to explicit out-of-band information directly supported by the filesystem. As a result, the letter frequency radix search had to perform a little magic simulating random record reads on top of an (unix) API that provided only simple random character reads.

one might also consider this helped contribute to lots of databases being implemented on unix platforms with raw disk ... instead of thru the filesystem .... since raw disk at least provided the record oriented api semantics (which got quite obfuscated in a unix filesystem environment).

-- 
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Received on Fri Oct 14 2005 - 18:27:56 CEST

Original text of this message