Re: newbie question
Date: 2000/06/07
Message-ID: <393DECE1.8157C615_at_elbanet.co.at>#1/1
Hi!
Olivier Deme wrote:
>
> Hello,
>
> I am a complete beginner with database programming and I need some help.
>
> I have written a program that parses source code and extracts symbols in a
> database.
>
> En entry consists of:
>
> - The key = A symbol (a string of characters)
> - The data = Symbol type + Filename + Lineno (data is stored also as a string)
>
> I am using the Berkeley DB freeware for managing my database.
>
> The problem is that, for a big input (source code > 10000 lines), my database
> becomes very very big. Addition of entries to the database becomes very
> quickly slow because of the database size.
>
> I believe the problem comes from the way I am storing the entries in the
> database.
>
> In the same database, I probably have thousands of entries with the same key
> but with slight variations of data.
>
> I believe I should use cross reference tables in order to decrease my database
> size.
> Unfortunately, I don't know how to proceed.
It seams that you're looking for a design like this:
TABLE Symbols
Key INTEGER (some kind of Autonumber/Identity/...) Name CHAR (..)
PRIMARY KEY (Key)
UNIQUE INDEX Symbol_Name ON Symbols (Name);
TABLE Data
Key INTEGER File CHAR (..) Line INTEGER
possibly PRIMARY KEY (Key, File, Line)
When you find a symbol in your file, you look up whether it is in the
Symbols table yet. If yes, you have the key; otherwise insert and get
key. Then insert into table Data.
Provided your symbols are significantly larger than an integer, this
should reduce size. And it also should increase speed, since integer
lookups are faster than character ones.
hth,
Heinz
Received on Wed Jun 07 2000 - 00:00:00 CEST