Re: newbie question

From: Heinz Huber <Heinz.Huber_at_elbanet.co.at>
Date: 2000/06/07
Message-ID: <393DECE1.8157C615_at_elbanet.co.at>#1/1


Hi!

Olivier Deme wrote:
>
> Hello,
>
> I am a complete beginner with database programming and I need some help.
>
> I have written a program that parses source code and extracts symbols in a
> database.
>
> En entry consists of:
>
> - The key = A symbol (a string of characters)
> - The data = Symbol type + Filename + Lineno (data is stored also as a string)
>
> I am using the Berkeley DB freeware for managing my database.
>
> The problem is that, for a big input (source code > 10000 lines), my database
> becomes very very big. Addition of entries to the database becomes very
> quickly slow because of the database size.
>
> I believe the problem comes from the way I am storing the entries in the
> database.
>
> In the same database, I probably have thousands of entries with the same key
> but with slight variations of data.
>
> I believe I should use cross reference tables in order to decrease my database
> size.
> Unfortunately, I don't know how to proceed.

It seams that you're looking for a design like this:

TABLE Symbols

    Key      INTEGER (some kind of Autonumber/Identity/...)
    Name     CHAR (..)

    PRIMARY KEY (Key)
UNIQUE INDEX Symbol_Name ON Symbols (Name);

TABLE Data

    Key      INTEGER
    File     CHAR (..)
    Line     INTEGER

possibly PRIMARY KEY (Key, File, Line)

When you find a symbol in your file, you look up whether it is in the Symbols table yet. If yes, you have the key; otherwise insert and get key. Then insert into table Data.
Provided your symbols are significantly larger than an integer, this should reduce size. And it also should increase speed, since integer lookups are faster than character ones.

hth,
Heinz Received on Wed Jun 07 2000 - 00:00:00 CEST

Original text of this message