Re: Data model for websites
From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Sun, 17 Oct 2004 09:19:55 -0400
Message-ID: <shrtkc.mkd.ln_at_mercury.downsfam.net>
Date: Sun, 17 Oct 2004 09:19:55 -0400
Message-ID: <shrtkc.mkd.ln_at_mercury.downsfam.net>
Fernando Rodríguez wrote:
> HI,
>
> I'm writing a webcrawler to analyse the structure of websites. What would
> be the best data model to represent the net of interlinked pages in a
> relational database? O:-)
>
> I would need to quickly retrieve the links from a given page as well as
> the links to a given page.
>
> Thanks
Have never thought about this before, but seems at its core you'd have a simple cross-reference of pages, something like:
create table links (
page_from varchar(100),
page_to varchar(100)
)
As you crawl the site, you create entries. Filtering on page_from or page_to should be self explanatory.
I used varchar(100) to allow for big complete urls, though it may be better to shorten them.
-- Kenneth Downs Use first initial plus last name at last name plus literal "fam.net" to email meReceived on Sun Oct 17 2004 - 15:19:55 CEST