Re: Data model for websites

From: Kenneth Downs <firstinit.lastname_at_lastnameplusfam.net>
Date: Sun, 17 Oct 2004 09:19:55 -0400
Message-ID: <shrtkc.mkd.ln_at_mercury.downsfam.net>


Fernando Rodríguez wrote:

> HI,
>
> I'm writing a webcrawler to analyse the structure of websites. What would
> be the best data model to represent the net of interlinked pages in a
> relational database? O:-)
>
> I would need to quickly retrieve the links from a given page as well as
> the links to a given page.
>
> Thanks

Have never thought about this before, but seems at its core you'd have a simple cross-reference of pages, something like:

create table links (
  page_from varchar(100),
  page_to varchar(100)
)

As you crawl the site, you create entries. Filtering on page_from or page_to should be self explanatory.

I used varchar(100) to allow for big complete urls, though it may be better to shorten them.

-- 
Kenneth Downs
Use first initial plus last name at last name plus literal "fam.net" to
email me
Received on Sun Oct 17 2004 - 15:19:55 CEST

Original text of this message