Re: Internet search engines and databases
Date: 2000/04/15
Message-ID: <#vRiDBrp$GA.233_at_cpmsnbbsa03>#1/1
> On Fri, 7 Apr 2000 21:11:44 -0700, "KenNorth" wrote:
> >Even if you got every Web site in the world to expose its metadata as
XML,
> >you'd still have a problem with semantics. Is my <ID> the same as your
<ID>?
Mark Preston wrote
> With respect, Ken, I disagree for the following reasons. > It matters not a bit - you will be using their DTD or Schema for their > data, not yours. It only matters if you want to actually import their > data into your database
If you are building a search engine that is going to search across thousands of Web sites, you're focus is not importing data but scanning it to build indexes. The indexes are to provide performance, but they are meaningless if each query has to access a site's DTDs or schemas.
The program that indexes your site has to scan data, including DTDs or schemas. It should have semantic understanding and recognize similarities, enabling the engine to return hits when there are not precise matches. For example, assume Acme made Transporters, but Megacorp came along and bought Acme.
Site A may contain this information:
<WarningID> 2300-01 </WarningID>
<Manufacturer> Acme </Manufacturer>
<Product> Transporter </Product>
<Warning> This model requires regular maintenance. MTBF is 5000 hours when
installed in starships powered by dilithium crystals. Service it before each
voyage and perform regular maintenance checks. </Warning>
Site B may contain this information:
<WarningID> 2300-01 </WarningID>
<Manufacturer> Megacorp </Manufacturer>
<Product> Transporter </Product>
<Hazard> This model requires regular maintenance. MTBF is 5000 hours when
installed in starships powered by dilithium crystals. Service it before each
voyage and perform regular maintenance checks. </Hazard>
The indexing process should recognize these two are the same, even if site A's data and schemas are a version behind site B's.
- Ken North ======================
http://ourworld.compuserve.com/homepages/Ken_North
See you at AD2000 (www.applicationdevelopment.com)
XML DevCon 2000 (www.xmldevcon2000.com)