near duplicates in short text fields
Date: Fri, 15 Aug 2008 20:08:04 +0200
Message-ID: <g84gm4$dep$02$2_at_news.t-online.com>
Hi,
[Quoted] [Quoted] can anybody tell me how to find near duplicates in a large amount (20 million) short text labels?
Is there any database tool which does just this?
I give you some examples:
not near:
Rugby Polo - black/white - S; (Angebot von Kabelmeister)
Rugby Shirt Striped - aqua/white - S; (Angebot von Kabelmeister)
near:
Rugby Shirt Striped - aqua/white - S; (Angebot von Kabelmeister)
Shirt Striped - aqua/white - S; (Angebot von)
near:
301 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT BLAU in L (eBay Shop
jeanspoint74)
482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop
jeanspoint74)
near:
482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT SCHWARZ in L (eBay Shop jeanspoint74)
482 LA RUGBY SEXY DISCO PARTY POLO T-SHIRT WEISS in M (eBay Shop jeanspoint74)
Thanks
merkury Received on Fri Aug 15 2008 - 20:08:04 CEST