Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Algorithm or ideas wanted for creative text parsing

Algorithm or ideas wanted for creative text parsing

From: rjamya <rjamya_at_gmail.com>
Date: Mon, 10 Apr 2006 12:51:54 -0400
Message-ID: <9177895d0604100951m4d4cc74dy4cabf0148bbf8b5@mail.gmail.com>


Basically I am looking to isolate just the (distinct) domain name from fully qualified domain names that you'd normally see in web-surfing.

I am working on couple of techniques, but it gets complicated since TLDs differ in format and there is only so much you can do with substr().

sample data ...

a836.v8519e.c8519.g.vm.akamaistream.net
a705.l1923962123.c19239.n.lm.akamaistream.net

db.c7.bf.a0.top.list.ru
a1657.l1923962104.c19239.n.lm.akamaistream.net
a1181.v21080b.c21080.g.vm.akamaistream.net
dl1.games.vip.scd.yahoo.com
lcp.mud.us.music.yahoo.com
www.celhs.osceola.k12.fl.us

www.celhs.osceola.k12.fl.us
www.celhs.osceola.k12.fl.us
w.s0.gc.sj.ipixmedia.com
w.s0.gc.sj.ipixmedia.com
v.s0.gc.sj.ipixmedia.com

us.1.p6.webhosting.yahoo.com
p1.music.vip.sc5.yahoo.com
lib1.store.vip.sc5.yahoo.com
www.twingroves.district96.k12.il.us
www.twingroves.district96.k12.il.us
www.the-simpsons.hpg.ig.com.br
www.schools.pinellas.k12.fl.us
www.rails4days.pwp.blueyonder.co.uk
www.rails4days.pwp.blueyonder.co.uk
www.garrp.dhr.state.ga.us
www.celhs.osceola.k12.fl.us
www.williamrobertson.pwp.blueyonder.co.uk
www.williamrobertson.pwp.blueyonder.co.uk
lcp.mud.us.music.yahoo.com
c.s0.gc.sj.ipixmedia.com
c.s0.gc.sj.ipixmedia.com

ax.phobos.apple.com
ax.phobos.apple.com
0982660.1206.feed.yellowpagecity.com
0982660.1207.feed.yellowpagecity.com

and by some magic the output should be ....

akamaistream.net
apple.com
yahoo.com
fl.us
ipixmedia.com
il.us
ig.com.br
blueyonder.co.uk
ga.us
yellowpagecity.com

Any ideas, thoughts? I'd prefer to do this in SQL if possible, else I'd prefer plsql. The data is already in a 10.1.0.4 database.

Thanks in advance
Raj



Got RAC?
--

http://www.freelists.org/webpage/oracle-l Received on Mon Apr 10 2006 - 11:51:54 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US