Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: Algorithm or ideas wanted for creative text parsing

Re: Algorithm or ideas wanted for creative text parsing

From: Richard Ji <richard.c.ji_at_gmail.com>
Date: Mon, 10 Apr 2006 13:59:13 -0400
Message-ID: <b4d52f20604101059g26f7d7eej893b7fa876090901@mail.gmail.com>


Raj,

.tv is 2 characters yet its' not a ccTLD, you need to get a list of valid ISO two letter country codes.

Richard Ji

On 4/10/06, rjamya <rjamya_at_gmail.com> wrote:
> Thanks SF and all
>
> maybe here is what I can do ...
>
> 1. if the domain is numeric, take it as it is
> 2. if the TLD (i.e. the last piece) is 3 or more characters, you take
> last 2 pieces
> (this will cover com,org,edu,name,info,museum etc)
> 3. if the last piece is 2 characters (most likely a ccTLD), take last 3 pieces
> (i.e. il, br, ca, uk etc)
>
> hmmm ... looks promising, am I missing anything?

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Apr 10 2006 - 12:59:13 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US