Re: Need to use a rare field separator in *.dat files

From: <fitzjarrell_at_cox.net>
Date: 10 Mar 2007 13:41:49 -0800
Message-ID: <1173562909.647466.218880@c51g2000cwc.googlegroups.com>

On Mar 10, 3:28 pm, "Ramon F Herrera" <r..._at_conexus.net> wrote:
> On Mar 10, 2:50 pm, "fitzjarr..._at_cox.net" <fitzjarr..._at_cox.net> wrote:
>
>
>
>
>
> > On Mar 10, 2:01 pm, "Ramon F Herrera" <r..._at_conexus.net> wrote:
>
> > > All my *.dat files are created programmatically and therefore I can
> > > insert any ASCII character as a field separator. I have used
> > > characters such as ':' or '|' to mark the border between the fields,
> > > but needless to say, the data may contain those characters.
>
> > > What I would like to do is to use some rare (but visible) high ASCII
> > > value which will be essentially impossible to be present in the raw
> > > datafiles.
>
> > > Is that the right approach (in some cases I cannot switch to fixed
> > > length records)?
>
> > > How do I define such character in the *.ctl files?
>
> > > This is what I have now:
>
> > > FIELDS TERMINATED BY "|"
>
> > > TIA,
>
> > > -Ramon F Herrera
>
> > Using the extended ASCII table found here:
>
> >http://www.cdrummond.qc.ca/cegep/informat/Professeurs/Alain/files/asc...
>
> > you might consider using a hex value of A9 or greater as your
> > separator (presuming, of course, you can generate this character in
> > your .dat file) as characters at that end of the extended ASCII range
> > are not likely to be present in any text you might be processing. A
> > simple
>
> > FIELDS TERMINATED BY X'A9'
>
> > in your .ctl file (as an example) should solve your problem, again
> > provided you can generate that character with your .dat file
> > generator.
>
> > Years ago I wrote a routine in a Pro*C program to scan the input file
> > and 'register' all characters in the file, after which I would select
> > the first ASCII character in my array having no occurrences as my
> > field separator. It worked well for years and took practically no
> > time at all (given the routine was written in C) however choosing some
> > uncommon character which isn't present on most computer keyboards is
> > probably the direction to take.
>
> > David Fitzjarrell
>
> I guess you should take into account what OS you are using. In Unix,
> characters from hex 80 to 9f are shown like this: <88>, while in
> Windows there are some interesting choices which will definitely stand
> out to the eye, such as hex 83 (it looks like an integral symbol), hex
> 151 or hex ac.
>
> Stay away from hex 80 (the Euro symbol)!.
>
> -Ramon
>
> 80 128: €
> 81 129:
> 82 130: ,
> 83 131: ƒ
> 84 132: ,,
> 85 133: ...
> 86 134: †
> 87 135: ‡
> 88 136: ˆ
> 89 137: ‰
> 8a 138: Š
> 8b 139: ‹
> 8c 140: Œ
> 8d 141:
> 8e 142: Ž
> 8f 143:
> 90 144:
> 91 145: '
> 92 146: '
> 93 147: "
> 94 148: "
> 95 149: ·
> 96 150: -
> 97 151: -
> 98 152: ˜
> 99 153: ™
> 9a 154: š
> 9b 155: ›
> 9c 156: œ
> 9d 157:
> 9e 158: ž
> 9f 159: Ÿ
> a0 160:
> a1 161: ¡
> a2 162: ¢
> a3 163: £
> a4 164: ¤
> a5 165: ¥
> a6 166: ¦
> a7 167: §
> a8 168: ¨
> a9 169: ©
> aa 170: ª
> ab 171: «
> ac 172: ¬
> ad 173: 
> ae 174: ®
> af 175: ¯
> b0 176: °
> b1 177: ±
> b2 178: ²
> b3 179: ³
> b4 180: ´
> b5 181: µ
> b6 182: ¶
> b7 183: ·
> b8 184: ¸
> b9 185: ¹
> ba 186: º
> bb 187: »
> bc 188: ¼
> bd 189: ½
> be 190: ¾
> bf 191: ¿
> c0 192: À
> c1 193: Á
> c2 194: Â
> c3 195: Ã
> c4 196: Ä
> c5 197: Å
> c6 198: Æ
> c7 199: Ç
> c8 200: È
> c9 201: É
> ca 202: Ê
> cb 203: Ë
> cc 204: Ì
> cd 205: Í
> ce 206: Î
> cf 207: Ï
> d0 208: Ð
> d1 209: Ñ
> d2 210: Ò
> d3 211: Ó
> d4 212: Ô
> d5 213: Õ
> d6 214: Ö
> d7 215: ×
> d8 216: Ø
> d9 217: Ù
> da 218: Ú
> db 219: Û
> dc 220: Ü
> dd 221: Ý
> de 222: Þ
> df 223: ß
> e0 224: à
> e1 225: á
> e2 226: â
> e3 227: ã
> e4 228: ä
> e5 229: å
> e6 230: æ
> e7 231: ç
> e8 232: è
> e9 233: é
> ea 234: ê
> eb 235: ë
> ec 236: ì
> ed 237: í
> ee 238: î
> ef 239: ï
> f0 240: ð
> f1 241: ñ
> f2 242: ò
> f3 243: ó
> f4 244: ô
> f5 245: õ
> f6 246: ö
> f7 247: ÷
> f8 248: ø
> f9 249: ù
> fa 250: ú
> fb 251: û
> fc 252: ü
> fd 253: ý
> fe 254: þ
> ff 255: ÿ- Hide quoted text -
>
> - Show quoted text -

I've never seen what you report for Unix servers:

> In Unix, characters from hex 80 to 9f are shown like this: <88>

SQL*Plus always reports such characters in the manner you've displayed them, regardless of the operating system. What is the value for TERM in your session? This has much to do with how you see the Unix world.

David Fitzjarrell Received on Sat Mar 10 2007 - 15:41:49 CST