Re: Need to use a rare field separator in *.dat files

From: Ramon F Herrera <ramon_at_conexus.net>
Date: 10 Mar 2007 14:56:03 -0800
Message-ID: <1173567363.375222.188280@s48g2000cws.googlegroups.com>

On Mar 10, 3:41 pm, "fitzjarr..._at_cox.net" <fitzjarr..._at_cox.net> wrote:
> On Mar 10, 3:28 pm, "Ramon F Herrera" <r..._at_conexus.net> wrote:
>
>
>
> > On Mar 10, 2:50 pm, "fitzjarr..._at_cox.net" <fitzjarr..._at_cox.net> wrote:
>
> > > On Mar 10, 2:01 pm, "Ramon F Herrera" <r..._at_conexus.net> wrote:
>
> > > > All my *.dat files are created programmatically and therefore I can
> > > > insert any ASCII character as a field separator. I have used
> > > > characters such as ':' or '|' to mark the border between the fields,
> > > > but needless to say, the data may contain those characters.
>
> > > > What I would like to do is to use some rare (but visible) high ASCII
> > > > value which will be essentially impossible to be present in the raw
> > > > datafiles.
>
> > > > Is that the right approach (in some cases I cannot switch to fixed
> > > > length records)?
>
> > > > How do I define such character in the *.ctl files?
>
> > > > This is what I have now:
>
> > > > FIELDS TERMINATED BY "|"
>
> > > > TIA,
>
> > > > -Ramon F Herrera
>
> > > Using the extended ASCII table found here:
>
> > >http://www.cdrummond.qc.ca/cegep/informat/Professeurs/Alain/files/asc...
>
> > > you might consider using a hex value of A9 or greater as your
> > > separator (presuming, of course, you can generate this character in
> > > your .dat file) as characters at that end of the extended ASCII range
> > > are not likely to be present in any text you might be processing. A
> > > simple
>
> > > FIELDS TERMINATED BY X'A9'
>
> > > in your .ctl file (as an example) should solve your problem, again
> > > provided you can generate that character with your .dat file
> > > generator.
>
> > > Years ago I wrote a routine in a Pro*C program to scan the input file
> > > and 'register' all characters in the file, after which I would select
> > > the first ASCII character in my array having no occurrences as my
> > > field separator. It worked well for years and took practically no
> > > time at all (given the routine was written in C) however choosing some
> > > uncommon character which isn't present on most computer keyboards is
> > > probably the direction to take.
>
> > > David Fitzjarrell
>
> > I guess you should take into account what OS you are using. In Unix,
> > characters from hex 80 to 9f are shown like this: <88>, while in
> > Windows there are some interesting choices which will definitely stand
> > out to the eye, such as hex 83 (it looks like an integral symbol), hex
> > 151 or hex ac.
>
> > Stay away from hex 80 (the Euro symbol)!.
>
> > -Ramon
>
> > 80 128: €
> > 81 129:
> > 82 130: ,
> > 83 131: ƒ
> > 84 132: ,,
> > 85 133: ...
> > 86 134: †
> > 87 135: ‡
> > 88 136: ˆ
> > 89 137: ‰
> > 8a 138: Š
> > 8b 139: ‹
> > 8c 140: Œ
> > 8d 141:
> > 8e 142: Ž
> > 8f 143:
> > 90 144:
> > 91 145: '
> > 92 146: '
> > 93 147: "
> > 94 148: "
> > 95 149: ·
> > 96 150: -
> > 97 151: -
> > 98 152: ˜
> > 99 153: ™
> > 9a 154: š
> > 9b 155: ›
> > 9c 156: œ
> > 9d 157:
> > 9e 158: ž
> > 9f 159: Ÿ
> > a0 160:
> > a1 161: ¡
> > a2 162: ¢
> > a3 163: £
> > a4 164: ¤
> > a5 165: ¥
> > a6 166: ¦
> > a7 167: §
> > a8 168: ¨
> > a9 169: ©
> > aa 170: ª
> > ab 171: «
> > ac 172: ¬
> > ad 173: 
> > ae 174: ®
> > af 175: ¯
> > b0 176: °
> > b1 177: ±
> > b2 178: ²
> > b3 179: ³
> > b4 180: ´
> > b5 181: µ
> > b6 182: ¶
> > b7 183: ·
> > b8 184: ¸
> > b9 185: ¹
> > ba 186: º
> > bb 187: »
> > bc 188: ¼
> > bd 189: ½
> > be 190: ¾
> > bf 191: ¿
> > c0 192: À
> > c1 193: Á
> > c2 194: Â
> > c3 195: Ã
> > c4 196: Ä
> > c5 197: Å
> > c6 198: Æ
> > c7 199: Ç
> > c8 200: È
> > c9 201: É
> > ca 202: Ê
> > cb 203: Ë
> > cc 204: Ì
> > cd 205: Í
> > ce 206: Î
> > cf 207: Ï
> > d0 208: Ð
> > d1 209: Ñ
> > d2 210: Ò
> > d3 211: Ó
> > d4 212: Ô
> > d5 213: Õ
> > d6 214: Ö
> > d7 215: ×
> > d8 216: Ø
> > d9 217: Ù
> > da 218: Ú
> > db 219: Û
> > dc 220: Ü
> > dd 221: Ý
> > de 222: Þ
> > df 223: ß
> > e0 224: à
> > e1 225: á
> > e2 226: â
> > e3 227: ã
> > e4 228: ä
> > e5 229: å
> > e6 230: æ
> > e7 231: ç
> > e8 232: è
> > e9 233: é
> > ea 234: ê
> > eb 235: ë
> > ec 236: ì
> > ed 237: í
> > ee 238: î
> > ef 239: ï
> > f0 240: ð
> > f1 241: ñ
> > f2 242: ò
> > f3 243: ó
> > f4 244: ô
> > f5 245: õ
> > f6 246: ö
> > f7 247: ÷
> > f8 248: ø
> > f9 249: ù
> > fa 250: ú
> > fb 251: û
> > fc 252: ü
> > fd 253: ý
> > fe 254: þ
> > ff 255: ÿ- Hide quoted text -
>
> > - Show quoted text -
>
> I've never seen what you report for Unix servers:
>
> > In Unix, characters from hex 80 to 9f are shown like this: <88>
>
> SQL*Plus always reports such characters in the manner you've displayed
> them, regardless of the operating system. What is the value for TERM
> in your session? This has much to do with how you see the Unix world.
>
> David Fitzjarrell

My TERM is vt100 with 8 bits. I only see <88> in the vi editor. The 'cat' and 'less' commands show nothing.

-Ramon Received on Sat Mar 10 2007 - 16:56:03 CST