Re: Dealing with UTF8 and loading data with SQL*Loader

From: Mark J. Bobak <mark_at_bobak.net>
Date: Mon, 12 Jun 2017 18:06:38 -0400
Message-ID: <CAFQ5ACLMCgPUPAKagnpZBYcixyA0-JxKzFLf2Tm4_7+e_Qu=fg_at_mail.gmail.com>



Thanks to everyone who replied. I went with Stephan's suggestion. But, not being much of a perlmonger, I did it with awk. I used sed to remove the 3 BOM bytes from the beginning of the file, then used awk to convert from fixed field length to quoted CSV. Once I had the CSV format, the loading was no problem.

-Mark

On Mon, Jun 12, 2017 at 4:05 PM, Jure Bratina <jure.bratina_at_gmail.com> wrote:

> Hi Mark,
>
> To add to what others already wrote, according to Doc ID 2010180.1:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *To avoid the problem, specify for all columns the corresponding character
> length instead of using the POSITION parameter, e.g.:LOAD DATAAPPENDINTO
> TABLE temp_tph_dataTRAILING NULLCOLS(COLUMN1 varchar2(20)
> ,COLUMN2 varchar2(20))If the above solution does not work, then
> you can use the CHARACTERSET parameter for SQL*Loader.LOAD DATAcharacterset
> UTF8 length semantics charinfile 'file_name'APPENDINTO TABLE
> employeesTRAILING NULLCOLS( column_1 char(3), column_2
> char(14), column_3 char(3))*
>
>
> Or alternatively using external tables and fixed field length data:
> http://docs.oracle.com/database/121/SUTIL/GUID-A077C3A6-B1AE-4D8B-98E6-
> E1C1AE34B63F.htm#SUTIL1380
>
> CREATE TABLE countries_ext2 (
> country_code VARCHAR2(5),
> country_name VARCHAR2(50),
> country_language VARCHAR2(50)
> )
> ORGANIZATION EXTERNAL (
> TYPE ORACLE_LOADER
> DEFAULT DIRECTORY tmp_loader
> ACCESS PARAMETERS (
> RECORDS DELIMITED BY NEWLINE
> *STRING SIZES ARE IN CHARACTERS*
> fields (
> country_code char(3),
> country_name char(14),
> country_language char(3)
> )
> )
> LOCATION ('contries2.txt')
> )
> PARALLEL 1
> REJECT LIMIT UNLIMITED;
>
> Regards,
> Jure Bratina
>
>
> On Mon, Jun 12, 2017 at 9:27 PM, Stefan Knecht <knecht.stefan_at_gmail.com>
> wrote:
>
>> This kind of sounds like you can't use fixed positions with a file like
>> this.
>>
>> Have you tried pre-processing it with e.g. perl, and "convert" it to so
>> that columns are separated by a specific character instead of relying on
>> the positions? A perl one-or-two-liner should be able to do that? Perl may
>> not suffer from the same "oddity" that you're seeing with sql*loader.
>>
>> Just an idea
>>
>> Stefan
>>
>>
>>
>> On Tue, Jun 13, 2017 at 1:09 AM, Mark J. Bobak <mark_at_bobak.net> wrote:
>>
>>> Hi All,
>>>
>>> I'm running Oracle 12.1.0.2 Standard Edition
>>>
>>> We have files in UTF8 that we are trying to load in SQL*Loader. I've
>>> got 'CHARACTERSET' in the control file set to 'UTF8'. The database is
>>> created with AL32UTF8.
>>>
>>> The file contains only standard English and some Spanish characters.
>>> The data is loading and characters are being interpreted correctly.
>>>
>>> My problem is with the fact that the file contains fixed field length
>>> data, and the control file is using 'POSITION(x:y)' parameters, to define
>>> the fields. For lines that only contain standard, single-byte characters,
>>> everything works fine. However, the POSITION parameter uses byte semantics
>>> (regardless of the value of the 'LENGTH SEMANTICS' parameter). So, if a
>>> particular line has a two-byte character in it,every field following the
>>> field with the two-byte character is off by one. And, of course, if there
>>> are multiple two-byte characters in a particular line, that line will be
>>> off by n, where n is the number of two-byte characters in that particular
>>> line.
>>>
>>> I'm no SQL*Loader expert.....am I missing something obvious here? Does
>>> anyone have any idea how to solve this?
>>>
>>> Thanks,
>>>
>>> -Mark
>>>
>>
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jun 13 2017 - 00:06:38 CEST

Original text of this message