Home » Developer & Programmer » Precompilers, OCI & OCCI » Data Loading problem of French Input file (Oracle 10g, SUN OS 5.6 , Korn Shell , Pro*C)
Data Loading problem of French Input file [message #563699] Thu, 16 August 2012 09:33 Go to next message
pradeepsanthosh
Messages: 7
Registered: August 2012
Junior Member
Hello All,

I am trying to load a input flat file (french data in it) into the database using pro*c (not using sql loader because of some validations ). i am reading the line by line and populating it into the structure and then process it.

The input file is encoded in WE8ISO8895P1 , I want the records to be populated into a table.
so i did set NLS_LANG=French_France.WE8ISO8895P1 and ran the pro*c program . i used character host variable in the insert query , i used data from the earlier read structure and set these character host variable.


The problem i am facing is , when i am printing the values before insert i could see correct data.

For example the printed data of variable is "pas de donné " and strlen is 14. the target table field is of varchar type
( name varchar(20 char) )

but after i insert i could see only the truncated value in the database i.e "pas de donn" in the table length is 11.

kindly request you all to provide me valuable suggestions , please let me know if you need further details.
Thanks in advance !!

Regards,
Pradeep
Re: Data Loading problem of French Input file [message #563701 is a reply to message #563699] Thu, 16 August 2012 09:42 Go to previous messageGo to next message
dws1
Messages: 14
Registered: July 2012
Junior Member
What is the database characterset?
Have you checked all static buffers within the Pro*C code are large enough for the data? (Just printing the data out won't confirm this; you need to look at the code).
Re: Data Loading problem of French Input file [message #563705 is a reply to message #563701] Thu, 16 August 2012 09:45 Go to previous messageGo to next message
Michel Cadot
Messages: 59082
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
I agree, this seems to be a buffer truncation or a not correct length not a problem with the database (by the way, what is its character set?).
Post your code, this may also help to debug it.

Regards
Michel
Re: Data Loading problem of French Input file [message #563716 is a reply to message #563705] Thu, 16 August 2012 10:55 Go to previous messageGo to next message
pradeepsanthosh
Messages: 7
Registered: August 2012
Junior Member
Thanks for the quick response !!

please find the details below.


1) connecting to oracle

EXEC SQL WHENEVER SQLERROR GOTO connect_error;
EXEC SQL CONNECT :username IDENTIFIED BY :password USING :ssid;


2) structure is

struct ref_ident
{
char nom_commercial [40];
char adr_no_voie [4];
char tel_entite [14];
char fax_entite [14];
char no_tva [14];
char no_siret [14];
char code_ape [4];

};


host variables are declared
commercial_name char(41);
tel_number char(15);

2) read records into the structure in loop one by one into structure ref_ident and then insert into a table

while(!feof(fp_read))
{


pt_struct = &ref_ident;
taille_struct = sizeof(struct _ref_ident);
fread(pt_struct,taille_struct,1,fp_read);


-- after reading into the structure some structure values are used to insert record into a table

memset(commercial_name,'\0',strlen(commercial_name));
memset(tel_number,'\0',strlen(tel_number));

strncpy(commercial_name,ref_ident.nom_commercial,40);
strncpy(tel_number,ref_ident.tel_entite,14);

-- inserting into table

printf("NAME[%s] LEN[%d] TELE[%s] LEN[%s]",commercia_name,strlen(commercial_name),tel_number,strlen(tel_number));

EXEC SQL
INSERT INTO CUSTOMER (NAME,TELEPHONE_NUM)
VALUES (:commercial_name,:tel_number);

-- Commiting in the end

EXEC SQL
COMMIT;


fgetc(fp_read)
}


make file
----------------------------

# Suffix rules
##############

.SUFFIXES: .pc


# definition
############

CODE=ANSI_C
SHELL=/usr/bin/ksh
LIB=${VERS}/lib
BIN=${VERS}/bin
SRC=${VERS}/src/proc
MAKE=${VERS}/make
LIBHOME=${ORACLE_HOME}/lib32
PROC=${ORACLE_HOME}/bin/proc
PROFLAGS=include=${LIB} line=yes code=${CODE}
CC=/logiciel/SUN_Studio_11/SUNWspro/bin/cc
LDFLAGS=-L${LIBHOME}
LIB806=${LIBHOME}/libclntsh.so -lc -lm
PROLDLIBS= ${LDFLAGS} ${LIB806}
OBJ=${MAKE}/load_detail.o
EXE=${BIN}/load_detail.exe

RM = rm -f



# regles
########

${EXE} : ${OBJ}
${CC} ${LDFLAGS} ${OBJ} -o ${EXE} ${PROLDLIBS} -L${LIB} -llogito -L${LIB} -lmodulecrm

${MAKE}/load_detail.o : ${SRC}/load_detail.c
${CC} -c ${SRC}/load_detail.c -I/usr/include -I${LIB}

${SRC}/load_detail.c : ${SRC}/load_detail.pc
${PROC} ${PROFLAGS} iname=${SRC}/load_detail.pc SQLCHECK=SEMANTICS USERID=$username/$passwd@$sid

clean :
$(RM) $(MAKE)/load_detail.o \
$(RM) $(SRC)/load_detail.lis \
$(RM) $(SRC)/load_detail.c


---------------------


I compiled it , there were no warnings . my default server profile language $NLS_LANG was us en_us utf8.
so i changed it to French
export NLS_LANG='French_France.WE8ISO8895P1'
export LANG ='fr_FR.ISO8859-1'

then executed the binary

it got completed successfully all the records were loaded but those who had french character in it were loaded with truncated values.

I connected to database from the same putty session using sqlplus

and also checked the NLS settings were of french only

SELECT * FROM nls_session_parameters;


select name,telephone_num from customer;

-- gave me truncated values for records which had french special character
Re: Data Loading problem of French Input file [message #563722 is a reply to message #563716] Thu, 16 August 2012 11:03 Go to previous messageGo to next message
pradeepsanthosh
Messages: 7
Registered: August 2012
Junior Member
About the database characterset , i guess it is UTF-8 but i will confirm you once i sign in tomorrow @ offc Wink .
i was totally freaked out today by this issue but i am glad now that i have some support from you all.
happy to join you guys !!!

I tried studying about nchar and nvarchar too . but since it was too late @ offc , i came back home disappointed Sad .

Thanks again for your kind support !!!
Re: Data Loading problem of French Input file [message #563731 is a reply to message #563722] Thu, 16 August 2012 12:22 Go to previous messageGo to next message
Michel Cadot
Messages: 59082
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
The problem might be there:
memset(commercial_name,'\0',strlen(commercial_name));
memset(tel_number,'\0',strlen(tel_number));

As the variables are not set strlen could return anything.
It should be:
memset(commercial_name,'\0',41*sizeof(char));
memset(tel_number,'\0',15*sizeof(char));


As you have this statement before insert:
printf("NAME[%s] LEN[%d] TELE[%s] LEN[%s]",commercia_name,strlen(commercial_name),tel_number,strlen(tel_number));

Please post the result of the execution.
Note that you used "commercia_name" when the variable name is "commercial_name" so this code is not the real one or it did not compile.

Regards
Michel

[Updated on: Thu, 16 August 2012 12:23]

Report message to a moderator

Re: Data Loading problem of French Input file [message #563736 is a reply to message #563731] Thu, 16 August 2012 12:41 Go to previous messageGo to next message
pradeepsanthosh
Messages: 7
Registered: August 2012
Junior Member
I will try this solution tomorrow when i am back to work and update you on the same.
Yeah i just recapped the program which i coded earlier in office and scribed it here and that printf was a typo error Confused .


Re: Data Loading problem of French Input file [message #563786 is a reply to message #563736] Fri, 17 August 2012 02:25 Go to previous messageGo to next message
dws1
Messages: 14
Registered: July 2012
Junior Member
Are your ref_ident strings long enough? Remember that C stores strings with a terminating NULL, so "buffer overflow" is actually *16* characters long: 'b'(1), 'u'(2), 'f'(3),...,'o'(14), 'w'(15), '\0'(16), and would need to be stored in at least char[16]. char[15] or less would give you undefined behaviour, which could well be the cause of this issue.

>>For example the printed data of variable is "pas de donné " and strlen is 14. the target table field is of varchar type
>>( name varchar(20 char) )

Which variable? Anyway, that implies this is NOT ref_ident::nom_commercial because we know that's varchar2(40) or similar, and putting that string into any of the others would cause a buffer overflow.
Re: Data Loading problem of French Input file [message #563791 is a reply to message #563786] Fri, 17 August 2012 02:37 Go to previous messageGo to next message
Michel Cadot
Messages: 59082
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
Are your ref_ident strings long enough? Remember that C stores strings with a terminating NULL,


I think OP is aware of this if you carefully read the code:

structure is
...
char nom_commercial [40];

host variables are declared
commercial_name char(41);

Regards
Michel
Re: Data Loading problem of French Input file [message #563795 is a reply to message #563791] Fri, 17 August 2012 03:07 Go to previous messageGo to next message
dws1
Messages: 14
Registered: July 2012
Junior Member
Which is why I asked which variable.
Re: Data Loading problem of French Input file [message #563796 is a reply to message #563795] Fri, 17 August 2012 03:10 Go to previous messageGo to next message
Michel Cadot
Messages: 59082
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Only 2 are used in the code and both of them take care of this.
Anyway, we have to wait for OP to go back to office. Smile

Regards
Michel
Re: Data Loading problem of French Input file [message #563808 is a reply to message #563731] Fri, 17 August 2012 04:56 Go to previous messageGo to next message
pradeepsanthosh
Messages: 7
Registered: August 2012
Junior Member
Hey Guys,

I tried your solution , but still it still did the truncation.

NAME[Non Renseigné ] LEN[40] TELE[+33471770407 ] LEN[14]

[ Note there are 27 spaces after the é character, which is not getting displayed thru this post reply ]

I did some more findings on this issue earlier this morning, please find the below points.

I wanted to verify whether the PRO*C connection to oracle session is correctly using my NLS_LANG set in unix.

so in the code i just added a code to fetch the NLS_LANGUAGE

SELECT value into :nls_lang FROM nls_session_parameters where parameter='NLS_LANGUAGE';
and the output of the nls_lang variable was "AMERICAN" Sad

But after the program ended in the same putty session, i connected to oracle by sqlplus and executed the same query.
SELECT value FROM nls_session_parameters where parameter='NLS_LANGUAGE';
it gave me "FRENCH"

I guess that PRO*C connection is using the default and not the session Language set by me ??!!!!

/logiciel/oracle --> more nls_ora
client_14:american_america.utf8:/logiciel/oracle/tnsadmin

Regards,
Pradeep
Re: Data Loading problem of French Input file [message #563858 is a reply to message #563808] Fri, 17 August 2012 10:14 Go to previous messageGo to next message
Michel Cadot
Messages: 59082
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
[ Note there are 27 spaces after the é character, which is not getting displayed thru this post reply ]

Please read How to use [code] tags and make your code easier to read.

Quote:
SELECT value into :nls_lang FROM nls_session_parameters where parameter='NLS_LANGUAGE';
and the output of the nls_lang variable was "AMERICAN"


The language is not important (I'm French and I always use American language but for dates and I can enter French specific charcaters), the root point is the character set.

Quote:
american_america.utf8


This is correct ONLY if you actually use UTF8 in your client program.
Are you sure your program handles UTF8 character set? It is common in Java, rare in C (C programs usually use AL16UTTF16 for Unicode).

Regards
Michel

[Updated on: Fri, 17 August 2012 10:15]

Report message to a moderator

Re: Data Loading problem of French Input file [message #563861 is a reply to message #563858] Fri, 17 August 2012 10:40 Go to previous messageGo to next message
pradeepsanthosh
Messages: 7
Registered: August 2012
Junior Member
Thanks Michel !

Please help me understand i am little confused in understanding the issue.

1) I receive a file which is encoded in FRENCH ISO8859-1 format.
2) I write a client program in pro*c and when executing it i set the NLS_LANG as the same encode format FRENCH ISO8859-1 format

Please correct me if i am wrong My assumption is that the client program will read the file correctly and insert the records into database. Database default characterset will be overriden by the NLS_LANG set by us in the host.

But this is not happening i could see that records are correctly split by fread and structure values are set correct. but after inserting the iso8859-1 characters are not in database instead they are removed.



I also did one experiment , i converted the input file to UTF-8 format using

iconv -f ISO8859-1 -t UTF-8 my_file > new_file

i then executed the binary with the new file as input and this time i didnt change my host $NLS_LANG to french instead by default it is set to American_America.UTF8

The result was fread had issue in correctly setting the structure values , i assume this is due to fread reads by bytes and it doesnot support multibyte character , i.e for a single ISO-8859-1 special character UTF-8 file had 2 bytes.


Regards,
Pradeep

[Updated on: Fri, 17 August 2012 10:44]

Report message to a moderator

Re: Data Loading problem of French Input file [message #563869 is a reply to message #563861] Fri, 17 August 2012 11:23 Go to previous messageGo to next message
Michel Cadot
Messages: 59082
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
1) I receive a file which is encoded in FRENCH ISO8859-1 format.

So it is not UTF8.

Quote:
2) I write a client program in pro*c and when executing it i set the NLS_LANG as the same encode format FRENCH ISO8859-1 format


But it seems your program overides it from what you said:
Quote:
so in the code i just added a code to fetch the NLS_LANGUAGE

SELECT value into :nls_lang FROM nls_session_parameters where parameter='NLS_LANGUAGE';
and the output of the nls_lang variable was "AMERICAN"
.

Quote:
Database default characterset will be overriden by the NLS_LANG set by us in the host.


This is not exactly that, Oracle converts from your character set (then one you declare in NLS_LANG) to the database character set to store the data.

Quote:
The result was fread


Sorry, I don't know what this word means.


I think the solution is to set WE8ISO8859P1 in your configuration file instead of UTF8.

Regards
Michel

Re: Data Loading problem of French Input file [message #564012 is a reply to message #563869] Mon, 20 August 2012 04:01 Go to previous messageGo to next message
dws1
Messages: 14
Registered: July 2012
Junior Member
There are two simple rules for solving virtually every NLS issue:
(1) Set the database characterset to a characterset that represents characters you want to store in the database.
(2) Set NLS_LANG to represent the characterset your application uses. This environment variable tells Oracle what characterset your program is using.

Oracle will then correctly translate characters from your application for storage in the DB, assuming the characters are represented by both charactersets, and if they aren't, you'll get an 0xBF.

Do those and virtually every NLS issue disappears, in my experience. Assume either doesn't need to hold for any reason, however reasonable it sounds, and you WILL get NLS issues.

[Updated on: Mon, 20 August 2012 04:03]

Report message to a moderator

Re: Data Loading problem of French Input file [message #564013 is a reply to message #564012] Mon, 20 August 2012 04:28 Go to previous messageGo to next message
Michel Cadot
Messages: 59082
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
I concur.

Regards
Michel
icon14.gif  Re: Data Loading problem of French Input file [message #564136 is a reply to message #564013] Tue, 21 August 2012 09:58 Go to previous message
pradeepsanthosh
Messages: 7
Registered: August 2012
Junior Member
Thanks Friends !!!

Finally found the problem. Though i have set the NLS_LANG to French_France.WE8ISO8895P1 in the unix session and executed the korn shell, which in turn executes the binary. In the shell they are loading a global config file in which they are setting NLS_LANG to american_america.utf8 . so because of this NLS_LANG is overridden and resulting wrong client characterset.

So i exported the NLS_LANG in the korn shell itself before executing the binary and it worked like a charm. I could now see all the french characters loaded into the database.


(2) Set NLS_LANG to represent the characterset your application uses. This environment variable tells Oracle what characterset your program is using.



Thanks again for your kind support !!!

Regards,
Pradeep
Previous Topic: does proc precompile gtk libraries?
Next Topic: sqladt is initialized twice.
Goto Forum:
  


Current Time: Tue Sep 16 08:57:35 CDT 2014

Total time taken to generate the page: 0.09932 seconds