Home » SQL & PL/SQL » SQL & PL/SQL » Reading text from binary data in a blob
Reading text from binary data in a blob [message #683871] Sat, 27 February 2021 10:31 Go to next message
appender
Messages: 1
Registered: February 2021
Junior Member
We are storing all files as blobs. There is a need to take the text content from several .doc files stored in this way, and insert all of this text into another blob. The problem is getting the text out of the blob. The following code works for .txt files, but not for .doc files (it only returns gibberish for .doc files). I assume that that is because the encoding of .doc files shows up as binary, whereas for .txt files it is UTF8 etc. I've tried all kinds of conversions but haven't been able to find a solution.

Is it perhaps possible to avoid this conversion to a clob altogether, and only work with blobs? I've tried appending blobs to one another, but probably due to some other data in the blob this doesn't work, and only the content of the first appended file is visible in the final result.

declare
  v_file_ids numberlist := numberlist();
  v_blob blob;
  v_total clob;
  v_clob clob;
  v_nvarchar nvarchar2(32767);
  v_start pls_integer;
  v_buffer pls_integer;
begin
  dbms_lob.createtemporary(v_total, true);

  v_file_ids.extend();
  v_file_ids(v_file_ids.last) := --random ID;
 
  for i in 1..v_file_ids.count loop
    dbms_lob.createtemporary(v_clob, true); 
    
    v_start := 1;
    v_buffer := 32767;
    
    --gets blob of the file
    GetFileContent(v_file_ids(i), v_blob);
    
    --retrieves the text content from the file
    for i in 1..ceil(dbms_lob.getlength(v_blob)/v_buffer)
    loop
     v_nvarchar := utl_raw.cast_to_nvarchar2(dbms_lob.substr(v_blob, v_buffer, v_start));

     dbms_lob.writeappend(v_clob, length(v_nvarchar), v_nvarchar);
     
     v_start := v_start + v_buffer;
    end loop; 
    
    --appends the retrieved text to the main clob
    dbms_lob.append(v_total, v_clob);
  end loop;
end;
Re: Reading text from binary data in a blob [message #683872 is a reply to message #683871] Sat, 27 February 2021 11:17 Go to previous messageGo to next message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Welcome to the forum.
Please read the OraFAQ Forum Guide and How to use [code] tags and make your code easier to read.
Also always post your Oracle version, with 4 decimals (query v$version), as often solution depends on it.

Quote:
The following code works for .txt files, but not for .doc files (it only returns gibberish for .doc files
So it seems that your procedure GetFileContent reads files as text and not binary.

Anyway, Oracle already does everything for you: have a look at DBMS_LOB.LOADBLOBFROMFILE procedure.
Here's a simple example:
DECLARE
  tmp_lob  BLOB;
  file     BFILE;
  src_off  PLS_INTEGER := 1;
  dst_off  PLS_INTEGER := 1;
BEGIN
  DBMS_LOB.CREATETEMPORARY (tmp_lob, false, dur=>dbms_lob.call);
  file := BFILENAME('<FILEDIR>','<FILENAME>');
  DBMS_LOB.OPEN (file);
  DBMS_LOB.LOADBLOBFROMFILE (tmp_lob, file, dbms_lob.lobmaxsize, src_off, dst_off);
  -- <... do what you want with the lob ...> --
END;
/

[Updated on: Sat, 27 February 2021 11:18]

Report message to a moderator

Re: Reading text from binary data in a blob [message #683887 is a reply to message #683871] Tue, 02 March 2021 11:29 Go to previous message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Any feedback?

Previous Topic: Group result set in a count of 5 records.
Next Topic: Oracle: sql update on field containing sql itself
Goto Forum:
  


Current Time: Thu Mar 28 08:09:01 CDT 2024