How to read xml files from the OS, parse them, and load the data into Oracle

Want to read xml files from the OS, parse them, and load the data into Oracle, all from PL/SQL (and a small Java stored procedure 🙂 )? Well allrightythen!

* This will all be done from within the database.

* The OS directories will be referenced only by Oracle directory name, not actual directory name.

Disclaimer: This code is not all mine. I’ve collected snippets from several different places, and wrote a bunch of my own as well. The goal was to take it all and put it into one post so that you can go from start to finish in one place.

We’ll be doing the following in this post

1) List the source directory
2) Loop through the file names
3) Parse the XML files
4) Insert the data (not the XML file) into a table
5) Move the file to a destination directory

Source directory: /tmp/xml_inbound
Destination directory: /tmp/xml_archive

Example XML file


<?xml version="1.0" encoding="UTF-8"?>
<Data>
   <Header>
      <ID>Batch 1</ID>
      <AnimalCount>2</AnimalCount>
      <BreedCount>6</BreedCount>
   </Header>
   <Animals>
      <Animal>
         <ID>1</ID>
         <Type>Dog</Type>
         <BreedList>
               <Breed>Bullmastiff</Breed>
               <Breed>Dalmation</Breed>
               <Breed>American Bulldog</Breed>
         </BreedList>
      </Animal>
      <Animal>
         <ID>2</ID>
         <Type>Cat</Type>
         <BreedList>
            <Breed>Tabby</Breed>
            <Breed>Manx</Breed>
            <Breed>Savannah</Breed>
         </BreedList>
      </Animal>
   </Animals>
</Data>

Let’s do a little OS level setup, as the “oracle” user


# mkdir /tmp/xml_inbound
# mkdir /tmp/xml_archive

Copy the above XML into a file called /tmp/xml_inbound/<anything>.xml 

And database level – Note that I’m granting privileges to “ADAM”.
That might not be you..


sqlplus "/ as sysdba"

11:08:35 DEMO (SYS) > Create directory xml_inbound as '/tmp/xml_inbound';
11:08:43 DEMO (SYS) > Create directory xml_archive as '/tmp/xml_archive';
11:08:51 DEMO (SYS) > Grant read,write on directory xml_inbound to ADAM;
11:08:58 DEMO (SYS) > Grant read,write on directory xml_archive to ADAM;
--
-- This is for the Java stored procedure we'll create below. 
-- It's only used for reading the directory contents.
--
11:09:03 DEMO (SYS) > Exec dbms_java.grant_permission( 'ADAM','SYS:java.io.FilePermission','/tmp/xml_inbound', 'read');

First, we need to be able to get a directory listing from the OS. We’ll use Java for this. One procedure will populate the DIR_LIST global temporary table with the file names. The table will be cleared when we do a commit. The other two procedures are for bringing the data in as a variable, and renaming (moving) the OS file.

As your non-SYS database user from now on.. For me it’s “ADAM”


Create global temporary table DIR_LIST
(
   filename varchar2(255)
) on commit delete rows;

The below code will create the following procedures


dir_list                Get a listing of files from a directory
get_xml_file_from_os    Load an XML file into a variable of XMLType
move_os_file            Used to move files from inbound to archive

Create or replace
and compile java source named "DirList"
as
import java.io.*;
import java.sql.*;

public class DirList
{
   public static void getList(String oracle_directory)
                   throws SQLException
   {
      String os_directory;
      #sql {Select directory_path into :os_directory from all_directories where directory_name = upper(:oracle_directory)};

      File path = new File( os_directory );
      String[] list = path.list();
      String element;

      for(int i = 0; i < list.length; i++)
      {
         element = list[i];
         #sql { INSERT INTO DIR_LIST (FILENAME) VALUES (:element) };
      }
   }

}
/

Create or replace
procedure get_dir_list( oracle_directory in varchar2 )
   as language java
   name 'DirList.getList( java.lang.String )';
/

Create or replace procedure move_os_file
(
   source_directory     in varchar2,
   source_file          in varchar2,
   target_directory     in varchar2,
   target_file          in varchar2
)

as

   srcdir               varchar2(255) := upper(source_directory);
   tgtdir               varchar2(255) := upper(target_directory);

begin

   --
   -- NOTE: If you're getting the all-too-familiar
   -- ORA-29292: file rename operation failed
   -- and you're SURE that your directory names are correct,
   -- and you're SURE that your privileges are correct, both at the 
   -- OS level, and within the database, there's one last thing that
   -- can get you. I learned the hard way that this command will NOT
   -- work successfully renaming a file from one filesystem to another,
   -- at least when those filesystems are NFS mounted. That is all. 
   --
   utl_file.frename(srcdir,source_file,tgtdir,target_file,TRUE);

end move_os_file;
/

Create or replace procedure get_xml_file_from_os 
(
   oracle_directory     in varchar2, 
   filename             in varchar2 ,
   o_xml                out xmltype
)

as

    dest_clob           clob;
    src_clob            bfile;
    dst_offset          number := 1 ;
    src_offset          number := 1 ;
    lang_ctx            number := DBMS_LOB.DEFAULT_LANG_CTX;
    warning             number;
    os_directory        varchar2(255);
    ora_directory       varchar2(255);

begin

   ora_directory := upper(oracle_directory);

   Select directory_path 
      into os_directory 
   from all_directories
   where directory_name = ora_directory;

   dbms_lob.createtemporary(dest_clob,true);

   dbms_output.put_line('OS directory: '||os_directory);
   dbms_output.put_line('Oracle directory: '||ora_directory);
   dbms_output.put_line('File: '||filename);
   src_clob := bfilename(ora_directory, filename);

   dbms_output.put_line('Opening lob'); 
   dbms_lob.open(src_clob,DBMS_LOB.LOB_READONLY);

   dbms_output.put_line('Loading lob');
   dbms_lob.LoadCLOBFromFile(
      dest_lob     => dest_clob,
      src_bfile    => src_clob,
      amount       => DBMS_LOB.GETLENGTH(src_clob),
      dest_offset  => dst_offset,
      src_offset   => src_offset,
      bfile_csid   => DBMS_LOB.DEFAULT_CSID,
      lang_context => lang_ctx,
      warning      => warning
   );

   dbms_lob.close(src_clob);

   o_xml := xmltype(dest_clob);

end get_xml_file_from_os;
/

Now that we can read files from the OS, we need to parse them and insert the data. Let’s create a couple of tables to support this.


Create table ANIMALS
(
   animal_id   number,
   type        varchar2(20)
);

Create table BREEDS
(
   animal_id    number,
   breed        varchar2(255)
);

And finally, a stored procedure to pull it all together. Here’s where we’ll do the XML parsing and inserting into the database.


Create or replace procedure process_xml

as

   v_XML                                XMLType;
   v_rowcount                           number;

   --
   -- Header variables
   --
   v_BatchID                            varchar2(20);
   v_AnimalCount                        number;
   v_BreedCount                         number;

   --
   -- Animal variables
   --
   v_AnimalID                           ANIMALS.ANIMAL_ID%TYPE;
   v_AnimalType                         ANIMALS.TYPE%TYPE;

   --
   -- Breed variables
   --
   v_BreedID                            BREEDS.ANIMAL_ID%TYPE;
   v_Breed                              BREEDS.BREED%TYPE;

Begin

   dbms_output.put_line('Getting directory listing');
   --
   -- The records in the DIR_LIST table will be removed on commit!
   --
   get_dir_list('XML_INBOUND');
   for i in (select filename from dir_list where upper(filename) like '%.XML')
   loop
      v_rowcount := 0;
      dbms_output.put_line('Filename: '||i.filename);
      get_xml_file_from_os('XML_INBOUND',i.filename,v_XML);
      dbms_output.put_line('File loaded');

      --
      -- Extract the Header
      --
      dbms_output.put_line('Getting Batch ID');
      v_BatchID         := v_XML.extract('/Data/Header/ID/text()').getStringVal();
      dbms_output.put_line('Getting Animal Count');
      v_AnimalCount     := v_XML.extract('/Data/Header/AnimalCount/text()').getNumberVal();
      v_BreedCount      := v_XML.extract('/Data/Header/BreedCount/text()').getNumberVal();
      dbms_output.put_line('Batch ID: '||v_BatchID);
      dbms_output.put_line('Animal Count: '||v_AnimalCount);
      dbms_output.put_line('Breed Count: '||v_BreedCount);

      --
      -- Loop through the animals
      --
      v_AnimalCount := 1;
      While v_XML.existsNode('/Data/Animals/Animal['||v_AnimalCount||']') = 1

      --
      -- Loop through Animals
      --
      loop
         v_AnimalID     :=  v_XML.extract('/Data/Animals/Animal['||v_AnimalCount||']/ID/text()').getNumberVal();
         v_AnimalType   :=  v_XML.extract('/Data/Animals/Animal['||v_AnimalCount||']/Type/text()').getStringVal();
         dbms_output.put_line('AnimalID: '||v_AnimalID);
         dbms_output.put_line('AnimalType: '||v_AnimalType);

         Insert into ANIMALS values (v_AnimalID,v_AnimalType);

         --
         -- Loop through breeds
         --
         v_BreedCount := 1;
         While v_XML.existsNode('/Data/Animals/Animal['||v_AnimalCount||']/BreedList/Breed['||v_BreedCount||']') = 1
         loop
            v_Breed     :=  v_XML.extract('/Data/Animals/Animal['||v_AnimalCount||']/BreedList/Breed['||v_BreedCount||']/text()').getStringVal();
            dbms_output.put_line('Breed: '||v_Breed);

            Insert into BREEDS values (v_AnimalID,v_Breed);

            v_BreedCount := v_BreedCount + 1;
   
         end loop;

         v_BreedCount := 1;
         v_AnimalCount := v_AnimalCount + 1;

      end loop;

      v_AnimalCount := 1;

      dbms_output.put_line('Moving file '||i.filename);
      move_os_file('XML_INBOUND',i.filename,'XML_ARCHIVE',i.filename);

   end loop;

   Commit;

End process_xml;
/

And that’s it! Now, let’s execute it and see what happens 🙂
Note: My xml file is called test1.xml


10:46:41 DEMO (ADAM) > Exec process_xml;
Getting directory listing
Filename: test1.xml
OS directory: /tmp/xml_inbound
Oracle directory: XML_INBOUND
File: test1.xml
Opening lob
Loading lob
File loaded
Getting Batch ID
Getting Animal Count
Batch ID: Batch 1
Animal Count: 2
Breed Count: 6
AnimalID: 1
AnimalType: Dog
Breed: Bullmastiff
Breed: Dalmation
Breed: American Bulldog
AnimalID: 2
AnimalType: Cat
Breed: Tabby
Breed: Manx
Breed: Savannah
Moving file test1.xml

Is the data there?


10:49:03 RMAN (ADAM) > Select count(*) from animals;
         2

10:49:11 RMAN (ADAM) > Select count(*) from breeds;
         6

Yep!

And if you look at /tmp/xml_archive..


[oracle@mastiff /home/oracle] [RMAN]$ ls /tmp/xml_archive
test1.xml

The file has been moved! That’s all folks!

11 thoughts on “How to read xml files from the OS, parse them, and load the data into Oracle

  1. Hi Adam and Anilton,

    I’m facing the same issue, so I’d like know if you got the procedure “dirlist” in PLSQL. Please provide me if you got.

    Thanks a lot.

    Cid

  2. Hi Adam and all
    I tried above example as it is,but getting following error

    *
    ERROR at line 1:
    ORA-29532: Java call terminated by uncaught Java exception: java.lang.NullPointerException
    ORA-06512: at “BAR12.GET_DIR_LIST”, line 1
    ORA-06512: at “BAR12.PROCESS_XML”, line 33
    ORA-06512: at line 1

    how could i get it?

    • java.lang.NullPointerException.
      Its returning nothing that’s why you get this error. make sure the path you specified is the right one

      • Is there any way to load multiple xml files?

        EXEC GET_DIR_LIST(‘/opt/oracle/xmlStore/clinicalTrials’);

        COMMIT;

        INSERT INTO CTGOV.CLINICAL_TRIALS (XML)
        SELECT XMLTYPE(BFILENAME(‘folder_where_xmls_are’,FILENAME), NLS_CHARSET_ID(‘UTF-8’))
        FROM CTGOV.NEW_RECORDS_TO_ADD_VIEW;

  3. Hi Adam and all
    I tried above example as it is,but getting following error

    ORA-29283: invalid file operation
    ORA-06512: at “SYS.UTL_FILE”, line 348
    ORA-06512: at “SYS.UTL_FILE”, line 1290
    ORA-06512: at “SYNTAX.MOVE_OS_FILE”, line 26
    ORA-06512: at “SYNTAX.PROCESS_XML”, line 93
    ORA-06512: at line 1

    how could i get it?

  4. Hi Adam and all
    I tried above example FOR MY FILE ,but getting following error

    ORA-31011: XML parsing failed
    ORA-19202: Error occurred in XML processing
    LPX-00283: document encoding is UTF-8-based but default input encoding is not
    Error at line 1
    ORA-06512: at “SYS.XMLTYPE”, line 272
    ORA-06512: at “SYNTAX.GET_XML_FILE_FROM_OS”, line 53
    ORA-06512: at “SYNTAX.PROCESS_XML_TEST”, line 37
    ORA-06512: at line 1

    how could i get it?

    Thanx in Advance
    Regard
    Farid

Leave a Reply

Your email address will not be published.