KnowledgeBase: USING SQL LOADER

Using SQL Loader#

A large number of .CTL files have been provided by High Line to be used by the SQL Loader utility for loading comma delimited files into the interface tables of the target database. These files are located in the CONVERSION\CTL_FILES directory of the shared, mapped directory on the database server (this should be P:\P2K\CONVERSION\CTL_FILES if the default settings are used from the installation manual).

There is one .CTL file for each conversion interface file defined within the P2K schema of the database. Each of these files has the same basic format with the name of the input file, the target table in the database and the list of columns available changing for each file. The following example is a portion of the PERSONALS.CTL file and is used for loading data into the INF_CV_PERSONALS:

LOAD DATA
INFILE 'PERSONALS.CSV'
TRUNCATE
INTO TABLE INF_CV_PERSONALS
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
  PERSON_CODE		POSITION (*)      
, USER_NAME			POSITION (*)        
, GOVERNMENT_CODE		POSITION (*)  
, FIRST_NAME		POSITION (*)       
, LAST_NAME			POSITION (*)        
, BIRTH_DATE		POSITION (*)       
, GENDER			POSITION (*)           
, LANGUAGE_CODE		POSITION (*)    
, ETHNIC			POSITION (*)           
, MIDDLE_NAME		POSITION (*)      
, SALUTATION		POSITION (*)       
, RANK			POSITION (*) 
, BIRTHPLACE		POSITION (*)       
, CHALLENGED		POSITION (*)       
, VETERAN_STATUS		POSITION (*)   
, EFFECTIVE			POSITION (*)"NVL(RTRIM(:EFFECTIVE),'01-JAN-0001') "
, EXPIRY			POSITION (*)"NVL(RTRIM(:EXPIRY),'31-Dec-3999') "
, CHANGE_CODE		POSITION (*)      
, GEO_CODE			POSITION (*) 
...

The settings within this file will cause SQL Loader to read the comma delimited data from the PERSONALS.CSV file into the INF_CV_PERSONALS tables within the database. If there are any records in this interface table when SQL Loader runs, the contents of the table will be truncated (deleted) prior to the new records being loaded. Also, if any of the records being loaded do not have enough columns to fill the records in the table, the columns not provided in the input file will be left as NULL.

The data from the comma delimited file will be loaded into the Oracle table using the column order defined in the .CTL file. In this example, the first column in the input file will be loaded into the PERSON_CODE column of the interface table, the second column will be loaded into the USER_NAME column and so on through the entire input record. If the input file does not match this format, exactly, the order of these columns can be changed to match the actual input file format.

If the column being loaded into the interface table needs to have the data manipulated during the load, a modification token can be added to the column line, as in the EFFECTIVE and EXPIRY columns listed above. The column read from the input file will be modified by the SQL functions, before being inserted into the interface table. The functions used here will set the column to a default value if a value has not been provided in the input record. These tokens can use any SQL function available within the database, including custom stored functions, as long as the entire token does not exceed 256 characters, 258 characters including the beginning and ending quotation marks.

Since the location of the .CSV file has not been specified in the INFILE parameter, this file must be located in the current working directory that SQL Loader is running from. SQL Loader is called from within the Conversion application and will have the current working directory set to the same location as the Start In directory defined in the Conversion shortcut.

To call SQL Loader from within the Conversion application select the Load item in the menu and enter the name of the control file that it is to process. This actually runs the SQL Loader utility in the background and waits for it to complete. You will only see a dialog box appear when the SQL Loaders utility completes, successfully of not. The dialog box will simply indicated that the load process completed, but the SQL Loader log file will have to be checked to see if anything was actually loaded into the INF table.

The log file will be created in the working directory defined by the Start In parameter of the conversion shortcut. The name of the log file will be the same as the .CTL file passed to SQL Loader with a .LOG extension.

A portion of the PERSONALS.LOG file is show below, as an example:

SQL*Loader: Release 9.2.0.6.0 - Production on Wed Jun 29 11:41:23 2005

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Control File:   personals.ctl
Data File:      personals.csv
  Bad File:     personals.bad
  Discard File:  none specified
 
 (Allow all discards)

Number to load: ALL
Number to skip: 0
Errors allowed: 50
Bind array:     64 rows, maximum of 256000 bytes
Continuation:    none specified
Path used:      Conventional

Table INF_CV_PERSONALS, loaded from every logical record.
Insert option in effect for this table: TRUNCATE
TRAILING NULLCOLS option in effect

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
PERSON_CODE                         FIRST     *   ,  O(") CHARACTER            
LAST_NAME                            NEXT     *   ,  O(") CHARACTER            

...

Record 103: Rejected - Error on table INF_CV_PERSONALS, column PHONE_UNLISTED.
ORA-01401: inserted value too large for column

Record 1201: Rejected - Error on table INF_CV_PERSONALS, column BIRTH_DATE.
ORA-01843: not a valid month


Table INF_CV_PERSONALS:
  1892 Rows successfully loaded.
  2 Rows not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Space allocated for bind array:                 250002 bytes(17 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          0
Total logical records read:          1894
Total logical records rejected:         6
Total logical records discarded:        0

Run began on Wed Jun 29 11:41:23 2005
Run ended on Wed Jun 29 11:41:24 2005

Elapsed time was:     00:00:00.98
CPU time was:         00:00:00.28

This SQL loader log file has been edited to fit this page. The actual log will have a complete list of all the columns defined in the .CTL file and their SQL modification tokens. However, the important portions of the log file are included for this discussion.

The first few lines after the SQL Loader banner lists the files used when the .CTL file was processed. The first is the name of the actual .CTL file, followed by the log file, the bad file and the discard file (if specified). The bad file will only be generated if records cannot be loaded into the database table from the comma delimited file. Typically, these records will not load if the data type for one of the fields does not match the table definition, or if the data is too large to fit into the column.

In the middle of the log example, there are two records that did not get loaded into the table. The first record had data for one column that was too large. This error shows the record number from the input file and the column name that generated the error. The second record generated shows similar information for another error, however this error was caused by an invalid date format. Both of these records will be copied to the bad file listed in the first section of the log.

The end of the log file also shows the total number of records read and loaded into the database table. If all the records have not been loaded, the input file may need to be regenerated by the extract logic of your legacy system and the load processed again. Alternately, additional modification tokens can be added to the .CTL file to compensate for these errors and the then reloaded.

Adding Translations Lexicons to Control Files#

In the situation where a large number of values need to be translated from the legacy system to the application , it may be necessary to build a translation lexicon for the SQL Loader control file to make use of. This is the same process that has already been done for several fields in the conversion process, such as CHANGE_CODE.

Building a translation lexicon can be done easily within Excel and then added to the current list of translations in the system. The following is an example of adding a translation lexicon into the system for the CHANGE_CODE. The first step is to build a comma separated values, or CSV, file that lists all the CHANGE_CODES that will be extracted from the current system and their equivalent value in the application. The following file is a section of one of these files:

TRAN_DCR,.old,.new,P2K Description
TRAN_DCR,01,EMPLOY CAREER,01 EMPLOYMENT - CAREER                           
TRAN_DCR,02,EMPLOY TERM,02 EMPLOYMENT - TERM                             
TRAN_DCR,04,EMPLOY TO CAREER,04 CONVERSION TO CAREER (EMPLOYMENT)             
TRAN_DCR,05,REHIRE,05 REHIRE                                        
TRAN_DCR,06,REACT ASSIGNMT,06 REACTIVATE ASSIGNMENT                         
TRAN_DCR,08,TERMINATION,08 TERMINATION OF EMPLOYMENT                     
TRAN_DCR,15,CAREER FT TO PT,15 CAREER FULL-TIME TO CAREER PART-TIME          
TRAN_DCR,16,CAREER PT TO FT,16 CAREER PART-TIME TO CAREER FULL-TIME

The first column is the name of the lexicon that will be created in the IMLN screen to hold the translation lexicon values. The second column contains the values that will be extracted from the legacy system, and the third column contains the value that it will be translated to. The fourth column is optional, and can contain the meaning of the value or its original value from the legacy system.

Once the comma delimited file has been created, use a text editor to append this list of translations onto the current list in the TRANSLATE.DAT file. The order of the records in this TRANSLATE.DAT file is not important, but should be listed in alphabetical order to make it easier to read and locate entries.

The next step is to add the name of the lexicon to the CV_TRAN_LOAD.SQL script so that the entry will be created the next time the script is run. Below is a section of the SQL script that inserts the lexicon names into the database. Add another entry to the end of the script to build the entry.

-- MODIFICATION HISTORY
-- ----------------------------------------------------
-- 20011015-IB-created to easily delete and rebuild translation tables.
--             loads the P2K_SMGLXID function
--             inserts the TRAN_% Lexicon Names
--             loads in the Lexicon_Values

@P2K_SVGLI.sql

delete from p2k_am_lexicon_names
where  LEXICON_NAME like 'TRAN_%'
and    LEXICON_TYPE = 'UC'
/
commit;
insert into p2k_am_lexicon_names
       (LEXICON_NAME, LEXICON_TYPE, VALUE_LENGTH)
values ('TRAN_PPC', 'UC', 5)
/
insert into p2k_am_lexicon_names
       (LEXICON_NAME, LEXICON_TYPE, VALUE_LENGTH)
values ('TRAN_DES', 'UC', 30)
/
.
.
.
insert into p2k_am_lexicon_names
       (LEXICON_NAME, LEXICON_TYPE, VALUE_LENGTH)
values ('TRAN_DTR', 'UC', 30)
/
insert into p2k_am_lexicon_names
       (LEXICON_NAME, LEXICON_TYPE, VALUE_LENGTH)
values ('TRAN_DCR', 'UC', 30)
/
commit;

host sqlldr p2k/p2k@PROD Translate.ctl

In the script listed above, you will see that the last INSERT statement adds the TRAN_DCR entry to the lexicon names. The entry is defined as a User Created lexicon UC, with a maximum size of thirty characters. Sixteen characters is typically enough for most translations, but can be increased to as much as thirty characters. Once the changes have been made to both the TRANSLATE.CSV file and the CV_TRAN_LOAD.SQL script run SQL Plus, log in as P2K and execute the CV_TRAN_LOAD script. This will delete and recreate all the translation lexicons in the target database.

The last step is to alter the SQL Loader control file to make use of the new translation lexicon. The example listed below is a modified version of the ASSIGNMENTS.CTL file, showing the translation of the CHANGE_CODE column as it is being loaded into the INF_CV_ASSIGNMENTS table:

LOAD DATA
INFILE 'assignments.dat'
TRUNCATE
INTO TABLE INF_CV_ASSIGNMENTS
FIELDS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
  ENTITY_CODE			      POSITION(*)                
, PERSON_CODE			      POSITION(*)                
, EMPLOYMENT_TYPE		      POSITION(*)            
, HIRE_DATE			      POSITION(*) "TO_CHAR(TO_DATE(RTRIM(:HIRE_DATE),'YYYYMMDD'),'DD-Mon-YYYY')"
, ASSIGNMENT_CODE		      POSITION(*)            
, ASSIGNMENT_TYPE                 POSITION(*) 
, ASSIGNMENT_STATUS               POSITION(*)          
.
.
.
, CHANGE_CODE			      POSITION(*) "p2k_smglx('TRAN_DCR', rtrim(ltrim(:change_code)))"

As you can see from this example, the translation is actually performed by a stored procedure created by High Line call P2K_SMGLX. This function takes two parameters and returns a string for the translation. The first parameter is the lexicon name that was created to hold the translation list, the second is the field that contains the coded value that needs to be translated. In this example the field that contains the coded value is the same field that the translation is being performs on, but this is not a requirement. It is possible to use any column, or set of columns, in the records for translation.