Using SQL Loader#
A large number of .CTL files have been provided by High Line to be used by the SQL Loader utility for loading comma delimited files into the interface tables of the target database. These files are located in the CONVERSION\CTL_FILES directory of the shared, mapped directory on the database server (this should be P:\P2K\CONVERSION\CTL_FILES if the default settings are used from the installation manual).
There is one .CTL file for each conversion interface file defined within the P2K schema of the database. Each of these files has the same basic format with the name of the input file, the target table in the database and the list of columns available changing for each file.
The LOAD DATA statement is required at the beginning of the control file.
INFILE specifies that the data is found in an external file and not in the control file.
TRUNCATE indicates that the contents of the table will first be purged before loading of the data. If this line did not exist, the interface table can be appended to.
The INTO TABLE statement is required to identify the table to be loaded (INF_CV_PERSONALS) into. By default, SQL*Loader requires the table to be empty before it inserts any records.
FIELDS TERMINATED BY specifies that each field is terminated by a comma and may also be enclosed by quotation marks. Data types for all fields default to CHAR.
TRAILING NULLCOLS allows you to terminate the record before the end of all the fields. Any fields not specified will be set to NULL
Specifies the names of each column to be loaded and the order the must be coded. SQL code can also be added to the columns to translate the data as it’s loaded into the interface table
The following example is a portion of the PERSONALS.CTL file and is used for loading data into the INF_CV_PERSONALS:
LOAD DATA INFILE 'PERSONALS.CSV' TRUNCATE INTO TABLE INF_CV_PERSONALS FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS ( PERSON_CODE POSITION (*) , USER_NAME POSITION (*) , GOVERNMENT_CODE POSITION (*) , FIRST_NAME POSITION (*) , LAST_NAME POSITION (*) , BIRTH_DATE POSITION (*) , GENDER POSITION (*) , LANGUAGE_CODE POSITION (*) , ETHNIC POSITION (*) , MIDDLE_NAME POSITION (*) , SALUTATION POSITION (*) , RANK POSITION (*) , BIRTHPLACE POSITION (*) , CHALLENGED POSITION (*) , VETERAN_STATUS POSITION (*) , EFFECTIVE POSITION (*)"NVL(RTRIM(:EFFECTIVE),'01-JAN-0001') " , EXPIRY POSITION (*)"NVL(RTRIM(:EXPIRY),'31-Dec-3999') " , CHANGE_CODE POSITION (*) , GEO_CODE POSITION (*) ...
The settings within this file will cause SQL Loader to read the comma delimited data from the PERSONALS.CSV file into the INF_CV_PERSONALS tables within the database. If there are any records in this interface table when SQL Loader runs, the contents of the table will be truncated (deleted) prior to the new records being loaded. Also, if any of the records being loaded do not have enough columns to fill the records in the table, the columns not provided in the input file will be left as NULL.
The data from the comma delimited file will be loaded into the Oracle table using the column order defined in the .CTL file. In this example, the first column in the input file will be loaded into the PERSON_CODE column of the interface table, the second column will be loaded into the USER_NAME column and so on through the entire input record. If the input file does not match this format, exactly, the order of these columns can be changed to match the actual input file format.
If the column being loaded into the interface table needs to have the data manipulated during the load, a modification token can be added to the column line, as in the EFFECTIVE and EXPIRY columns listed above. The column read from the input file will be modified by the SQL functions, before being inserted into the interface table. The functions used here will set the column to a default value if a value has not been provided in the input record. These tokens can use any SQL function available within the database, including custom stored functions, as long as the entire token does not exceed 256 characters, 258 characters including the beginning and ending quotation marks.
Since the location of the .CSV file has not been specified in the INFILE parameter, this file must be located in the current working directory that SQL Loader is running from. SQL Loader is called from within the Conversion application and will have the current working directory set to the same location as the Start In directory defined in the Conversion shortcut.
To call SQL Loader from within the Conversion application select the Load item in the menu and enter the name of the control file that it is to process. This actually runs the SQL Loader utility in the background and waits for it to complete. You will only see a dialog box appear when the SQL Loaders utility completes, successfully of not. The dialog box will simply indicated that the load process completed, but the SQL Loader log file will have to be checked to see if anything was actually loaded into the INF table.
The example shown below is an example of the PERSONALS.DAT file. It contains only the first few fields and a limited number of records. A complete data file would require all of the mandatory fields, as well as providing commas for all the fields that are to be set to a value of NULL.
1234,MASTER,333-22-4444,John,Doe,01-Apr-1960,Male,…,01-Jan-0001,31-Dec-3999,Initial Record,… 1235,GUEST,123-45-6789,Jane,Doe,29-Feb-1964,Female,…,01-Jan-0001,31-Dec-3999,Initial Record,… . . . 54321,,987-65-4321,John,Smith,25-Dec-1966,Male,…,01-Jan-0001,31-Dec-3999,Initial Record,…
As you can see from the example, each record is fully contained on one line within the data file and terminated by a carriage return. Since all fields are treated as character fields by SQL*Loader, we are not required to enclose the text within quotes (‘’). However, this can be done if the tool you are using to create the data files does require them.
The log file will be created in the working directory defined by the Start In parameter of the conversion shortcut. The name of the log file will be the same as the .CTL file passed to SQL Loader with a .LOG extension.
A portion of the PERSONALS.LOG file is show below, as an example:
SQL*Loader: Release 9.2.0.6.0 - Production on Wed Jun 29 11:41:23 2005 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Control File: personals.ctl Data File: personals.csv Bad File: personals.bad Discard File: none specified (Allow all discards) Number to load: ALL Number to skip: 0 Errors allowed: 50 Bind array: 64 rows, maximum of 256000 bytes Continuation: none specified Path used: Conventional Table INF_CV_PERSONALS, loaded from every logical record. Insert option in effect for this table: TRUNCATE TRAILING NULLCOLS option in effect Column Name Position Len Term Encl Datatype ------------------------------ ---------- ----- ---- ---- --------------------- PERSON_CODE FIRST * , O(") CHARACTER LAST_NAME NEXT * , O(") CHARACTER ... Record 103: Rejected - Error on table INF_CV_PERSONALS, column PHONE_UNLISTED. ORA-01401: inserted value too large for column Record 1201: Rejected - Error on table INF_CV_PERSONALS, column BIRTH_DATE. ORA-01843: not a valid month Table INF_CV_PERSONALS: 1892 Rows successfully loaded. 2 Rows not loaded due to data errors. 0 Rows not loaded because all WHEN clauses were failed. 0 Rows not loaded because all fields were null. Space allocated for bind array: 250002 bytes(17 rows) Read buffer bytes: 1048576 Total logical records skipped: 0 Total logical records read: 1894 Total logical records rejected: 6 Total logical records discarded: 0 Run began on Wed Jun 29 11:41:23 2005 Run ended on Wed Jun 29 11:41:24 2005 Elapsed time was: 00:00:00.98 CPU time was: 00:00:00.28
This SQL loader log file has been edited to fit this page. The actual log will have a complete list of all the columns defined in the .CTL file and their SQL modification tokens. However, the important portions of the log file are included for this discussion.
The first few lines after the SQL Loader banner lists the files used when the .CTL file was processed. The first is the name of the actual .CTL file, followed by the log file, the bad file and the discard file (if specified). The bad file will only be generated if records cannot be loaded into the database table from the comma delimited file. Typically, these records will not load if the data type for one of the fields does not match the table definition, or if the data is too large to fit into the column.
In the middle of the log example, there are two records that did not get loaded into the table. The first record had data for one column that was too large. This error shows the record number from the input file and the column name that generated the error. The second record generated shows similar information for another error, however this error was caused by an invalid date format. Both of these records will be copied to the bad file listed in the first section of the log.
The end of the log file also shows the total number of records read and loaded into the database table. If all the records have not been loaded, the input file may need to be regenerated by the extract logic of your legacy system and the load processed again. Alternately, additional modification tokens can be added to the .CTL file to compensate for these errors and the then reloaded.
Viewing / Altering the INF tables#
Once the data has been loaded into the interface tables we can use SQL*Plus to view and/or alter the individual rows, or a group of rows. The following script is an example of how to count the number of rows that exist in the PERSONALS interface table:SELECT COUNT (*) FROM INF_CV_PERSONALS; COUNT (*) --------- 0This information can be used to verify that all the records in the comma-delimited file were loaded successfully into the interface table. The total number of records within the interface table should be the same number of records in the flat file. If the numbers do not match, the SQL*Loader program would have rejected some of the records.
SQL*Loader Rejects#
Records are rejected by SQL*Loader when the input format is invalid. For example, if the second enclosure delimiter is missing, or if a delimited field exceeds its maximum length, SQL*Loader rejects the record. Rejected records are placed in the bad file, if one has been specified within the control file. If you chose to capture rejected records is a “Bad file” you must specify the name of the file within the SQL*Loader control file. The following line will capture the records into a file called BADFILE.RJT.BADFILE BADFILE.RJT
You may have to qualify the location of the bad file by using a line similar to the following:
BADFILE C:\HIGHLINE\P2K\CONVERSION\REJECT\BADFILE.RJT
Oracle Rejects#
After a record is accepted for processing by SQL*Loader, a row is sent to Oracle for insertion. If Oracle determines that the row is valid, then the row is inserted into the database. If not, the record is rejected, and SQL*Loader puts it in the bad file. The row may be rejected, for example, because a key is not unique, because a required field is NULL, or because the field contains invalid data for the Oracle data type.If the data can be evaluated according to the WHEN-clause criteria (even with unbalanced delimiters) then it is either inserted or rejected.
If a record is rejected on insert, then no part of that record is inserted into any table. For example, if data in a record is to be inserted into multiple tables, and most of the inserts succeed, but one insert fails; then all the inserts from that record are rolled back. The record is then written to the bad file, where it can be corrected and reloaded. Previous inserts from records without errors are not affected.
SQL*Loader Discards#
As SQL*Loader executes, it may create a file called the discard file. This file is created only when it is needed, and only if you have specified that a discard file should be enabled. The discard file contains records that were filtered out of the load because they did not match any of the record-selection criteria specified in the control file.The discard file therefore contains records that were not inserted into any table, up to a specifiable maximum. If a record's data is written to any table, it is not written to the discard file.
The discard file is written in the same format as the data file. The discard data can be loaded with the existing control file, after any necessary editing or correcting. These discarded records can, optionally, be placed in a reject file. However, the name and location of the reject file must be added to the control file prior to running SQL*Loader. The following line would need to exist after the INFILE setting in the control file:
DISCARDFILE 'BADPERS.RJT'
DISCARDMAX 999
This identifies the reject file as being located in the current work directory and named BADPERS.RJT. If the number of rejected records exceeds 999, the SQL*Loader will abort the load process for the remaining records in the data file.