Oracle Warehouse Builder Automated ETL Processing

One of the most
Warehouse Builder is a powerful but somewhat under the radar tool. Like many
of Oracle’s non-RDBMS products, it didn’t exactly get off to the best start
possible with regard to usability, freedom from bugs, ease of use/installation,
documented, well, you get the idea. However, in later versions, Warehouse
Builder has evolved into an extremely feature-rich and highly functional tool
application that allows users to do some pretty amazing things without having to
to be wrapped up in all the gory detail.

This article assumes that you
have already installed OWB and are trying to figure out how to build a
automated ETL process. What are the step-by-step details to get there?

context

Warehouse Builder,
often referred to as OWB, can automate the loading of flat files into a
database. Many DBAs are intimately familiar with using a combination of
SQL*Loader and shell scripts, plus have a cron action run here and
the. OWB does this (and much more), by abstracting this process via a
Wizard-driven graphical interface with many point-and-click features.
Through its Design Center and Control Center interfaces, a user can design and deploy the ETL
process (and we will only focus on the loading part, i.e. what it takes to
get the contents of a delimited flat file loaded into a table, no changes to
data along the way). And the deployment is not limited to the server you are on
work at the time. OWB allows you to design a process on a server and
then deploy the steps to another server. Or, to more than one server if desired
desire.

What is the procedure
concept behind this operation? Listing the steps to achieve this helps
to provide a frame.

1.
Identify the source file,
including the location and nature of the data within.

2.
Create an external table that
serves as a data dictionary container.

3.
Identify and, if necessary,
create the “real” table in the database.

4.
Make it all happen on one
on a scheduled basis or all at once.

OWB’s approach to this
process is to use the metadata of these objects and link them through a
cartography and a workflow. The workflow, in fact, can be created as a visual
artifact, i.e. OWB will produce a workflow diagram, which interestingly
enough, that’s exactly what OWB uses behind the scenes: Oracle Workflow.

Think about every piece of this
process as an object: the file, the location of the file, the
table, the table, the mapping from the flat file to the external table, the job
to run this at the end, and so on. Everything is an object, and objects can be
linked together via hierarchies and dependencies. Each type of object lives in a
module. Since this tool is Java-based, the object-oriented design makes sense.
Every object is an instance or an instantiation of a method, generally speaking.

Scenario

A typical ETL scenario might
involve downloading a flat file on a recurring basis (which you will also be
capable of running only the mapping chunk by itself). If you break it down
multi-step process involving different parts of the project tree, the
overall task is easy to understand. As a point of reference, we will start with the
down and up the hierarchy, with one exception. Within a project,
To start, you will need to have a module created under Oracle.

As a tip, remember that
almost all categories involve the same two steps: create a module and import
metadata. Also, the examples are not always based on the same table (something
related to customers and countries were used).

An extended project tree is
indicated below.

The areas of interest of the
the project tree, in the order in which we want to build the ETL process are:

  • Files

  • External tables

  • the tables

  • Mappings

Once the mapping step is complete
completed, you must scroll down to Process Flows and Schedules.

Create a new module under
Files and identify the location of the file.

Complete the Create Module step
and launch the metadata import wizard. This is where you talk to OWB about the
flat file content, which in turn launches the sample flat file wizard.
Don’t forget to add a date mask for date fields. After the file has been
identified and sampled, you are ready to create metadata from an external source
table. Step 1 of the sample flat file wizard is shown below.

When identifying the land
delimiter, you can manually enter something other than what the dropdown shows
(for example, a pipe). Once the definition of the external table is complete, you can
deploy the external table now and create this table in the target
diagram.

In the next step we have to
create a table. If the table needs to be created from scratch, open the Data
Object editor and design it. Make sure the column definitions match those in the
corresponding external table. As a tip, deploy the outer table first, and
then in SQL*Plus, create the table via CTAS from the external table (just the
table definition, no data).

The next step is to map
contents of the outer table into the real table. Create a new mapping and
map the columns, as the image below suggests.

Deploy the mapping and if
successful, it is at this point that you can manually run (Start) the load
data from the external table into the actual table. This is verified by
enter the control center and watch the jobs being executed.

To put this in a
automated workflow, a new process module is what it takes. Create the process
flow module, package and process flow, and launch in process editor. Add
in the mapping and final states, with a result like this:

Create a new program once
the process flowchart is complete. The timetable will be generic, i.e. it
is not related to anything. You have to go back to the mapping and associate the
schedule mapping. Once programmed, loading the step table from a
the flat file is automated.

In conclusion

Although it was a bit
next level overview of what it takes to automate a load, he covered each
and every project item you need to address to make it work. For once
file, and even for the same file or set of files day after day, it can be
easier to script this workflow and schedule it via a cron job. On the
on the other hand, once this process is configured, it can be deployed from a
from development to QA or production environment. And don’t leave the “warehouse” part
of this tool lead you to believe that it is only for data warehouses. If you need
to map, design and schedule an ETL load into another type of database,
Warehouse Builder will also work great there.

»


See all articles by columnist
Steve Callan

Maria H. Underwood