Do you wish for an ETL framework that is highly customizable,
light-weight and suits perfectly with all of your data loading needs? We
too! Let's build one together...
As an example of an ETL framework, let's consider this. "Harry" needs to load 2 tables everyday from one source system to some other target system. For this purpose, Harry has created 2 SQL jobs, each of which reads data from source through some "SELECT" statements and write the data in the target database using some "INSERT" statements. But in order to run these jobs, Harry needs couple of more information - e.g.
Such a setup is called "ETL Framework". And we are trying to build the perfect one here.
What is an ETL framework?
ETL Or "Extraction, Transformation and Loading" is the prevalent technological paradigm for data integration. While ETL in itself is not a tool, ETL processes can be implemented though varied tools and programming methods. This includes, but not limited to, tools like Informatica PowerCentre, DataStage, BusinessObjects Data Services (BODS), SQL Server Integration Services (SSIS), AbInitio etc. and programming methods like PL/SQL (Oracle), T-SQL (Microsoft), UNIX shell scripting etc. Most of these tools and programming methodologies use a generic setup that controls, monitors, executes and Logs the data flow through out the ETL process. This generic 'setup' is often referred as 'ETL framework'As an example of an ETL framework, let's consider this. "Harry" needs to load 2 tables everyday from one source system to some other target system. For this purpose, Harry has created 2 SQL jobs, each of which reads data from source through some "SELECT" statements and write the data in the target database using some "INSERT" statements. But in order to run these jobs, Harry needs couple of more information - e.g.
- when is a good time to execute these jobs? Can he schedule these jobs to run automatically everyday?
- Where is the source system located? (Connection information)
- What will happen if one of the jobs fail while loading the data? Will Harry get an alert message? Can he simply rerun the jobs after fixing the issue of the failure?
- How will Harry know if at all any data is retrieved or loaded to the target?
Such a setup is called "ETL Framework". And we are trying to build the perfect one here.
Critical Features of an ETL framework
In a very broad sense, here are a few of the features that we feel critical in any ETL framework- Support for Change Data Capture Or Delta Loading Or Incremental Loading
- Metadata logging
- Handling of multiple source formats
- Restartability support
- Notification support
- Highly configurable / customizable
Good-to-have features of ETL Framework
These are some good-to-have features for the framework- Inbuilt data reconciliation
- Customizable log format
- Dry-load enabling
- Multiple notification formats
0 comments:
Post a Comment