Saturday, June 15, 2013

Informatica Tutorial

0 comments
 Informatica is a widely used ETL tool for extracting the source data and loading it into the target after applying the required transformation. In the following section, we will try to explain the usage of Informatica in the Data Warehouse environment with an example. Here we are not going into the details of data warehouse design and this tutorial simply provides the overview about how INFORMATICA can be used as an ETL tool.

Note: The exchanges/companies that are explained here is for illustrative purpose only.

Bombay Stock Exchange (BSE) and National Stock Exchange (NSE) are two major stock exchanges in India in which the shares of ABC Corporation and XYZ Private Limited are traded between Mondays through Friday except Holidays.  Assume that a software company “KLXY Limited” has taken the project to integrate the data between two exchanges BSE and NSE.

In order to complete this task of integrating the Raw data received  from NSE & BSE, KLXY Limited allots responsibilities to Data  Modelers, DBAs and ETL Developers. During this entire ETL process,  many IT professionals may involve, but we are highlighting the  roles of these three personals only for easy understanding and  better clarity.
  • Data Modelers analyze the data from these two sources(Record Layout 1 & Record Layout 2), design Data Models, and then generate scripts to create necessary tables and the corresponding records.
  • DBAs create the databases and tables based on the scripts generated by the data modelers.
  • ETL developers map the extracted data from source systems and load it to target systems after applying the required transformations.
newer post

Informatica Upgrade Process:

0 comments
Informatica Upgrade Process:
Stages across upgrade can be categorized as below:
  1. Upgrading the domain and server file: run the Informatica server installer and select the upgrade option. The domain upgrade wizard installs the server files and configures the domain. If the domain has multiple nodes, you must upgrade on all the nodes.
The following table describes the actions that the installer performs when you upgrade Informatica:

Tasks
Description
1. Installs Informatica.
Installs Informatica directories and files into the new Directory.
2. Copies infa_shared directory.
Copies the contents of the infa_shared directory from the existing installation directory into the new installation Directory.
3. Copies mm_files directory.
Copies the contents of the mm_files directory from the default location in the existing installation directory into the New installation directory.
4. Upgrades the domain.
Upgrades the domain to run version 9.0.1 application Services.
The upgrade retains the user and administrator accounts in The domain.
5. Starts Informatica Services.
Starts Informatica Services on the node.
2.Upgrading the application services: After you upgrade the domain and server files, log in to the Administrator Tool and upgrade the application services. The service upgrade wizard provides a list of all application that must be upgraded. It upgrades the services based on the order required by the dependent objects.
3.Upgrading the Informatica client: To upgrade the Informatica client, run the Informatica client installer and Select the upgrade option
Pre-Upgrade Tasks
Before you upgrade the domain and server files, complete the following tasks:
1. Review the prerequisites.
2. Verify the file descriptor settings.
3. Verify the configuration of the environment variables used by the installer.
4. Clear the configuration of environment variables that pertain to previous installations of Informatica.
5. Prepare the domain.
6. Prepare the Power Center repository.
7. Prepare the Power Center Profiling warehouse.
8. Prepare for upgrade from Power Center 8.6.1
  • Export Reference Table Manager Data.
  • Prepare Metadata Manager.
  • Prepare the Data Analyzer repository.
9. Shut down the domain.
Upgrading the Domain and Server in Graphical Mode:
You can upgrade the Informatica domain and server files in graphical mode on Windows or UNIX.
1. Verify that your environment meets the minimum system requirements and complete the pre-upgrade tasks.
2. Log in to the machine with the same user account that you used to install the previous version.
3. Close all other applications.
4. To begin the upgrade on Windows, run install.bat from the root directory.
To begin the upgrade on UNIX, use a shell command line to run install.sh from the root directory, and then Select the option for graphical mode installation.
5. In the Installation Type window, select Upgrade to Informatica 9.0.1 and click next.
  • The Upgrade Pre-Requisites window displays the upgrade system requirements. Verify that all requirements are met before you continue the upgrade.
6. Click Next.
7. In the Upgrade Directory window, enter the following directories
Directory
Description
1.Directory of the Informatica
Product to upgrade.
Directory that contains the previous version of Power Center that you want to upgrade.
2.Directory for Informatica 9.0.1
Directory in which to install Informatica 9.0.1.
Enter the absolute path for the installation directory. The directory cannot be the same as the directory that contains the previous version of Power Center. The directory names in the path must not contain spaces or the following special characters: @|* $ # ! % ( ) { } [ ] , ; '
On Windows, the installation directory must be on the current machine.
Click Next.
The upgrade wizard displays a warning to shut down the Informatica domain before you continue the upgrade.
9. Click OK.
10. In the Pre-Installation Summary window, review the upgrade information, and click Install to continue.
The upgrade wizard installs the Informatica server files to the Informatica 9.0.1 installation directory.
11. In the Domain Configuration Upgrade window, the upgrade wizard displays the database and user account information for the domain configuration repository to be upgraded.
Property
Description
Database type
Database for the domain configuration repository.
Database user ID
Database user account for the domain configuration repository.
User password
Password for the database user account.
Tablespace
Displayed for IBM DB2 only. Name of the tablespace for the upgraded domain configuration repository tables.
If the database of the domain configuration repository that you are upgrading does not use a 32 K tablespace, this property is blank. Enter the name of a tablespace with a page size of 32 K. In a single-partition database, if you do not specify a tablespace
name, the installer writes the upgraded tables in the default tablespace. The default tablespace must be 32 K. In a multi-partition database, you must specify a 32 K tablespace.
The upgrade wizard displays the database connection string for the domain configuration repository based on how the connection string of the previous version was created at installation:
  • If the previous version used a JDBC URL at installation, the upgrade wizard displays the JDBC connection properties, including the database address and service name.
  • If the previous version used a custom JDBC connection string at installation, the upgrade wizard displays the custom connection string.
  • Optionally, you can specify additional JDBC parameters to include in the connection string. To provide Additional JDBC parameters, select JDBC parameters and enter a valid JDBC parameter string.
12. Click Test Connection to verify that you can connect to the database, and then click OK to continue.
13. Click Next.
On the Port Configuration Upgrade window, the upgrade wizard displays the default port numbers assigned to the domain and node components.
14. You can specify new port numbers or use the default port numbers.
The following table describes the ports that you can specify:
Port
Description
Service Manager port
Port number used by the Service Manager in the node. Client applications and the Informatica command line programs use this port to communicate to the services in the domain.
Informatica Administrator port
Port number used by the Administrator tool.
Available if you upgrade a gateway node.
Informatica Administrator
shutdown port
Port number used by the Administrator tool to listen for shut down commands.
Available if you upgrade a gateway node.
15. Click Next.
On Windows, the upgrade wizard creates a service to start Informatica. By default, the service runs under the same user account as the account used for installation. You can run the Windows service under a different User account.
16. Select whether to run the Windows service under a different user account.
The following table describes the properties that you set:
Property
Description
Run Informatica under a
different user account
Indicates whether to run the Windows service under a different user account.
User name
User account with which to run the Informatica Windows service.
Use the following format: DomainName\UserAccount
This user account must have the Act as operating system permission.
Password
Password for the user account with which to run the Informatica Windows service.
17. Click Next.
The Post-Upgrade Summary window indicates whether the upgrade completed successfully.
18. Click Done.
Upgrade the Application Services:
1. Configure Informatica Environment Variables
You can configure the INFA_JAVA_OPTS, INFA_DOMAINS_FILE, and INFA_HOME environment variables to store memory, domain, and location settings.
(i) INFA_JAVA_OPTS
For example, to configure 1 GB of system memory for the Informatica daemon on UNIX in a C shells.
setenv INFA_JAVA_OPTS “-Xmx1024m”
(ii) INFA_DOMAINS_FILE
Set the value of the INFA_DOMAINS_FILE variable to the path and file name of the domains.infa file. If you configure the INFA_DOMAINS_FILE variable, you can run infacmd and pmcmd from a directory other than /server/bin.
Configure the INFA_DOMAINS_FILE variable on the machine where you install the Informatica services. On Windows, configureINFA_DOMAINS_FILE as a system variable.
(iii) INFA_HOME
Use a softlink in UNIX for any of the Informatica directories. To configure INFA_HOME so that any Informatica application or service can locate the other Informatica components it needs to run, set INFA_HOME to the location of the Informatica installation directory.
2. Configure Locale Environment Variables
Use the following command to verify that the value for the locale environment variable is compatible with the Language settings for the machine and the type of code page you want to use for the repository:
Eg : locale –a
Locale for Oracle Database Clients
if the value is american_america.UTF8, set the variable in a C shell with the following command:
setenv NLS_LANG american_america.UTF8
Service Upgrade:
Use the service upgrade wizard to upgrade services.
1. In the Informatica Administrator header area click Manage > Upgrade.
2. Select the objects to upgrade.
3. Click Next.
4. If dependency errors exist, the Dependency Errors dialog box appears. Review the dependency errors and click OK. Then, resolve dependency errors and click next.
5. Enter the repository login information. Optionally, choose to use the same login information for all Repositories.
6. Click Next.
The service upgrade wizard upgrades each service and displays the status and processing details.
7. When the upgrade completes, the Summary section displays the list of services and their upgrade status.
Click each service to view the upgrade details in the Service Details section.
8. Optionally, click Save Report to save the upgrade details to a file.
If you choose not to save the report, you can click Save Previous Report the next time you launch the Service upgrade wizard.
9. Click Close.
10. Restart upgraded services.
After you upgrade the PowerCenter Repository Service, you must restart the service and its dependent Services.
Informatica Client Upgrade
1. Close all applications.
2. Run install.bat from the root directory.
The Upgrade Pre-Requisites window displays the system requirements. Verify that all installation Requirements are met before you continue the installation.
3. Click Next.
On the Select Component window, select the Informatica client you want to upgrade.
You can upgrade the following Informatica client applications:
  • Informatica Developer
  • PowerCenter Client
If both Informatica Developer and PowerCenter Client are installed on the machine, you can upgrade the tools in the same process.
4. On the Upgrade Directory window, enter the following directories:
Directory
Description
Directory of the Informatica
client to upgrade
Directory that contains the previous version of the Informatica client tool that you want to upgrade
Directory for Informatica 9.0.1
client tools
Directory in which to install the Informatica 9.0.1 client tools
Enter the absolute path for the installation directory. The installation directory must be on the current machine. The directory names in the path must not contain spaces or the following special characters: @|* $ # ! % ( ) { } [ ] , ; '
5. Click Next.
6. On the Pre-Installation Summary window, review the installation information, and click Install.
The installer copies the Informatica client files to the installation directory.
The Post-installation Summary window indicates whether the upgrade completed successfully.
7. Click Done.
Post Upgrade Tasks:
Informatica Domain
  • Configure LDAP Connectivity.
  • Update the Log Events Directory.
  • Update ODBC Data Sources.
  • Update Statistics for the Domain Configuration Repository.
  • View Log Events from the Previous Informatica Version.
Metadata Manager Service
  • Reload Metadata Manager Resources
  • Update the Metadata Manager Properties File
  • Reference Table Manager
For detailed study on version Upgrade, you may go through Documents from informatica corporation on upgrade.
newer post

Wednesday, May 15, 2013

Design Tip #155 Going Agile? Start with the Bus Matrix

0 comments
Many organizations are embracing agile development techniques for their DW/BI implementations. While we strongly concur with agile’s focus on business collaboration to deliver value via incremental initiatives, we’ve also witnessed agile’s “dark side.” Some teams get myopically focused on a narrowly-defined set of business requirements. They extract a limited amount of source data to develop a point solution in a vacuum. The resultant standalone solution can’t be leveraged by other groups and/or integrated with other analytics. The agile deliverable may have been built quickly, so it’s deemed a success. But when organizations lift their heads several years down the agile road, they often discover a non-architected hodgepodge of stovepipe data marts. The agile approach promises to reduce cost (and risk), but some organizations end up spending more on redundant, isolated efforts, coupled with the ongoing cost of fragmented decision-making based on inconsistent data.

It’s no surprise that a common criticism of the agile approaches for DW/BI development is the lack of planning and architecture, coupled with ongoing governance challenges. We believe the enterprise data warehouse bus matrix (described in our article “The Matrix: Revisited”) is a powerful tool to address these shortcomings. The bus matrix provides a master plan for agile development, plus it identifies the reusable common descriptive dimensions that provide both data consistency and reduced time-to-market delivery in the long run.

With the right mix of business and IT stakeholders in a room, along with a skilled facilitator, the bus matrix can be produced in relatively short order (measured in days, not weeks). Drafting the bus matrix depends on a solid understanding of the business’s needs. Collaboration is critical to identifying the business’s core processes. It’s a matter of getting the team members to visualize the key measurement events needed for analyses. Involving business representatives and subject matter experts will ensure the team isn’t paralyzed by this task. You’ll likely discover that multiple business areas or departments are interested the same fundamental business processes. As the business is brainstorming the list of measurement events, IT representatives are bringing a dose of reality about the available operational source data and any known limitations.

Once the matrix has been drafted, the team can then adopt agile development techniques to bring it to life. Business and IT management need to identify the single business process matrix row that’s both a high priority for the business, and highly feasible from a technical perspective. Focusing on just one matrix row minimizes the risk of signing up for an overly ambitious implementation. Most implementation risk comes from biting off too much ETL system design and development; focusing on a single business process, typically captured by a single operational source system, reduces this risk. Incremental development can produce the descriptive dimensions associated with the selected matrix row until sufficient functionality is available and then the dimensional model is released to the business community, as we describe in Design Tip #135: Conformed Dimensions as the Foundation for Agile Date Warehousing.
newer post

White Paper: Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics

0 comments
The enterprise data warehouse (EDW) community has entered a new realm of meeting new and growing business requirements in the era of big data. Common challenges include:

  1.     extreme integration
  2.     semi- and un-structured data sources
  3.     petabytes of behavioral and image data accessed through MapReduce/Hadoop
  4.     massively parallel relational database
  5.     structural considerations for the EDW to support predictive and other advanced analytics.

 These pressing needs raise more than a few urgent questions, such as:
  1.     How do you handle the explosion and diversity of data sources from conventional and non-conventional sources?
  2.     What new and existing technologies are needed to deepen the understanding of business through big data analytics?
  3.     What technological requirements are needed to deploy big data projects?
  4.     What potential organizational and cultural impacts should be considered?

This white paper provides detailed guidance for designing and administering the necessary deployment processes to meet these requirements. Ralph Kimball fills the hole where there is a lack of specific guidance in the industry as to how the EDW needs to respond to the big data analytics challenge, and what design elements are needed to support these new requirements.
newer post

Sunday, October 7, 2012

Writing a Proper Data Warehousing Request for Proposal!

1 comments
1. General Information for Vendors
§         Lists the Project Vision
§         Project timelines
§         Nature of contract
§         Scope
 Reasoning: Give an overview of project and format for the RFP process.
2. System Background
Description of the Technical Environment
a)      Description on the Client and Server specifications at the Ministry.
b)      Breakdown of  user types for the data warehouse:
User Types
(Examples Only)
Description
Number of Users
Novice User
Uses “canned” reports and queries.
x
Power User
Frequent user, edits reports, queries directly and builds cubes for multi-dimensional analysis.
x
c)      Software tools for the data warehouse:
·         ETL Tools
·         Report and Query Tools
·         Data Analysis
·         Data Visualization Tools
 Reasoning: Give an overview of the expected installation. Also specify points of particular
 interest.
3. System Requirements
  • List of General Requirements
→ Give an overview of the data warehouse environment.
 Reasoning: An overview of the rational for the data warehouse implementation.
  • Mandatory Requirements
a)      List if the software is Client/Server based or web-based. Described what functions are available via the Internet browser. Describe what plug-ins        (if any) are necessary for the client machines.
b)      State that the vendor must communicate with the Ministry’s database and applications. Ask the vendor how they propose to do this.
c)      The vendor must be committed to its products remaining compatible with future versions of Oracle. Ask the vendor to offer documents as evidence of commitment to this environment.
d)      The vendor’s system must enable the overall optimization of the Ministry’s environment:
I)        Ask for how the vendor’s system minimizes user response time while maximizing the use of system resources.
ii)       Ask the vendor to explain the system’s architecture for data selection, processing, formatting.
iii)     Ask for evidence that network traffic is minimized and how the need to process large data sets on the client is also minimized.
 Reasoning: Based on user interviews and technical review, these are the must  
 have features for the data warehouse installation.
  • Strongly Desired Features
Examples:
→ The vendor’s system must allow users to schedule reports.
→ Does your system permit users to schedule at a certain time
and date? Daily? Weekly? Based on an event?
Describe how scheduling is accomplished and the options for scheduling reports.
 Reasoning: Based on the requirements analysis, the items listed here are
 additional features that will be good to have but not mandatory.
  • Desired Features
Examples:
→ Does your system possess a redundant query control?
This feature returns a saved version of a report whenever an identical report has been executed earlier – the report, output format, and raw data all remain constant in the interim. Describe how this works in your system.
   What are the limits in terms of how far back reports are archived?
 Reasoning: Following the rational of “Strongly Desired Features” above, this
 section lists a lower desired set of requirements.
§         Product Profile
o         Product name
o         Product description
o         Current release level
o         Date current release level was available
o         Projected date for the next release candidate
o         Current products install base
o         Industry verticals of installation
o         Number of installations
o         Breakdown of installations/users
 Reasoning: Gauge vendor’s product specifications and maturity.
§         Consulting
o         Describe the level of involvement the consultants will have at each stage of the project.
o         What is the experience of the consultants?
o         Location of Consultants.
o         Please forward resumes of consultants.
 Reasoning: Evaluate experience and competency of the vendor’s team.
 Also enquire on the consultants’ geographic location, this will help evaluate
 possible response rate.
§         Business Partners
o               Ask the vendor to state which products are fully integrated to your solution.
o               Describe the interconnectivity and interface requirements.
Reasoning: Enquire what other products can integrate well with the solution, e.g. Crystal Reports, Cognos. Also, the vendor will state the ease at which integration can occur.
§         Full feature of the business intelligence solution and products
o               Ask what makes the product stand out.
Reasoning: This will give the vendor a chance to distinguish their products.
 Further business intelligence capabilities such as OLAP, Data mining can be evaluated.
§         Application areas
o               Describe the areas the solution supports i.e. budgeting, financial planning etc.
Reasoning: Evaluate which areas the data warehousing solution can serve.
The “Mandatory Features” section above will inform the vendor the area which are necessary for the Ministry. This section gives further information on the vendors solution and possible areas which the Ministry can use in the future.
§         Web-based and standalone clients
o               Query if the solution supports both web-based and standalone clients. Ask which functionality is not supported in both versions (i.e. what works in a standalone client and not web-based and vice versa)
Reasoning: The architecture is of a fundamental importance. This section will enquire if the vendor’s solution will match the Ministry requirements.
§         Pricing and Licensing Model
o               How much does the product cost, including initial training, support and consulting?
o                
Reasoning: Gauge the cost of the solution and other indirect costs.
§         System Installation and Tool Administration
Examples:
o               Please describe the platform and system requirements of your product.
o               For web-based clients, are there specific requirements on the type and version of browser that may be used (e. g. Netscape vs. Internet Explorer)?
o               Please describe the overall architecture of your product (e. g. Client/ Server, two-tier/ three-tier architecture, etc.), including how your product functions in a geographically distributed environment.
o               Please describe the steps involved in installing your product.
o               Please list the administrative functions that are included in your product (e. g. defining roles, backup/ recovery, etc.).
o               How do tool administrators detect when your product requires additional hardware or requires human intervention (e. g. to restart a process, etc.).
o               How often are major/ minor software upgrades available? How would we be notified of a new software release?
o               Please describe how standalone (I. e. non-web-based) clients are distributed to end-user desktops. How are software upgrades distributed to end-users?
o               Are there ways to make the upgrade process transparent to the end-users?
Reasoning: The above questions and statements will give further information on the vendor’s solution regarding the installation and administration of your product.
§         Security
o              How are end-users and administrators authenticated to your product?
o              How do end-users and administrators gain access to reports, data sets, etc. (e. g. via roles)?
o              Please fully describe how reports and data are protected from unauthorized users.  
o              Please describe how row-level access to the data is attained.
o              What level of encryption is used on passwords, data, reports, etc.?
o              How does the existence of a firewall change the security architecture?  
o              How does the web-based client differ in terms of security from the standalone client?
Reasoning: Questions on security must also pass the security criteria set by regulators or industry standards .
§         Performance and Scalability
o              Does the product supports a substantial increase in data size, frequency and complexity of end-user queries?
o              How is performance measured within your product?
Reasoning: Evaluation of how the product can scale to future expectations.

newer post

Replicating Transactions Between Microsoft SQL Server and Oracle Database Using Oracle GoldenGate

0 comments
Most Oracle technology professionals who are interested in data replication are familiar with Oracle Streams. Until 2009, Streams was the recommended and most popular Oracle technology for data distribution.
In July 2009, Oracle acquired GoldenGate, a provider of database replication software. The company is now encouraging its customers to use Oracle GoldenGate (which is part of the Oracle Fusion Middleware family) for their data replication needs in new applications. Oracle's statement of direction regarding Oracle Streams says that product “will continue to be supported, but will not be actively enhanced.”
In this article we will build a simple transaction replication example using Oracle GoldenGate, in order to get acquainted with this new technology.

Oracle GoldenGate Architecture

GoldenGate v11 enables transaction level replication among heterogeneous platforms. It supports Oracle Database, IBM DB2, Microsoft SQL Server, MySQL, Teradata, and many other platforms. (It also supports access through a generic ODBC driver.)
The most important components that we need to be familiar with are the Extract and Replicat processes. The Extract process runs at the source system and captures the data changes. The Replicat is running at the target machine and is responsible for applying the changes to the target database.
oracle-sqlserver-goldengate-f1
There are two common configurations for the Extract process. The so called “initial load” is used for populating the target database with an exact copy of the source data (i.e. Extract is fetching all data from the source database and typically runs only once). Then the “change synchronization” can take place. In “change synchronization” configuration the Extract is constantly monitoring the source database and captures all changes on the fly.
In this demonstration we will setup a Microsoft SQL Server 2008 as a source database, configure and perform an initial load and then start an Extract process in a change synchronization mode. In order to show that this replication is truly heterogeneous, we will run SQL Server on Windows XP and Oracle Database 11g Release 2 on Oracle Linux 5. As a prerequisite I will assume that you already have a clean installation of SQL Server 2008 on the Windows box and Oracle Database on the Linux machine.
We will start building the demonstration scenario by installing GoldenGate. Let's start with the Windows box.

GoldenGate for SQL Server Installation on Windows XP

First you need a copy of Oracle GoldenGate v11 for SQL Server. You can download it from http://edelivery.oracle.com (Oracle Fusion Middleware → Microsoft Windows x32 → Oracle GoldenGate for Non Oracle Database v11). The serial number of the media pack that you need is V22241-01.

Extract the downloaded archive in a location where you want to have the Oracle GoldenGate installation (in this example – C:\GG). Then open a command prompt, go to the directory, and launch GGSCI (the GoldenGate command interface):
C:\GG>ggsci

Oracle GoldenGate Command Interpreter for ODBC
Version 11.1.1.0.0 Build 078
Windows (optimized), Microsoft SQL Server on Jul 28 2010 18:55:52

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.

GGSCI (MSSQL) 1>
Next execute the command CREATE SUBDIRS to create the Oracle GoldenGate working directories.
GGSCI (MSSQL) 1> CREATE SUBDIRS

Creating subdirectories under current directory C:\GG

Parameter files                C:\GG\dirprm: created
Report files                   C:\GG\dirrpt: created
Checkpoint files               C:\GG\dirchk: created
Process status files           C:\GG\dirpcs: created
SQL script files               C:\GG\dirsql: created
Database definitions files     C:\GG\dirdef: created
Extract data files             C:\GG\dirdat: created
Temporary files                C:\GG\dirtmp: created
Veridata files                 C:\GG\dirver: created
Veridata Lock files            C:\GG\dirver\lock: created
Veridata Out-Of-Sync files     C:\GG\dirver\oos: created
Veridata Out-Of-Sync XML files C:\GG\dirver\oosxml: created
Veridata Parameter files       C:\GG\dirver\params: created
Veridata Report files          C:\GG\dirver\report: created
Veridata Status files          C:\GG\dirver\status: created
Veridata Trace files           C:\GG\dirver\trace: created
Stdout files                   C:\GG\dirout: created


GGSCI (MSSQL) 2> EXIT

C:\GG>
According to the official documentation GGSCI supports up to 300 concurrent Extract and Replicat processes per Oracle GoldenGate instance. There is however a single process that is responsible for controlling the other processes; it's called the Manager process. Although you can run this process manually it is a good practice to install it as service - otherwise it will stop when the user that started it logs off.
To add the Manager process as a Windows service execute the INSTALL ADDSERVICE command within the GoldenGate installation directory.
C:\GG>INSTALL ADDSERVICE

Service 'GGSMGR' created.

Install program terminated normally.

C:\GG>
This pretty much completes the Windows installation. Let's move on to the Linux machine.

GoldenGate for Oracle Installation on Oracle Linux 5

Installing Oracle GoldenGate on Linux is not much different than the installation that you did on top of Windows XP. You will need to download the media pack of GoldenGate for Oracle on Linux (V22228-01). You create an installation directory and unzip the archive there. In this example, I use the /u01/app/oracle/gg directory, as our ORACLE_BASE is pointing to /u01/app/oracle. After this is done you have to set the PATH and LD_LIBRARY_PATH environment variables like this:
[oracle@oradb ~]$ export PATH=$PATH:$ORACLE_BASE/gg
[oracle@oradb ~]$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_BASE/gg
Let's start GGSCI and execute CREATE SUBDIRS.
[oracle@oradb ggs]$ cd $ORACLE_BASE/gg
[oracle@oradb gg]$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Linux, x86, 32bit (optimized), Oracle 11 on Jul 28 2010 13:22:25

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.


GGSCI (oradb) 1> CREATE SUBDIRS

Creating subdirectories under current directory /u01/app/oracle/gg

Parameter files                /u01/app/oracle/gg/dirprm: created
Report files                   /u01/app/oracle/gg/dirrpt: created
Checkpoint files               /u01/app/oracle/gg/dirchk: created
Process status files           /u01/app/oracle/gg/dirpcs: created
SQL script files               /u01/app/oracle/gg/dirsql: created
Database definitions files     /u01/app/oracle/gg/dirdef: created
Extract data files             /u01/app/oracle/gg/dirdat: created
Temporary files                /u01/app/oracle/gg/dirtmp: created
Veridata files                 /u01/app/oracle/gg/dirver: created
Veridata Lock files            /u01/app/oracle/gg/dirver/lock: created
Veridata Out-Of-Sync files     /u01/app/oracle/gg/dirver/oos: created
Veridata Out-Of-Sync XML files /u01/app/oracle/gg/dirver/oosxml: created
Veridata Parameter files       /u01/app/oracle/gg/dirver/params: created
Veridata Report files          /u01/app/oracle/gg/dirver/report: created
Veridata Status files          /u01/app/oracle/gg/dirver/status: created
Veridata Trace files           /u01/app/oracle/gg/dirver/trace: created
Stdout files                   /u01/app/oracle/gg/dirout: created


GGSCI (oradb) 2> EXIT
[oracle@oradb gg]$
Installation on the Linux machine is now completed.

Preparing the Source Database

Next step is to create a new database in SQL Server and populate it with some sample data. The name of the database will be EMP. You can create it by launching SQL Server Management Studio, right-clicking on Databases, and selecting New Database.

oracle-sqlserver-goldengate-f3
 
Type EMP in the database name field and click OK, leaving all other options by default.
Let's add a new database schema (HRSCHEMA), a table (EMP) and a few test records in the newly created database. This will be accomplished by running the following SQL:
set ansi_nulls on
go

set quoted_identifier on
go

create schema hrschema
go

create table [hrschema].[emp] (
     [id] [smallint] not null,
     [first_name] varchar(50) not null,
     [last_name] varchar(50) not null,
constraint [emp_pk] primary key clustered (
     [id] asc
) with (pad_index = off, statistics_norecompute=off, ignore_dup_key=off, allow_row_locks=on, allow_page_locks=on) on [primary]
) on [primary]

go

-- TEST DATA
INSERT INTO [hrschema].[emp] ([id], [first_name], [last_name]) VALUES (1,'Dave','Mustaine')
INSERT INTO [hrschema].[emp] ([id], [first_name], [last_name]) VALUES (2,'Chris','Broderick')
INSERT INTO [hrschema].[emp] ([id], [first_name], [last_name]) VALUES (3,'David','Ellefson')
INSERT INTO [hrschema].[emp] ([id], [first_name], [last_name]) VALUES (4,'Shawn','Drover')
GO
First create a new query (by right-clicking on the database name and selecting New Query). Then paste-in the SQL text above and hit F5 to execute it.
oracle-sqlserver-goldengate-f4
Now, in order for Oracle GoldenGate to be able to access the EMP database, you have to create an ODBC data source for it. Let's go to Control Panel -> Administrative Tools -> Data Sources (ODBC) and add a new System DSN. Select SQL Server as the database driver and name the data source HR. You point the source to the local SQL Server (MSSQL) and fill in the login credentials. The data source summary should be similar to this:
oracle-sqlserver-goldengate-f5
Now it's time to enable Oracle GoldenGate to acquire the transaction information for the EMP table from the transaction logs. Again you will be using GGSCI:
C:\GG>ggsci.exe

Oracle GoldenGate Command Interpreter for ODBC
Version 11.1.1.0.0 Build 078
Windows (optimized), Microsoft SQL Server on Jul 28 2010 18:55:52

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.


GGSCI (MSSQL) 1> DBLOGIN SOURCEDB HR
Successfully logged into database.

GGSCI (MSSQL) 2> ADD TRANDATA HRSCHEMA.EMP

Logging of supplemental log data is enabled for table hrschema.emp

GGSCI (MSSQL) 3>
Because the data types in Oracle and SQL Server are different you have to establish a data type conversion. GoldenGate provides a dedicated tool called DEFGEN that generates data definitions and is referenced by Oracle GoldenGate processes when source and target tables have dissimilar definitions. Before running DEFGEN you have to create a parameter file for it, specifying which tables should the tool inspect and where to place the type definitions file after the tables are inspected. You can create such a parameter file using the EDIT PARAMS command within GGSCI.
GGSCI (MSSQL) 3> EDIT PARAMS DEFGEN

GGSCI (MSSQL) 4>
This creates an empty parameter file named DEFGEN.PRM and located in the DIRPRM folder of your GoldenGate installation. Put the following contents inside the file:
defsfile c:\gg\dirdef\emp.def
sourcedb hr
table hrschema.emp;
The parameters are pretty self explanatory.  We want DEFGEN to inspect the EMP table inside the HRSCHEMA and to place a definitions file named EMP.DEF in the DIRDEF sub-directory. Let's invoke DEFGEN and examine its output.
C:\GG>defgen paramfile c:\gg\dirprm\defgen.prm

***********************************************************************
         Oracle GoldenGate Table Definition Generator for ODBC
                     Version 11.1.1.0.0 Build 078
   Windows (optimized), Microsoft SQL Server on Jul 28 2010 19:16:56

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2011-04-08 14:41:06
***********************************************************************

Operating System Version:
Microsoft Windows XP Professional, on x86
Version 5.1 (Build 2600: Service Pack 3)

Process id: 2948

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
defsfile c:\gg\dirdef\emp.def
sourcedb hr
table hrschema.emp;
Retrieving definition for HRSCHEMA.EMP

Definitions generated for 1 tables in c:\gg\dirdef\emp.def

C:\GG>
If you bother to check the contents of EMP.DEF it will be something similar to this:
*
* Definitions created/modified  2011-07-07 10:27
*
*  Field descriptions for each column entry:
*
*     1    Name
*     2    Data Type
*     3    External Length
*     4    Fetch Offset
*     5    Scale
*     6    Level
*     7    Null
*     8    Bump if Odd
*     9    Internal Length
*    10    Binary Length
*    11    Table Length
*    12    Most Significant DT
*    13    Least Significant DT
*    14    High Precision
*    15    Low Precision
*    16    Elementary Item
*    17    Occurs
*    18    Key Column
*    19    Sub Data Type
*
*
Definition for table HRSCHEMA.EMP
Record length: 121
Syskey: 0
Columns: 3
id          134     23        0  0  0 1 0      8      8      8 0 0 0 0 1    0 1 0
first_name   64     50       11  0  0 1 0     50     50      0 0 0 0 0 1    0 0 0
last_name    64     50       66  0  0 1 0     50     50      0 0 0 0 0 1    0 0 0
End of definition
It basically lists all tables/columns and describes the native database types using a more general definitions.
Now you have to copy the EMP.DEF file to the target machine as it should be available to the Replicat process. The Replicat will have to do another conversion. It will map the more general types back to database specific types (but this time the types will correspond to the ones used by the target database). For copying the file you can use FTP/SFTP or SCP transfer. (Personally I am using a free FTP/SFTP/SCP client called WinSCP to copy EMP.DEF from the Windows box to the /u01/app/oracle/gg/dirdef folder on the Linux machine.)
 oracle-sqlserver-goldengate-f6

Preparing the Target Database

After the source preparations are finalized it's time to move to the target machine. Let's create a schema (GG_USER) and a table where the Replicat process can apply the transactions coming from the source.
[oracle@oradb ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.1.0 Production on Fri Apr 8 14:11:49 2011

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> create user gg_user identified by welcome1;

User created.

SQL> grant connect, resource,select any dictionary to gg_user;

Grant succeeded.

SQL>
The EMP table should reside in GG_USER's schema:
SQL> create table gg_user.emp (id number not null, first_name varchar2(50), last_name varchar2(50));

Table created.

SQL>
You have to keep in mind that should the Replicat process apply data to tables residing in different schemas, GG_USER will need additional privileges (like SELECT ANY TABLE, LOCK ANY TABLE etc.). A detailed list of the required privileges is listed in the official documentation.
Setting Up the Extract & Replicat for Initial Data Load
Let's start by setting up the Extract process on the source machine. Name the process INEXT (for INitial EXTract). Next create a parameters file in the same manner as the parameter file that you created for the DEFGEN utility. The filename will be INEXT.PRM.
C:\GG>ggsci.exe

Oracle GoldenGate Command Interpreter for ODBC
Version 11.1.1.0.0 Build 078
Windows (optimized), Microsoft SQL Server on Jul 28 2010 18:55:52

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.

GGSCI (MSSQL) 1> EDIT PARAMS INEXT
Paste the following contents to INEXT.PRM:
SOURCEISTABLE
SOURCEDB HR
RMTHOST ORADB, MGRPORT 7809
RMTFILE /u01/app/oracle/gg/dirdat/ex
TABLE hrschema.emp;
The SOURCEISTABLE parameter instructs the Extract process to get the data directly from the table instead of the transaction logs. This is the behavior that we want in order to do a full extraction. SOURCEDB points to the database that contains the data. RMTHOST and MGRPORT specify the remote machine and Manager's port. RMTFILE specifies the file to which the extracted data will be written.
That's all the configuration you need for the initial data extraction. Let's move to the Linux machine and configure the initial data loading.
You have to deal with the Manager process first: Start GGSCI and create a parameter file called MGR.PRM.
[oracle@oradb gg]$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Linux, x86, 32bit (optimized), Oracle 11 on Jul 28 2010 13:22:25

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.


GGSCI (oradb) 1> EDIT PARAM MGR
There is only one line that you have to put in MGR.PRM:
PORT 7809
After saving the file execute the START MANAGER command within GGSCI and see if the manager starts correctly.
GGSCI (oradb) 2> START MANAGER

Manager started.


GGSCI (oradb) 3>
Next you have to set the parameters for the Replicat process. So create a new parameters file and name it INLOAD (for INitial LOADing).
GGSCI (oradb) 3> EDIT PARAMS INLOAD
Put the following contents inside INLOAD.PRM:
SPECIALRUN
END RUNTIME
USERID gg_user, PASSWORD welcome1
EXTFILE /u01/app/oracle/gg/dirdat/ex
SOURCEDEFS /u01/app/oracle/gg/dirdef/emp.def
MAP hrschema.emp, TARGET gg_user.emp;
The SPECIALRUN parameter defines an initial-loading process (it is a one-time loading that doesn't use checkpoints). The next line of the file instructs the Replicat process to terminate after the loading is finished.

Next you provide the database user and password, the extract file, and the table definition. The final parameter, MAP, instructs the Replicat to remap the table HRSCHEMA.EMP to GG_USER.EMP.

Running the Initial Extract and Loading

The databases and processes are finally configured. Now you can start the initial loading and see the data replication in action.
First you have to run the Extract process; it will fetch all data residing at the SQL Server's EMP table and write it to the RMTFILE (/u01/app/oracle/gg/dirdat/ex) at the Linux host.
Start the Extract by running the EXTRACT command and providing parameters and log file as command line arguments.
C:\GG>extract paramfile dirprm\inext.prm reportfile dirrpt\inext.rpt

***********************************************************************
                  Oracle GoldenGate Capture for ODBC
                     Version 11.1.1.0.0 Build 078
   Windows (optimized), Microsoft SQL Server on Jul 28 2010 19:22:00

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2011-04-08 15:57:48
***********************************************************************

Operating System Version:
Microsoft Windows XP Professional, on x86
Version 5.1 (Build 2600: Service Pack 3)

Process id: 556

Description:

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************

2011-04-08 15:57:48  INFO    OGG-01017  Wildcard resolution set to IMMEDIATE bec
ause SOURCEISTABLE is used.

Using the following key columns for source table HRSCHEMA.EMP: id.

CACHEMGR virtual memory values (may have been adjusted)
CACHEBUFFERSIZE:                         64K
CACHESIZE:                                1G
CACHEBUFFERSIZE (soft max):               4M
CACHEPAGEOUTSIZE (normal):                4M
PROCESS VM AVAIL FROM OS (min):        1.85G
CACHESIZEMAX (strict force to disk):   1.62G

Database Version:
Microsoft SQL Server
Version 10.00.1600
ODBC Version 03.52.0000

Driver Information:
SQLSRV32.DLL
Version 03.85.1132
ODBC Version 03.52

Database Language and Character Set:

Warning: Unable to determine the application and database codepage settings.
Please refer to user manual for more information.


2011-04-08 15:57:49  INFO    OGG-01478  Output file /u01/app/oracle/gg/dirdat/ex
 is using format RELEASE 10.4/11.1.

2011-04-08 15:57:55  INFO    OGG-01226  Socket buffer size set to 27985 (flush s
ize 27985).

Processing table HRSCHEMA.EMP

***********************************************************************
*                   ** Run Time Statistics **                         *
***********************************************************************

Report at 2011-04-08 15:57:55 (activity since 2011-04-08 15:57:49)

Output to /u01/app/oracle/gg/dirdat/ex:

From Table HRSCHEMA.EMP:
       #                   inserts:         4
       #                   updates:         0
       #                   deletes:         0
       #                  discards:         0


C:\GG>
The run time statistics shows that 4 rows were successfully extracted. Let's move to the Linux machine and start the Replicat.

To apply the extracted data to the target database, run the replicat command and provide the prepared parameters file. Here is an excerpt from the replicat run:
[oracle@oradb gg]$ ./replicat paramfile dirprm/inload.prm

***********************************************************************
                 Oracle GoldenGate Delivery for Oracle
                     Version 11.1.1.0.0 Build 078
   Linux, x86, 32bit (optimized), Oracle 11 on Jul 28 2010 15:42:30

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2011-04-11 12:52:52
***********************************************************************

Operating System Version:
Linux
Version #1 SMP Mon Mar 29 20:06:41 EDT 2010, Release 2.6.18-194.el5
Node: oradb
Machine: i686
                         soft limit   hard limit
Address Space Size   :    unlimited    unlimited
Heap Size            :    unlimited    unlimited
File Size            :    unlimited    unlimited
CPU Time             :    unlimited    unlimited

Process id: 23383

Description:

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
SPECIALRUN
END RUNTIME
USERID gg_user, PASSWORD ********
EXTFILE /u01/app/oracle/gg/dirdat/ex
SOURCEDEFS /u01/app/oracle/gg/dirdef/emp.def
MAP hrschema.emp, TARGET gg_user.emp;

CACHEMGR virtual memory values (may have been adjusted)
CACHEBUFFERSIZE:                         64K
CACHESIZE:                              512M
CACHEBUFFERSIZE (soft max):               4M
CACHEPAGEOUTSIZE (normal):                4M
PROCESS VM AVAIL FROM OS (min):           1G
CACHESIZEMAX (strict force to disk):    881M

Database Version:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

...

Reading /u01/app/oracle/gg/dirdat/ex, current RBA 1210, 4 records

Report at 2011-04-11 12:53:15 (activity since 2011-04-11 12:53:14)

From Table HRSCHEMA.EMP to GG_USER.EMP:
       #                   inserts:         4
       #                   updates:         0
       #                   deletes:         0
       #                  discards:         0


Last log location read:
     FILE:      /u01/app/oracle/gg/dirdat/ex
     RBA:       1210
     TIMESTAMP: 2011-04-08 16:57:55.433993
     EOF:       NO
     READERR:   400

...

[oracle@oradb gg]$
You can login to the Oracle Database as GG_USER and check the contents of the EMP table.
SQL> select id, first_name from emp;

     ID FIRST_NAME
---------- --------------------------------------------------
      1 Dave
      2 Chris
      3 David
      4 Shawn

SQL>
The EMP table now contains a copy of all records that were originally inserted at the SQL Server.

Live Data Capture Configuration

With the Oracle database having an exact copy of the SQL Server's EMP table, it is now time to create a live capture configuration. We will setup the Extract and Replicat processes to run all the time and continuously transmit/apply changes of the EMP table.
In order to implement the new configuration you will have to create new parameter files for extracting and replicating. First however you have to perform two additional steps on SQL Server: Confirm that the database is set to Full Recovery and then take a full database backup of the EMP database. Failure to take a full backup will prevent the Extract process from capturing live data changes.
You can easily check if the EMP database is in Full Recovery by right-clicking on it, selecting Properties, and inspecting the value of Recovery model.
oracle-sqlserver-goldengate-f7
Taking a full backup is done in a few clicks as well. Right-click on the EMP database, select Tasks and then Back Up. This brings up the backup database dialog. We confirm that the Backup type is set to Full and then click OK.
oracle-sqlserver-goldengate-f8
If everything goes well in a couple of seconds we should see a notification that the operation is successful.
oracle-sqlserver-goldengate-f9
Time to set the processes. We will start by configuring a Manager process on the Windows machine. We skipped this step in the initial loading phase, but in the new configuration that you are building the Extract process must be running all the time. This requires an active manager process that will perform resource management functions. You will follow the same steps as with the Linux box configuration.
GGSCI (MSSQL) 1> EDIT PARAM MGR

GGSCI (MSSQL) 2>
Put a single line in MGR.PRM to set the port of the Manager instance.
PORT 7809
Then we start the Manager.
GGSCI (MSSQL) 2> START MANAGER

Starting Manager as service ('GGSMGR')...
Service started.


GGSCI (MSSQL) 3>
Let's create a new extract group for mining the transaction logs and name it MSEXT. Then set a destination where the data changes should be written (/u01/app/oracle/gg/dirdat/ms).
GGSCI (MSSQL) 3> ADD EXTRACT MSEXT, TRANLOG, BEGIN NOW
EXTRACT added.

GGSCI (MSSQL) 4> ADD RMTTRAIL /u01/app/oracle/gg/dirdat/ms, EXTRACT MSEXT
RMTTRAIL added.
You will also need a new parameters file.
GGSCI (MSSQL) 5> EDIT PARAMS MSEXT


GGSCI (MSSQL) 6>
Type the following lines in it:
EXTRACT MSEXT
SOURCEDB HR
TRANLOGOPTIONS MANAGESECONDARYTRUNCATIONPOINT
RMTHOST ORADB, MGRPORT 7809
RMTTRAIL /u01/app/oracle/gg/dirdat/ms
TABLE HRSCHEMA.EMP;
The difference here is that we are omitting the SOURCEISTABLE parameter and introducing a new one: TRANLOGOPTIONS MANAGESECONDARYTRUNCATIONPOINT. This options tells the Extract process to routinely check and delete the CDC capture job, resulting in better performance and less occupied space for captured data.
This is all you need on the source machine. Let's move on and configure the replication at the target.
On the Linux box you have to start by creating a checkpoint table. Checkpoints are used to store the current read/write positions of the Extract and Replicat processes. They prevent loss of data and insure that the processes can recover from faults (for example if the network between the source and target machine goes down for a moment). Create a table that holds checkpoints information by issuing the ADD CHECKPOINT command at the target.
GGSCI (oradb) 1> DBLOGIN USERID gg_user, PASSWORD welcome1
Successfully logged into database.

GGSCI (oradb) 2> ADD CHECKPOINTTABLE gg_user.chkpt

Successfully created checkpoint table GG_USER.CHKPT.

GGSCI (oradb) 3>
Let's add a Replicat group and setup its parameters.
GGSCI (oradb) 3> ADD REPLICAT MSREP, EXTTRAIL /u01/app/oracle/gg/dirdat/ms, CHECKPOINTTABLE gg_user.chkpt
REPLICAT added.


GGSCI (oradb) 4> EDIT PARAMS MSREP

GGSCI (oradb) 5>
As a final step put the following lines in MSREP.PRM.
REPLICAT MSREP
SOURCEDEFS /u01/app/oracle/gg/dirdef/emp.def
USERID gg_user, PASSWORD welcome1
MAP hrschema.emp, TARGET gg_user.emp;
The configuration is now completed. Let's start the Extract and Replicat and do some testing.

Starting and Testing Online Transaction Replication

To start the Extract process, use GGSCI and execute the START EXTRACT command. 
GGSCI (MSSQL) 1> START EXTRACT MSEXT

Sending START request to MANAGER ('GGSMGR') ...
EXTRACT MSEXT starting


GGSCI (MSSQL) 2>
On the Linux machine use the START REPLICAT command respectively.
GGSCI (oradb) 1> START REPLICAT MSREP

Sending START request to MANAGER ...
REPLICAT MSREP starting


GGSCI (oradb) 2>
Let's login as GG_USER and see the contents of the EMP table.
SQL> select id, first_name from emp;

     ID FIRST_NAME
---------- --------------------------------------------------
      1 Dave
      2 Chris
      3 David
      4 Shawn

SQL>
Nothing new here. The data hasn't change since the last time we checked. Let's go back to the SQL Server machine and run the following query, adding one additional row to the EMP table at the source.
BEGIN TRAN
INSERT INTO [hrschema].[emp] ([id], [first_name], [last_name]) VALUES (9,'Gar','Samuelson')
COMMIT TRAN
oracle-sqlserver-goldengate-f10
Let's go back to the Oracle Database and see if anything changed there.
SQL> select id, first_name from emp;

     ID FIRST_NAME
---------- --------------------------------------------------
      1 Dave
      2 Chris
      3 David
      4 Shawn
      9 Samuelson

SQL>
Congratulations! The data is getting replicated in a sub-second interval, reflecting every single transaction.

Conclusion

In this article we performed a very basic demonstration of some of the Oracle GoldenGate features. You should be aware that there are many different topologies and usage scenarios. For instance, you can configure GoldenGate to perform bidirectional replication (where two different databases simultaneously replicate changes to each other). There are also broadcast (where a single database replicates to multiple targets) and consolidation (many databases replicate to a central  database) configurations. One can use GoldenGate to implement query offloading (separating reporting from production, but avoiding the time gap of the traditional data warehouses). It is also a powerful solution for implementing zero downtime upgrades and database migrations.

newer post
older post