Saturday, June 15, 2013

Informatica Tutorial

0 comments
 Informatica is a widely used ETL tool for extracting the source data and loading it into the target after applying the required transformation. In the following section, we will try to explain the usage of Informatica in the Data Warehouse environment with an example. Here we are not going into the details of data warehouse design and this tutorial simply provides the overview about how INFORMATICA can be used as an ETL tool.

Note: The exchanges/companies that are explained here is for illustrative purpose only.

Bombay Stock Exchange (BSE) and National Stock Exchange (NSE) are two major stock exchanges in India in which the shares of ABC Corporation and XYZ Private Limited are traded between Mondays through Friday except Holidays.  Assume that a software company “KLXY Limited” has taken the project to integrate the data between two exchanges BSE and NSE.

In order to complete this task of integrating the Raw data received  from NSE & BSE, KLXY Limited allots responsibilities to Data  Modelers, DBAs and ETL Developers. During this entire ETL process,  many IT professionals may involve, but we are highlighting the  roles of these three personals only for easy understanding and  better clarity.
  • Data Modelers analyze the data from these two sources(Record Layout 1 & Record Layout 2), design Data Models, and then generate scripts to create necessary tables and the corresponding records.
  • DBAs create the databases and tables based on the scripts generated by the data modelers.
  • ETL developers map the extracted data from source systems and load it to target systems after applying the required transformations.
newer post

Informatica Upgrade Process:

0 comments
Informatica Upgrade Process:
Stages across upgrade can be categorized as below:
  1. Upgrading the domain and server file: run the Informatica server installer and select the upgrade option. The domain upgrade wizard installs the server files and configures the domain. If the domain has multiple nodes, you must upgrade on all the nodes.
The following table describes the actions that the installer performs when you upgrade Informatica:

Tasks
Description
1. Installs Informatica.
Installs Informatica directories and files into the new Directory.
2. Copies infa_shared directory.
Copies the contents of the infa_shared directory from the existing installation directory into the new installation Directory.
3. Copies mm_files directory.
Copies the contents of the mm_files directory from the default location in the existing installation directory into the New installation directory.
4. Upgrades the domain.
Upgrades the domain to run version 9.0.1 application Services.
The upgrade retains the user and administrator accounts in The domain.
5. Starts Informatica Services.
Starts Informatica Services on the node.
2.Upgrading the application services: After you upgrade the domain and server files, log in to the Administrator Tool and upgrade the application services. The service upgrade wizard provides a list of all application that must be upgraded. It upgrades the services based on the order required by the dependent objects.
3.Upgrading the Informatica client: To upgrade the Informatica client, run the Informatica client installer and Select the upgrade option
Pre-Upgrade Tasks
Before you upgrade the domain and server files, complete the following tasks:
1. Review the prerequisites.
2. Verify the file descriptor settings.
3. Verify the configuration of the environment variables used by the installer.
4. Clear the configuration of environment variables that pertain to previous installations of Informatica.
5. Prepare the domain.
6. Prepare the Power Center repository.
7. Prepare the Power Center Profiling warehouse.
8. Prepare for upgrade from Power Center 8.6.1
  • Export Reference Table Manager Data.
  • Prepare Metadata Manager.
  • Prepare the Data Analyzer repository.
9. Shut down the domain.
Upgrading the Domain and Server in Graphical Mode:
You can upgrade the Informatica domain and server files in graphical mode on Windows or UNIX.
1. Verify that your environment meets the minimum system requirements and complete the pre-upgrade tasks.
2. Log in to the machine with the same user account that you used to install the previous version.
3. Close all other applications.
4. To begin the upgrade on Windows, run install.bat from the root directory.
To begin the upgrade on UNIX, use a shell command line to run install.sh from the root directory, and then Select the option for graphical mode installation.
5. In the Installation Type window, select Upgrade to Informatica 9.0.1 and click next.
  • The Upgrade Pre-Requisites window displays the upgrade system requirements. Verify that all requirements are met before you continue the upgrade.
6. Click Next.
7. In the Upgrade Directory window, enter the following directories
Directory
Description
1.Directory of the Informatica
Product to upgrade.
Directory that contains the previous version of Power Center that you want to upgrade.
2.Directory for Informatica 9.0.1
Directory in which to install Informatica 9.0.1.
Enter the absolute path for the installation directory. The directory cannot be the same as the directory that contains the previous version of Power Center. The directory names in the path must not contain spaces or the following special characters: @|* $ # ! % ( ) { } [ ] , ; '
On Windows, the installation directory must be on the current machine.
Click Next.
The upgrade wizard displays a warning to shut down the Informatica domain before you continue the upgrade.
9. Click OK.
10. In the Pre-Installation Summary window, review the upgrade information, and click Install to continue.
The upgrade wizard installs the Informatica server files to the Informatica 9.0.1 installation directory.
11. In the Domain Configuration Upgrade window, the upgrade wizard displays the database and user account information for the domain configuration repository to be upgraded.
Property
Description
Database type
Database for the domain configuration repository.
Database user ID
Database user account for the domain configuration repository.
User password
Password for the database user account.
Tablespace
Displayed for IBM DB2 only. Name of the tablespace for the upgraded domain configuration repository tables.
If the database of the domain configuration repository that you are upgrading does not use a 32 K tablespace, this property is blank. Enter the name of a tablespace with a page size of 32 K. In a single-partition database, if you do not specify a tablespace
name, the installer writes the upgraded tables in the default tablespace. The default tablespace must be 32 K. In a multi-partition database, you must specify a 32 K tablespace.
The upgrade wizard displays the database connection string for the domain configuration repository based on how the connection string of the previous version was created at installation:
  • If the previous version used a JDBC URL at installation, the upgrade wizard displays the JDBC connection properties, including the database address and service name.
  • If the previous version used a custom JDBC connection string at installation, the upgrade wizard displays the custom connection string.
  • Optionally, you can specify additional JDBC parameters to include in the connection string. To provide Additional JDBC parameters, select JDBC parameters and enter a valid JDBC parameter string.
12. Click Test Connection to verify that you can connect to the database, and then click OK to continue.
13. Click Next.
On the Port Configuration Upgrade window, the upgrade wizard displays the default port numbers assigned to the domain and node components.
14. You can specify new port numbers or use the default port numbers.
The following table describes the ports that you can specify:
Port
Description
Service Manager port
Port number used by the Service Manager in the node. Client applications and the Informatica command line programs use this port to communicate to the services in the domain.
Informatica Administrator port
Port number used by the Administrator tool.
Available if you upgrade a gateway node.
Informatica Administrator
shutdown port
Port number used by the Administrator tool to listen for shut down commands.
Available if you upgrade a gateway node.
15. Click Next.
On Windows, the upgrade wizard creates a service to start Informatica. By default, the service runs under the same user account as the account used for installation. You can run the Windows service under a different User account.
16. Select whether to run the Windows service under a different user account.
The following table describes the properties that you set:
Property
Description
Run Informatica under a
different user account
Indicates whether to run the Windows service under a different user account.
User name
User account with which to run the Informatica Windows service.
Use the following format: DomainName\UserAccount
This user account must have the Act as operating system permission.
Password
Password for the user account with which to run the Informatica Windows service.
17. Click Next.
The Post-Upgrade Summary window indicates whether the upgrade completed successfully.
18. Click Done.
Upgrade the Application Services:
1. Configure Informatica Environment Variables
You can configure the INFA_JAVA_OPTS, INFA_DOMAINS_FILE, and INFA_HOME environment variables to store memory, domain, and location settings.
(i) INFA_JAVA_OPTS
For example, to configure 1 GB of system memory for the Informatica daemon on UNIX in a C shells.
setenv INFA_JAVA_OPTS “-Xmx1024m”
(ii) INFA_DOMAINS_FILE
Set the value of the INFA_DOMAINS_FILE variable to the path and file name of the domains.infa file. If you configure the INFA_DOMAINS_FILE variable, you can run infacmd and pmcmd from a directory other than /server/bin.
Configure the INFA_DOMAINS_FILE variable on the machine where you install the Informatica services. On Windows, configureINFA_DOMAINS_FILE as a system variable.
(iii) INFA_HOME
Use a softlink in UNIX for any of the Informatica directories. To configure INFA_HOME so that any Informatica application or service can locate the other Informatica components it needs to run, set INFA_HOME to the location of the Informatica installation directory.
2. Configure Locale Environment Variables
Use the following command to verify that the value for the locale environment variable is compatible with the Language settings for the machine and the type of code page you want to use for the repository:
Eg : locale –a
Locale for Oracle Database Clients
if the value is american_america.UTF8, set the variable in a C shell with the following command:
setenv NLS_LANG american_america.UTF8
Service Upgrade:
Use the service upgrade wizard to upgrade services.
1. In the Informatica Administrator header area click Manage > Upgrade.
2. Select the objects to upgrade.
3. Click Next.
4. If dependency errors exist, the Dependency Errors dialog box appears. Review the dependency errors and click OK. Then, resolve dependency errors and click next.
5. Enter the repository login information. Optionally, choose to use the same login information for all Repositories.
6. Click Next.
The service upgrade wizard upgrades each service and displays the status and processing details.
7. When the upgrade completes, the Summary section displays the list of services and their upgrade status.
Click each service to view the upgrade details in the Service Details section.
8. Optionally, click Save Report to save the upgrade details to a file.
If you choose not to save the report, you can click Save Previous Report the next time you launch the Service upgrade wizard.
9. Click Close.
10. Restart upgraded services.
After you upgrade the PowerCenter Repository Service, you must restart the service and its dependent Services.
Informatica Client Upgrade
1. Close all applications.
2. Run install.bat from the root directory.
The Upgrade Pre-Requisites window displays the system requirements. Verify that all installation Requirements are met before you continue the installation.
3. Click Next.
On the Select Component window, select the Informatica client you want to upgrade.
You can upgrade the following Informatica client applications:
  • Informatica Developer
  • PowerCenter Client
If both Informatica Developer and PowerCenter Client are installed on the machine, you can upgrade the tools in the same process.
4. On the Upgrade Directory window, enter the following directories:
Directory
Description
Directory of the Informatica
client to upgrade
Directory that contains the previous version of the Informatica client tool that you want to upgrade
Directory for Informatica 9.0.1
client tools
Directory in which to install the Informatica 9.0.1 client tools
Enter the absolute path for the installation directory. The installation directory must be on the current machine. The directory names in the path must not contain spaces or the following special characters: @|* $ # ! % ( ) { } [ ] , ; '
5. Click Next.
6. On the Pre-Installation Summary window, review the installation information, and click Install.
The installer copies the Informatica client files to the installation directory.
The Post-installation Summary window indicates whether the upgrade completed successfully.
7. Click Done.
Post Upgrade Tasks:
Informatica Domain
  • Configure LDAP Connectivity.
  • Update the Log Events Directory.
  • Update ODBC Data Sources.
  • Update Statistics for the Domain Configuration Repository.
  • View Log Events from the Previous Informatica Version.
Metadata Manager Service
  • Reload Metadata Manager Resources
  • Update the Metadata Manager Properties File
  • Reference Table Manager
For detailed study on version Upgrade, you may go through Documents from informatica corporation on upgrade.
newer post

Wednesday, May 15, 2013

Design Tip #155 Going Agile? Start with the Bus Matrix

0 comments
Many organizations are embracing agile development techniques for their DW/BI implementations. While we strongly concur with agile’s focus on business collaboration to deliver value via incremental initiatives, we’ve also witnessed agile’s “dark side.” Some teams get myopically focused on a narrowly-defined set of business requirements. They extract a limited amount of source data to develop a point solution in a vacuum. The resultant standalone solution can’t be leveraged by other groups and/or integrated with other analytics. The agile deliverable may have been built quickly, so it’s deemed a success. But when organizations lift their heads several years down the agile road, they often discover a non-architected hodgepodge of stovepipe data marts. The agile approach promises to reduce cost (and risk), but some organizations end up spending more on redundant, isolated efforts, coupled with the ongoing cost of fragmented decision-making based on inconsistent data.

It’s no surprise that a common criticism of the agile approaches for DW/BI development is the lack of planning and architecture, coupled with ongoing governance challenges. We believe the enterprise data warehouse bus matrix (described in our article “The Matrix: Revisited”) is a powerful tool to address these shortcomings. The bus matrix provides a master plan for agile development, plus it identifies the reusable common descriptive dimensions that provide both data consistency and reduced time-to-market delivery in the long run.

With the right mix of business and IT stakeholders in a room, along with a skilled facilitator, the bus matrix can be produced in relatively short order (measured in days, not weeks). Drafting the bus matrix depends on a solid understanding of the business’s needs. Collaboration is critical to identifying the business’s core processes. It’s a matter of getting the team members to visualize the key measurement events needed for analyses. Involving business representatives and subject matter experts will ensure the team isn’t paralyzed by this task. You’ll likely discover that multiple business areas or departments are interested the same fundamental business processes. As the business is brainstorming the list of measurement events, IT representatives are bringing a dose of reality about the available operational source data and any known limitations.

Once the matrix has been drafted, the team can then adopt agile development techniques to bring it to life. Business and IT management need to identify the single business process matrix row that’s both a high priority for the business, and highly feasible from a technical perspective. Focusing on just one matrix row minimizes the risk of signing up for an overly ambitious implementation. Most implementation risk comes from biting off too much ETL system design and development; focusing on a single business process, typically captured by a single operational source system, reduces this risk. Incremental development can produce the descriptive dimensions associated with the selected matrix row until sufficient functionality is available and then the dimensional model is released to the business community, as we describe in Design Tip #135: Conformed Dimensions as the Foundation for Agile Date Warehousing.
newer post

White Paper: Evolving Role of the Enterprise Data Warehouse in the Era of Big Data Analytics

0 comments
The enterprise data warehouse (EDW) community has entered a new realm of meeting new and growing business requirements in the era of big data. Common challenges include:

  1.     extreme integration
  2.     semi- and un-structured data sources
  3.     petabytes of behavioral and image data accessed through MapReduce/Hadoop
  4.     massively parallel relational database
  5.     structural considerations for the EDW to support predictive and other advanced analytics.

 These pressing needs raise more than a few urgent questions, such as:
  1.     How do you handle the explosion and diversity of data sources from conventional and non-conventional sources?
  2.     What new and existing technologies are needed to deepen the understanding of business through big data analytics?
  3.     What technological requirements are needed to deploy big data projects?
  4.     What potential organizational and cultural impacts should be considered?

This white paper provides detailed guidance for designing and administering the necessary deployment processes to meet these requirements. Ralph Kimball fills the hole where there is a lack of specific guidance in the industry as to how the EDW needs to respond to the big data analytics challenge, and what design elements are needed to support these new requirements.
newer post
newer post older post Home