SAP HANA: High-Performance Analytic Appliance (HANA)is
an In-Memory Database from SAP to store data and analyze large volumes
of non aggregated transactional data in Real-time with unprecedented
performance ideal for decision support & predictive analysis.
The In-Memory Computing Engine is a next generation innovation that uses
cache-conscious data-structures and algorithms leveraging hardware
innovation as well as SAP software technology innovations. It is ideal
for Real-time OLTP and OLAP in one appliance i.e. E-2-E solution from
Transactional to high performance Analytics. SAP HANA can also be used
as a secondary database to accelerate analytics on existing
applications.
Disk I/O was the Performance bottleneck in the past, whereas in memory computing was always much faster than that. Earlier, however, the cost of in-memory computing was prohibitive for any large scale implementation. Now with Multi-Core CPU and high capacity of RAM, we can host the entire database in memory. So now CPU is waiting for data to be loaded from main memory into CPU cache - and that's what is the Performance bottleneck today.
This is a total paradigm shift; Tape is Dead, Disk is Tape, Main Memory is Disk & CPU Cache is Main Memory. HANA is optimized to exploit the parallel processing capabilities of modern multi-core/CPU architectures. With this architecture, SAP applications can benefit from current hardware technologies.
Hardware Innovations - Leading to HANA
In real world we have so many variety of data sources, e.g. Unstructured Data, Operational Data Stores, Data Marts, Data Warehouses, Online Analytical Stores, etc. To do analytics or information mining from this Big Data at real time we come across the hurdles like Latency, High Cost and Complexity.Disk I/O was the Performance bottleneck in the past, whereas in memory computing was always much faster than that. Earlier, however, the cost of in-memory computing was prohibitive for any large scale implementation. Now with Multi-Core CPU and high capacity of RAM, we can host the entire database in memory. So now CPU is waiting for data to be loaded from main memory into CPU cache - and that's what is the Performance bottleneck today.
This is a total paradigm shift; Tape is Dead, Disk is Tape, Main Memory is Disk & CPU Cache is Main Memory. HANA is optimized to exploit the parallel processing capabilities of modern multi-core/CPU architectures. With this architecture, SAP applications can benefit from current hardware technologies.
Memory Overview - Where we stand
Let us have a quick look on Multi-Core CPU Caches, Main Memory i.e. RAM & traditional Hard Disk with respect to response time.- L1 cache - Primary & within core. SRAM - Fastest. L1 cache | ~ 1ns | 64k
- L2 cache – Intermediate & within core. DRAM - Slower. L2 cache | ~ 5ns | 256k
- L3 Cache – Shared across all cores. DRAM - Slowest. L3 cache | ~ 20ns | 8M
- Main Memory | ~ 100ns | TBs
- Hard Disk | > 1.000.000ns | TBs
HANA Hardware Requirement
HANA can be installed on many certified SAP hardware partners: Hewlett Packard, IBM, Fujitsu Computers, CISCO systems, DELL.
Currently SUSE Linux Enterprise Server x86-64 (SLES) 11 SP1 is the Operating System supported by SAP HANA.
A typical example of CPU and RAM can be 4 Intel E7-4870 / 40 cores
and 512 GB RAM. SAP recommends a dedicated server network communication
of 10 GBit/s between the SAP HANA landscape and the source system for
efficient data replication.
HANA Database Features
Important database features of HANA include OLTP & OLAP capabilities, Extreme Performance, In-Memory , Massively Parallel Processing, Hybrid Database, Column Store, Row Store, Complex Event Processing, Calculation Engine, Compression, Virtual Views, Partitioning and No aggregates. HANA In-Memory Architecture includes the In-Memory Computing Engine and In-Memory Computing Studio for modeling and administration. All the properties need a detailed explanation followed by the SAP HANA Architecture.Basic Concepts behind HANA Database
Extreme Hardware Innovations:
Main memory is no-longer a limited
resource, modern servers can have 2TB of system memory and this allows
complete databases to be held in RAM. Currently processors have up to 64
cores, and 128 cores will soon be available. With the increasing number
of cores, CPUs are able to process increased data per time interval.
This shifts the performance bottleneck from disk I/O to the data
transfer between main memory and CPU cache.
In-Memory Database:
HANA fully leverages the hardware
innovations like Multi-Core CPU, High capacity RAM availability. The
basic concept is to cache the entire database into fast accessible Main
Memory close to CPU for faster execution and to avoid disk I/O. Disk
storage is still required for permanent persistency since Main Memory is
volatile. SAP HANA, holds the bulk of its data in memory for maximum
performance, but still uses persistent storage to provide a fallback in
case of failure. Data and log are automatically saved to disk at regular
save points, the log is also saved to disk after each COMMIT of a
database transaction. Disk write operations happens asynchronously and
as a background task. Generally on system start-up HANA loads the tables
into memory.
Massively Parallel Processing:
With availability of Multi-Core CPUs,
higher CPU execution speeds can be achieved. Multiple CPUs call for new
parallel algorithms to be used in databases in order to fully utilize
the computing resources available. HANA Column-based storage makes it
easy to execute operations in parallel using multiple processor cores.
In a column store data is already vertically partitioned. This means
that operations on different columns can easily be processed in
parallel. If multiple columns need to be searched or aggregated, each of
these operations can be assigned to a different processor core. In
addition operations on one column can be parallelized by partitioning
the column into multiple sections that can be processed by different
processor cores. With the SAP HANA database, queries can be executed
rapidly and in parallel.
Hybrid Data Store:
Common databases store tabular data
row-wise, i.e. all data for a record are stored adjacent to each other
in memory. Row store tables are linked list of memory pages.
Conceptually, a database table is a two-dimensional data structure with
cells organized in rows and columns. Computer memory however is
organized as a linear structure. To store a table in linear memory, two
options exist:
- A row-oriented storage stores a table as a sequence of records, each of which contain the fields of one row.
- A column-oriented storage stores all the values of a column in contiguous memory locations.
Use of column store will help to prevent
table scan of unnecessary columns while performing searching and
aggregation operations on single column values stored in contiguous
memory locations. Such an oper-ation has high spatial locality and can
efficiently be executed in the CPU cache. With row-oriented storage, the
same operation would be much slower because data of the same column is
distributed across memory and the CPU is slowed down by cache misses.
Column store is optimized for high performance of read operation and
efficient data compression. This combination of both classical and
innovative technologies of data storage and access allows the developer
to choose the best technology for their application and, where
necessary, use both in parallel.
OLTP and OLAP Database:
HANA is a hybrid database, having both
read optimised column store ideally suited for OLAP and write optimised
row store best for OLTP systems relational engines. Both the stores are
In-Memory. Using column stores in OLTP applications requires a balanced
approach to insertion and indexing of column data to minimize cache
misses. The SAP HANA database allows the developer to specify whether a
table is to be stored column-wise or row-wise. It is also possible to
alter an existing table from columnar to row-based and vice versa.
Higher Data Compression:
The goal of keeping all relevant data in
main memory can be achieved with less cost if data compression is used.
Columnar data storage allows highly efficient compression. If a column
is sorted, there will normally be several contiguous values placed
adjacent to each other in memory. In this case compression methods, such
as run-length encoding, cluster coding or dictionary coding can be
used. In column stores a compression factor of 10 can typically be
achieved compared to traditional row-oriented storage systems.
2 comments:
My cousin recommended this blog and she was totally right keep up the fantastic work!
SAP HANA
Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging!
SAP HANA Training in London UK
Post a Comment