In this article we will discuss about the architecture overview of the
In-Memory Computing Engine of SAP HANA. The SAP HANA database is
developed in C++ and runs on SUSE Linux Enterpise Server. SAP HANA
database consists of multiple servers and the most important component
is the
Index Server. SAP HANA database consists of Index Server, Name Server, Statistics Server, Preprocessor Server and XS Engine.
- Index Server contains the actual data and the engines for processing the data. It also coordinates and uses all the other servers.
- Name Server holds information about the SAP HANA
databse topology. This is used in a distributed system with instances of
HANA database on different
hosts. The name server knows where the components are running and which
data is located on which server.
- Statistics Server collects information about Status,
Performance and Resource Consumption from all the other server
components. From the SAP HANA Studio we can access the Statistics Server
to get status of various alert monitors.
- Preprocessor Server is used for Analysing Text Data and extracting the information on which the text search capabilities are based .
- XS Engine is an optional component. Using XS Engine clients can connect to SAP HANA database to fetch data via HTTP.
Now let us check the architecture components of SAP HANA Index Server.
SAP HANA Index Server Architecture:
- Connection and Session Management component is responsible
for creating and managing sessions and connections for the database
clients. Once a session is established, clients can communicate with the
SAP HANA database using SQL statements. For each session a set of
parameters are maintained like, auto-commit, current transaction isolation level etc. Users are Authenticated
either by the SAP HANA database itself (login with user and password)
or authentication can be delegated to an external authentication
providers such as an LDAP directory.
- The client requests are analyzed and executed by the set of components summarized as Request Processing And Execution Control.
The Request Parser analyses the client request and dispatches it to the
responsible component. The Execution Layer acts as the controller that
invokes the different engines and routes intermediate results to the
next execution step.
For example, Transaction Control statements are forwarded to the
Transaction Manager. Data Definition statements are dispatched to the
Metadata Manager and Object invocations are forwarded to Object Store.
Data Manipulation statements are forwarded to the Optimizer which
creates an Optimized Execution Plan that is subsequently forwarded to
the execution layer.
- The SQL Parser checks the syntax and semantics of the
client SQL statements and generates the Logical Execution Plan. Standard
SQL statements are processed directly by DB engine.
- The SAP HANA database has its own scripting language named SQLScript
that is designed to enable optimizations and parallelization. SQLScript
is a collection of extensions to SQL. SQLScript is based on side effect
free functions that operate on tables using SQL queries for set
processing. The motivation for SQLScript is to offload data-intensive
application logic into the database.
- Multidimensional Expressions (MDX) is a language for querying and manipulating the multidimensional data stored in OLAP cubes.
- The SAP HANA database also contains a component called the Planning Engine
that allows financial planning applications to execute basic planning
operations in the database layer. One such basic operation is to create a
new version of a dataset as a copy of an existing one while applying
filters and transformations. For example: Planning data for a new year
is created as a copy of the data from the previous year. This requires
filtering by year and updating the time dimension. Another example for a
planning operation is the disaggregation operation that distributes
target values from higher to lower aggregation levels based on a
distribution function.
- The SAP HANA database also has built-in support for
domain-specific models (such as for financial planning) and it offers
scripting capabilities that allow application-specific calculations to
run inside the database.
The SAP HANA database features such as SQLScript and Planning
operations are implemented using a common infrastructure called the Calc engine.
The SQLScript, MDX, Planning Model and Domain-Specific models are
converted into Calculation Models. The Calc Engine creates Logical
Execution Plan for Calculation Models. The Calculation Engine
will break up a model, for example some SQL Script, into operations that
can be processed in parallel. The engine also executes the user defined
functions.
- In HANA database, each SQL statement is processed in the
context of a transaction. New sessions are implicitly assigned to a new
transaction. The Transaction Manager coordinates database
transactions, controls transactional isolation and keeps track of
running and closed transactions. When a transaction is committed or
rolled back, the transaction manager informs the involved engines about
this event so they can execute necessary actions. The transaction
manager also cooperates with the persistence layer to achieve atomic and
durable transactions.
- Metadata can be accessed via the Metadata Manager. The
SAP HANA database metadata comprises of a variety of objects, such as
definitions of relational tables, columns, views, and indexes,
definitions of SQLScript functions and object store metadata. Metadata
of all these types is stored in one common catalog for all SAP HANA
database stores (in-memory row store, in-memory column store, object
store, disk-based). Metadata is stored in tables in row store. The SAP
HANA database features such as transaction support, multi-version
concurrency control, are also used for metadata management. In
distributed database systems central metadata is shared across servers.
How metadata is actually stored and shared is hidden from the components
that use the metadata manager.
- The Authorization Manager is invoked by other SAP HANA
database components to check whether the user has the required
privileges to execute the requested operations. SAP HANA allows granting
of privileges to users or roles. A privilege grants the right to
perform a specified operation (such as create, update, select, execute,
and so on) on a specified object (for example a table, view, SQLScript
function, and so on).
The SAP HANA database supports Analytic Privileges that represent
filters or hierarchy drilldown limitations for analytic queries.
Analytic privileges grant access to values with a certain combination of
dimension attributes. This is used to restrict access to a cube with
some values of the dimensional attributes.
- Database Optimizer gets the Logical Execution Plan from the SQL Parser or the Calc Engine as input and generates the optimised Physical Execution Plan based on the database Statistics. The database optimizer which will determine the best plan for accessing row or column stores.
- Database Executor basically executes the Physical
Execution Plan to access the row and column stores and also process all
the intermediate results.
- The Row Store is the SAP HANA database row-based
in-memory relational data engine. Optimized for high performance of
write operation, Interfaced from calculation / execution layer.
Optimised Write and Read operation is possible due to Storage separation
i.e. Transactional Version Memory & Persisted Segment.
- Transactional Version Memory contains temporary versions
i.e. Recent versions of changed records. This is required for
Multi-Version Concurrency Control (MVCC). Write Operations mainly go
into Transactional Version Memory. INSERT statement also writes to the
Persisted Segment.
- Persisted Segment contains data that may be seen by
any ongoing active transactions. Data that has been committed before
any active transaction was started.
- Version Memory Consoliation moves the recent
version of changed records from Transaction Version Memory to Persisted
Segment based on Commit ID. It also clears outdated record versions from
Transactional Version Memory. It can be considered as garbage collector
for MVCC.
- Segments contain the actual data (content of
row-store tables) in pages. Row store tables are linked list of memory
pages. Pages are grouped in segments. Typical Page size is 16 KB.
- Page Manager is responsible for Memory allocation. It also keeps track of free/used pages.
- The Column Store is the SAP HANA database column-based in-memory relational data engine. Parts of it originate from TREX
(Text Retrieval and Extraction) i.e SAP NetWeaver Search and
Classification. For the SAP HANA database this proven technology was
further developed into a full relational column-based data store.
Efficient data compression and optimized for high performance of read
operation, Interfaced from calculation / execution layer. Optimised Read
and Write operation is possible due to Storage separation i.e. Main
& Delta.
- Main Storage contains the compressed data in memory for fast read.
- Delta Storage is meant for fast write operation. The update is performed by inserting a new entry into the delta storage.
- Delta Merge is an asynchronous process to move
changes in delta storage into the compressed and read optimized main
storage. Even during the merge operation the columnar table will be
still available for read and write operations. To fulfil this
requirement, a second delta and main storage are used internally.
- During Read Operation data is always read from both
main & delta storages and result set is merged. Engine uses multi
version concurrency control (MVCC) to ensure consistent read operations.
- As row tables and columnar tables can be combined in one
SQL statement, the corresponding engines must be able to consume
intermediate results created by each other. A main difference between
the two engines is the way they process data: Row store operators
process data in a row-at-a-time fashion using iterators. Column store
operations require that the entire column is available in contiguous
memory locations. To exchange intermediate results, row store can
provide results to column store materialized as complete rows in memory
while column store can expose results using the iterator interface
needed by row store.
- The Persistence Layer is responsible for durability and
atomicity of transactions. It ensures that the database is restored to
the most recent committed state after a restart and that transactions
are either completely executed or completely undone. To achieve this
goal in an efficient way the per-sistence layer uses a combination of
write-ahead logs, shadow paging and savepoints. The persistence layer
offers interfaces for writing and reading data. It also contains SAP
HANA 's logger that manages the transaction log. Log entries can be
written implicitly by the persistence layer when data is written via the
persistence interface or explicitly by using a log interface.
Distributed System and High Availability
The SAP HANA Appliance software supports High Availability. SAP HANA
scales systems beyond one server and can remove the possibility of
single point of failure. So a typical Distributed Scale out Cluster
Landscape will have many server instances in a cluster. Therefore Large
tables can also be distributed across multiple servers. Again Queries
can also be executed across servers. SAP HANA Distributed System also
ensures transaction safety.
Features
- N Active Servers or Worker hosts in the cluster.
- M Standby Server(s) in the cluster.
- Shared file system for all Servers. Serveral instances of SAP HANA share the same metadata.
- Each Server hosts an Index Server & Name Server.
- Only one Active Server hosts the Statistics Server.
- During startup one server gets elected as Active Master.
- The Active Master assigns a volume to each starting Index Server or no volume in case of cold Standby Servers.
- Upto 3 Master Name Servers can be defined or configured.
- Maximum of 16 nodes is supported in High Availability configurations.
Name Server Configured Role |
Name Server Actual Role |
Index Server Configured Role |
Index Server Actual Role |
Master 1 |
Master |
Worker |
Master |
Master 2 |
Slave |
Worker |
Slave |
Master 3 |
Slave |
Worker |
Slave |
Slave |
Slave |
Standby |
Standby |
Failover
- High Availability enables the failover of a node within one
distributed SAP HANA appliance. Failover uses a cold Standby node and
gets triggered automatically. So when a Active Server X fails, Standby
Server N+1 reads indexes from the shared storage and connects to logical
connection of failed server X.
- If the SAP HANA system detects a failover situation, the work
of the services on the failed server is reassigned to the services
running on the standby host. The failed volume and all the included
tables are reassigned and loaded into memory in accordance with the
failover strategy defined for the system. This reassignment can be
performed without moving any data, because all the persistency of the
servers is stored on a shared disk. Data and logs are stored on shared
storage, where every server has access to the same disks.
- The Master Name Server detects an Index Server failure and
executes the failover. During the failover the Master Name Server
assigns the volume of the failed Index Server to the cold Standby
Server. In case of a Master Name Server failure, another of the
remaining Name Servers will become Active Master.
- Before a failover is performed, the system waits for a few
seconds to determine whether the service can be restarted. Standby node
can take over the role of a failing master or failing slave node
0 comments:
Post a Comment