Monday, July 2, 2012

Database Warehouse Appliance Basics




Data warehouse appliance consists of
o  An integrated set of servers, storage, operating system(s), DBMS and software specifically pre-installed and pre-optimized
o  Specific recommended hardware configurations
o  Offering terabyte to petabyte range.

Appliance Technology
o  Most DW appliance vendors use Massively Parallel Processing (MPP) architectures
o  MPP architectures consist of independent processors or servers executing in parallel.

Benefits
o  Parallel Performance
o  Reduced Administration
o  Built-in high Availability
o  Scalability [Adding servers increases performance as well as capacity]
o  Service [One number to call]
o  Reduction in costs



Traditional Multiprocessing Architecture
§  SMP (Symmetric Multi Processing)
§  MPP (Massively Parallel Processing)

SMP (Symmetric Multi Processing)
o  SMP systems consist of several processors, each with its own memory cache.
o  Load is balanced across the processors.
o  Unable to move large amounts of data as required in data warehousing and business intelligence applications.

MPP (Massively Parallel Processing)
o  Consist of very large numbers of processors.
o  Each processor has its own memory, backplane and storage, and runs its own operating system.
o  The no shared-resources approach of pure MPP systems allows nearly linear scalability.
o  High availability– when one node fails, another can take over.
o  Main objective is to take the performance and scalability advantages of MPP while reducing costs and administration time.


Both SMP and MPP have major drawbacks:
Requires massive data movement.


Multiprocessing Variations
·     Large Scale SMP
·     MPP on Clustered SMP


Large Scale SMP





-                 Larger SMP systems with additional processors and shared memory are available that deliver much higher computing power.

-                 As processors take turns accessing massive amounts of data in memory, the memory bus becomes a bottleneck that results in poor performance.


MPP on Clustered SMP
( Used by Teradata and the IBM DB2 Integrated Cluster Environment (ICE))



 -                 Small SMP clusters operating in parallel

-                 Sharing a storage area network and management structure.

-                 The resource-sharing built into this approach imposes a bottleneck that limits performance and scalability.




-                 Three architectures used today by well-known data warehouse solutions.

o  Shared nothing (data not shared disk not shared) MPP
o  Separate data, shared storage MPP
o  Shared data and storage MPP

-                 All three are based on a hybrid combination of MPP on SMP clusters, but vary in sharing data and storage resources between MPP nodes.


-                 Major weakness: it requires significant data movement from disks to processors for BI queries.




Separate data, shared storage:
(Used by IBM in DB2 database management system)

-                 Drawback: Significant data movement from disks to processors



 Shared data and storage:

-                 multiple processors operating in parallel shared data residing on a common storage system



3 comments: