Here is a brief summary what I've got from the paper.
1970 - RDBMS emerges, i.e. SYSTEM R
1980 - major DB vendors take "one size fit all" strategy to push RDBMS to the mainstream market
1990 - data warehouse: put multiple operational db into a dataware house for business intelligence
Use senario: different OLTP, often optimized for updates, warehouse often
Index: Prefer bitmap index( good when data has low cardinality or not frequently updated) over B-Tree
Entering 2000, special-purpose DB engine emerges.*load the data from operational db periodically, and
*complex adhoc query, i.e. historical trend, correlation between diff op db data
Common data schema: fact and dimensional table, star schema*complex adhoc query, i.e. historical trend, correlation between diff op db data
Index: Prefer bitmap index( good when data has low cardinality or not frequently updated) over B-Tree
*StreamDB, motivated by fast approaching data streams in monitoring applications
DB Model: in-bound processing for RDBMS ( process-after-store); outbound for StreamDB ( process before (optional) store )
Three reasons that the exiting DBMS can not deal with data streams
*Column-store DB ( for extremely large data warehouse ) Three reasons that the exiting DBMS can not deal with data streams
- RDBMS can not be optimized for in-bound process as triggers are incorporated to the existing design as an after-thought.
- lack of low-layer primitives like time-window
- RDBMS separate db process and application logic using C/S arch, while stream db need seamless integration between the two.
Data are stored by column, not by row; optimized for "read-intensive" applications, while row-store db are good for write-intensive application.
*DB for Search Engine, represented by Google Bigtable
Use scenario: inbound stream data ( from crawlers) processing, and ad-hoc lookup on existing index; write operation append-only; read-operation sequential.
Requirement: fast response and high availability ( through replication and fast recovery)
Requirement: fast response and high availability ( through replication and fast recovery)
*XML DB - still under onging debate whether it is needed.

0 comments:
Post a Comment