Tuesday, February 10, 2009

PNUTS: Yahoo!’s Hosted Data Serving Platform

In this VLDB paper, folks in Yahoo described their distributed database -- PNUTS, the one with richer database features than Google BigTable, while still manages achieves low latency at a massive scale.

Here are several important design desicion that differentiate their system with others:

1. Consistency model: They choose sth in between two extremes: the general serializability guaranteed by RDBMS, and the eventual consistency delieved by many modern distributed database, such Amazon dynamo. The model, called per-record timeline consistency, guarantees that all replicas of a given record apply all updates to the record in the same order.

2. The use of pub/sub system for both replication updates between regions, and redo log of the database.

3. A flexible mapping of tablets to storage units to support failover and load balancing

It also interests me on some applications PNUTS targets for:
1. User database: frequent read and write; the user must seen his own changes, but it's fine that other users read stable data.
2. Social applications: lots of small updates. Ability of high write rates is key, but dessemination of the writes are not essential. A typical relationship table can be maintained with the primary key as (friend1, friend2).