2 years ago, mid-March
A standard roadmap for scaling websites
Posted by pbirnie under technology
Two articles on scaling myspace and ebay
http://www.addsimplicity.com.nyud.net:8080/downloads/eBaySDForum2006-11-29.pdf
http://glinden.blogspot.com/2006/12/talk-on-ebay-architecture.html “The parallels with Amazon are remarkable. Like Amazon, eBay started with a two-tiered architecture. Like Amazon, they split the website into a cluster in the late 1990's, followed soon after by partitioning the databases.”
This "inside my space" article give a simple road map of the steps the myspace.com took to scale their website. I have summarized this roadmap as a table and diagram below.
Given that the roadmap seems to be standard and any website that is lucky enough to have this problem can simply define their scalability roadmap and see if they can skip some steps along the way. ie: go straight to an architecture that consists of
- system partitioned-by-user-id
- physical middletier server which wrap the database
- middle tier supports caching using something like memcached
- a user login database (contains hashed password and the database server their account is stored on).
lots of simple frontend servers - user session is implemented using cookies and the contents of the url. ie. no server side session database
| Option | Scaling option |
|---|---|
| 1 | Simple 2 frontends, 1 database |
| 2 | Add frontends |
| 3 | Add database read slaves(1 master writer) |
| 4 | Vertical partitioning of database (1 database per feature) |
| 5 | Continue vertical partitioning of database (1 database per feature) |
| 6 | Add SAN |
| 7 | Scale up vs scale out decision - bigger machines vs lots of small machines - choose small machines |
| 8 | Split database based on users. 1 database per 1 million accounts, One central login server(UDB) |
| 9 | Language change |
| 10 | SAN bottleneck – move users accounts from disk to disk to distribute disk load |
| 11 | Change to 3PAR SAN which acts like a multidisk RAID. Similar to Google File System(GFS) |
| 12 | Add middle tier cache |
| 13 | Move to 64bit, add 64gb of Ram / machine |
| 14 | Replicate 3PAR SAN in 3 colos |

March 21st, 2008 at 2:46 pm
Wow, great post. Thanks for digging this up and sharing. The patterns and strategies for scalable “enterprise” solutions are not as well known as would benefit the industry, hopefully there will be broader publishing of these kinds of standard approaches.