3 years ago, mid-August
text search - scaling in terms of volume and traffic and index upgrades
Posted by pbirnie under technology
only things you can do with data are:
* make more copies of it (eg. master slave replication)
* process data to make more data / summary (eg. monthly weekly rollup tables)
* create different indexes that allow rapid access (eg. index on a, b, c columns, reverse index on all words in a document for free text search )
so for document search - lets say you have 3 distinct documents A,B,C
and lets assume that these 3 documents sit amongst 300 million documents
and with size of the document data that you are storing - you can only fit 100 million documents per machine (so that the index can fit in memory and the search time is quite short)
so you have 3 machines
FE
A
B
C
and you have a frontend that federates - ie. asks each of the A, B, C machines for hits on the word "dog" and then works out which are the best results to present to the user (based on rank).
if you want to add another 100 million documents - you can just add another row - so you have
A
B
C
D
ie. you can grow with data volume - by adding rows
but if you want to improve performance (ie. reduce latency in the response time) - you can add columns and a round robin load balancing on the front.
ie
AA AA
BB BB
CC CC
DD DD
now lets say you want to run active-active in 2 colos - (2 columns in west coast colo and 2 columns in east coast colo) you could upgrade the index while directing the traffic to the other column in the colo - or if you are using amazon EC could simply rent a new column of servers - create the search index on them and discard the old column when you are ready
