only things you can do with data are:

* make more copies of it (eg. master slave replication)
* process data to make more data / summary (eg. monthly weekly rollup tables)
* create different indexes that allow rapid access (eg. index on a, b, c columns, reverse index on all words in a document for free text search )

so for document search  - lets say you have 3 distinct documents A,B,C
and lets assume that these 3 documents sit amongst 300 million documents
and with size of the document data that you are storing - you can only fit 100 million documents per machine (so that the index can fit in memory and the search time is quite short)

so you have 3 machines

FE

A
B
C

and you have a frontend that federates - ie. asks each of the A, B, C machines for hits on the word "dog" and then works out which are the best results to present to the user (based on rank).

if you want to add another 100 million documents - you can just add another row - so you have

A
B
C
D
ie. you can grow with data volume - by adding rows

but if you want to improve performance (ie. reduce latency in the response time) - you can add columns and a round robin load balancing on the front.
ie

AA AA
BB BB
CC CC
DD DD

now lets say you want to run active-active in 2 colos - (2 columns in west coast colo and 2 columns in east coast colo) you could upgrade the index while directing the traffic to the other column in the colo - or if you are using amazon EC could simply rent a new column of servers - create the search index on them and discard the old column  when   you are ready