All posts by Tomasz Lelek

Splitting data that does not fit on one machine using data partitioning

The following article is an excerpt from Software Mistakes and Trade-offs book. In real-world big data applications, the amount of data that we need to store and process can be often counted in the hundreds of terabytes or petabytes. It is not feasible to store such an amount of data on one physical node. We need a way to split that data into N data nodes.

Tomasz Lelek

Tomasz is a former Allegro engineer. He currently works at Datastax, building products around Cassandra, one of the world’s favourite distributed databases. Published author of “Software Mistakes and Tradeoffs: Making good programming decisions”