10 Aug 2021
Splitting data that does not fit on one machine using data partitioning
The following article is an excerpt from Software Mistakes and Trade-offs book. In real-world big data applications, the amount of data that we need to store and process can be often counted in the hundreds of terabytes or petabytes. It is not feasible to store such an amount of data on one physical node. We need a way to split that data into N data nodes.