blog.allegro.tech

Jun 28 2021

One task — two solutions: Apache Spark or Apache Beam?

Some time ago, our team faced the issue of moving an existing Apache Spark job from an on-premise Hadoop cluster to public cloud. While working on the transition we came across another way to process data that is Apache Beam. We were curious whether this tool had more advantages in comparison to traditional Apache Spark. We wanted to find the answer relatively quickly with minimal effort. Hence, we built two projects to process the same data using these technologies. Below you can get to know the architecture of the jobs written in Apache Spark and Apache Beam.

All posts by
Yevgeniya Li

One task — two solutions: Apache Spark or Apache Beam?

Yevgeniya Li

All posts by Yevgeniya Li

One task — two solutions: Apache Spark or Apache Beam?

Yevgeniya Li

All posts by
Yevgeniya Li