Articles tagged with
bigdata
19 Dec 2024
Many companies face the challenge of efficiently processing large datasets for analytics.
Using an operational database for such purposes can lead to performance issues or, in extreme cases, system failures.
This highlights the need to transfer data from operational databases to data warehouses.
This approach allows heavy analytical queries without overburdening transactional systems and supports shorter retention periods in production databases.
10 Aug 2021
The following article is an excerpt from Software Mistakes and Trade-offs book.
In real-world big data applications, the amount of data that we need to store and process can be often counted in the hundreds of terabytes or petabytes. It is not feasible to store such an amount of data on one physical node. We need a way to split that data into N data nodes.
28 Jun 2021
Some time ago, our team faced the issue of moving an existing Apache Spark job from an on-premise Hadoop cluster to public cloud.
While working on the transition we came across another way to process data that is Apache Beam. We were curious whether this tool had
more advantages in comparison to traditional Apache Spark. We wanted to find the answer relatively quickly with minimal effort. Hence, we built two projects to
process the same data using these technologies. Below you can get to know the architecture of the jobs written in Apache Spark and Apache Beam.
07 Dec 2020
Marketing is a very important department in every company. In case of Allegro,
marketing is especially difficult because you have so many products to promote.
In this post we will tell the story of a platform we built for marketing
purposes.
24 Jun 2015
At Allegro we use many open-source tools that support our work.
Sometimes we are not able to find what we want and this is
a perfect moment to fill the gap and to
share with the community. We are proud to announce
Camus Compressor — a tool
that merges files created by Camus
on HDFS and saves
them in a compressed format.