Articles tagged with
big data

20 Jun 2024

A Mission to Cost-Effectiveness: Reducing the cost of a single Google Cloud Dataflow Pipeline by Over 60%

In this article we’ll present methods for efficiently optimizing physical resources and fine-tuning the configuration of a Google Cloud Platform (GCP) Dataflow pipeline in order to achieve cost reductions. Optimization will be presented as a real-life scenario, which will be performed in stages.

Jakub Demianowski

04 Nov 2020

BigFlow — a Python framework for data processing on GCP

We are excited to announce that we have just released BigFlow 1.0 as open source. It’s a Python framework for big data processing on the Google Cloud Platform.

Bartłomiej Bęczkowski, Bartosz Walacik

07 Jan 2020

Design for failure - multiple layers of processing to protect against failures.

When designing the architecture of a system, one always needs to think about what can go wrong and what kind of failures can occur in the system. This kind of problem analysis is especially hard in distributed systems. Failure is inevitable and the best we can do is to prepare for it.

Grzegorz Dziadosz, Małgorzata Karmazyn

22 Oct 2018

Turnilo — let’s change the way people explore Big Data

Two years ago at Allegro we used to have a very typical Big Data technology stack. The architecture was based on a Hadoop cluster and we would query it with plain Hive queries, Spark jobs and Jupyter notebooks. Over those last two years we have transformed it into a more efficient and easy to use OLAP platform.

Piotr Szczepanik, Piotr Guzik

29 Jun 2017

Presto - a small step for DevOps engineer but a big step for BigData analyst

I bet you have found this article after googling some of the issues you encounter when working with a Hadoop cluster. You probably deal with Hive queries used for exploratory data analysis that are processed way too long. Moreover, you cannot adapt Spark in your organization for every use case because of the fact that writing jobs requires quite strong programming skills. Clogged Yarn queues might be your nightmare and waiting for the launch of the container when you run even a small query drives you mad. Before we deployed Presto — a Fast SQL engine provided by Facebook — our analysts struggled with these problems on a regular basis.

Robert Mroczkowski, Piotr Wikieł

26 Jan 2017

Estimating the cache efficiency using big data

Caching is a good and well-known technique used to increase application performance and decrease overall system load. Usually small or medium data sets, which are often read and rarely changed, are considered as a good candidate for caching. In this article we focus on determining optimal cache size based on big data techniques.

Filip Marszelewski

17 Dec 2014

Big Data Spain 2014 review

Big Data Spain is an annual conference on Big Data and related topics held in the suburbs of Madrid. This year’s, i.e. third, edition has so far been the biggest; it has attracted more than 500 guests and various speakers including Big Data celebrities like Paco Nathan of Databricks. During two days of the conference, guests could attend many keynotes, speeches and workshops and learn about variuos products, services and specific use-cases, in both English and Spanish. Allegro was represented by two employees with a presentation on Hadoop pitfalls and gotchas.

Maciej Arciuch

05 Nov 2014

Hadoop World 2014 New York from a developer’s point of view

This year’s edition of Strata Hadoop World held in New York was humongous, 16 workshops, over 20 keynotes, over 130 talks and most importantly over 5000 attendees! This massive crowd wouldn’t fit in Hilton hotel where the previous edition was held. That is why organizers had to move the conference to Javits Conference Center - an enormous building in which Big Data believers occupied just one sector. The fact that the European edition of Hadoop Summit experienced exactly the same transition (the third edition is going to be held in a bigger location in Brussels) gives pleasant assurance that Big Data technologies are still a hot topic and that Big Data Community grows at a stable pace.

Jarosław Grabowski

Articles tagged with big data

Articles tagged with
big data