Sep 10 2024

Automating Code Migrations at Scale

At Allegro, we continuously improve our development processes to maintain high code quality and efficiency standards. One of the significant challenges we encounter is managing code migrations at scale, especially with breaking changes in our internal libraries or workflows. Manual code migration is a severe burden, with over 2000 services (and their repositories). We need to introduce some kind of code migration management.




Aug 5 2024

Migrating Selenium to Playwright in Java - evolution, not revolution

Are you, as a test automation engineer, tired of Selenium’s flakiness? Are you seeking a better tool to automate your end-to-end tests? Have you heard of Playwright? Perhaps you’ve encountered opinions that it is only worth using within a Node.js environment. I have. And as a tester, I decided to verify if this is true. If you’re interested in the results, I encourage you to read the following article.




Jul 1 2024

INP — what is the new Core Web Vitals metric and how do we work with it at Allegro.

Site performance is very important, first of all, from the perspective of users, who expect a good experience when visiting the site. The user should not wait too long for the page to load. We all know how annoying it can be when we want to press an element and it jumps to another place on the page or when we click on a button and then nothing happens for a very long time. The state of a site’s performance in these aspects is measured by Web Vitals performance metrics and most importantly by a set of three major Core Web Vitals metrics (LCP — Largest Contentful Paint, CLS — Cumulative Layout Shift, INP — Interaction to Next Paint). They are responsible for measuring the 3 things: loading time, visual stability and interactivity. These metrics are also important for the websites themselves because, in addition to the user experience, they are also taken into account in terms of the website’s positioning in search engines (SEO), which is crucial for most websites on the Internet, Allegro included.




Jun 4 2024

REST service client: design, testing, monitoring

The purpose of this article is to present how to design, test, and monitor a REST service client. The article includes a repository with clients written in Kotlin using various technologies such as WebClient, RestClient, Ktor Client, Retrofit. It demonstrates how to send and retrieve data from an external service, add a cache layer, and parse the received response into domain objects.


May 16 2024

Unveiling bottlenecks of couchbase sub-documents operations

This story shows our journey in addressing a platform stability issue related to autoscaling, which, paradoxically, added some additional overhead instead of reducing the load. A pivotal part of this narrative is how we used Couchbase — a distributed NoSQL database. If you find yourself intrigued by another enigmatic story involving Couchbase, don’t miss my blog post on tuning expired doc settings.


Apr 12 2024

Ten Years and Counting: My Affair with Microservices

In early 2024, I hit ten years at Allegro, which also happens to be how long I’ve been working with microservices. This timespan also roughly corresponds to how long the company as a whole has been using them, so I think it’s a good time to outline the story of project Rubicon: a very ambitious gamble which completely changed how we work and what our software is like. The idea probably seemed rather extreme at the time, yet I am certain that without this change, Allegro would not be where it is today, or perhaps would not be there at all.


Mar 6 2024

Unlocking Kafka's Potential: Tackling Tail Latency with eBPF

At Allegro, we use Kafka as a backbone for asynchronous communication between microservices. With up to 300k messages published and 1M messages consumed every second, it is a key part of our infrastructure. A few months ago, in our main Kafka cluster, we noticed the following discrepancy: while median response times for produce requests were in single-digit milliseconds, the tail latency was much worse. Namely, the p99 latency was up to 1 second, and the p999 latency was up to 3 seconds. This was unacceptable for a new project that we were about to start, so we decided to look into this issue. In this blog post, we would like to describe our journey — how we used Kafka protocol sniffing and eBPF to identify and remove the performance bottleneck.



Feb 12 2024

Don’t bother: it is only a little expired

This story shows how we strive to fix issues reported by our customers regarding inconsistent listing views on our e-commerce platform. We will use a top-down manner to guide you through our story. At the beginning, we highlight the challenges faced by our customers, followed by presenting basic information on how views are personalized on our web application. We then delve deeper into our internal architecture, aiming to clarify how it supports High Availability (HA) by using two data centers. Finally, we advertise a little Couchbase, distributed NoSQL database, and explain why it is an excellent storage solution for such an architecture.




Dec 14 2023

Clever, surprised and gray-haired

This article is a form of a public postmortem in which we would like to share our bumpy way of revealing the cause of a mysterious performance problem. Besides unveiling part of our technical stack based on open-source solutions, we also show how some false assumptions made such a bug triage process much harder. Besides all NOT TO DOs, you can find some exciting information about performance hunting and reproducing performance issues on a small scale. As a perk, we prepared a repository where you can reproduce the problem and make yourself familiar with tools that allowed us to confirm the cause. The last part (lessons learned) is the most valuable if you prefer to learn from the mistakes of others.


Nov 27 2023

How does B-tree make your queries fast?

B-tree is a structure that helps to search through great amounts of data. It was invented over 40 years ago, yet it is still employed by the majority of modern databases. Although there are newer index structures, like LSM trees, B-tree is unbeaten when handling most of the database queries.


Oct 30 2023

Beyond the Code - An Engineer’s Battle Against Knowledge Loss

The idea for this article arose during a meeting where we learned that our supervisor would be leaving the company to pursue new opportunities. In response, a colleague lamented that what we would miss most is the knowledge departing with the leader. Unfortunately, that’s how it goes. Not only do we lose a colleague, but we also lose valuable knowledge and experience. However, this isn’t a story about my supervisor; it’s a story about all those individuals who are experts in their fields, who understand the paths to success and paths that lead to catastrophic failures. When they leave, they take with them knowledge that you won’t find in any book, note, or Jira ticket. And this leads to a fundamental question: What can be done to avoid this “black hole” of knowledge? How can we ensure it doesn’t vanish along with them? That’s what this article is all about.


Sep 14 2023

Online MongoDB migration

MongoDB is the most popular database used at Allegro. We have hundreds of MongoDB databases running on our on—premise servers. In 2022 we decided that we need to migrate all our MongoDB databases from existing shared clusters to new MongoDB clusters hosted on Kubernetes pods with separated resources. To perform the migration of all databases we needed a tool for transfering all the data and keeping consistency between old and new databases. That’s how mongo-migration-stream project was born.




May 31 2023

Debugging hangs - piecing together why nothing happens

As a part of a broader initiative of refreshing Allegro platform, we are upgrading our internal libraries to Spring Boot 3.0 and Java 17. The task is daunting and filled with challenges, however overall progress is steady and thanks to the modular nature of our code it should end in finite time. Everyone who has performed such an upgrade knows that you need to expect the unexpected and at the end of the day prepare for lots of debugging. No amount of migration guide would prepare you for what’s coming in the field. In the words of Donald Rumsfeld there are unknown unknowns and we need to be equipped with the tools to uncover these unknowns and patch them up. In this blog post I’d like to walk you through a process that should show where the application hangs, although there seems to be nothing wrong with it. I will also show that you don’t always know what code you have – problem known as dependecy hell, place we got quite cosy in during this upgrade.


Apr 18 2023

Trust no one, not even your training data! Machine learning from noisy data

Label noise is ever-present in machine learning practice. Allegro datasets are no exception. We compared 7 methods for training classifiers robust to label noise. All of them improved the model’s performance on noisy datasets. Some of the methods decreased the model’s performance in the absence of label noise.

Showing 1 of 10 Pages Next