Jul 1 2024

INP — what is the new Core Web Vitals metric and how do we work with it at Allegro.

Site performance is very important, first of all, from the perspective of users, who expect a good experience when visiting the site. The user should not wait too long for the page to load. We all know how annoying it can be when we want to press an element and it jumps to another place on the page or when we click on a button and then nothing happens for a very long time. The state of a site’s performance in these aspects is measured by Web Vitals performance metrics and most importantly by a set of three major Core Web Vitals metrics (LCP — Largest Contentful Paint, CLS — Cumulative Layout Shift, INP — Interaction to Next Paint). They are responsible for measuring the 3 things: loading time, visual stability and interactivity. These metrics are also important for the websites themselves because, in addition to the user experience, they are also taken into account in terms of the website’s positioning in search engines (SEO), which is crucial for most websites on the Internet, Allegro included.

Jun 4 2024

REST service client: design, testing, monitoring

The purpose of this article is to present how to design, test, and monitor a REST service client. The article includes a repository with clients written in Kotlin using various technologies such as WebClient, RestClient, Ktor Client, Retrofit. It demonstrates how to send and retrieve data from an external service, add a cache layer, and parse the received response into domain objects.

May 16 2024

Unveiling bottlenecks of couchbase sub-documents operations

This story shows our journey in addressing a platform stability issue related to autoscaling, which, paradoxically, added some additional overhead instead of reducing the load. A pivotal part of this narrative is how we used Couchbase — a distributed NoSQL database. If you find yourself intrigued by another enigmatic story involving Couchbase, don’t miss my blog post on tuning expired doc settings.

Apr 12 2024

Ten Years and Counting: My Affair with Microservices

In early 2024, I hit ten years at Allegro, which also happens to be how long I’ve been working with microservices. This timespan also roughly corresponds to how long the company as a whole has been using them, so I think it’s a good time to outline the story of project Rubicon: a very ambitious gamble which completely changed how we work and what our software is like. The idea probably seemed rather extreme at the time, yet I am certain that without this change, Allegro would not be where it is today, or perhaps would not be there at all.

Mar 6 2024

Unlocking Kafka's Potential: Tackling Tail Latency with eBPF

At Allegro, we use Kafka as a backbone for asynchronous communication between microservices. With up to 300k messages published and 1M messages consumed every second, it is a key part of our infrastructure. A few months ago, in our main Kafka cluster, we noticed the following discrepancy: while median response times for produce requests were in single-digit milliseconds, the tail latency was much worse. Namely, the p99 latency was up to 1 second, and the p999 latency was up to 3 seconds. This was unacceptable for a new project that we were about to start, so we decided to look into this issue. In this blog post, we would like to describe our journey — how we used Kafka protocol sniffing and eBPF to identify and remove the performance bottleneck.

Feb 12 2024

Don’t bother: it is only a little expired

This story shows how we strive to fix issues reported by our customers regarding inconsistent listing views on our e-commerce platform. We will use a top-down manner to guide you through our story. At the beginning, we highlight the challenges faced by our customers, followed by presenting basic information on how views are personalized on our web application. We then delve deeper into our internal architecture, aiming to clarify how it supports High Availability (HA) by using two data centers. Finally, we advertise a little Couchbase, distributed NoSQL database, and explain why it is an excellent storage solution for such an architecture.

Dec 14 2023

Clever, surprised and gray-haired

This article is a form of a public postmortem in which we would like to share our bumpy way of revealing the cause of a mysterious performance problem. Besides unveiling part of our technical stack based on open-source solutions, we also show how some false assumptions made such a bug triage process much harder. Besides all NOT TO DOs, you can find some exciting information about performance hunting and reproducing performance issues on a small scale. As a perk, we prepared a repository where you can reproduce the problem and make yourself familiar with tools that allowed us to confirm the cause. The last part (lessons learned) is the most valuable if you prefer to learn from the mistakes of others.

Nov 27 2023

How does B-tree make your queries fast?

B-tree is a structure that helps to search through great amounts of data. It was invented over 40 years ago, yet it is still employed by the majority of modern databases. Although there are newer index structures, like LSM trees, B-tree is unbeaten when handling most of the database queries.

Oct 30 2023

Beyond the Code - An Engineer’s Battle Against Knowledge Loss

The idea for this article arose during a meeting where we learned that our supervisor would be leaving the company to pursue new opportunities. In response, a colleague lamented that what we would miss most is the knowledge departing with the leader. Unfortunately, that’s how it goes. Not only do we lose a colleague, but we also lose valuable knowledge and experience. However, this isn’t a story about my supervisor; it’s a story about all those individuals who are experts in their fields, who understand the paths to success and paths that lead to catastrophic failures. When they leave, they take with them knowledge that you won’t find in any book, note, or Jira ticket. And this leads to a fundamental question: What can be done to avoid this “black hole” of knowledge? How can we ensure it doesn’t vanish along with them? That’s what this article is all about.

Sep 14 2023

Online MongoDB migration

MongoDB is the most popular database used at Allegro. We have hundreds of MongoDB databases running on our on—premise servers. In 2022 we decided that we need to migrate all our MongoDB databases from existing shared clusters to new MongoDB clusters hosted on Kubernetes pods with separated resources. To perform the migration of all databases we needed a tool for transfering all the data and keeping consistency between old and new databases. That’s how mongo-migration-stream project was born.

May 31 2023

Debugging hangs - piecing together why nothing happens

As a part of a broader initiative of refreshing Allegro platform, we are upgrading our internal libraries to Spring Boot 3.0 and Java 17. The task is daunting and filled with challenges, however overall progress is steady and thanks to the modular nature of our code it should end in finite time. Everyone who has performed such an upgrade knows that you need to expect the unexpected and at the end of the day prepare for lots of debugging. No amount of migration guide would prepare you for what’s coming in the field. In the words of Donald Rumsfeld there are unknown unknowns and we need to be equipped with the tools to uncover these unknowns and patch them up. In this blog post I’d like to walk you through a process that should show where the application hangs, although there seems to be nothing wrong with it. I will also show that you don’t always know what code you have – problem known as dependecy hell, place we got quite cosy in during this upgrade.

Apr 18 2023

Trust no one, not even your training data! Machine learning from noisy data

Label noise is ever-present in machine learning practice. Allegro datasets are no exception. We compared 7 methods for training classifiers robust to label noise. All of them improved the model’s performance on noisy datasets. Some of the methods decreased the model’s performance in the absence of label noise.

Apr 5 2023

Dynamic Workload Balancing in Hermes

Hermes is a distributed publish-subscribe message broker that we use at Allegro to facilitate asynchronous communication between our microservices. As our usage of Hermes has grown over time, we faced a challenge in effectively distributing the load it handles to optimize resource utilization. In this blog post, we will present the implementation of a dynamic workload balancing algorithm that we developed to address this challenge. We will describe the approach we took, the lessons we learned along the way, and the results we achieved.

Mar 21 2023

How neuroscience can help you as a software engineer - motivation

Many of us, software engineers, have experienced those days when nothing really sparks joy in coding, debugging, preparing spikes or refining tasks for the next sprints. Obviously, we would like to have as few of such days as possible and go on with our work effectively. A solution to this definitely is not tormenting our brains with guilt and forced labour. There are other ways, and I would like to invite you to explore them with me and learn a little about our nervous systems in the process. We’ll find out where the motivation comes from on a biological and psychological level. We’ll also take a look at the changes you can introduce into your day to take advantage of certain mechanisms working on a neural level and boost your motivation and productivity.

Feb 13 2023

Onion Architecture

Software Architecture is an elusive thing which, if neglected, can lead to a hard-to-develop and maintain codebase, and in more drastic circumstances to the failure of a product. This article discusses one of the backend application architecture styles which proved to be successful in providing a good foundation for building and maintaining an application in the long run: Onion Architecture.

Showing 1 of 9 Pages Next