Engineering culture of Allegro & Allegro Pay: Pragmatic Engineer Score
One tech blog/newsletter gained traction and popularity for a couple of years now: Pragmatic Engineer.
One tech blog/newsletter gained traction and popularity for a couple of years now: Pragmatic Engineer.
The purpose of this article is to present how to design, test, and monitor a REST service client. The article includes a repository with clients written in Kotlin using various technologies such as WebClient, RestClient, Ktor Client, Retrofit. It demonstrates how to send and retrieve data from an external service, add a cache layer, and parse the received response into domain objects.
This story shows our journey in addressing a platform stability issue related to autoscaling, which, paradoxically, added some additional overhead instead of reducing the load. A pivotal part of this narrative is how we used Couchbase — a distributed NoSQL database. If you find yourself intrigued by another enigmatic story involving Couchbase, don’t miss my blog post on tuning expired doc settings.
In early 2024, I hit ten years at Allegro, which also happens to be how long I’ve been working with microservices. This timespan also roughly corresponds to how long the company as a whole has been using them, so I think it’s a good time to outline the story of project Rubicon: a very ambitious gamble which completely changed how we work and what our software is like. The idea probably seemed rather extreme at the time, yet I am certain that without this change, Allegro would not be where it is today, or perhaps would not be there at all.
At Allegro, we use Kafka as a backbone for asynchronous communication between microservices. With up to 300k messages published and 1M messages consumed every second, it is a key part of our infrastructure. A few months ago, in our main Kafka cluster, we noticed the following discrepancy: while median response times for produce requests were in single-digit milliseconds, the tail latency was much worse. Namely, the p99 latency was up to 1 second, and the p999 latency was up to 3 seconds. This was unacceptable for a new project that we were about to start, so we decided to look into this issue. In this blog post, we would like to describe our journey — how we used Kafka protocol sniffing and eBPF to identify and remove the performance bottleneck.
Have you ever thought about ways of reducing repetitive, monotonous tasks? Maybe you would like to try to automate your own tasks? I will show you what technology we use at Allegro, what processes we have automated, and how to do it on your own.
This story shows how we strive to fix issues reported by our customers regarding inconsistent listing views on our e-commerce platform. We will use a top-down manner to guide you through our story. At the beginning, we highlight the challenges faced by our customers, followed by presenting basic information on how views are personalized on our web application. We then delve deeper into our internal architecture, aiming to clarify how it supports High Availability (HA) by using two data centers. Finally, we advertise a little Couchbase, distributed NoSQL database, and explain why it is an excellent storage solution for such an architecture.
Ready to turn web accessibility from a headache into a breeze? Join us as we demystify WCAG, explore its latest 2.2 version, and gaze into the future of digital inclusivity. Get ready for a journey that’s as enlightening as it is entertaining!
Icons are an integral part of most modern UIs. What is the best way to embed icons nowadays?
This article is a form of a public postmortem in which we would like to share our bumpy way of revealing the cause of a mysterious performance problem. Besides unveiling part of our technical stack based on open-source solutions, we also show how some false assumptions made such a bug triage process much harder. Besides all NOT TO DOs, you can find some exciting information about performance hunting and reproducing performance issues on a small scale. As a perk, we prepared a repository where you can reproduce the problem and make yourself familiar with tools that allowed us to confirm the cause. The last part (lessons learned) is the most valuable if you prefer to learn from the mistakes of others.
B-tree is a structure that helps to search through great amounts of data. It was invented over 40 years ago, yet it is still employed by the majority of modern databases. Although there are newer index structures, like LSM trees, B-tree is unbeaten when handling most of the database queries.
The idea for this article arose during a meeting where we learned that our supervisor would be leaving the company to pursue new opportunities. In response, a colleague lamented that what we would miss most is the knowledge departing with the leader. Unfortunately, that’s how it goes. Not only do we lose a colleague, but we also lose valuable knowledge and experience. However, this isn’t a story about my supervisor; it’s a story about all those individuals who are experts in their fields, who understand the paths to success and paths that lead to catastrophic failures. When they leave, they take with them knowledge that you won’t find in any book, note, or Jira ticket. And this leads to a fundamental question: What can be done to avoid this “black hole” of knowledge? How can we ensure it doesn’t vanish along with them? That’s what this article is all about.
MongoDB is the most popular database used at Allegro. We have hundreds of MongoDB databases running on our on—premise servers. In 2022 we decided that we need to migrate all our MongoDB databases from existing shared clusters to new MongoDB clusters hosted on Kubernetes pods with separated resources. To perform the migration of all databases we needed a tool for transfering all the data and keeping consistency between old and new databases. That’s how mongo-migration-stream project was born.
After six years as a Team Leader, I went back to hands-on engineering work, and I’m very happy about taking this step. While it may appear surprising at first, it was a well-thought-out decision, and actually I’ve already performed such a maneuver once before.
In the era of ubiquitous cloud services and an increasingly growing PaaS and serverless-oriented approach, performance and resources seem to be becoming less and less important. After all, we can scale horizontally and vertically at any time, without worrying about potential performance challenges that the business may introduce.
As a part of a broader initiative of refreshing Allegro platform, we are upgrading our internal libraries to Spring Boot 3.0 and Java 17. The task is daunting and filled with challenges, however overall progress is steady and thanks to the modular nature of our code it should end in finite time. Everyone who has performed such an upgrade knows that you need to expect the unexpected and at the end of the day prepare for lots of debugging. No amount of migration guide would prepare you for what’s coming in the field. In the words of Donald Rumsfeld there are unknown unknowns and we need to be equipped with the tools to uncover these unknowns and patch them up. In this blog post I’d like to walk you through a process that should show where the application hangs, although there seems to be nothing wrong with it. I will also show that you don’t always know what code you have – problem known as dependecy hell, place we got quite cosy in during this upgrade.
Label noise is ever-present in machine learning practice. Allegro datasets are no exception. We compared 7 methods for training classifiers robust to label noise. All of them improved the model’s performance on noisy datasets. Some of the methods decreased the model’s performance in the absence of label noise.
Hermes is a distributed publish-subscribe message broker that we use at Allegro to facilitate asynchronous communication between our microservices. As our usage of Hermes has grown over time, we faced a challenge in effectively distributing the load it handles to optimize resource utilization. In this blog post, we will present the implementation of a dynamic workload balancing algorithm that we developed to address this challenge. We will describe the approach we took, the lessons we learned along the way, and the results we achieved.
Many of us, software engineers, have experienced those days when nothing really sparks joy in coding, debugging, preparing spikes or refining tasks for the next sprints. Obviously, we would like to have as few of such days as possible and go on with our work effectively. A solution to this definitely is not tormenting our brains with guilt and forced labour. There are other ways, and I would like to invite you to explore them with me and learn a little about our nervous systems in the process. We’ll find out where the motivation comes from on a biological and psychological level. We’ll also take a look at the changes you can introduce into your day to take advantage of certain mechanisms working on a neural level and boost your motivation and productivity.
Hardware is always hard. The amount of operations, maintenance and planning that goes into supporting a data center is a daunting challenge for any enterprise. Though often unseen, without hardware there is no software.
Software Architecture is an elusive thing which, if neglected, can lead to a hard-to-develop and maintain codebase, and in more drastic circumstances to the failure of a product. This article discusses one of the backend application architecture styles which proved to be successful in providing a good foundation for building and maintaining an application in the long run: Onion Architecture.
Sometimes great results in code performance come with a small amount of work. We’d like to tell you a story about how we changed the Allegro mobile homepage and reduced usage of Allegro service infrastructure with only a few lines of code.
Let’s look at what transactions in MongoDB are and how they differ from SQL transactions.
Building a complex web platform can be a real challenge, especially when parts of it are delivered by independent teams. Picking out the correct architecture is crucial, but maintaining it can be even more challenging. Frontend microservices, aka microfrontends, is an architecture that gives a lot of flexibility, but can cause performance issues in the future, if not managed well. This article presents an approach to the microfrontends architecture to keep the frontend technology stack efficient based on the complexity of user interface.
What would you say if we stored 1 000 records in a database, and the database claimed that there were only 998 of them? Or, if we created a database storing sets of values and in some cases the database would claim that some element was in that set, while in fact it was not? It definitely must be a bug, right? It turns out such behavior is not necessarily an error, as long as we use a database that implements probabilistic algorithms and data structures. In this post we will learn about two probability-based techniques, perform some experiments and consider when it is worth using a database that lies to us a bit.