Dec 20 2024
In this article, we want to share our journey of searching for optimizations in one of Allegro’s main microservices: opbox-web. You’ll read about the issues we had to deal with and how we managed to overcome them — together with a few surprises along the way and even one golden rule broken.
Dec 19 2024
Many companies face the challenge of efficiently processing large datasets for analytics.
Using an operational database for such purposes can lead to performance issues or, in extreme cases, system failures.
This highlights the need to transfer data from operational databases to data warehouses.
This approach allows heavy analytical queries without overburdening transactional systems and supports shorter retention periods in production databases.
Dec 11 2024
When we think about the Circuit Breaker pattern, we instantly associate it with the HTTP client. Just make some annotation or wrapper and proceed with coding.
In this article, I will encourage you to use this pattern to resolve business problems.
Based on a live example from Allegro I will show you how to use the implementation of CircuitBreaker from Resilience4j library for cases other than HTTP calls.
Nov 20 2024
As part of Allegro Hacktoberfest celebrations, Andamio Task Force (the team responsible for Andamio, a set of common libraries used by most JVM projects at Allegro) posted the
following message on our social platform…
Oct 7 2024
Did you know that in October this year, DRY principle will celebrate its 25th anniversary?
It was proposed by Andrew Hunt and David Thomas in The Pragmatic Programmer book in 1999. 25th birthday is quite a good reason to celebrate, isn’t it?
At least, it’s a good opportunity to bring this principle back into the spotlight and to discuss how to use it properly.
Sep 10 2024
At Allegro, we continuously improve our development processes to maintain high
code quality and efficiency standards. One of the significant challenges we
encounter is managing code migrations at scale, especially with breaking changes
in our internal libraries or workflows. Manual code migration is a severe burden, with over
2000 services (and their repositories). We need to introduce some kind
of code migration management.
Sep 4 2024
In one of our core services, the execution of a single unit test took approximately 30 seconds,
while a single integration test ranged between 65 and 70 seconds.
Running the entire test suite took circa 6 minutes.
Aug 26 2024
Hi, I am Magda and I will tell you a story about coming back to work after a break of 21 months and 2 days. Everything here will be a subjective perspective about my experience.
Aug 5 2024
Are you, as a test automation engineer, tired of Selenium’s flakiness? Are you seeking a better tool to automate your end-to-end tests? Have you heard of
Playwright? Perhaps you’ve encountered opinions that it is only worth using within a Node.js environment. I have. And as a tester, I decided to verify if
this is true. If you’re interested in the results, I encourage you to read the following article.
Jul 26 2024
This article is a case study of how we improved stability in our critical application.
It’s mostly a technical analysis of what happens in fresh Java based instance,
how JIT Compiler toyed with us at application start and how we learned to control it.
Jul 16 2024
If you have experience with Event Storming and have ever found yourself wishing there was a way to document the insights gathered during a session,
or wanting to communicate the process to other team members, then I have a solution for you. This idea can be expressed in a famous saying:
One picture is worth more than a thousand words.
Jul 1 2024
Site performance is very important, first of all, from the perspective of users, who expect a good experience when visiting the site.
The user should not wait too long for the page to load. We all know how annoying it can be when we want to press an element
and it jumps to another place on the page or when we click on a button and then nothing happens for a very long time. The state of a
site’s performance in these aspects is measured by Web Vitals performance metrics and most importantly by a set of three major
Core Web Vitals metrics (LCP — Largest Contentful Paint, CLS — Cumulative Layout Shift, INP — Interaction to Next Paint). They are
responsible for measuring the 3 things: loading time, visual stability and interactivity. These metrics are also important for the
websites themselves because, in addition to the user experience, they are also taken into account in terms of the website’s positioning
in search engines (SEO), which is crucial for most websites on the Internet, Allegro included.
Jun 20 2024
In this article we’ll present methods for efficiently optimizing physical resources and fine-tuning the configuration of a Google Cloud Platform (GCP)
Dataflow pipeline in order to achieve cost reductions.
Optimization will be presented as a real-life scenario, which will be performed in stages.
Jun 11 2024
One tech blog/newsletter gained traction and popularity for a couple of years now: Pragmatic Engineer.
Jun 4 2024
The purpose of this article is to present how to design, test, and monitor a REST service client.
The article includes a repository with clients written in Kotlin using various technologies such as WebClient,
RestClient,
Ktor Client,
Retrofit.
It demonstrates how to send and retrieve data from an external service, add a cache layer, and parse the received response into domain objects.
May 16 2024
This story shows our journey in addressing a platform stability issue related to autoscaling, which, paradoxically, added some additional overhead instead
of reducing the load. A pivotal part of this narrative is how we used Couchbase — a distributed NoSQL database. If you find
yourself intrigued by another enigmatic story involving Couchbase, don’t miss my
blog post on tuning expired doc settings.
Apr 12 2024
In early 2024, I hit ten years at Allegro, which also happens to be how long I’ve been working with microservices.
This timespan also roughly corresponds to how long the company as a whole has been using them, so I think it’s a good time to outline the story of project
Rubicon: a very ambitious gamble which completely changed how we work and what our software is like. The idea probably seemed rather extreme at the time, yet I
am certain that without this change, Allegro would not be where it is today, or perhaps would not be there at all.
Mar 6 2024
At Allegro, we use Kafka as a backbone for asynchronous communication between microservices. With up to
300k messages published and 1M messages consumed every second, it is a key part of our infrastructure. A few months ago, in our main Kafka cluster, we noticed
the following discrepancy: while median response times for produce requests
were in single-digit milliseconds, the tail latency was much worse. Namely, the
p99 latency was up to 1 second, and the p999 latency was up to 3 seconds. This was unacceptable for a new project that we were about to start, so we
decided to look into this issue. In this blog post, we would like to describe our journey — how we used Kafka protocol sniffing and eBPF to identify and remove
the performance bottleneck.
Feb 20 2024
Have you ever thought about ways of reducing repetitive, monotonous tasks? Maybe you would like to try to automate your own tasks? I will show you what
technology we use at Allegro, what processes we have automated, and how to do it on your own.
Feb 12 2024
This story shows how we strive to fix issues reported by our customers regarding inconsistent listing views on our e-commerce platform.
We will use a top-down manner to guide you through our story. At the beginning, we highlight the challenges faced by our customers, followed by presenting
basic information on how views are personalized on our web application. We then delve deeper into our internal architecture, aiming to clarify how
it supports High Availability (HA) by using two data centers. Finally, we advertise a little Couchbase,
distributed NoSQL database, and explain why it is an excellent storage solution for such an architecture.
Jan 24 2024
Ready to turn web accessibility from a headache into a breeze? Join us as we demystify WCAG, explore its latest 2.2 version, and gaze into the future of digital
inclusivity. Get ready for a journey that’s as enlightening as it is entertaining!
Jan 10 2024
Icons are an integral part of most modern UIs. What is the best way to embed icons nowadays?
Dec 14 2023
This article is a form of a public postmortem in which we would like to share our bumpy way of revealing the cause of a mysterious performance problem.
Besides unveiling part of our technical stack based on open-source solutions, we also show how some false assumptions made such a bug triage process much
harder.
Besides all NOT TO DOs, you can find some exciting information about performance hunting and reproducing performance issues on a small scale.
As a perk, we prepared a repository where you can reproduce the problem and make yourself familiar with tools
that allowed us to confirm the cause.
The last part (lessons learned) is the most valuable if you prefer to learn from the mistakes of others.
Nov 27 2023
B-tree is a structure that helps to search through great amounts of data.
It was invented over 40 years ago, yet it is still employed by the majority of modern databases.
Although there are newer index structures, like LSM trees,
B-tree is unbeaten when handling most of the database queries.
Oct 30 2023
The idea for this article arose during a meeting where we learned that our supervisor would be leaving the company to pursue new opportunities. In response, a
colleague lamented that what we would miss most is the knowledge departing with the leader. Unfortunately, that’s how it goes. Not only do we lose a colleague,
but we also lose valuable knowledge and experience. However, this isn’t a story about my supervisor; it’s a story about all those individuals who are experts in
their fields, who understand the paths to success and paths that lead to catastrophic failures. When they leave, they take with them knowledge that you won’t
find in any book, note, or Jira ticket. And this leads to a fundamental question: What can be done to avoid this “black hole” of knowledge? How can we ensure
it doesn’t vanish along with them? That’s what this article is all about.