All posts by Michał Knasiecki

Probabilistic Data Structures and Algorithms in NoSQL databases

What would you say if we stored 1 000 records in a database, and the database claimed that there were only 998 of them? Or, if we created a database storing sets of values and in some cases the database would claim that some element was in that set, while in fact it was not? It definitely must be a bug, right? It turns out such behavior is not necessarily an error, as long as we use a database that implements probabilistic algorithms and data structures. In this post we will learn about two probability-based techniques, perform some experiments and consider when it is worth using a database that lies to us a bit.

GC, hands off my data!

Each of us has probably experienced a time in our career when we wanted to get rid of the Garbage Collector from our application because it was running too long, too often, and perhaps even led to temporary system freezes. What if we could still benefit from the GC, but in special cases, also be able to store data beyond its control? We could still take advantage of its convenience and, at the same time, be able to easily get rid of long GC pauses. It turns out that it is possible. In this article, we will look at whether and when it is worth storing data beyond the reach of the Garbage Collector’s greedy hands.

Evaluating performance of time series collections

A new version of MongoDB, 5.0, has been recently launched. The list of changes included one that I found particularly interesting: the time series collections. It is a method of effective storing and processing of time-ordered value series. In this article we will verify whether the processing of time series is really as fast as promised by the authors.

Comparing MongoDB composite indexes

One of the key elements ensuring efficient operation of the services we work on every day at Allegro is fast responses from the database. We spend a lot of time to properly model the data so that storing and querying take as little time as possible. You can read more about why good schema design is important in one of my earlier posts. It’s also equally important to make sure that all queries are covered with indexes of the correct type whenever possible. Indexes are used to quickly search the database and under certain conditions even allow results to be returned directly from the index, without the need to access the data itself. However, indexes are not all the same and it’s important to learn more about their different types in order to make the right choices later on.

Impact of data model on MongoDB database size

So I was tuning one of our services in order to speed up some MongoDB queries. Incidentally, my attention was caught by the size of one of the collections that contains archived objects and therefore is rarely used. Unfortunately I wasn’t able to reduce the size of the documents stored there, but I started to wonder: is it possible to store the same data in a more compact way? Mongo stores JSON that allows many different ways of expressing similar data, so there seems to be room for improvements.

Michał Knasiecki

Java/Kotlin/Scala software developer working at Allegro in automatic frauds prevention team.