What would you say if we stored 1 000 records in a database, and the database claimed that there were only 998 of them? Or, if we created a database storing sets of values and in some cases the database would claim that some element was in that set, while in fact it was not? It definitely must be a bug, right? It turns out such behavior is not necessarily an error, as long as we use a database that implements probabilistic algorithms and data structures. In this post we will learn about two probability-based techniques, perform some experiments and consider when it is worth using a database that lies to us a bit.
Each of us has probably experienced a time in our career when we wanted to get rid of the Garbage Collector from our application because it was running too long, too often, and perhaps even led to temporary system freezes. What if we could still benefit from the GC, but in special cases, also be able to store data beyond its control? We could still take advantage of its convenience and, at the same time, be able to easily get rid of long GC pauses. It turns out that it is possible. In this article, we will look at whether and when it is worth storing data beyond the reach of the Garbage Collector’s greedy hands.
A new version of MongoDB, 5.0, has been recently launched. The list of changes included one that I found particularly interesting: the time series collections. It is a method of effective storing and processing of time-ordered value series. In this article we will verify whether the processing of time series is really as fast as promised by the authors.
One of the key elements ensuring efficient operation of the services we work on every day at
Allegro is fast responses from the database.
We spend a lot of time to properly model the data so that storing and querying take as little time as possible.
You can read more about why good schema design is important in one of my earlier
posts.
It’s also equally important to make sure that all queries are covered with indexes of the correct type whenever
possible. Indexes are used to quickly search the database and under certain conditions even allow results to be
returned directly from the index, without the need to access the data itself. However, indexes are not
all the same and it’s important to learn more about their different types in order to make the right choices later on.
So I was tuning one of our services in order to speed up some MongoDB queries. Incidentally, my attention was caught by
the size of one of the collections that contains archived objects and therefore is rarely used. Unfortunately I wasn’t
able to reduce the size of the documents stored there, but I started to wonder: is it possible to store the same data
in a more compact way? Mongo stores JSON that allows many different ways of expressing similar data, so there seems
to be room for improvements.
Michał Knasiecki
Java/Kotlin/Scala software developer working at Allegro in automatic frauds prevention team.