Performance and Availability pitfalls in the Microservices Architecture — CQK Top 10
High performance and availability are always hard to achieve, and microservice architecture is no exception. In the microservices world, every user request sent to a frontend application triggers a cascade of remote calls. The frontend application calls API facades, API facades ask backend services, backend services communicate with databases and even more backend services. Statistics work against you. Latencies add up but failure probabilities combine by multiplication. The more services are engaged in a flow, the more threats for performance and availability.
Since we have been doing microservices at scale for a while, we can share our experience about common availability pitfalls and performance bottlenecks.
Our Top 10 is meant as a self-check list. There are ten topics, each one is covered in a separate article. In each article, you will find a few questions. You can go through these questions and ask yourself whether your system is safe or vulnerable in that area. Of course, some questions could be not relevant for you, so just skip them.
Each question is briefly explained and we give you some hints about possible improvements.
Now, we are publishing first three articles. The rest will be published soon as we need to translate our original texts from Polish to English.
CQK Top 10 #
- Thread pools
- JVM Garbage Collection
- Caching
- Application Metrics
- Communication with Dependencies
- HTTP Client
- Performance and Availability
- HTTP and REST API
- Databases
- Anesthesia
What’s CQK after all? #
CQK stands for Code Quality Keepers. We are a group of engineers at Allegro who are passionate about code quality. Besides our normal day-to-day job (most of us are team leaders or senior developers), we help dev teams to improve quality of their code. All teams work on their projects with pull requests and code reviews on daily basis. CQK is just an opportunity to spread ideas and knowledge between teams.
Inside CQK, we discuss code quality a lot. How can you define truly good code? Can you measure its goodness? Surprisingly, it’s really hard to come up with a better definition of code quality other than zen-style saying:
Whatever works for you, is good.
What’s worse, things change. Two years ago we focused on the common understanding of code quality. So readability, maintainability, TDD, DDD, … But when you look at a microservice, maybe more important than its code aesthetics is how resilient it is?
Personally, I really like the definition proposed by Robert L. Glass in his great book — Facts and Fallacies of Software Engineering:
Quality in the software field is about collection of seven attributes that a quality software product shoud have: portability, reliability, efficiency, usability, testability, understability, and modifiablity.
This definition is very tricky because it doesn’t define absolute weights (priorities) for all ot these -ilities. Simply, it depends on your project. When you write a computer game, maybe portability is the key. But when you write a microservice, I bet that you will care much about reliability.
Sometimes, after long discussion we reach an accord. CQK Top 10 is in fact our latest definition of code quality in the microservices world.
Back to the list of CQK Top 10 articles.