Oct 15 2020

Sureness, or how many tests do we need?

Imagine that you were charged with finishing a task started by your colleague. He or she implemented it just before leaving for a vacation. Now it is your job to finish and release it.

Example: To make this example more practical, let’s say that the task is to count reactions that are stored in pageReactionsRepository. This is your colleague solution implemented in Kotlin:

suspend fun getReactionsForPage(pageKey: String, userUuid: String?): PageReactionsResponse =
    pageReactionsRepository.getPageReactions(pageKey)
        .let { reactions ->
            PageReactionsResponse(
                pageKey = pageKey,
                reactionsCount = emptyReactionsMap + reactions.groupingBy { it.reaction }.eachCount(),
                userChoice = userUuid?.let { reactions.find { it.userUuid == userUuid } }?.reaction
            )
        }

Is this solution correct? Think about it for a minute. After analysing above code you might conclude that this solution seems correct, but are you sure about it? An even better question would be: what is your degree of sureness that this solution is correct? There is one thought experiment that help us establish that: How much money would you bet on this function being correct?

Releasing is always a bet #

Think of a release as a bet. You need to put money on correctness, and if you are wrong, you lose it all. It won’t be a fair bet, though. If you are right, you win nothing. If you are wrong, you need to pay. A third alternative is to spend more of your precious time and investigate.

So now, would you bet 10 cents on this solution? Yeah, why not. Would you bet 10 dollars? Some readers probably would, but I wouldn’t. From my observations, most experienced developers wouldn’t either as we’ve seen enough good looking code with a surprising outcome (we sometimes collect such snippets as puzzlers). Would you bet 1000 dollars on this solution working 100% correctly? I think no sane person would.

This game design is no coincidence. It simulates you from your employer’s perspective. Assuming there are no other verification mechanisms, this is how it would work — on the one hand, you can spend more time (which is valuable); on the other, a bug in production can be costly. Sure, in most cases, there are some testers on the way. But depending too heavily on testers is dangerous, and the fact that we missed a bug is already a risk. We will talk about it later.

The stake is different in different products and functionalities #

Most mistakes can be fixed. The highest costs are:

Users’ discontent that can negatively influence their behavior and your company image.
Users’ inability to use your product, thus reducing your income

This cost is much smaller for small companies or start-ups. It is also much smaller for internal products (for our employees). This is why small companies often care less about proper testing. Big products known publicly are under much higher pressure. At Allegro some mistakes might cost the company thousands or even millions of dollars. This is why we take testing seriously.

We build sureness via different forms of checking #

So let’s assume this is a 100 dollars bet. What would you do to make sure it works fine? This is where we start the process of checking.

Note: Checking and testing are not the same, but those two terms are often confused. People often call testing what should be called checking (https://www.infoq.com/news/2009/12/testing-or-checking/). I will not play language nazi here, and I stick with the term “software testing” as a subcategory of checking.

The first intuitive step is to check it out manually. The steps are:

Use this functionality and see how it behaves.
If it seems correct, think about possible problems and check them out. So we try to emulate a situation that might not be handled well, and we see what happens. If anything catches our attention, we stop and try to fix it. If not, we feel more confident that our functionality works fine.

This is what many programmers do, but it works well only in the short term. The problem is that it does not scale well.

We can still lose the bet in the future — we need automatic testing #

When I was still a student, a friend of mine, who worked in a big corporation, complained that other people write some components, and they do not leave any tests. “This is extremely immature,” he said, “Later, I need to touch it to introduce some change, and I have no idea if I broke something or not.” This system couldn’t be tested manually. It was (and is) too big, and I am not sure if there is a single person who understands it all. Even if it could, it would take a preposterous amount of time for each small change.

This is the answer to the question why hiring many manual testers is not a good long-term solution (even though it is practiced by many companies in Poland). It does not scale well. The bigger your system, the more manual testers you need to maintain to just check regression.

We need automatic testing — testing we can run again and again and have it check all main functionalities.

It is not free as it needs to be written, and it needs to be maintained (when we change logic in our system, we need to update tests as well). However, in the end, if your project is big and changes over time, it will surely pay off.

Lack of tests leads to anxious developers and terrible code #

Having a properly tested code is very important for us, developers. Without it, we feel anxious when operating on it. For a good reason: a small change might break who knows what. This has further consequences. When developers are worried about touching the code, and the project is too big for them to comprehend and test it all manually, they will make minimal changes possible. They will not refactor the legacy code. As a consequence:

Components look worse and worse over time.
We will soon have outdated system design, not reflecting how our system works now.
Code will be less and less understandable by developers.

Testing takes time and skills #

On the other hand, writing tests has costs too:

Proper testing takes a lot of time.
Testing requires adjustments in architecture, which is both positive (it is often cleaner) and negative (it is often more complicated).
Testing well is hard and needs to be learned. It is easy to write tests that test little and constrain our code a lot. It is hard to do the other way around.

It is also said, that if we first think about tests before implementation, we design both tests and components differently. Probably in a better way, but who knows for sure.

Testing is not easy, but it is worth to learn it. It takes time, but it will pay off later. We need to learn to write proper tests in an appropriate amount, so they serve us well.

We need to stay in touch with tests #

Another trick teams do when developers do not want to write tests is hiring a tester to write automatic tests. It works… to a certain degree. He might be very helpful by writing end-to-end tests for you, but they are not enough, as we will see later. We should also not entirely give up on testing ourselves.

As creators, we need to know our product and how to use it well. If we don’t, we lose contact with it. This should never happen for many reasons (decision-making, system design, operating on the project). Manual testing is one of the best ways to stay in touch with the system.

Do not send unchecked functionality to a tester #

One terrible mistake I see some developers doing is sending unchecked code to testers without even running it. It is nearly sure it will be back with some mistakes. The time developer saved on not checking will be wasted on getting back to the task and starting it again. He also wasted another person’s time and increased the risk of a problem getting released to production. A tester should be the second person to check something out. Not the only one. This should never happen.

We need to unit test components ourselves #

A tester can write end-to-end tests for the system. Having it well tested, we can have some degree of sureness our system behaves correctly. It does not mean that its components work correctly, though.

A few times, I had a funny situation where mistakes in 2 components canceled each other out, and the system as a whole worked fine. After fixing one of them, unexpected behavior revealed itself. Another common problem is that component is not implemented correctly, but the way we use it does not reveal it. However, if we use this component somewhere else, we might have a problem. This causes a cascade of problems. Refactorization then might be much harder.

Another thing is that when we test components, we know better what extreme situations might break them.

Finally, unit tests are generally much faster than other kinds of tests, and they are an important tool we should use when we develop a solution to check out our code’s correctness.

Ending #

(for summary read headlines)

I hope you have some idea of how many tests do you need. The most important question you should ask yourself is: “How much money would I bet on this solution working correctly.” This is your amount of sureness that this test works now.

You should also ask yourself, “How much money would I bet on this solution not breaking in the future.” This is your degree of sureness that it is properly automated tested.

We use tests to answer essential questions for now and for the future. We trade time for answers. So automated tests should cover everything essential for us to know. Do not depend on “code looks fine.” The code might change. We need enough automated tests to make changes in our code confidently and to be able to verify that it still works fine quickly. Writing useful tests is an art of making decisions by keeping in mind your employer’s best interest.

tests