In this post I present a story of a bug that hit us recently. Everything was caused by unexpected (although documented) behavior of Go built-in function append. This bug has lived silently for nearly a year in allegro/marathon-consul. Ensure you run the latest version.

The missing service

Dude, where is my service

At Allegro we build our infrastructure on the top of Mesos and Marathon. For discovery service we use Consul. Services registration is done by allegro/marathon-consul, a simple tool written in Go to register services started by Marathon in Consul.

One day we deployed a service. It was neither a new service nor a big release. It was just a regular deployment. But there was a problem. The service had two ports, each registered in Consul. After deployment, both ports had the same tags although they were configured differently. This might not sound like a serious issue but it was. The service was unavailable because its clients couldn’t find it due to invalid tags on a port responsible for handling clients’ requests.

Marathon-Consul hasn’t been touched for some time, so it was very unlikely that it was responsible for malformed registration. Application configuration in Marathon looked good. There were some global service tags on application level and additional tags on each port. Why Marathon-Consul messed this up?

We checked what had changed in the new deployment and the only differences were the service version and an additional service tag that was added. Why adding a new tag results in such a weird behavior? We deleted this tag and the service has registered correctly. We added it back and tags were filled wrong. We added a test to reproduce it and contributed a fix.

The bug

The bug lied in the following code:

commonTags := labelsToTags(app.Labels)
var intents []RegistrationIntent
for _, d := range definitions {
        intents = append(intents, RegistrationIntent{
                Name: app.labelsToName(d.Labels, nameSeparator),
                Port: task.Ports[d.Index],
                Tags: append(commonTags, labelsToTags(d.Labels)...), // ◀ Wrong tags here
        })
}

func labelsToTags(labels map[string]string) []string {
    tags := []string{}
    for key, value := range labels {
        if value == "tag" {
            tags = append(tags, key)
        }
    }
    return tags
}

The bug is not easy to hit and probably that’s why it wasn’t covered in tests and nobody reported it before. To reproduce it, an application must have at least two ports with different tags on each. When commonTags size is power of two it worked but in other case — it didn’t. It’s a rare case a service has multiple ports (80% of our applications have only one port) and even rarer when ports have additional tags (8% of our ports have tags) and only one has multiple tagged ports.

The bug can be distilled to the example below. Let’s unroll the loop to just two iterations and use ints instead of structures. Then rename commonTags to x and fill it with some values. Finally, use y and z instead of intents[0] and intents[1]. What’s the output of the following code?

package main

import (
    "fmt"
)

func a() {
    x := []int{}
    x = append(x, 0)
    x = append(x, 1)  // commonTags := labelsToTags(app.Labels)
    y := append(x, 2) // Tags: append(commonTags, labelsToTags(d.Labels)...)
    z := append(x, 3) // Tags: append(commonTags, labelsToTags(d.Labels)...)
    fmt.Println(y, z)
}

func b() {
    x := []int{}
    x = append(x, 0)
    x = append(x, 1)
    x = append(x, 2)  // commonTags := labelsToTags(app.Labels)
    y := append(x, 3) // Tags: append(commonTags, labelsToTags(d.Labels)...)
    z := append(x, 4) // Tags: append(commonTags, labelsToTags(d.Labels)...)
    fmt.Println(y, z)
}

func main() {
    a()
    b()
}

First guess could be

[0, 1, 2] [0, 1, 3]
[0, 1, 2, 3] [0, 1, 2, 4]

but in fact it results in

[0, 1, 2] [0, 1, 3]
[0, 1, 2, 4] [0, 1, 2, 4]

Function a() works as expected but behavior of b() is not what we were expecting.

Slices

Slices

To understand this not obvious behavior we need some background on how slices works and what happens when we call append.

Slice is a triple of pointer to first element, length and capacity (length ≤ capacity). Memory is a continuous block of data but slice uses only length of capacity.

slice_1.svg

According to documentation of append:

The append built-in function appends elements to the end of a slice. If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated. Append returns the updated slice. It is therefore necessary to store the result of append, often in the variable holding the slice itself:

append allocates a new slice if new elements do not fit into the current slice, but when they fit they will be added at the end. append always returns a new slice but (as the slice is a triple of address, length and capacity) the new slice could have the same address and capacity and differs only on the length.

How slices grow?

One does not simply append to a slice

Above paragraph doesn’t answer why code works like this. To understand it, we need to go deeper in Go code. Let’s take a look at growslice function of Go runtime. It’s called by append when a slice doesn’t have enough capacity for all appended elements.

// growslice handles slice growth during append.
// It is passed the slice element type, the old slice, and the desired new minimum capacity,
// and it returns a new slice with at least that capacity, with the old data
// copied into it.
// The new slice's length is set to the old slice's length,
// NOT to the new requested capacity.
// This is for codegen convenience. The old slice's length is used immediately
// to calculate where to write new values during an append.
// TODO: When the old backend is gone, reconsider this decision.
// The SSA backend might prefer the new length or to return only ptr/cap and save stack space.
func growslice(et *_type, old slice, cap int) slice

when slice needs to grow, it doubles its size. In fact there is more logic to handle growing heuristics, but in our case it grows just like this.

Connect the dots

Let’s go through b() step by step.

x := []int{}
x = append(x, 0)
x = append(x, 1)

Create a slice with 2 elements. 1.svg

x = append(x, 2)

Append one element. x is too small so it needs to grow. It doubles its capacity. 2.svg

y := append(x, 3)

Append one element. Slice has free space at the end so 3 is stored there. 3.svg

z := append(x, 4)

Append one element. Slice has free space at the end so 4 is stored there and overwrites 3 stored before. 4.svg

All 3 slices: x, y and z point to the same memory block.

Why it’s working in a()? The Answer is really simple. There is a slice of capacity two and when we append one element it’s copied to a new space. That’s why we end up with x, y and z pointing to different memory blocks.

5.svg

TL;DR

Be careful when using append. Don’t append to slices you want to keep unchanged. If you want to work on a copy of a slice data, you must explicitly copy it into a new slice.

What if I told you