If something looks like a bug in the Language Runtime, Standard Library or the Operating System I tend to always approach it with caution: It's usually a bug in my code and I'm just not seeing it.
But sometimes it's not me - it's really the compiler and you spend a solid week debugging a Go program until you find out that cross-compiling from OSX to Linux leads to a stdlib Bug that manifests itself with the whole application just hanging in IOWait
loops given enough concurrency.
Obviously the whole thing was really frustrating because:
- The bug only happened on production servers (obviously - anything else would not be fun).
- Could only be reproduced on a large dataset of 300 million items (so every test also takes quite a while)
- I had to test if it works without concurrency (which took 2 days and yes it did)
But the important finding from this exercise was that you can print the full stacktrace of all running Goroutines as well as their status for a running/hanging program!
You just have to send the kill -ABRT
signal to a process!
This is similar to what you see when a panic
occurs and was massively helpful in hunting down this bug. Kudos to the Go team for that.
An example for this:
package main func main() { for {} }
The program will obviously hang and do a busy loop, but if you send the kill -ABRT
signal you'll get something similar to this printed to stderr
:
SIGABRT: abort PC=0x1056d70 m=0 sigcode=0 goroutine 1 [running]: main.main() /Users/tigraine/projects/test/main.go:4 fp=0xc00003c788 sp=0xc00003c780 pc=0x1056d70 runtime.main() /usr/local/Cellar/go/1.14.1/libexec/src/runtime/proc.go:203 +0x212 fp=0xc00003c7e0 sp=0xc00003c788 pc=0x102b3f2 runtime.goexit() /usr/local/Cellar/go/1.14.1/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00003c7e8 sp=0xc00003c7e0 pc=0x10528f1 ...