If something looks like a bug in the Language Runtime, Standard Library or the Operating System I tend to always approach it with caution: It's usually a bug in my code and I'm just not seeing it.
But sometimes it's not me - it's really the compiler and you spend a solid week debugging a Go program until you find out that cross-compiling from OSX to Linux leads to a stdlib Bug that manifests itself with the whole application just hanging in IOWait loops given enough concurrency.
Obviously the whole thing was really frustrating because:
- The bug only happened on production servers (obviously - anything else would not be fun).
- Could only be reproduced on a large dataset of 300 million items (so every test also takes quite a while)
- I had to test if it works without concurrency (which took 2 days and yes it did)
But the important finding from this exercise was that you can print the full stacktrace of all running Goroutines as well as their status for a running/hanging program!
You just have to send the kill -ABRT signal to a process!
This is similar to what you see when a panic occurs and was massively helpful in hunting down this bug. Kudos to the Go team for that.
An example for this:
package main
func main() {
for {}
}
The program will obviously hang and do a busy loop, but if you send the kill -ABRT signal you'll get something similar to this printed to stderr:
SIGABRT: abort
PC=0x1056d70 m=0 sigcode=0
goroutine 1 [running]:
main.main()
/Users/tigraine/projects/test/main.go:4 fp=0xc00003c788 sp=0xc00003c780 pc=0x1056d70
runtime.main()
/usr/local/Cellar/go/1.14.1/libexec/src/runtime/proc.go:203 +0x212 fp=0xc00003c7e0 sp=0xc00003c788 pc=0x102b3f2
runtime.goexit()
/usr/local/Cellar/go/1.14.1/libexec/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00003c7e8 sp=0xc00003c7e0 pc=0x10528f1
...

