Note: I did a little more research and it turns out that the gc runtime creates one OS thread only, and then adds threads as a way to avoid I/O locks. On the other hand the gccgo runtime maps goroutines and p_threads on a 1 to 1 basis (as of now).
I have been lazily following the Go language project for a while now, when it
was suddenly discussed in this post of “Appunti Digitali” (in italian), that
pointed at this post
on Dalke Scientific.
It really got me that S.Python manages to beat Go so
badly, so I decided to run my own tests on my trusty old IBM X-41 (Pentium-M 1.5
GHz).