I’ve implemented parameter interpolation in go-sql-driver/mysql. You can enable it by adding “interpolateParams=true” option to dsn.
Why this feature is important is described in here and here. I don’t say lot about it here.
When writing low level library like MySQL driver, you can’t assume user’s performance requirements. Especially, avoiding memory allocations is important. Go’s GC is not so fast. And everyone want memory usage of their server is stable.
So I’ve reduced allocations in the interpolation as possible. This post describes how I did it and coding tips to avoid allocations.
1. Write it correct.
Since it was new feature, I wrote it quickly like writing Python code. (Go can be used like Python or C!).
I’ve made pull request with “[RFC]” in title. We’ve discussed about feature and rough design first.
2. Write benchmark program.
Before starting tuning, I’ve prepared benchmark program to confirm performance difference.
Writing benchmark program in Go is easy: Write function like BenchmarkXxxx(b *testing.B)
in xxx_test.go
. To see allocations, call b.ReportAllocs()
in it.
Here is benchmark function I’ve wrote:
func BenchmarkInterpolation(b *testing.B) {
mc := &mysqlConn{
cfg: &config{
interpolateParams: true,
loc: time.UTC,
},
maxPacketAllowed: maxPacketSize,
maxWriteSize: maxPacketSize - 1,
}
args := []driver.Value{
int64(42424242),
float64(math.Pi),
false,
time.Unix(1423411542, 807015000),
[]byte("bytes containing special chars ' \" \a \x00"),
"string containing special chars ' \" \a \x00",
}
q := "SELECT ?, ?, ?, ?, ?, ?"
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, err := mc.interpolateParams(q, args)
if err != nil {
b.Fatal(err)
}
}
}
To run it:
$ go test -bench=BenchmarkInterpolation
PASS
BenchmarkInterpolation 300000 3887 ns/op 1144 B/op 15 allocs/op
ok github.com/go-sql-driver/mysql 2.386s
3. Identify allocations
When setting allocfreetrace=1
to environment variable GODEBUG
, you can see stacktrace where allocation occures (reference).
But using it while running go test -bench=BenchmarkInterpolation
makes huge log including large noise. To reduce log, build mysql.test
with go test -c
and run it.
$ go test -c
$ GODEBUG=allocfreetrace=1 ./mysql.test -test.run=none -test.bench=BenchmarkInter -test.benchtime=10ms 2>trace.log
PASS
BenchmarkInterpolation 5000 4095 ns/op 1144 B/op 15 allocs/op
-test.run=none
prevents running tests before benchmark. -test.benchtime=10ms
reduces execution time and log size.
Now I have trace.log
. Open it with vim and search interpolateParams
. Cut unnecessary stacktrace before it. I can see stacktrace like this:
tracealloc(0xc2080100e0, 0x70, string)
goroutine 5 [running]:
runtime.mallocgc(0x70, 0x22e4e0, 0x0, 0x2)
/usr/local/Cellar/go/1.4.1/libexec/src/runtime/malloc.go:327 +0x32a fp=0xc20802ea60 sp=0xc20802e9b0
runtime.newarray(0x22e4e0, 0x7, 0x15c5e)
/usr/local/Cellar/go/1.4.1/libexec/src/runtime/malloc.go:365 +0xc1 fp=0xc20802ea98 sp=0xc20802ea60
runtime.makeslice(0x2229c0, 0x7, 0x7, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.4.1/libexec/src/runtime/slice.go:32 +0x15c fp=0xc20802eae0 sp=0xc20802ea98
strings.genSplit(0x30c190, 0x17, 0x2e1f10, 0x1, 0x0, 0x7, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.4.1/libexec/src/strings/strings.go:287 +0x14d fp=0xc20802eb60 sp=0xc20802eae0
strings.Split(0x30c190, 0x17, 0x2e1f10, 0x1, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.4.1/libexec/src/strings/strings.go:325 +0x76 fp=0xc20802ebb0 sp=0xc20802eb60
github.com/go-sql-driver/mysql.(*mysqlConn).interpolateParams(0xc20802eed0, 0x30c190, 0x17, 0xc20802ee70, 0x6, 0x6, 0x0, 0x0, 0x0, 0x0)
/Users/inada-n/go1.4/src/github.com/go-sql-driver/mysql/connection.go:180 +0x86 fp=0xc20802ed38 sp=0xc20802ebb0
github.com/go-sql-driver/mysql.BenchmarkInterpolation(0xc20806a400)
/Users/inada-n/go1.4/src/github.com/go-sql-driver/mysql/benchmark_test.go:240 +0x437 fp=0xc20802ef58 sp=0xc20802ed38
testing.(*B).runN(0xc20806a400, 0x1)
/usr/local/Cellar/go/1.4.1/libexec/src/testing/benchmark.go:124 +0x95 fp=0xc20802ef68 sp=0xc20802ef58
testing.(*B).launch(0xc20806a400)
/usr/local/Cellar/go/1.4.1/libexec/src/testing/benchmark.go:199 +0x78 fp=0xc20802efd8 sp=0xc20802ef68
runtime.goexit()
/usr/local/Cellar/go/1.4.1/libexec/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc20802efe0 sp=0xc20802efd8
created by testing.(*B).run
/usr/local/Cellar/go/1.4.1/libexec/src/testing/benchmark.go:179 +0x3e
...
This stacktrace shows strings.Split()
called from interpolateParams()
cause allocation. You can see also filename and line number (connection.go:180
).
Here is the code:
func (mc *mysqlConn) interpolateParams(query string, args []driver.Value) (string, error) {
chunks := strings.Split(query, "?")
if len(chunks) != len(args)+1 {
return "", driver.ErrSkip
}
4. Tuning up
OK. It’s time to start optimize. Let’s record current benchmark result. I use it later.
$ go test -bench=BenchmarkInterpolation | tee old.txt
PASS
BenchmarkInterpolation 500000 3942 ns/op 1144 B/op 15 allocs/op
ok github.com/go-sql-driver/mysql 3.219s
4.1. Avoid concatenate strings, Use []byte and append
My first version is very Pythonic. When interpolating db.Exec("SELECT SLEEP(?)", 42)
, my code did like this:
parts := []string{"SELECT SLEEP(", escape(42), ")"}
query := strings.Join(parts, "")
Since Go’s string is immutable, making temporary string cause allocations. So I replaced stirngs.Join()
with []byte
and append
. I can use strconv.AppendInt()
and strconv.AppendFloat()
to avoid temporal string when formatting int64 and float64.
Now my code looks like:
buf := make([]byte, 0, estimatedSize)
argPos := 0
for _, c := range []byte(query) {
if c != '?' {
buf = append(buf, c)
continue
}
arg := args[argPos]
argPos++
if arg == nil {
buf = append(buf, []byte("NULL")...)
continue
}
switch v := arg.(type) {
case int64:
buf = strconv.AppendInt(buf, v, 10)
...
4.2. Use benchcmp
I’ve made first optimization. Let’s measure it.
benchcmp
is famous tool to compare benchmark result. You can include it’s output in commit log. It makes your commit looks cool.
$ go get -u golang.org/x/tools/cmd/benchcmp
$ go test -bench=BenchmarkInterpolation > new.txt
$ benchcmp old.txt new.txt
benchmark old ns/op new ns/op delta
BenchmarkInterpolation 3942 2573 -34.73%
benchmark old allocs new allocs delta
BenchmarkInterpolation 15 6 -60.00%
benchmark old bytes new bytes delta
BenchmarkInterpolation 1144 560 -51.05%
You can see number of allocation is reduced from 15 to 6. Execution time and memory consumption are also reduced.
4.3. Avoid Time.Format()
time.Time
doesn’t provides methods like strconv.AppendInt()
.
To avoid making temporary string, I’ve wrote it myself.
Before:
case time.Time:
if v.IsZero() {
buf = append(buf, []byte("'0000-00-00'")...)
} else {
fmt := "'2006-01-02 15:04:05.999999'"
if v.Nanosecond() == 0 {
fmt = "'2006-01-02 15:04:05'"
}
s := v.In(mc.cfg.loc).Format(fmt)
buf = append(buf, []byte(s)...)
}
After (including other improvements):
case time.Time:
if v.IsZero() {
buf = append(buf, "'0000-00-00'"...)
} else {
v := v.In(mc.cfg.loc)
v = v.Add(time.Nanosecond * 500) // To round under microsecond
year := v.Year()
year100 := year / 100
year1 := year % 100
month := v.Month()
day := v.Day()
hour := v.Hour()
minute := v.Minute()
second := v.Second()
micro := v.Nanosecond() / 1000
buf = append(buf, []byte{
'\'',
digits10[year100], digits01[year100],
digits10[year1], digits01[year1],
'-',
digits10[month], digits01[month],
'-',
digits10[day], digits01[day],
' ',
digits10[hour], digits01[hour],
':',
digits10[minute], digits01[minute],
':',
digits10[second], digits01[second],
}...)
if micro != 0 {
micro10000 := micro / 10000
micro100 := micro / 100 % 100
micro1 := micro % 100
buf = append(buf, []byte{
'.',
digits10[micro10000], digits01[micro10000],
digits10[micro100], digits01[micro100],
digits10[micro1], digits01[micro1],
}...)
}
buf = append(buf, '\'')
}
It reduces two allocations:
benchmark old allocs new allocs delta
BenchmarkInterpolation 6 4 -33.33%
4.4. Avoid range
when iterating string
Usually, range
is used for iterating slice. But for _, c := range s
(where s
is string
) produces rune
, not byte
.
My first code used for i, c := range([]byte(s)) {
. But Go 1.4 make new slice and copy s
to it. (Go 1.5 optimize it out).
So I’ve used C-like for loop:
@@ -210,8 +210,8 @@ func (mc *mysqlConn) interpolateParams(query string, args []driver.Value) (strin
buf := make([]byte, 0, estimated)
argPos := 0
- // Go 1.5 will optimize range([]byte(string)) to skip allocation.
- for _, c := range []byte(query) {
+ for i := 0; i < len(query); i++ {
+ c := query[i]
if c != '?' {
It reduces one allocation.
benchmark old allocs new allocs delta
BenchmarkInterpolation 4 3 -25.00%
4.5. append
string
to []byte
directly
Concatenating slices can be written like buf = append(buf, token...)
. Basically, buf
and token
should have same type. But there is one exception: when buf
is []byte
, token
can be string
.
@@ -210,17 +210,19 @@ func (mc *mysqlConn) interpolateParams(query string, args []driver.Value) (strin
argPos := 0
for i := 0; i < len(query); i++ {
- c := query[i]
- if c != '?' {
- buf = append(buf, c)
- continue
+ q := strings.IndexByte(query[i:], '?')
+ if q == -1 {
+ buf = append(buf, query[i:]...)
+ break
}
+ buf = append(buf, query[i:i+q]...)
+ i += q
arg := args[argPos]
argPos++
if arg == nil {
- buf = append(buf, []byte("NULL")...)
+ buf = append(buf, "NULL"...)
continue
}
...
I couldn’t reduce allocation from this change. But I could improve readability a lot.
4.6. Make two same function: for []byte and for string
MySQL requires same escaping for binary and string.
Sadly, since Go doesn’t have “read only slice”, converting between string
and []byte
cause allocation and copy. Current compiler can optimize it only for very simple cases.
So I’ve made mostly same two functions for []byte
and string
.
Generally speaking, duplicated code is evil. But TEXT
and BLOB
column may store large data. I want to avoid allocating and copying large data. So I did evil.
It reduces one more allocation.
benchmark old ns/op new ns/op delta
BenchmarkInterpolation 2463 2118 -14.01%
benchmark old allocs new allocs delta
BenchmarkInterpolation 3 2 -33.33%
benchmark old bytes new bytes delta
BenchmarkInterpolation 496 448 -9.68%
4.7. Use send/receive buffer for scratchpad
Last two allocation is: (1) Make scratchpad buffer for interpolate, and (2) convert it to string.
I can remove them by create MySQL packet directly. But it may breaks separation of packet layer and higher layer. So I didn’t it for now.
Instead, I borrowed send/receive buffer and use it as scratchpad buffer. It reduces one allocation.
benchmark old ns/op new ns/op delta
BenchmarkInterpolation 1900 1403 -26.16%
benchmark old allocs new allocs delta
BenchmarkInterpolation 2 1 -50.00%
benchmark old bytes new bytes delta
BenchmarkInterpolation 448 160 -64.29%
Conclusion
Go’s compiler is not intelligent compared with gcc or clang. You should use some tips to reduce allocation.
But Go’s runtime is great. I can measure and find allocation quickly. I think it is more important than magical optimization.