Go/JSON serialization woes
Recently I’ve been dabbling a lot with APIs and ran into a bunch of edge cases during JSON serialization that left me puzzling. Trying to make sense of this I wrote a bunch of test cases to instill more sense into how and why Go JSON de/serialization works.
I generally like JSON due to it being readable and nice to write. However it doesn’t really map 1to1 into Go types and when it does it doesn’t necesarilly map the way I’d like it to which of course leads to non-obvious bugs. To make matters worse there are a bunch of JSON libraries out there each of which has their own preferred way of dealing with various edge cases but I’ll only take a look at Go default library in this post.
To quote JSON RFC:
JSON can represent four primitive types (strings, numbers, booleans, and null)
I tend to avoid null
in API design. There are usually better ways for a REST API to
describe what’s up with data than fallbacking to null
.
Go Unmarshal
happily works with null
though.
Here’s what happens if I try to deserialize some basic types:
var si int
var sb bool
var sst string
var sm map[string]string
var siarr []int
// unmarshal `null` then marshal and print to stdout
fmt.Printf("int : %5v | %5s\n", si, marshal(si))
fmt.Printf("bool : %5v | %5s\n", sb, marshal(sb))
fmt.Printf("string : %5v | %5s\n", sst, marshal(sst))
fmt.Printf("map[string]string: %5v | %5s\n", sm, marshal(sm))
fmt.Printf("[]int : %v | %5s\n", siarr, marshal(siarr))
This will print the following:
int: 0 | 0
bool: false | false
string: | ""
map[string]string: map[] | null
[]int: [] | null
Unmarshal
seemingly doesn’t do much but subsequent Marshal
nicely packs
this back into a different JSON representation with map and array being exceptions
that get back into null
. The reason is that Unmarshal
makes ptr array and map
uninitialized. If I manually initialize the array and map (and don’t deserialize from JSON):
sm := map[string]string{}
var siarr []int
I get back the expected empty JSONs
map[string]string: map[] {}
[]int: [] []
It’s not hard to see how this can quickly devolve into a wild bug hunt. There are of course remedies.
Making everything a pointer with omitempty
allows for (possibly) less headaches.
Pointers allow for a third state with null
being represented by a nil
pointer.
Let’s check how this would work on a sample struct with the following definition:
type sampleCustomer struct {
ID *int `json:"id,omitempty"`
Email *string `json:"email,omitempty"`
Archived *bool `json:"archived,omitempty"`
Traits *[]int `json:"traits,omitempty"`
Properties *map[string]string `json:"properties,omitempty"`
}
Now let’s un/marshal a few JSON payloads:
var c0 sampleCustomer
err := json.Unmarshal([]byte(`{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}}`), &c0)
var c1 sampleCustomer
err = json.Unmarshal([]byte(`{}`), &c1)
var c2 sampleCustomer
err = json.Unmarshal([]byte(`{"traits":[], "properties":{}}`), &c2)
fmt.Printf(`{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}} -> %s`, marshal(c0))
fmt.Printf(`{} -> %s` + "\n", marshal(c1))
fmt.Printf(`{"traits":[], "properties":{}} -> %s` + "\n", marshal(c2))
Running the above will output:
({"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}} ->
{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}}
({} -> {}
({"traits":[], "properties":{}} -> {"traits":[],"properties":{}}
Missing fields become nil
when unmarshalled. Empty arrays and empty maps are nicely preserved.
Unfortunately some edge cases remaing:
var c3 sampleCustomer
err = json.Unmarshal([]byte(`{"traits":null, "properties":null}`), &c3)
fmt.Printf(`{"traits":null, "properties":null} -> %s` + "\n", marshal(c3))
What’s to be expected here?. As a dev. I like predictable two-way mappings to come back to the starting point.
In this case marshalling omit’s the null
rather than preserving them.
{"traits":null, "properties":null} -> {}
Regardless of the approach used there are pros and cons. I’m not personally a fan of pointers and avoid them if I can
as that reduces the amount of nil
checking and Go panic
attacks but there are plenty of situations when they are the better choice.
Let’s take a break from null
s for the moment. JSON objects also have a bunch of interesting quirks.
JSON specification doesn’t say much about object ordering. Golang default implementation likes to reorder things around upon marshalling.
The following two strings when unmarshalled to sampleCustomer
:
c1str := `{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}}`
c2str := `{"properties":{"a":"b"},"traits":[5,4,3,2,1],"id":12345,"email":"[email protected]","archived":true}`
output the same JSON when marshalled:
{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}}
Internally default serializer sorts object keys. This is in a way nice but also something to keep in your mind when comparing serialized output to a raw payload.
Duplicate keys in objects aren’t specifically forbidden. This means the following string will deserialize:
c1str := `{"id":12345,"email":"[email protected]","id":67891,"email":"[email protected]"}`
and serialize back to:
{"id":67891,"email":"[email protected]"}
According to specification behavior in such cases is parser dependent. Unmarshal
defaults just pick the last value.
I’d prefer to see an error for such a case because that’s very likely what it is but library defaults are what they are.
There are more cases and more combinations such as deserializing into {}interface
or deserializing arrays with elements of different types
but the ones I wrote about are the ones I found to be common. JSON/REST APIs are ubiquotous and understanding what the preferred JSON library
does is essential to making well behaving API endpoints.
Full code samples are available on my GitHub