Recently I’ve been dabbling a lot with APIs and ran into a bunch of edge cases during JSON serialization that left me puzzling. Trying to make sense of this I wrote a bunch of test cases to instill more sense into how and why Go JSON de/serialization works.

I generally like JSON due to it being readable and nice to write. However it doesn’t really map 1to1 into Go types and when it does it doesn’t necesarilly map the way I’d like it to which of course leads to non-obvious bugs. To make matters worse there are a bunch of JSON libraries out there each of which has their own preferred way of dealing with various edge cases but I’ll only take a look at Go default library in this post.

To quote JSON RFC: JSON can represent four primitive types (strings, numbers, booleans, and null) I tend to avoid null in API design. There are usually better ways for a REST API to describe what’s up with data than fallbacking to null. Go Unmarshal happily works with null though. Here’s what happens if I try to deserialize some basic types:

var si int 
var sb bool
var sst string
var sm map[string]string
var siarr []int
// unmarshal `null` then marshal and print to stdout
fmt.Printf("int              : %5v | %5s\n", si, marshal(si))
fmt.Printf("bool             : %5v | %5s\n", sb, marshal(sb))
fmt.Printf("string           : %5v | %5s\n", sst, marshal(sst))
fmt.Printf("map[string]string: %5v | %5s\n", sm, marshal(sm))
fmt.Printf("[]int            : %v  | %5s\n", siarr, marshal(siarr))

This will print the following:

int:                   0 |     0
bool:              false | false
string:                  |    ""
map[string]string: map[] |  null
[]int:             []    |  null

Unmarshal seemingly doesn’t do much but subsequent Marshal nicely packs this back into a different JSON representation with map and array being exceptions that get back into null. The reason is that Unmarshal makes ptr array and map uninitialized. If I manually initialize the array and map (and don’t deserialize from JSON):

sm := map[string]string{}
var siarr []int

I get back the expected empty JSONs

map[string]string: map[]    {}
[]int:                []    []

It’s not hard to see how this can quickly devolve into a wild bug hunt. There are of course remedies. Making everything a pointer with omitempty allows for (possibly) less headaches. Pointers allow for a third state with null being represented by a nil pointer. Let’s check how this would work on a sample struct with the following definition:

type sampleCustomer struct {
	ID         *int               `json:"id,omitempty"`
	Email      *string            `json:"email,omitempty"`
	Archived   *bool              `json:"archived,omitempty"`
	Traits     *[]int             `json:"traits,omitempty"`
	Properties *map[string]string `json:"properties,omitempty"`
}

Now let’s un/marshal a few JSON payloads:

var c0 sampleCustomer
err := json.Unmarshal([]byte(`{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}}`), &c0)

var c1 sampleCustomer
err = json.Unmarshal([]byte(`{}`), &c1)

var c2 sampleCustomer
err = json.Unmarshal([]byte(`{"traits":[], "properties":{}}`), &c2)

fmt.Printf(`{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}} -> %s`, marshal(c0))
fmt.Printf(`{}                                 -> %s` + "\n", marshal(c1))
fmt.Printf(`{"traits":[], "properties":{}}     -> %s` + "\n", marshal(c2))

Running the above will output:

({"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}} -> 
 {"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}}
({}                                 -> {}
({"traits":[], "properties":{}}     -> {"traits":[],"properties":{}}

Missing fields become nil when unmarshalled. Empty arrays and empty maps are nicely preserved. Unfortunately some edge cases remaing:

var c3 sampleCustomer
err = json.Unmarshal([]byte(`{"traits":null, "properties":null}`), &c3)
fmt.Printf(`{"traits":null, "properties":null} -> %s` + "\n", marshal(c3))

What’s to be expected here?. As a dev. I like predictable two-way mappings to come back to the starting point. In this case marshalling omit’s the null rather than preserving them.

{"traits":null, "properties":null} -> {}

Regardless of the approach used there are pros and cons. I’m not personally a fan of pointers and avoid them if I can as that reduces the amount of nil checking and Go panic attacks but there are plenty of situations when they are the better choice.

Let’s take a break from nulls for the moment. JSON objects also have a bunch of interesting quirks. JSON specification doesn’t say much about object ordering. Golang default implementation likes to reorder things around upon marshalling. The following two strings when unmarshalled to sampleCustomer:

c1str := `{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}}`
c2str := `{"properties":{"a":"b"},"traits":[5,4,3,2,1],"id":12345,"email":"[email protected]","archived":true}`

output the same JSON when marshalled:

{"id":12345,"email":"[email protected]","archived":true,"traits":[5,4,3,2,1],"properties":{"a":"b"}}

Internally default serializer sorts object keys. This is in a way nice but also something to keep in your mind when comparing serialized output to a raw payload.

Duplicate keys in objects aren’t specifically forbidden. This means the following string will deserialize:

c1str := `{"id":12345,"email":"[email protected]","id":67891,"email":"[email protected]"}`

and serialize back to:

{"id":67891,"email":"[email protected]"}

According to specification behavior in such cases is parser dependent. Unmarshal defaults just pick the last value. I’d prefer to see an error for such a case because that’s very likely what it is but library defaults are what they are.

There are more cases and more combinations such as deserializing into {}interface or deserializing arrays with elements of different types but the ones I wrote about are the ones I found to be common. JSON/REST APIs are ubiquotous and understanding what the preferred JSON library does is essential to making well behaving API endpoints.

Full code samples are available on my GitHub