Chopping the data: Intro to cursor paging

Projects generally go through a series of phases. Greenfield projects move fast trading sound engineering practices for speed of delivery. The ones that make it before funding runs out, run into a series of technical limitations that stem from rushed early decisions. One practice that I’ve seen universally is to load all data server side and leave frontend to deal with resulting mess. Initially this always works due to lack of data. Scaling up eventually exposes a hot mess of weird frontend hacks coupled with large memory footprint and bad performance on API side.

The alternative to loading the DB is called paging and in this post I want to show a variant of paging using cursors in Golang on top of Postgres DB.

For starters we’re going to need some data to query. I’ve come up with a simple schema that will contain a list of computer models:

CREATE TABLE public.computer (
  id SERIAL,
  name VARCHAR(255),
  CONSTRAINT computer_pk PRIMARY KEY (id)
);

Let’s insert some data into computer table (source: Wikipedia):

INSERT INTO public.computer (name) VALUES
  ('Commodore 64'),
  ('Altair 8800'),
  ('Apple I and also Replica 1'),
  ('Applix 1616') --,
  -- ...
  ('TEC-1'),
  ('Wave Mate Bullet');

In order to page through the table I’m going to make use of id column and use SQL LIMIT to limit amount of data per page. But first I need a struct to represent the Computer model in Go code:

type Computer struct {
  ID   int
  Name string
}

For pgxscan library this is enough to later derive database field names id and name.

Now that data and models are ready it’s time to write the core DB paging function:

type computer struct {
  db *pgx.Conn
}

func (c *computer) GetPaged(ctx context.Context, 
                            lastID int, 
                            limit int) ([]paging.Computer, error) {
  computers := []paging.Computer{}
  err := pgxscan.Select(ctx, c.db, &computers, 
    "SELECT id, name FROM computer 
     WHERE id > $1 
     ORDER BY id 
     LIMIT $2", lastID, limit)
    return computers, err
}

GetPaged will first be called with lastID set to 0 meaning it will return first limit count of computer models. Query sorts computers by id which guarantees an order so returned data is consistent. To get to the next page take the last computer id of previous page and call GetPaged again.

How do we know we’re at the last page? len() of returned computers will contain less than limit elements. In case last page is full, execution will stop on an empty page thereafter.

Here’s a bit of sample code that iterates over computer table in increments of 10:

func PrintAllComputers(ctx context.Context) {
  comp := sql.Computer()

  lastID := 0
  page := 0
  limit := 10
  for {
    pagedComputers, err := comp.GetPaged(ctx, lastID, limit)
    if err != nil {
      log.Info().Err(err).Msg("failed getting computers")
    }

    log.Info().Msgf("Paged computers - page: %d", page)
    for _, comp := range pagedComputers {
      log.Info().Msg(comp.String())
    }

    if len(pagedComputers) < limit {
      break
    }
    lastID = pagedComputers[len(pagedComputers)-1].ID
    page++
  }
}

As mentioned lastID := 0 sets up initial condition relying on serial Postgres column starting with values greater than 0). On each iteration I print out page info using my log library du jour zerolog.

Note the termination condition where code checks if this is the last page and sets next cursor value to last computer id. SQL query filters by id greater than lastID so there’s no need to think about off-by-ones:

    if len(pagedComputers) < limit {
      break
    }
    lastID = pagedComputers[len(pagedComputers)-1].ID

Executing the sample code with page size 10 yields the following output:

{"level":"info","time":1663100905,"message":"Paged computers - page: 0"}
{"level":"info","time":1663100905,"message":"id:   1 | name: Commodore 64"}
# ...some data omitted ...
{"level":"info","time":1663100905,"message":"id:  10 | name: Ferguson Big Board"}
{"level":"info","time":1663100905,"message":"Paged computers - page: 1"}
{"level":"info","time":1663100905,"message":"id:  11 | name: Galaksija"}
# ...some data omitted ...
{"level":"info","time":1663100905,"message":"id:  20 | name: Micro-Professor MPF-I"}
{"level":"info","time":1663100905,"message":"Paged computers - page: 2"}
{"level":"info","time":1663100905,"message":"id:  21 | name: Nascom 1 and Nascom 2"}
{"level":"info","time":1663100905,"message":"id:  22 | name: Newbear 77-68"}
{"level":"info","time":1663100905,"message":"id:  23 | name: Processor Technology SOL 20"}
{"level":"info","time":1663100905,"message":"id:  24 | name: PSI Comp 80 (computer)"}
{"level":"info","time":1663100905,"message":"id:  25 | name: SCELBI"}
{"level":"info","time":1663100905,"message":"id:  26 | name: Sinclair ZX80 kit"}
{"level":"info","time":1663100905,"message":"id:  27 | name: Tangerine MICROTAN 65"}
{"level":"info","time":1663100905,"message":"id:  28 | name: TEC-1"}
{"level":"info","time":1663100905,"message":"id:  29 | name: Wave Mate Bullet"}

Cursor paging is straightforward to implement and has a number of nice characteristics:

  • limits memory usage on the API side
  • queries execute efficiently on large data sets
  • data to load on frontend side resulting in lower latency and page loading times for end users
  • makes for good use of DB indices on primary keys (in our case id)

The downsides are that it does require a bit more thought about API design and frontend-backend interaction to make for a seamless dev experience. It’s also not always usable due to not having a natural cursor to use. The alternative is of course classic paging using LIMIT x OFFSET y.

Cursor paging is definitely worth the trouble and I hope to see it popup more frequently in emerging codebases.

For a full code sample check my GitHub.

Recent posts: