The four replica identities in PostgreSQL: a practical example

Logical replication is a PostgreSQL feature that produces information about database writes and stores it in a write-ahead log, enabling a consumer to subscribe to these changes and consume them in real-time. A replica identity determines what information is published when an UPDATE or DELETE occurs on a row. It has four different options that can be configured: DEFAULT, FULL, NOTHING, and USING INDEX. The behaviour of these four replica identity options are detailed in the PostgreSQL documentation:

This form changes the information which is written to the write-ahead log to identify rows which are updated or deleted. This option has no effect except when logical replication is in use. DEFAULT (the default for non-system tables) records the old values of the columns of the primary key, if any. USING INDEX records the old values of the columns covered by the named index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULLFULL records the old values of all columns in the row. NOTHING records no information about the old row. (This is the default for system tables.) In all cases, no old values are logged unless at least one of the columns that would be logged differs between the old and new versions of the row.

This is a complete description of the different behaviours for the four replica identity options, but to better illustrate the behaviour, let’s look at a practical example below.

Table definition and database write

The table we’re writing to for this example, users, is defined like so1:

CREATE TABLE users (
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  address VARCHAR (255) NOT NULL,
  age INTEGER NOT NULL,
  PRIMARY KEY (first_name, last_name)
);

Our initial state of this table starts with a single row – a user with a first_name of John, last_name of Smith, address of 123 Fake Street, and age of 30.

We’ll examine the different behaviours for our different replica identities with the following database write, which the user’s first name from John to James:

UPDATE users SET first_name = 'James' WHERE first_name = 'John' AND last_name = 'Smith';

With that context set, let’s look at what is actually written into our write-ahead log for each different replica identity.

DEFAULT replica identity

For the DEFAULT replica identity, we see the following information written to the write-ahead log:

table public.users:
  UPDATE:
    old-key:
      first_name[character varying]:'John'
      last_name[character varying]:'Smith'
    new-tuple:
      first_name[character varying]:'James'
      last_name[character varying]:'Smith'
      address[character varying]:'123 Fake Street'
      age[integer]:30

The DEFAULT replica identity records the old values of the complete primary key, even though we only wrote to one column of our primary key.

FULL replica identity

We receive the old values for every column in the database when we specify FULL:

table public.users:
  UPDATE:
    old-key:
      first_name[character varying]:'John'
      last_name[character varying]:'Smith'
      address[character varying]:'123 Fake Street'
      age[integer]:30
    new-tuple:
      first_name[character varying]:'James'
      last_name[character varying]:'Smith'
      address[character varying]:'123 Fake Street'
      age[integer]:30

NOTHING replica identity

NOTHING only provides us with the new values (ie. provides no old information):

table public.users:
  UPDATE:
    first_name[character varying]:'James'
    last_name[character varying]:'Smith'
    address[character varying]:'123 Fake Street'
    age[integer]:30

USING INDEX replica identity

The behaviour of the USING INDEX replica identity depends on what columns our index is defined on. For example, if we used the index users_age (ie. an index on the age column), we’d see the following information:

table public.users:
  UPDATE:
    first_name[character varying]:'James'
    last_name[character varying]:'Smith'
    address[character varying]:'123 Fake Street'
    age[integer]:30

We didn’t update age, so this behaviour is identical to the NOTHING identity in this case. However, what if if we used the index users_first_name (ie. an index on the first_name column) instead?

table public.users:
  UPDATE:
    old-key:
      first_name[character varying]:'John'
    new-tuple:
      first_name[character varying]:'James'
      last_name[character varying]:'Smith'
      address[character varying]:'123 Fake Street'
      age[integer]:30

We see the previous information on first_name because users_first_name indexes on this column. If we defined an index called users_first_name_last_name on the first_name and last_name column, the behaviour would be identical to a DEFAULT replica identity:

table public.users:
  UPDATE:
    old-key:
      first_name[character varying]:'John'
      last_name[character varying]:'Smith'
    new-tuple:
      first_name[character varying]:'James'
      last_name[character varying]:'Smith'
      address[character varying]:'123 Fake Street'
      age[integer]:30

Conclusion

To summarize, replica identity lets us choose which columns we want to publish the old values for when we do an UPDATE or DELETE operation:

  • DEFAULT specifies the columns specifying the primary key
  • FULL specifies all columns
  • NONE specifies no columns
  • USING INDEX specifies columns that the index is defined on

However, we always receive complete information about the new value of the row, regardless of replica identity.

Start your journey towards writing better software, and watch this space for new content.

1: Using a first name and last name a primary key isn’t a great idea for anything but a toy data model – there are real life examples of multiple people with the same first and last name! However, it lets us define a (semi-believable) primary key that’s a composite of two columns, which is important when describing the DEFAULT replica identity.