Irmin: September 2020 updateby the Irmin team on Sep 8th, 2020
This post will survey the latest design decisions and performance improvements
irmin-pack, the Irmin storage backend used by
Tezos. Tezos is an open-source blockchain technology,
written in OCaml, which uses many libraries from the MirageOS ecosystem. For
more context on the design of
irmin-pack and how it is optimised for the Tezos
use-case, you can check out our previous blog post.
This post showcases the improvements to
irmin-pack since its initial
deployment on Tezos:
- Faster read-only store instances
- Improved automatic flushing
- Staging generic serialisation operations
- More control over
- Clearing stores
Faster read-only store instances
The Tezos use-case of Irmin requires both read-only and read-write
store handles, with multiple readers and a single writer all accessing the same
Irmin store concurrently. These store handles are held by different processes
(with disjoint memory spaces) so the instances must use files on disk to
synchronise, ensuring that the readers never miss updates from the writer. The
writer instance automatically flushes its internal buffers to disk at regular
intervals, allowing the readers to regularly pick up
Until recently, each time a reader looked for a value – be it a commit, a node,
or a blob – it first checked if the writer had flushed new contents to disk. This
ensured that the readers always see the latest changes from the writer. However,
if the writer isn't actively modifying the regions being read, the readers make
one unnecessary system call per
find. The higher the rate of reads, the more
time is lost to these synchronisation points. This is particularly problematic
in two use-cases:
Taking snapshots of the store. Tezos supports exporting portable snapshots of the store data. Since this operation only reads historic data in the store (traversing backwards from a given block hash), it's never necessary to synchronise with the writer.
Bulk writes. It's sometimes necessary for the writer to dump lots of new data to disk at once (for instance, when adding a commit to the history). In these cases, any readers will repeatedly synchronise with the disk even though they don't need to do so until the bulk operation is complete. More on this in the coming months!
To better support these use-cases, we dropped the requirement for readers to
maintain strict consistency with the writer instance. Instead, readers can call
sync function only when they need to see the latest concurrent
updates from the writer instance.
In our benchmarks, there is a clear speed-up for
find operations from readers:
[RO] Find in random order with implicit syncs Total time: 67.276527 Operations per second: 148640.253086 Mbytes per second: 6.378948 Read amplification in syscalls: 3.919739 Read amplification in bytes: 63.943734 [RO] Find in random order with only one call to sync Total time: 40.817458 Operations per second: 244993.208543 Mbytes per second: 10.513968 Read amplification in syscalls: 0.919588 Read amplification in bytes: 63.258072
Not only it is faster, we can see also that fewer system calls are used in the
Read amplification in syscalls column. The benchmarks consists of reading
10,000,000 entries of 45 bytes each.
Relevant PRs: irmin #1008, index #175, index #198 and index #203.
Better flushing for the read-write instance
Irmin-pack uses an index to speed up
pack file is used to store pairs of
(key, value) and an
records the address in pack where a
key is stored. A read-write instance has
to write both the
index and the
pack file, for a read-only instance to find
a value. Moreover, the order in which the data is flushed to disk for the two
files is important: the address for the pair
(key, value) cannot be written
before the pair itself. Otherwise the read-only instance can read an address for
a non existing
(key, value) pair. But both
index have internal
buffers that accumulate data, in order to reduce the number of system calls, and
both decide arbitrarily when to flush those buffers to disk.
We introduce a
flush_callback argument in
index, which registers a callback
for whenever the index decides to flush.
irmin-pack uses this callback to flush
its pack file, resolving the issue of the dangling address.
Relevant PRs: index #189, index #216, irmin #1051.
Faster serialisation for
Irmin uses a library of generic operations: functions that take a runtime representation of a type and derive some operation on that type. These are used in many places to automatically derive encoders and decoders for our types, which are then used to move data to and from disk. For instance:
val decode : 'a t -> string -> 'a (** [decode t] is the binary decoder of values represented by [t]. *) (** Read an integer from a binary-encoded file. *) let int_of_file ~path = open_in_bin path |> input_line |> decode Irmin.Type.int32
decode takes a representation of the type
int32 and uses
this to select the right binary decoder. Unfortunately, we pay the cost of this
runtime specialisation every time we call
int_of_file. If we're invoking
the decoder for a particular type very often – such as when serialising store
values – it's more efficient to specialise
(** Specialised binary decoder for integers. *) let decode_int32 = decode Irmin.Type.int32 let int_of_file_fast ~path = open_in_bin path |> input_line |> decode_int32 contents
The question then becomes: how can we change
decode to encourage it to be
used in this more-efficient way? We can add a type wrapper – called
to prevent the user from passing two arguments to
decode at once:
module Staged : sig type +'a t val stage : 'a -> 'a t val unstage : 'a t -> 'a end val decode : 'a t -> (string -> 'a) Staged.t (** [decode t] needs to be explicitly unstaged before being used. *)
By forcing the user to add a
Staged.unstage type coercion when using this
function, we're encouraging them to hoist such operations out of their
(** The slow implementation no longer type-checks: *) let int_of_file ~path = open_in_bin path |> input_line |> decode Irmin.Type.int32 (* Error: This expression has type (string -> 'a) Staged.t * but an expression was expected of type string -> 'a *) (* Instead, we know to pull [Staged.t] values out of hot-loops: *) let decode_int32 = Staged.unstage (decode Irmin.Type.int32) let int_of_file_fast ~path = open_in_bin path |> input_line |> decode_int32 contents
We made similar changes to the performance-critical generic functions in
Irmin.Type, and observed significant performance improvements.
We also added benchmarks for serialising various types.
Relevant PRs: irmin #1030 and irmin #1028.
There are other interesting factors at play, such as altering
increase the efficiency of the specialised decoders; we leave this for a future
More control over
index regularly does a maintenance operation, called
merge, to ensure fast
look-ups while having a small memory imprint. This operation is concurrent with
the most of other functions, it is however not concurrent with itself: a second
merge needs to wait for a previous one to finish. When writing big chunks of
data very often,
merge operations become blocking. To help measuring and
detecting a blocking
merge, we added in the
index API calls to check whether
a merge is ongoing, and to time it.
We mentioned that
merge is concurrent with most of the other function in
index. One notable exception was
close, which had to wait for any ongoing
merge to finish, before closing the index. Now
close interrupts an ongoing
merge, but still leaves the index in a clean state.
Relevant PRs: index #185, irmin #1049 and index #215.
Another feature we recently added is the possibility to
clear the store. It is
implemented by removing the old files on disk and opening fresh ones. However
irmin-pack, the read-only instance has to detect that a clear occurred. To
do this, we add a
generation in the header of the files used by an
irmin-pack store, which is increased by the clear operation. A generation
change signals to the read-only instance that it needs to close the file and
open it again, to be able to read the latest values.
As the header of the files on disk changed with the addition of the clear
irmin-pack stores created previous to this change are no longer
supported. We added a migration function for stores created with the previous
version (version 1) to the new version (version 2) of the store. You can call
this migration function as follows:
let open_store () = Store.Repo.v config in Lwt.catch open_store (function | Irmin_pack.Unsupported_version `V1 -> Logs.app (fun l -> l "migrating store to version 2") ; Store.migrate config ; Logs.app (fun l -> l "migration ended, opening store") ; open_store () | exn -> Lwt.fail exn)
Relevant PRs: index #211, irmin #1047, irmin #1070 and irmin #1071.
We hope you've enjoyed this discussion of our recent work. Stay tuned for our next Tezos / MirageOS development update! Thanks to our commercial customers, users and open-source contributors for making this work possible.