Internet-Draft ERIS May 2022
Renberg [Page]
E. Renberg

Encoding for Robust Immutable Storage (ERIS)


This document describes the Encoding for Robust Immutable Storage (ERIS). ERIS is an encoding of arbitrary content into a set of uniformly sized, encrypted and content-addressed blocks as well as a short identifier that can be encoded as an URN. The content can be reassembled from the blocks only with this identifier. The encoding is defined independent of any storage and transport layer or any specific application. We illustrate how ERIS can be used as to build robust and decentralized applications.

WARNING: This specification is not yet stable or released. We are working towards a stable 1.0.0 release. Join the mailing list for updates.

Table of Contents

1. Introduction

Unavailability of content on computer networks is a major cause for reduced reliability of networked services [Polleres20].

Availability can be increased by caching content on multiple peers. However most content on the Internet is identified by its location. Caching location-addressed content is complicated as the content receives a new location when cached.

An alternative to identifying content by its location is to identify content by its content itself. This is called content-addressing. The hash of some content is computed and used as an unique identifier for the content.

Caching content-addressed content and making it available redundantly is much easier as the content is completely decoupled from any physical location. Integrity of content is automatically ensured with content-addressing (when using a cryptographic hash) as the identifier of the content can be recomputed to check that the content matches the requested identifier.

However, naive content-addressing has certain drawbacks:

ERIS addresses these issues by splitting content into small uniformly sized and encrypted blocks. These blocks can be reassembled to the original content only with access to a short read capability, which can be encoded as an URN.

Encodings similar to ERIS are already widely-used in applications and protocols such as GNUNet (see Section 1.3), BitTorrent, Freenet [Freenet] and others. However, they all use slightly different encodings that are tied to the respective protocols and applications. ERIS defines an encoding independent of any specific protocol or application and decouples content from transport and storage layers. ERIS may be seen as a modest step towards Information-Centric Networking [RFC7927].

1.1. Objectives

The objectives of ERIS are:

Content encoded with ERIS can be easily replicated and cached to increase the availability of content.
Data integrity
Integrity of content is verified while decoding content from blocks.
Intermediary Peer Deniability
Intermediary peers, who are storing and transporting encoded blocks without access to a read capability, can claim that decrypting encoded content is infeasible for them.
Censorship Resistance
An adversary who does not have access to a read capability of some encoded content can not selectively block access to the content without blocking access to all content.
Deterministic Identifiers
The read capability can be used as a deterministic identifier of the encoded content.
URN reference
ERIS encoded content can be referenced with a single URN (the encoded read capability).
Storage efficiency
ERIS can be used to encode small content (< 1 kibibyte) as well as large content (> many gibibyte) with reasonable storage overhead.
The encoding should be as simple as possible in order to allow correct implementation on various platforms and in various languages.

Confidentiality is not an objective of ERIS and ERIS SHOULD NOT be used to ensure that content is kept secret from an adversary.

See Section 4 for security considerations.

1.2. Scope

ERIS describes how arbitrary content (sequence of bytes) can be encoded into a set of uniformly sized blocks and an identifier with which the content can be decoded from the set of blocks.

ERIS does not prescribe how the blocks should be stored or transported over network. The only requirement is that a block can be referenced and accessed (if available) by the hash value of the contents of the block. In Section 3.1 we show how existing technology can be used to store and transport blocks.

There is also no support for grouping content or mutating content. In Section 3.3 we describe how such functionality can be implemented on top of ERIS.

ERIS is an attempt to find a minimal common basis on which higher functionality, such as mutability, can be built.

1.3. Previous Work

ERIS is inspired and based on the encoding used in the file-sharing application of GNUNet - Encoding for Censorship-Resistant Sharing (ECRS) [ECRS].

ERIS differs from ECRS in following points:

Cryptographic primitives
ECRS itself does not specify any cryptographic primitives. The GNUNet implementation uses the SHA-512 hash and AES cipher. ERIS uses the Blake2b-256 cryptographic hash [RFC7693] and the ChaCha20 stream cipher [RFC8439]. This improves performance, storage efficiency (as hash references are smaller) and allows a convergence secret to be used (via Blake2b keyed hashing; see Section 2.3).
Block size
ECRS uses a fixed block size of 32 KiB. This can be very inefficient when encoding many small pieces of content. ERIS allows a block size of 1 KiB or 32 KiB, allowing efficient encoding of small and large content (see Section 2.2).
ECRS does not specify an URN for referring to encoded content (this is specified as part of the GNUNet file-sharing application). ERIS specifies an URN for encoded content regardless of encoding application or storage and transport layer (see Section 2.7).
ECRS defines two mechanisms for grouping and discovering encoded content (SBlock and KBlock). ERIS does not specify any such mechanisms (see Section 3.3).

Other related projects include Tahoe-LAFS, Freenet and Datashards. The reader is referred to the ECRS paper [ECRS] for an in-depth explanation and comparison of related projects.

1.4. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

We use binary prefixes for multiples of bytes, i.e: 1024 bytes is 1 kibibyte (KiB), 1024 kibibytes is 1 mebibyte (MiB) and 1024 mebibytes is 1 gigibytes (GiB).

2. Specification of ERIS

2.1. Cryptographic Primitives

The cryptographic primitives used by ERIS are a cryptographic hash function, a symmetric key cipher and a padding algorithm. The hash function and cipher are readily available in open-source libraries such as libsodium or Monocypher. The padding algorithm can be implemented with reasonable effort.

2.1.1. Cryptographic Hash Function

We use Blake2b [RFC7693] with output size of 256 bit (32 byte). The keying feature is used and we refer to the key used for keying Blake2b as the hashing key. The hashing key always has a size of 256 bit (32 byte) (see Section 2.3).

The functions provided are Blake2b-256-Keyed(INPUT, HASHING-KEY) for keyed hashing and Blake2b-256(INPUT) for unkeyed hashing.

2.1.2. Symmetric Key Cipher

We use the ChaCha20 (IETF variant) [RFC8439] stream cypher. The provided function is ChaCha20(INPUT, KEY, NONCE), where INPUT is an arbitrary length byte sequence, KEY is the 256 bit encryption key and NONCE is the 96 bit nonce. The output is the encrypted byte sequence.

The 32 bit initial counter is set to null.

Decryption is done with the same function where INPUT is the encrypted byte sequence.

2.1.3. Padding Algorithm

We use a byte padding scheme to ensure that input content size is a multiple of a block size. The procedures Pad and Unpad, as described below, provide the necessary functionality

The padding scheme used is the same as the one implemented in the libsodium cryptographic library. Pad

The procedure Pad(input, block-size) given input of length n adds a mandatory byte valued 0x80 (hexadecimal) to input followed by m < block-size bytes valued 0x00 such that n + m + 1 is the smallest multiple of block-size.

Pad(input, block-size):
    n := Length(input)

    # the final modulo block-size ensures that if (n + 1) modulo block-size = 0 no unnecessary zeroes are added.
    m := (block-size - (n + 1 modulo block-size)) modulo block-size

    # add mandatory 0x80 followed by m bytes of zeroe (0x00)
    padded := input ++ [ 0x80 ] ++ ZeroBytes(m)

    assert (Length(padded) modulo block-size = 0)

    return padded
Figure 1: Pad procedure Unpad

The procedure Unpad(input, block-size) starts reading bytes from the end of input until a 0x80 is read and then returns bytes of input before the 0x80. The procedure throws an error if a value other than 0x00 is read before reading 0x80, if no 0x80 is read after reading block-size bytes from the end of input or if length of input is less than block-size.

Unpad(input, block-size):

    # ensure that input is at least as large as block-size
    if Length(input) < block-size:
        throw "invalid padding"

    n := Length(input)

    for i in Range(0, block-size):

        # read the ith byte from the end of input
        byte = input[n - i]

        if byte = 0x80:
            # special marker is reached, return everyting before it
            return input[0:(n-i)]
        else if byte = 0x00:
            # continue with next byte from right
           # padding must be 0x00 or 0x80
           throw "invalid padding"

    # no 0x80 has been read after reading block-size bytes from the right of input
    throw "invalid padding"
Figure 2: Unpad procedure

Implementations MUST check that padding is valid when unpadding. This is verified in the negative test vectors (see Section 5.2).

2.2. Block Size

ERIS uses two block sizes: 1KiB (1024 bytes) and 32KiB (32768 bytes). The block size must be specified when encoding content.

Both block sizes can be used to encode content of arbitrary size. The block size of 1KiB is an optimization towards smaller content.

The block size is encoded in the read capability and the decoding process is capable of handling both block sizes.

Implementations MUST support encoding and decoding content with both block sizes (1KiB and 32KiB).

2.2.1. Recommendation on Block Size Choice

Applications are RECOMMENDED to use a block size of 1KiB for content smaller than 16KiB and a block size of 32KiB for larger content.

When using block size 32KiB to encode content smaller than 1KiB, the content will be encoded in a 32KiB block. This is a storage overhead of over 3100%. When encoding very many pieces of small content (e.g. short messages or cartographic nodes) this overhead might not be acceptable. On the other hand, using block size 1KiB to encode large content is also not efficient, as the content is split into many small 1KiB blocks and must be reassembled using internal nodes (see Section 2.4.3). When encoding larger content it is more efficient to use a block size of 32KiB. Using 16KiB as a breaking point is reasonable for most applications.

Note that the best choice of block size may depend on other factors such as number of round-trips to the storage layer. Content larger than 1KiB encoded with block size 1KiB will always be encoded in multiple levels, requiring multiple calls to a storage layer. For certain applications it might be better to minimize the number of calls to the storage layer at the cost of higher storage overhead.

In other applications the size of the content to be encoded might not be known when encoding starts and block size must be chosen (see Section 2.4.4). In such cases applications should use appropriate heuristics.

2.3. Convergence Secret

Using the hash of the content as key is called convergent encryption.

Because the hash of the content is deterministically computed from the content, the key will be the same when the same content is encoded twice. This results in de-duplication of content, as well as deterministic identifiers. Both are useful properties for certain applications.

However, convergent encryption suffers from two known attacks that allow adversaries to either confirm the presence of some encoded content or even learn the content when parts are predictable (see Section 4.6 for details). A solution to both attacks is to use a convergence secret.

ERIS allows a 32 byte convergence secret to be specified when encoding some content. Using different convergence secrets to encode the same content will result in different blocks and different read capabilities. This prevents deterministic identifiers and de-duplication, but allows a slightly stronger form of censorship resistance (see Section 4.5).

A group using a shared convergence secret can benefit from the advantages of convergent encryption (de-duplication and deterministic identifiers) while being safe against certain attacks from adversaries that do not know the convergence secret.

The convergence secret only needs to be provided during encoding. The content can be decoded from blocks without access to the convergence secret (see Section 2.5).

If no convergence secret is specified a null convergence secret MUST be used (32 bytes of zeroes).

The convergence secret is implemented as the keying feature of the Blake2 cryptographic hash [RFC7693].

2.4. Encoding

Inputs to the encoding process are:

An arbitrary length byte sequence of content to be encoded.
A 256 bit (32 byte) byte sequence (see Section 2.3).
The block size used for encoding in bytes can be either 1024 (1KiB) or 32768 (32KiB) (see Section 2.2).

The output of the encoding process is a set of uniformly sized blocks and a read capability.

The encoding process constructs a tree of uniformly sized nodes. Leaf nodes at the bottom of the tree (level 0) correspond to the input content. Internal nodes (not leaf nodes) consist of references to nodes at lower levels .

The maximum number of references to nodes of a lower level in a node is called the arity of the tree and depends on the block size. For block size 1KiB the arity of the tree is 16, for block size 32KiB the arity is 512.

A simplified view of a tree constructed by encoding some content that is split into four leaf nodes is given in Figure 3. For illustration purposes the tree is of arity 2 (instead of 16 or 512).

read capability Level 2 Level 1 Level 0 root node node node leaf node leaf node leaf node leaf node
Figure 3: A tree of content that is split into four leaf nodes

Nodes are unencrypted elements of the tree . The encoding process output blocks which are encrypted nodes .

As nodes are always stored as encrypted blocks, it is necessary to have the reference to the encrypted block as well as the encryption key to decrypt the node from the block. A reference to a node is a pair consisting of a reference to an encrypted block and the key to decrypt the block. We call such a pair a reference-key pair. The reference is the unkeyed Blake2b hash of the encrypted block (32 bytes) and the key is the ChaCha20 key to decrypt the block (32 bytes). The concatenation of a reference-key pair is 64 bytes long (512 bits).

Internal nodes of the encoding tree (i.e. not leaf nodes) are concatenations of reference-key pairs (see Section 2.4.3).

The block size, the level of the root node and the reference-key pair to the root node (root reference and root key) are the necessary pieces of information required to decode content (see Section 2.5). The tuple consisting of block size, level, root reference and key is called the read capability.

The steps of the encoding process are:

  1. Split content into leaf nodes (see Section 2.4.1).
  2. Encrypt leaf nodes into blocks and reference-key pairs (see Section 2.4.2).
  3. Recursively construct tree by making nodes from reference-key pairs to nodes at lower level (see Section 2.4.3) and encrypting nodes to blocks and reference-key pairs (see Section 2.4.2) until there is a single root node.
  4. Construct the read capability from the tree level, used block size and reference-key pair to the root node.

A pseudocode implementation of the encoding process:

ERIS-Encode(content, convergence-secret, block-size):
    # initialize empty set of blocks to be output
    blocks := {}

    # split content into leaf nodes
    leaf-nodes := Split-Content(content, block-size)

    # initialize a list of reference-key pairs
    reference-key-pairs := []

    # initialize level to 0
    level := 0

    # encrypt leaf nodes to blocks and reference-key pairs
    for leaf-node in leaf-nodes:
        block, reference, key := Encrypt-Node(leaf-node, level, convergence-secret)

        # add block to set of blocks to output
        blocks := blocks ∪ { block }

        # add reference-key pair to list of reference-key pairs
        reference-key-pairs := reference-key-pairs ++ [ (reference, key) ]

    # we can be sure that there is at least a single reference-key pair
    assert Length(reference-key-pairs) >= 1

    # Construct higher levels until there is a single reference-key pair
    while Length(reference-key-pairs) > 1:

        # increment level
        level := level + 1

        # construct list of nodes at current level
        nodes := Construct-Internal-Nodes(reference-key-pairs, level, convergence-secret)

        # clear the reference-key pairs
        reference-key-pairs := []

        # encrypt nodes to blocks and reference-key pairs
        for node in nodes:
            block, reference, key := Encrypt-Node(node, level, convergence-secret)

            # add block to set of blocks to output
            blocks := blocks ∪ { block }

            # add reference-key pair to list of reference-key pairs
            reference-key-pairs := reference-key-pairs ++ [ (reference, key) ]

    # after constructing tree there is a single reference key pair
    assert Length(reference-key-pairs) = 1

    # get the root reference-key pair (pointing to the root node)
    root-reference, root-key := reference-key-pairs[0]

    # construct the read capability
    read-capability := block-size, level, root-reference, root-key

    return read-capability, blocks
Figure 4: ERIS-Encode procedure

The sub-processes Split-Content, Encrypt-Leaf-Node, Encrypt-Internal-Node and Construct-Internal-Nodes are explained in the following sections.

2.4.1. Split Content into Leaf Nodes

The input content is split into uniformly sized blocks of size block-size. In order that this is always possible, the content is first padded.

0x80 0x00 ... 0x00 content padding leaf node 0 leaf node 1 leaf node 2
Figure 5: Content padded and split into three leaf nodes

A pseudo code implementation:

Split-Content(content, block-size):
    # pad content
    padded := Pad(content, block-size)

    # length of padded content is a multiple of block size
    assert Length(padded) mod block-size = 0

    # split padded content
    leaf-nodes := split padded into pieces of size exactly block-size

    return leaf-nodes
Figure 6: Split-Content procedure

Note that when the length of the content is a multiple of the block size, then an entire leaf node will be created with padding. This is necessary as we do not encode the length of the content but require mandatory padding. Figure 7 illustrates such a case.

0x80 0x00 ... 0x00 content padding leaf node 0 leaf node 1 leaf node 2
Figure 7: Last leaf node is entirely padding

2.4.2. Encrypt Node

Unencrypted nodes (leaf nodes or internal nodes) are encrypted into blocks. We use two slightly different procedures to encrypt leaf nodes and internal nodes. Leaf nodes use the convergence secret whereas internal nodes do not.

The Encrypt-Leaf-Node procedure is used to encrypt leaf nodes (level 0). It takes an unencrypted leaf node and the convergence secret and returns the encrypted block as well as a reference-key pair to the block:

Encrypt-Leaf-Node(node, convergence-secret):
    # use the keyed Blake2b hash to compute the encryption key
    key := Blake2b-256-Keyed(node, convergence-secret)

    # nonce is 12 bytes of 0
    nonce := [0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0]

    # encrypt node to block
    block := ChaCha20(node, key, nonce)

    # compute reference to encrypted block using unkeyed Blake2b
    reference := Blake2b-256(block)

    return block, reference, key
Figure 8: Encrypt-Leaf-Node procedure

The Encrypt-Internal-Node procedure is used to encrypt internal nodes (level 1 and above). It takes an unencrypted node and the level of the node as input and returns the encrypted block as well as a reference-key pair to the block:

Encrypt-Internal-Node(node, level):
    # level is always larger than 0
    assert level > 0

    # use the unkeyed Blake2b hash to compute the encryption key
    key := Blake2b-256(node)

    # first 11 bytes of nonce are 0, last byte is the level of the node
    nonce := [0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; level]

    # encrypt node to block
    block := ChaCha20(node, key, nonce)

    # compute reference to encrypted block using unkeyed Blake2b
    reference := Blake2b-256(block)

    return block, reference, key
Figure 9: Encrypt-Internal-Node procedure

For both Encrypt-Leaf-Node and Encrypt-Internal-Node the ChaCha20 nonce is set to 11 bytes of null (zero) followed by a single byte set to the level of the block (for Encrypt-Leaf-Node the level is 0). By doing this we have implicitly encoded the level of the block in the encrypted block. This prevents certain vulnerabilities where the level of a block is confused.

Note that we reuse the same nonce for all blocks at a certain level. This is safe as we always use different keys for different input content.

The convergence secret is only used to compute the encryption key of leaf nodes. This is a minor performance optimization and allows the read capability to be strongly verified in certain cases (see Section 2.5, Paragraph 4).

Note that the convergence secret is not used to compute the reference to the encrypted block. This allows blocks to be dereferenced and verified without knowing the convergence secret.

2.4.3. Construct Internal Nodes

Internal nodes of the tree (not leaf nodes) are constructed by concatenating at most arity reference-key pairs to nodes of a the next lower level.

If there are less than arity number of references-key pairs to collect in a node, then the node is filled up to block-size with zeroes. This ensures that nodes (and blocks) always have the same size (block-size).

The procedure Construct-Internal-Nodes takes as input a non-empty list of reference-key pairs and the block size and returns a list of nodes. A pseudocode implementation is provided:

Construct-Internal-Nodes(reference-key-pairs, block-size):
    # input is always non-empty
    assert Length(reference-key-pairs) > 0

    # compute arity
    arity := block-size / 64

    # initialize empty list of nodes to return
    nodes := []

    while Length(reference-key-pairs) > 0:
        # take at most arity reference-key pairs from the left of reference-key-pairs
        node-reference-key-pairs, rest :=
            Take-At-Most-N-From-Left(reference-key-pairs, arity)

        # concatenate all reference-key pairs to a node
        node := Concat(node-reference-key-pairs)

        # make sure node has size block-size by filling up with zeroes
        if Length(node) < block-size:
            node := Fill-with-zeroes(node, block-size)

        # add node to list of nodes to return
        nodes := nodes ++ [ node ]

        # set reference-key-pairs to rest
        reference-key-pairs := rest

    return nodes
Figure 10: Construct-Internal-Nodes procedure

Note that the method for filling up internal nodes with zeroes to make sure they have size block-size is different than the padding algorithm used on content (see Section 2.4.1). We do not need the special marker byte (0x80; see Section 2.1.3) as structures in internal nodes (reference-key pairs) have fixed sizes. When decoding we process reference-key pairs until we reach a reference-key pair that is all zeroes (see Section 2.5, Paragraph 3).

An example node with three reference-key pairs and filled with zeroes is illustrated in Figure 11.

ref key ref key ref key 0x00 ... 0x00 block size
Figure 11: Internal node filled with zeroes

If in a tree of encoded content the number of nodes at any level is not a multiple of the arity, then there will be internal nodes that are filled with zeroes. Because internal nodes are always filled with reference-key pairs to the left, trees are always left-aligned. Figure 12 illustrates such a left-aligned tree.

Level 3 Level 2 Level 1 Level 0 root node node node node node node leaf node leaf node leaf node leaf node leaf node
Figure 12: A tree with arity two and five leaf nodes

2.4.4. Streaming

Note that the algorithm above requires the entire content to be kept in memory during encoding. Implementors are RECOMMENDED to implement a streaming encoding process that can handle content larger than memory. This can be implemented by immediately emitting encrypted blocks and eagerly constructing nodes when enough reference-key pairs are available.

For an example, see the reference Guile implementation.

2.5. Decoding

Given a read capability and access to blocks via a block storage the content can be decoded as shown in the procedure ERIS-Decode:

ERIS-Decode-Recurse(block-size, level, reference, key):

    # dereference the node
    node := Dereference-Node(reference, key, level, block-size)

    if level = 0:
        # node is a leaf node
        return node
       # node is an internal node

       # initialize an output
       output := []

       # iterate over all the reference-key-pairs in the node
       for reference-key-pair in node:
           # decode sub-tree and append to output
           output := output ++ Eris-Decode-Recurse(level - 1, reference-key-pair)

       return output

ERIS-Decode(block-size, level, root-reference, root-key):

    # verify integrity of read capability key if level is larger than 0
    if level > 0:
        Verify-Key(level, root-reference, root-key)

    padded := ERIS-Decode-Recurse(block-size, level, root-reference, root-key)

    return Unpad(padded, block-size)
Figure 13: ERIS-Decode procedure

When iterating over reference-key pairs in nodes we read 64 bytes from the node. If the 64 bytes are non-zero they are split into a reference-key pair and iterated over. If 64 bytes of zeroes are encountered the rest of the node MUST be checked to be all zeroes (see Section 2.4.3, Paragraph 5). If rest of node is all zeros there are no more reference-key pairs in the node and iteration ends. If a non-zero byte is encountered after reading 64 bytes of zeroes then an error must be thrown indicating an invalid internal node.

Implementations MUST verify the key appearing in the read capability if level of encoded content is larger than 0.

The procedures Dereference-Node and Verify-Key are provided in the next section.

2.5.1. Dereference Nodes

A node can be dereferenced given a reference-key pair by first retrieving the block for the reference and decrypting the block using the key.

The block storage is represented by some procedure Block-Storage-Get that takes as input a reference and outputs the corresponding block, such that Blake2b-256(block) = reference.

The procedure Dereference-Node returns a node given inputs a reference-key pair and the level of the node to dereference. A pseudocode implementation is provided:

Dereference-Node(reference-key-pair, level, block-size):
    # unpack reference-key pair
    reference, key := reference-key-pair

    # get block
    block := Block-Storage-Get(reference)

    # ensure that block has correct size
    if not Length(block) = block-size:
        throw "block has invalid size"

    # ensure that block is valid for reference
    if not Blake2b-256(block) = reference:
        throw "block is invalid"

    # first 11 bytes of nonce are 0, last byte is the level of the node
    nonce := [0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; level]

    node := ChaCha20(block, key, nonce)

    return node
Figure 14: Dereference-Node procedure

Implementations MUST ensure that blocks have expected size and are valid (i.e. that Blake2b-256(block) = reference).

2.5.2. Verify Key

The key to an internal node is computed using unkeyed Blake2b without the convergence secret (see Section 2.4.2). This allows the key itself to be verified by recomputing the Blake2b hash of the node.

Verify-Key(level, reference, key):
    # ensure that reference-key pair is to an internal node
    assert level > 0

    # dereference the node
    node := Dereference-Node(reference, key, level)

    # verify integrity of key
    if not Blake2b-256(node) = key:
        throw "key in read capability is invalid"
Figure 15: Verify-Read-Capability procedure

Note that this check only works if the level of the encoded content is larger than one. If the level is 0 then the reference-key pair points to a leaf node for which the key can not be verified as it is computed by using the convergence secret (see Section 2.4.2).

2.5.3. Streaming and Random Access

The provided decoding algorithm is provided for illustration purposes. Implementors are RECOMMENDED to implement a streaming decoding procedure that is capable of decoding content larger than memory.

A decoder can also be implemented to allow random access to content without dereferencing the entire tree.

2.6. Binary Encoding of Read Capability

The read capability consisting of the block-size, level of root reference-key pair as well as the root reference-key pair form the necessary pieces of information required to decode content.

We specify an binary encoding of the read capability to 66 bytes:

Table 1: Binary Encoding of Read Capability
Byte offset Content Length (in bytes)
0 block size (0x0a for block size 1KiB and 0x0f for block size 32KiB) 1
1 level of root reference-key pair as unsigned integer 1
2 root reference 32
34 root key 32

2.6.1. CBOR Tag

The CBOR tag 276 is assigned for a ERIS binary read capability (see Section 8.1). This allows efficient references to ERIS encoded content from CBOR [RFC8949].

2.6.2. GNU Name System

The GNU Name System [LSD0001] is a decentralized and censorship-resistant name system that can be used to resolve memorable names to secure identifiers. A GNU Name System record type for ERIS read capabilities is defined (see Section 9.1) allowing any ERIS encoded content to be associated with memorable names.

2.7. URN

A read capability can be encoded as an URN [RFC8141] using the namespace identifier erisx3 and the unpadded Base32 [RFC4648] encoding of the read capability as namespace specific string, i.e:


For example the ERIS URN of the UTF-8 encoded string "Hello world!" (with block size 1KiB and null convergence secret):


The URN namespace identifier erisx3 is used for this experimental version of the encoding. Once finalized the namespace identifier eris will be used (e.g. urn:eris:BIAD77QDJMFAKZYH2DXBUZYAP3MXZ3DJZVFYQ5DFWC6T65WSFCU5S2IT4YZGJ7AC4SYQMP2DM2ANS2ZTCP3DJJIRV733CRAAHOSWIYZM3M).

2.7.1. Block URN

Blocks are referenced by their Blake2b-256 hash (see Section 2.4, Paragraph 9). In certain circumstances it is useful to encode such a reference as an URN. Applications are RECOMMENDED to use the URN namespace identifier blake2b and the unpadded Base32 [RFC4648] encoding of the reference as namespace specific string, i.e:


For example the URN encoding of a reference to a block might look like this:


3. Applications

Traditionally encoding schemes similar to ERIS are used in peer-to-peer file-sharing applications (e.g. BitTorrent, GNUNet file-sharing, FreeNet). Given a performant block tranport layer, ERIS can be used for efficient file-sharing. However, we hope to motivate usage for a much wider scope of applications.

As part of the openEngiadina project we are using ERIS to encode small bits of information that constitute "local knowledge" (e.g. geographic information, social and cultural events, etc.) along with the social interactions that created and curated this information (using the ActivityStreams vocabulary). ERIS allows such information to be securely cached on multiple peers to increase the robustness of the system. The fact that ERIS encoded content can be referenced by an URN allows it to be embeded into existing data structures and protocols. In particular, ERIS encoded content can be referenced from RDF and RDF can be made content-addressable with ERIS (see ERIS and RDF).

ERIS can be used to share content in small communal and friend-to-friend networks. For this the ability to use convergence secrets is very useful for increased security (see Section 4.6). This use-case has been further research as part of the DREAM project.

ERIS can also be used to create larger content-delivery networks. In particular, we are working towards making software sources and pre-built packages for Guix more robustly available and peer-to-peer shareable.

ERIS identifiers have also been embedded in ELF binaries to reference shared libraries (see Sigil OS).

3.1. Storage and Transport Layers

ERIS is defined independent of any storage and transport layer for blocks. The only requirement is that blocks can be accessed by their reference - the hash of the block content.

Possible storage layers include:

  • in-memory hash-map
  • key-value store and other databases
  • files on a file system

Transport mechanisms include:

  • HTTP: A simple HTTP endpoint can be used to de-reference blocks (see ERIS over HTTP).
  • Sneakernet: Blocks can be transported on a physical medium such as a USB stick.

More interesting transport and storage layers use the fact that blocks are content-addressed. For example the peer-to-peer network IPFS can be used to store and transport blocks. The major advantages over using IPFS directly are that blocks are encrypted and not readable to IPFS peers without the read capability and that identifier of blocks and encoded content are not tied to the IPFS network. Applications can transparently use IPFS or any other storage/transport mechanism.

Other protocols we are investigating for usage as ERIS transport layer include Named Data Networking, BitTorrent and GNUNet.

We are also researching transport protocols based on CoAP [RFC7252].

3.2. Authenticity of Content

While decoding ERIS encoded content the integrity of the content is verified. Content can not be tampered with without changing the identifier (read capability) of the content. To prove authenticity of encoded content it is sufficient to sign the read capability.

We have presented a concrete proposal on how this might be done using a RDF vocabulary and the Ed25519 cryptographic signature scheme RDF-Signify.

3.3. Mutability and Namespaces

Encoded content is immutable in the sense that changing the encoded content results in a new identifier. Existing references to the old content need to be updated. This is a property that allows robust availabilty of content.

Nevertheless, there are applications where one wants to reference mutable content. Examples include user profiles or dynamic collections of content. Making small changes to a user profile or adding a piece of content to a collection should preserve the identifiers.

There are many ways of implementing such mutability or namespaces. ERIS does not specify any particular mechanism. Possible mechanisms include:

  • Centralized servers that returns a mutable list of reference to (immutable) content. This is how most HTTP services work.
  • Append-only logs where changes are securely appended with cryptographic signatures. The state is computed from the log of changes. This is how peer-to-peer systems such as hypercore or Secure ScuttleButt work. See also ERIS Canons for appendable logs optimized for usage with ERIS.
  • Petname system: A system where a dynamic local name can be mapped to a reference. Sophisticated systems that allow delegation of naming authority include the GNU Name System [LSD0001] (see Section 2.6.2).
  • Commutative Replicated Data Types (CRDTs) are distributed data structures similar to append-only logs with the advantage that the state of a mutable container can diverge and converge to consistent state eventually. Such structures seem especially suitable when control over a mutable container is shared by multiple parties. For an example see Distributed Mutable Containers.

We believe that the best suited mechanism for handling mutability depends on concrete applications and use-cases. A key value of ERIS is that it is agnostic of such mechanisms and can be used from any of them.

4. Security Considerations

In this section we discuss security considerations when using ERIS as an abstract encoding as well as when used in conjunction with storage and transport layers (see Section 3.1).

We use terms for communication security as defined in RFC 3552 [RFC3552] (e.g. CONFIDENTIALITY or PEER ENTITY AUTHENTICATION).

4.1. Threat Model

We consider a setting with following entities:

Wants to publish some content.
A group of entities that should be able to decode the content published by the publisher. They receive the ERIS read capability from the publisher over a channel that provides CONFIDENTIALITY, DATA INTEGRITY and PEER ENTITY AUTHENTICATION.
Intermediary Peers
A group of entities that assist in making the content available to the audience by storing and transporting blocks of the encoded content. There are no communication security requirements for the communication between the intermediary peers and publisher or audience. Note that the publisher as well as members of the audience can act as intermediary peers.
An adversary that wishes to prevent the audience from being able to decode some specific content. The censor does not have access to the read capability of the encoded content but may inspect, modify or drop communication between the intermediary peers and the audience. The censor does not have access and can not control the internal state of the publisher, audience or intermediary peers. The censor can impersonate a malicious intermediary peer.

See also the ECRS paper [ECRS] and the theoretical treatment of censorship resistance by Perng et al. [Perng2005].

4.2. Availability

The publisher can make the blocks of the encoded content available by replicating them to intermediary peers and audience directly over various different storage and transport layers (see Section 3.1).

4.3. Data Integrity

Members of the audience can verify the integrity of the content while decoding by verifying the Blake2b hash of a block (see Section 2.5). Even when a malicious intermediary peer is distributing invalid blocks, this will be detected by the internal decoding process run by the audience.

Intermediary peers can pro-actively detect invalid blocks by checking the Blake2b hash of the block.

Note that the ERIS read capability can not be used to verify integrity of content when the content is given directly and not decoded from blocks. When the content is given in as a sequence of bytes, the only way to compute the read capability is to encode the content. However, the read capability to verify might have been computed using a convergence secret (see Section 2.3) that is not known, making it impossible to verify that the content corresponds to the read capability.

4.4. Intermediary Peer Deniability

Intermediary peers do not need to have access to the read capability in order to store and transport blocks of encoded content. As the blocks only contain encrypted data, intermediary peers can plausibly deny being able to decode the content.

Note that in certain situations an active attack that can reveal parts of the encoded content is possible (see Section 4.6).

4.5. Censorship Resistance

We define censorship resistance as the inability of a censor who does not have access to the read capability of some content to prevent members of the audience from decoding the specific content without preventing access from all content (i.e. DENIAL OF SERVICE).

This holds as a censor that does not have access to the read capability of some content can not decide if a given block is required to decode the specific content or is a block from the encoding of some other content.

Note that a censor can prevent the audience from decoding any content by dropping all communication to intermediary peers - the censor can perform a DENIAL OF SERVICE attack. Practically a DENIAL OF SERVICE attack can be made difficult by replicating blocks and using various different storage and transport layers (see Section 4.2).

Our definition of censorship resistance is slightly stronger than that of ECRS [ECRS]. In ECRS the censor may not know the exact content, whereas in ERIS the censor may not know the read capability. If the censor does know the exact content that should be censored than we can use a fresh convergence secret to create a read capability that the censor does not know (see Section 2.3).

4.6. Known Attacks on Convergent Encryption

Convergent encryption allows de-duplication of content and deterministic identifiers. However, it also suffers from two known attacks [Zooko2008]:

The Confirmation Of A File Attack
An adversary who knows the read capability of some ERIS encoded content can enumerate all blocks that are required to decode the content. The adversary can now confirm that a member of the audience is accessing some content by observing which blocks are being accessed.
The Learn-The-Remaining-Information Attack
When encoding content where an adversary can predict parts of the content, the remaining information may be learned by brute forcing the remainder. For example, this might be a configuration file where the adversary knows the entire content except for a single field containing a password. The adversary can then brute force the password and confirm that a configuration with a certain password has been accessed. Note that the brute-forcing effort can be reused to efficiently learn many secrets, similar to how rainbow tables are used to crack password hashes.

ERIS is vulnerable to both when using a convergent secret that is known to the adversary. A defense against both attacks is to use a convergence secret that is not known by the adversary (see Section 2.3). Using a different convergence secret causes the same content to be encoded into different blocks and identifiers.

De-duplication and deterministic identifiers are both properties that may be important to certain applications and users. Users should be aware of the known attacks and must decide depending on application and context on whether mitigations are necessary.

4.7. Observing Block Access

A passive adversary that only observes communication between audience and intermediary peers might be able to learn information about encoded content from the pattern in which blocks are accessed by members of the audience. For example an adversary might be able to infer that certain blocks are part of the encoding of a video with some resolution by observing that blocks are fetched with predictable intervals and in a fixed order.

A passive adversary may also log block transfers between audience and intermediary peers for extended periods of time. If a read capability is leaked to the adversary later, they will know who has accessed blocks for decoding content with the leaked read capability in the past.

Storage and transport layers SHOULD use encryption to prevent passive network attackers from being able to observe such patterns.

Members of the audience MAY use obfuscation tactics when getting blocks from storage and transport layers to prevent malicious intermediary peers from being able to observe such patterns.

5. Test Vectors

Three types of test vectors are provided that can be used to ensure correct implementation of the encoding: positive (see Section 5.1), negative (see Section 5.2) and large content (see Section 5.3).

Implementations MUST satisfy the positive and negative test vectors.

Implementations are RECOMMENDED to also satisfy the large content test vectors. However this requires implementing a streaming encoder (see Section 2.4.4) which might not be necessary or desireable for certain applications (e.g. constrained environments where size of content is always very small).

The positive and negative test vectors are provided as machine-readable JSON files in the archive eris-test-vectors-v0.4.0.tar.gz.

Following JSON field appear in both positive and negative test vectors:

Numeric identifier of the test vector.
Specifies the type of test. Values are either positive or negative.
Version of ERIS specification
Short human readable name.
Human readable description of the test.
JSON map containing the components of the read capability. This is not used in tests but is here as a help for developers.
The ERIS URN of the content.
A JSON map of blocks required to decode the content given the URN. Key and field are encoded as Base32 strings.

Further fields are used in the positive test vectors.

5.1. Positive

The positive test vectors can be used to ensure that implementations correctly encode content to a given read capability and can decode the same content from given blocks.

Following additional fields are used in the JSON files for positive test vectors:

The binary content to be encoded as Base32 (unpadded) string.
The convergence secret to be used as Base32 string.
Block size that should be used for encoding in bytes (either 1024 or 32768).

The test vector eris-test-vector-positive-00.json is shown as example:

  "id": 0,
  "type": "encode-decode-success",
  "spec-version": "0.4.0-dev",
  "name": "short string (block size 1KiB)",
  "description": "Encode the UTF-8 encoding of the string \"Hello world!\" with block-size 1KiB and null convergence-secret.",
  "content": "JBSWY3DPEB3W64TMMQQQ",
  "block-size": 1024,
  "read-capability": {
    "block-size": 1024,
    "level": 0,
  "blocks": {
Figure 16: Positive test vector eris-test-vector-positive-00.json

Imlementations MUST for all positive test vectors:

  1. Verify that the content encodes to the specified URN given the convergent secret and block size.
  2. Verify that the content decoded from the given URN and blocks is equal to the specified content.

5.2. Negative

Negative test vectors are provided to help implementations ensure that they correctly verify content while decoding.

For all negative test vectors, implementations should attempt to decode content given the URN and blocks. The test passes if decoding fails. The reason for failure is described in the description field. Implementations MUST pass all negative test vectors.

The test vector eris-test-vector-negative-13.json is shown as example:

  "id": 13,
  "type": "negative",
  "spec-version": "0.4.0-dev",
  "name": "no blocks",
  "description": "Content can not be decoded because of missing blocks.",
  "read-capability": {
    "block-size": 1024,
    "level": 0,
  "blocks": {}
Figure 17: Negative test vector eris-test-vector-negative-13.json

5.3. Large Content

In order to verify implementations that encode content by streaming (see Section 2.4.4) URNs of large contents that are generated in a specified way are provided:

Table 2: Large content test vectors
Test name Content size Block size Level of root reference URN

Content is the ChaCha20 stream using a null nonce and the key which is the Blake2b hash of the UTF-8 encoded test name (e.g. key := Blake2b-256("100MiB (block size 1KiB)")). The ChaCha20 stream can be computed by encoding a null byte sequence (e.g. chacha20-stream := ChaCha20(null_byte_stream, KEY)).

6. Implementations

A list of known implementations that satisfy the test vectors is maintained at

7. Contact

A mailing list for general discussion on ERIS is available at ~pukkamustard/ Ephemeral discussions take place in the #eris channel on the Libera IRC network. See also the project page for more information.

Please feel free to direct any questions or comments regarding the specification to the mailing list. You are also invited to share your implementations and use-cases.

Urgent and sensitive security issues may be addressed directly to the ERIS maintainers.

8. IANA Considerations

8.1. CBOR Tags Registry

This specification requires the assignment of a CBOR tag for a binary ERIS read capability. The tag is added to the CBOR Tags Registry as defined in RFC 8949 [RFC8949].

Table 3: CBOR Tag Registration for ERIS binary read capability
Tag Data Item Semantics
276 byte string ERIS binary read capability (see Section 2.6)

9. GANA Considerations

9.1. GNU Name System record types registry

GANA [GANA] is requested to add an entry into the "GNU Name System record types" registry as follows:

Table 4: GNU Name System record types registry entry for ERIS read capability
Number Name Comment References
65557 ERIS_READ_CAPABILITY Encoding for Robust Immutable Storage (ERIS) binary read capability

10. Acknowledgments

Initial development of ERIS was done as part of the openEngiadina project and was supported by the NLnet Foundation trough the NGI0 Discovery Fund. Further development is being supported by the NLnet Foundation trough NGI Assure.

11. References

11.1. Normative References

GNUnet e.V., "GNUnet Assigned Numbers Authority (GANA)", , <>.
Schanzenbach, M., Grothoff, C., and B. Fix, "The GNU Name System", , <>.
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <>.
Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, DOI 10.17487/RFC3552, , <>.
Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, , <>.
Saarinen, M-J., Ed. and J-P. Aumasson, "The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC)", RFC 7693, DOI 10.17487/RFC7693, , <>.
Saint-Andre, P. and J. Klensin, "Uniform Resource Names (URNs)", RFC 8141, DOI 10.17487/RFC8141, , <>.
Nir, Y. and A. Langley, "ChaCha20 and Poly1305 for IETF Protocols", RFC 8439, DOI 10.17487/RFC8439, , <>.
Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, , <>.

11.2. Informative References

Grothoff, C., Grothoff, K., Horozov, T., and J.T. Lindgren, "An Encoding for Censorship-Resistant Sharing", , <>.
Clarke, I., Sandberg, O., Wiley, B., and T. Hong, "Freenet: A Distributed Anonymous Information Storage and Retrieval System", Designing Privacy Enhancing Technologies pp. 46-66, DOI 10.1007/3-540-44702-4_4, , <>.
Perng, G., Reiter, M., and C. Wang, "Censorship Resistance Revisited", Information Hiding pp. 62-76, DOI 10.1007/11558859_6, , <>.
Polleres, A., Kamdar, M., Fernández, J., Tudorache, T., and M. Musen, "A more decentralized vision for Linked Data", Semantic Web Vol. 11, pp. 101-113, DOI 10.3233/SW-190380, , <>.
Shelby, Z., Hartke, K., and C. Bormann, "The Constrained Application Protocol (CoAP)", RFC 7252, DOI 10.17487/RFC7252, , <>.
Kutscher, D., Ed., Eum, S., Pentikousis, K., Psaras, I., Corujo, D., Saucez, D., Schmidt, T., and M. Waehlisch, "Information-Centric Networking (ICN) Research Challenges", RFC 7927, DOI 10.17487/RFC7927, , <>.
Wilcox-O'Hearn, Z., "Drew Perttula and Attacks on Convergent Encryption", , <>.


The most recent version of the specification is published at

0.4.0 (13. May 2022)

Permanent link


  • Encode the level of a block in the ChaCha20 nonce used to encrypt a block as a mitigation against attacks based on block level confusion
  • Use unkeyed Blake2b to compute encryption key of internal nodes. This allows the key in the read capability to be verified if tree level is larger than 0.
  • Add DATA INTEGRITY to requirement on channel between publisher and audience in threat model
  • Update URN scheme to erisx3
  • Syntax of pseudocode uses less ALL CAPS
  • Clarify usage of lists and sets in pseudocode
  • Clarify usage of terms nodes and blocks
  • Restructure description of encoding and decoding algorithms
  • Reformat to IETF xml2rfc (see RFC7991)


  • More figures to illustrate encoding and non-regular cases
  • Recommendation for encoding references to blocks as URNs
  • Negative test vectors
  • Pseudocode for padding algorithms

0.3.0 (11. January 2022)

Permanent link


  • CBOR Tag for ERIS binary read capability
  • GANA GNU Name System record types entry
  • Security considerations


  • Off-by-one errors in specification of PAD and UNPAD
  • Simplify pseudocode implementation of Split-Content by reading from padded content


  • Encoding of block size in binary read capability: Use 0x0a for block size 1KiB (instead of 0x00) and 0x0f for block size 32KiB (instead of 0x01)
  • Remove confidentiality from objectives and add intermediary peer deniability and censorship resistance
  • Test vectors are provided in a tar archive

0.2.0 (7. December 2020)

Major update of encoding that removes the verification capability - ability to verify integrity of content without reading content.

Permanent link

0.1.0 (11. June 2020)

Initial version.

Permanent link



Author's Address

Endo Renberg