Encoding for Robust Immutable Storage (ERIS)

This document describes the Encoding for Robust Immutable Storage (ERIS). ERIS is an encoding of arbitrary content into a set of uniformly sized, encrypted and content-addressed blocks as well as a short identifier (a URN). The content can be reassembled from the blocks only with this identifier. The encoding is defined independent of any storage and transport layer or any specific application. We illustrate how ERIS can be used as to build robust and decentralized applications.

Warning

This specification is not yet stable or released. We are working towards a stable 1.0.0 release. Join the mailing list for updates.

1. Introduction

Unavailability of content on computer networks is a major cause for reduced reliability of networked services [Polleres2020].

Availability can be increased by caching content on multiple peers. However most content on the Internet is identified by its location. Caching location-addressed content is complicated as the content receives a new location.

An alternative to identifying content by its location is to identify content by its content itself. This is called content-addressing. The hash of some content is computed and used as an unique identifier for the content.

Caching content-addressed content and making it available redundantly is much easier as the content is completely decoupled from any physical location. Integrity of content is automatically ensured with content-addressing (when using a cryptographic hash) as the identifier of the content can be computed to check that the content matches the requested identifier.

However, naive content-addressing has certain drawbacks:

Large content is stored as a large chunk of data. In order to optimize storage and network operations it is better to split up content into smaller uniformly sized blocks and reassemble blocks when needed.
Unencrypted: Content is readable by all peers involved in transporting, caching and storing content.

ERIS addresses these issues by splitting content into small uniformly sized and encrypted blocks. These blocks can be reassembled to the original content only with access to a short read capability, which can be encoded as an URN.

Encodings similar to ERIS are already widely-used in applications and protocols such as GNUNet (see Section 1.3), BitTorrent [BEP52], Freenet [Freenet] and others. However, they all use slightly different encodings that are tied to the respective protocols and applications. ERIS defines an encoding independant of any specific protocol or application and decouples content from transport and storage layers. ERIS may be seen as a modest step towards Information-Centric Networking [RFC7927].

1.1. Objectives

The objectives of ERIS are:

Availability: Content encoded with ERIS can be easily replicated and cached to increase the availability of content.
Data integrity: Integrity of content is verified while decoding content from blocks.
Intermediary Peer Deniability: Intermediary peers, who are storing and transporting encoded blocks without access to a read capability, can claim that decrypting encoded content is infeasible for them.
Censorship Resistance: An adversary who does not have access to a read capability of some encoded content can not selectively block access to the content without blocking access to all content.
Deterministic Identifiers: The read capability can be used as a deterministic identifier of the encoded content.
URN reference: ERIS encoded content can be referrenced with a single URN (the encoded read capability).
Storage efficiency: ERIS can be used to encode small content (< 1 kibibyte) as well as large content (> many gibibyte) with reasonable storage overhead.
Simplicity: The encoding should be as simple as possible in order to allow correct implementation on various platforms and in various languages.

Confidentiality is not an objective of ERIS and ERIS MAY NOT be used to ensure that content is kept secret from an adversary.

See Section 4 for security considerations.

1.2. Scope

ERIS describes how arbitrary content (sequence of bytes) can be encoded into a set of uniformly sized blocks and an identifier with which the content can be decoded from the set of blocks.

ERIS does not prescribe how the blocks should be stored or transported over network. The only requirement is that a block can be referenced and accessed (if available) by the hash value of the contents of the block. In section Section 3.1 we show how existing technology can be used to store and transport blocks.

There is also no support for grouping content or mutating content. In section Section 3.3 we describe how such functionality can be implemented on top of ERIS.

ERIS is an attempt to find a minimal common basis on which higher functionality, such as mutability, can be built.

1.3. Previous work

ERIS is inspired and based on the encoding used in the file-sharing application of GNUNet - Encoding for Censorship-Resistant Sharing (ECRS) [ECRS].

ERIS differs from ECRS in following points:

Cryptographic primitives: ECRS itself does not specify any cryptographic primitives. The GNUNet implementation uses the SHA-512 hash and AES cipher. ERIS uses the Blake2b-256 cryptographic hash [RFC7693] and the ChaCha20 stream cipher [RFC8439]. This improves performance, storage efficiency (as hash references are smaller) and allows a convergence secret to be used (via Blake2b keyed hashing; see Section 2.3).
Block size: ECRS uses a fixed block size of 32 KiB. This can be very inefficient when encoding many small pieces of content. ERIS allows a block size of 1 KiB or 32 KiB, allowing efficient encoding of small and large content (see Section 2.2).
URN: ECRS does not specify an URN for referring to encoded content (this is specified as part of the GNUNet file-sharing application). ERIS specifies an URN for encoded content regardless of encoding application or storage and transport layer (see Section 2.7).
Namespaces: ECRS defines two mechanisms for grouping and discovering encoded content (SBlock and KBlock). ERIS does not specify any such mechanisms (see Section 3.3).

Other related projects include Tahoe-LAFS, Freenet and Datashards. The reader is referred to the ECRS paper [ECRS] for an in-depth explanation and comparison of related projects.

1.4. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

We use binary prefixes for multiples of bytes, i.e: 1024 bytes is 1 kibibyte (KiB), 1024 kibibytes is 1 mebibyte (MiB) and 1024 mebibytes is 1 gigibytes (GiB).

2. Specification of ERIS

2.1. Cryptographic Primitives

The cryptographic primitives used by ERIS are a cryptographic hash funciton, a symmetric key cipher and a padding algorithm. The hash function and cipher are readily available in open-source libraries such as libsodium or Monocypher. The padding algorithm can be implemented with reasonable effort.

2.1.1. Cryptographic Hash Function

We use Blake2b [RFC7693] with output size of 256 bit (32 byte). The keying feature is used we refer to the key used for keying Blake2b as the hashing key. The hashing key always has a size of 256 bit (32 byte) (see Section 2.3).

Provides the functions Blake2b-256(INPUT, HASHING-KEY) for keyed hashing and Blake2b-256(INPUT) for unkeyed hashing.

2.1.2. Symmetric Key Cipher

We use the ChaCha20 (IETF variant) [RFC8439] stream cypher. Provides ChaCha20(INPUT, KEY), where INPUT is an arbirtarty length byte sequence and KEY is the 256 bit encryption key. The output is the encrypted byte sequence.

The 32 bit initial counter as well as the 96 bit nonce are set to null (sequence of 0x00 bytes). We can safely use the null nonce as we never reuse a key.

Decryption is done with the same function where INPUT is the encrypted byte sequence.

2.1.3. Padding Algorithm

We use a byte padding scheme to ensure that input content size is a multiple of a block size. Provides following functions:

PAD(INPUT,BLOCK-SIZE): For INPUT of size n adds a mandatory byte valued 0x80 (hexadecimal) to INPUT followed by m < BLOCK-SIZE bytes valued 0x00 such that n + m + 1 is a multiple of BLOCK-SIZE.
UNPAD(INPUT,BLOCK-SIZE): Starts reading bytes from the end of INPUT until a 0x80 is read and then returns bytes of INPUT before the 0x80. Throws an error if a value other than 0x00 is read before reading 0x80 or if no 0x80 is read after reading BLOCK-SIZE bytes from the end.

This is the padding algorithm implemented in libsodium^[1].

2.2. Block Size

ERIS uses two block sizes: 1KiB (1024 bytes) and 32KiB (32768 bytes). The block size must be specified when encoding content.

Both block sizes can be used to encode content of arbitrary size. The block size of 1KiB is an optimization towards smaller content.

The block size is encoded in the read capability and the decoding process is capable of handling both block sizes.

Implementations MUST suppport encoding and decoding content with both block sizes (1KiB and 32KiB).

2.2.1. Recommendation on Block Size Choice

Applications are RECOMMENDED to use a block size of 1KiB for content smaller than 16KiB and a block size of 32KiB for larger content.

When using block size 32KiB to encode content smaller than 1KiB, the content will be encoded in a 32KiB block. This is a storage overhead of over 3100%. When encoding very many pieces of small content (e.g. short messages or cartographic nodes) this overhead might not be acceptable. On the other hand, using block size 1KiB to encode large content is also not efficient, as the content is split into many small 1KiB blocks and must be reassembled using internal nodes (see Section 2.4.3). When encoding larger content it is more efficient to use a block size of 32KiB. Using 16KiB as a breaking point is reasonable for most applications.

Note that the best choice of block size may depend on other factors such as number of round-trips to the storage layer. Content larger than 1KiB encoded with block size 1KiB will always be encoded in multiple levels, requiring multiple calls to a storage layer. For certain applications it might be better to minizmize the number of calls to the storage layer at the cost of higher storage overhead.

In other applications the size of the content to be encoded might not be known when encoding starts and block size must be chosen (see Section 2.4.4). In such cases applications should use appropriate heuristics.

2.3. Convergence Secret

Using the hash of the content as key is called convergent encryption.

Because the hash of the content is deterministically computed from the content, the key will be the same when the same content is encoded twice. This results in de-duplication of content, as well as deterministic identifiers. Both are useful properties for certain applications.

However, convergent encryption suffers from two known attacks that allow adversaries to either confirm the presence of some encoded content or even learn the content when parts are predictable (see Section 4.6 for details). A solution to both attacks is to use a convergence secret.

ERIS allows a 32 byte convergence secret to be specified when encoding some content. Using different convergence secrets to encode the same content will result in different blocks and different read capabilities. This prevents deterministic identifiers and de-duplication, but allows a slightly stronger form of censorship resistance (see Section 4.5).

A group using a shared convergence secret can benefit from the advantages of convergenct encryption (de-duplication and deterministic identifiers) while being safe against certain attacks from adversaries that do not know the convergence secret.

The convergence secret only needs to be provided during encoding. The content can be decoded without access to the convergence secret (see Section 2.5).

If no convergence secret is specified a null convergence secret MUST be used (32 bytes of zeroes).

The convergence secret is implemented as the keying feature of the Blake2 cryptographic hash [RFC7693].

2.4. Encoding

Inputs to the encoding process are:

CONTENT: An arbitary length byte sequence of content to be encoded.
CONVERGENCE-SECRET: A 256 bit (32 byte) byte sequence (see Section 2.3).
BLOCK-SIZE: The block size used for encoding in bytes can be either 1024 (1KiB) or 32768 (32KiB) (see Section 2.2).

Content is encoded by first splitting into uniformly sized blocks, encrypting the blocks and computing references to the blocks. If there are multiple references to blocks they are collected in nodes that have the same size as content blocks. The nodes are encrypted and references to the nodes are computed. This process is repeated until there is a single root reference.

References to nodes and blocks of content consist of a reference to an encrypted block and a key to decrypt the block - a reference-key pair. The process of encrypting a block and computing a reference-key pair is explained in Section 2.4.2.

The encoding process constructs a tree of reference-key pairs that reference nodes that hold references to nodes of a lower level or to content.

The number of reference-key pairs collected into a node is called the arity of the tree and depends on the block size. For block size 1KiB the arity of the tree is 16, for block size 32KiB the arity is 512.

An encoding of a content that is split into eight blocks is depicted in Figure 1. For illustration purposes the tree is of arity 2 (instead of 16 or 512).

Figure 1. Encoding of content as tree. Solid edges are concatenations of reference-key pairs as described in Section 2.4.3. Dotted edges are encryption and computation of reference-key pairs as described in Section 2.4.2.

The block size, the level of the root reference and the root reference-key pair are the necessary pieces of information required to decode content. The tuple consisting of block size, level, root reference and key is called the read capability.

The encrypted blocks and the read capability are the outputs of the encoding process.

A pseudo-code implementation of the encoding process is provided in the following. Note that the pseudo-code implementation is naive and given for illustration purposes only. It is RECOMMENDED that imlementations use a streaming encoding process (as described in Section 2.4.4) which allows encoding of content larger than the available memory.

ERIS-Encode(CONTENT, CONVERGENCE-SECRET, BLOCK-SIZE):
    // initialize empty list of blocks to be output
    BLOCKS := []

    // initialize level to 0
    LEVEL := 0

    // split the input content into uniformly sized blocks and encode
    LEVEL-0-BLOCKS, RK-PAIRS := Split-Content(CONTENT, CONVERGENCE-SECRET, BLOCK-SIZE)

    // add blocks from level 0 to blocks to be output
    BLOCKS := BLOCKS ++ LEVEL-0-BLOCKS

    // loop until there is a single root reference
    WHILE Length(RK-PAIRS) > 1:
        LEVEL-BLOCKS, RK-Pairs := Collect-RK-Pairs(RK-PAIRS, CONVERGENCE-SECRET, BLOCK-SIZE)

        // add blocks to blocks to be output and increase the level counter
        BLOCKS := BLOCKS ++ LEVEL-BLOCKS
        LEVEL := LEVEL + 1

    // extract the root reference-key pair
    ROOT-RK-PAIR := RK-PAIRS[0]
    ROOT-REFERENCE, ROOT-KEY := ROOT-RK-PAIR

    // return blocks and read-capability
    RETURN BLOCKS, BLOCK-SIZE, LEVEL, ROOT-REFERENCE, ROOT-KEY

The sub-process Split-Content and Collect-RK-Pairs are explained in the following sections.

2.4.1. Splitting Input Content into Blocks

Input content is first padded so that it can be split into blocks of size exactly block size. The content blocks are encrypted and the encrypted blocks as well as reference-key pairs are returned.

A pseudo code implementation:

Split-Content(CONTENT, CONVERGENCE-SECRET, BLOCK-SIZE):
    // initialize list of blocks and reference-key pairs to output
    BLOCKS := []
    RK-PAIRS := []

    // pad content
    PADDED := PAD(CONTENT, BLOCK-SIZE)

    // read blocks of size BLOCK-SIZE from PADDED
    WHILE CONTENT-BLOCK, LAST? := READ(PADDED, BLOCK-SIZE):

        // because of padding CONTENT-BLOCK always has length BLOCK-SIZE
        Assert(Length(CONTENT-BLOCK) == BLOCK-SIZE)

        // encrypt the content block
        ENCRYPTED-BLOCK, RK-PAIR := Encrypt-Block(CONTENT-BLOCK, CONVERGENCE-SECRET)

        BLOCKS := BLOCKS ++ [ENCRYPTED-BLOCK]
        RK-PAIRS := RK-PAIRS ++ [RK-PAIR]

    RETURN BLOCKS, RK-PAIRS

2.4.2. Encrypt Block and Compute Reference-Key Pair

A reference-key pair is a pair consisting of a reference to an encrypted block and the key to decrypt the block. Reference and key are both 32 bytes long. The concatenation of a reference-key pair is 64 bytes long (512 bits).

The Encrypt-Block function encrypts a block and returns the encrypted block along with the reference-key pair:

Encrypt-Block(INPUT, CONVERGENCE-SECRET):
    KEY := Blake2b-256(INPUT, CONVERGENCE-SECRET)
    ENCRYPTED-BLOCK := ChaCha20(INPUT, KEY)
    REFERENCE := Blake2b-256(ENCRYPTED-BLOCK)
    RETURN ENCRYPTED-BLOCK, REFERENCE, KEY

The convergence secret MUST NOT be used to compute the reference to the encrypted block.

2.4.3. Collect Reference-Key Pairs in Nodes

Reference-key pairs are collected into nodes of size block size by concatenating reference-key pair. The node is encrypted, and a reference-key pair to the node is computed. This results in a sequence of reference-key pairs that refer to nodes containing reference-key pairs at a lower level - a tree.

If there are less than arity number of references-key pairs to collect in a node, then the node is filled with missing number of null reference-key pairs - 64 bytes of zeros. The size of a node is always equal the block size (implemented with the FILL-WITH-NULL-RK-PAIRS function).

A pseudo-code implementation of Collect-RK-Pairs:

Collect-RK-Pairs(INPUT-RK-PAIRS, CONVERGENCE-SECRET, BLOCK-SIZE):
    // number of reference-key pairs in a node
    ARITY := BLOCK-SIZE / 64

    // initialize blocks and reference-key pairs to output
    BLOCKS := []
    OUTPUT-RK-PAIRS := []

    // take ARITY reference-key pairs from INPUT-RK-PAIRS at a time
    WHILE RK-PAIRS-FOR-NODE := TAKE(INPUT-RK-PAIRS, ARITY):
        // make sure there are exactly ARITY reference-key pairs in node
        RK-PAIRS-FOR-NODE := FILL-WITH-NULL-RK-PAIRS(RK-PAIRS-FOR-NODE, ARITY)

        // concat reference-key pairs to node
        NODE := CONCAT(RK-PAIRS-FOR-NODE)

        // encrypt node and compute reference-key pair
        BLOCK, RK-TO-NODE := Encrypt-Block(NODE, CONVERGENCE-SECRET)

        // add node to output
        BLOCKS := BLOCKS ++ [BLOCK]
        OUTPUT-RK-PAIRS := OUTPUT-RK-PAIRS ++ [RK-TO-NODE]

    RETURN BLOCKS, OUTPUT-RK-PAIRS

2.4.4. Streaming

The encoding process can be implemented to encode a stream of content while immediately outputting encrypted blocks when ready and eagerly collecting reference-key pairs to nodes. This allows the encoding of larger-than-memory content.

For an example, see the reference Guile implementation.

2.5. Decoding

Given an ERIS read capability and access to blocks via a block-storage the content can be decoded.

ERIS-Decode-Recurse(LEVEL, REFERENCE, KEY):
    IF LEVEL == 0:
        ENCRYPTED-CONTENT-BLOCK := Block-Storage-Get(REFERENCE)
        RETURN ChaCha20(CONTENT-BLOCK, KEY)
    ELSE:
        ENCRYPTED-NODE := Block-Storage-Get(REFERENCE)
        NODE := ChaCha20(ENCRYPTED, KEY)
        OUTPUT := []
        WHILE SUB-REFERENCE, SUB-KEY := Read-RK-Pair-From-Node(NODE):
            OUTPUT := OUTPUT ++ [ERIS-DECODE-Recurse(LEVEL - 1, SUB-REFERENCE, SUB-KEY)]
        RETURN CONCAT(OUTPUT)

ERIS-Decode(BLOCK-SIZE, LEVEL, ROOT-REFERENCE, ROOT-KEY):
    PADDED := ERIS-Decode-Recurse(LEVEL, ROOT-REFERENCE, ROOT-KEY)
    RETURN UNPAD(PADDED, BLOCK-SIZE)

Where the block-storage can be accessed as follows:

Block-Storage-Get(REFERENCE): Returns a block such that Blake2b-256(Block-Storage-Get(REFERENCE)) == REFERENCE or throws an error.

A streaming decoding procedure can be implemented where the content can be output block wise and does not need to be kept in memory for unpadding. For an example, see the reference Guile implementation.

2.5.1. Random Access

A decoder that allows random access to the encoded content can be implemented by decoding selected sub-trees.

2.6. Binary Encoding of Read Capability

The read-capability consisting of the block-size, level of root reference-key pair as well as the root reference-key pair form the necessary pieces of information required to decode content.

We specify an binary encoding of the read-capability 66 bytes:

Byte offset	Content	Length (in bytes)
0	block size (`0x0a` for block size 1KiB and `0x0f` for block size 32KiB)	1
1	level of root reference-key pair as unsigned integer	1
2	root reference	32
34	root key	32

The initial field (block size) also encodes the ERIS version. Future versions of ERIS MUST use different codes to encode block sizes.

Note that using a single byte to encode the level limits the size of content that can be encoded with ERIS. However, the size of the largest encodable content is approximately 1e300 TiB, which seems to be sufficient for any conceivable practical applications (including an index of all atoms in the universe).

2.6.1. CBOR Tag

The CBOR tag 276 is assigned for a ERIS binary read capability (see Section 8.1). This allows efficient references to ERIS encoded content from CBOR.

2.6.2. GNU Name System

The GNU Name System [LSD0001] is a decentralized and censorship-resistant name system that can be used to resolve memorable names to secure identifiers. A GNU Name System record type for ERIS read capabilities is defined (see Section 9.1) allowing any ERIS encoded content to be associated with memorable names.

2.7. URN

A read-capability can be encoded as an URN: urn:erisx2:BASE32-READ-CAPABILITY, where BASE32-READ-CAPABILITY is the unpadded Base32 [RFC4648] encoding of the read capability.

For example the ERIS URN of the UTF-8 encoded string "Hello world!" (with block size 1KiB and null convergence secret):

urn:erisx2:BIAD77QDJMFAKZYH2DXBUZYAP3MXZ3DJZVFYQ5DFWC6T65WSFCU5S2IT4YZGJ7AC4SYQMP2DM2ANS2ZTCP3DJJIRV733CRAAHOSWIYZM3M

Note	The URN namespace `erisx2` is used for this experimental version of the encoding. Once finalized the namespace `eris` will be used (e.g. `urn:eris:BIAD77QDJMFAKZYH2DXBUZYAP3MXZ3DJZVFYQ5DFWC6T65WSFCU5S2IT4YZGJ7AC4SYQMP2DM2ANS2ZTCP3DJJIRV733CRAAHOSWIYZM3M`)

3. Applications

Traditionally encoding schemes similar to ERIS are used for peer-to-peer filesharing. We hope to motivate usage for a much wider scope of applications.

As part of the openEngiadina project we are using ERIS to encode small bits of information that constitute "local knowledge" (e.g. geogrpahic information, social and cultural events, etc.) along with the social interactions that created and curated this information (using the ActivityStreams vocabulary [ActivityStreams]). ERIS allows such information to be securely cached on multiple peers to increase the robustness of the system.

ERIS encoded content can be used from existing web technology and RDF as the content can be referenced by an URN. At the same time more decentralized networks can be used (this will be further research as part of the DREAM project).

Other possible applications include package managers such as Guix to increase availability of software sources and built packages or decentralized and offline-first mapping applications.

3.1. Storage and Transport Layers

ERIS is defined indepenedant of any storage and transport layer for blocks. The only requireiment is that blocks can be accessed by their reference - the hash of the block content.

Possible storage layers include:

in-memory hash-map
key-value store
files on a file system

Transport mechanisms include:

HTTP: A simple HTTP endpoint can be used to dereference blocks.
Sneakernet: Blocks can be transported on a physical medium such as a USB stick.

More interesting transport and storage layers use the fact that blocks are content-addressed. For example the peer-to-peer network IPFS can be used to store and transport blocks. The major advantages over using IPFS directly include:

Blocks are encrypted and not readable to IPFS peers without the read capability.
Identifier of blocks and encoded content are not tied to the IPFS network. Applications can transparently use IPFS or any other storage/transport mechanism.

It also seems possible to use the Named Data Networking infrastructure and forwarding daemons (initial support for using Blake2b as hash function is present in the ndn-cxx library).

3.2. Authenticity of Content

While decoding ERIS encoded content the integrity of the content is verified. Content can not be tampered with wihtout changing the identifier (read capability) of the content. To prove authenticity of encoded content it is sufficient to cryptographically sign the read capability.

We have presented a concrete proposal on how this might be done using a RDF vocabulary and the Ed25519 cryptographic signature scheme [RDF-Signify].

3.3. Mutability and Namespaces

Encoded content is immutable in the sense that changing the encoded content results in a new identifier. Existing references to the old content need to be updated. This is a property that makes caching efficient and allows ERIS to be used for robust systems.

Nevertheless, there are applications where one wants to reference mutable content. Examples include user profiles or dynamic collections of content. Making small changes to a user profile or adding a piece of content to a collection should preserve the identifiers.

There are many ways of implementing such mutability or "namespaces". ERIS does not specify any particular mechanism. Possible mechanisms include:

Centralized servers that returns a mutable list of reference to (immutable) content. This is how most HTTP services work.
Append-only logs where changes are securely appended with cryptographic signatures. The state is computed from the log of changes. This is how peer-to-peer systems such as hypercore or Secure ScuttleButt work.
Petname system: A system where a dynamic local name can be mapped to a reference. Sophisticated systems that allow delegation of naming authority include the GNU Name System.
Commutative Replicated Data Types (CRDTs) are distributed datastructures similar to append-only logs with the advantage that the state of a mutable container can diverge and converge to consistent state eventually. Such structures seem especially suitable when control over a mutable container is shared by multiple parties. For an example see Distributed Mutable Containers [DMC].

We believe that the best suited mechanism for handling mutability depends on concrete applications and use-cases. A key value of ERIS is that it is agnostic of such mechanisms and can be used from any of them.

4. Security Considerations

In this section we discuss security considerations when using ERIS as an abstract encoding as well as when used in conjunction with storage and transport layers (see Section 3.1).

We use terms for communication security as defined in RFC 3552 [RFC3552] (e.g. CONFIDENTIALITY or PEER ENTITY AUTHENTICATION).

4.1. Threat Model

We consider a setting with following entities:

Publisher: Wants to publish some content.
Audience: A group of entities that should be able to decode the content published by the publisher. They receive the ERIS read capability from the publisher over a channel that provides CONFIDENTIALITY and PEER ENTITY AUTHENTICATION.
Intermediary Peers: A group of entities that assist in making the content available to the audience by storing and transporting blocks of the encoded content. There are no communication security requirements for the communication between the intermediary peers and publisher or audience. Note that the publisher as well as members of the audience can act as intermediary peers.
Censor: An adversary that wishes to prevent the audience from being able to decode some specific content. The censor does not have access to the read capability of the encoded content but may inspect, modify or drop communication between the intermediary peers and the audience. The censor does not have access and can not control the internal state of the publisher, audience or intermediary peers. The censor can impersonate a malicious intermediary peer.

See also the ECRS paper [ECRS] and the theoretical treatment of censorship resistance by Perng et al. [Perng2005].

4.2. Availability

The publisher can make the blocks of the encoded content available by replicating them to intermediary peers and audience directly over various different storage and transport layers (see Section 3.1).

4.3. Data Integrity

Members of the audience can verify the integrity of the content while decoding by verifying the Blake2b hash of a block (see Section 2.5). Even when a malicious intermediary peer is distributing invalid blocks, this will be detected by the internal decoding process run by the audience.

Intermediary peers can pro-actively detect invalid blocks by checking the Blake2b hash of the block.

Note that the ERIS read capability can not be used to verify integrity of content when the content is given directly and not decoded from blocks. When the content is given in as a sequence of bytes, the only way to compute the read capability is to encode the content. However, the read capability to verify might have been computed using a convergence secret (see Section 2.3) that is not known, making it impossible to verify that the content corresponds to the read capability.

4.4. Intermediary Peer Deniability

Intermediary peers do not need to have access to the read capability in order to store and transport blocks of encoded content. As the blocks only contain encrypted data, intermediary peers can plausibly deny being able to decode the content.

Note that in certain situations an active attack that can reveal parts of the encoded content is possible (see Section 4.6).

4.5. Censorship Resistance

We define censorship resistance as the inability of a censor who does not have access to the read capability of some content to prevent members of the audience from decoding the specific content without preventing access from all content (i.e. DENIAL OF SERVICE).

This holds as a censor that does not have access to the read capability of some content can not decide if a given block is required to decode the specific content or is a block from the encoding of some other content.

Note that a censor can prevent the audience from decoding any content by dropping all communication to intermediary peers - the censor can perform a DENIAL OF SERVICE attack. Practically a DENIAL OF SERVICE attack can be made difficult by replicating blocks and using various different storage and transport layers (see Section 4.2).

Our definition of censorship resistance is slightly stronger than that of ECRS [ECRS]. In ECRS the censor may not know the exact content, whereas in ERIS the censor may not know the read capability. If the censor does know the exact content that should be censored than we can use a fresh convergence secret to create a read capability that the censor does not know (see Section 2.3).

4.6. Known Attacks on Convergent Encryption

Convergent encryption allows de-duplication of content and deterministic identifiers. However, it also suffers from two known attacks [Zooko2008]:

The Confirmation Of A File Attack: An adversary who knowns the read capability of some ERIS encoded content can enumerate all blocks that are required to decode the content. The adversary can now confirm that a member of the audience is accessing some content by observing which blocks are being accessed.
The Learn-The-Remaining-Information Attack: When encoding content where an adversary can predict parts of the content, the remaining information may be learnt by brute forcing the remainder. For example, this might be a configuration file where the adversary knows the entire content except for a single field containing a password. The adversary can then brute force the password and confirm that a configuration with a certain password has been accessed. Note that the brute-forcing effort can be reused to efficiently learn many secrets, similar to how rainbow tables are used to crack password hashes.

ERIS is vulnerable to both when using a convergent secret that is known to the adversary. A defense against both attacks is to use a convergence secret that is not known by the adversary. Using a different convergence secret causes the same content to be encoded into different blocks and identifiers.

De-duplication and deterministic identifiers are both properties that may be important to certain applications and users. Users should be aware of the known attacks and must decide depending on application and context on wheter mitigations are necessary.

4.7. Observing Block Access

A passive adversary that only observes communication between audience and intermediary peers might be able to learn information about encoded content from the pattern in which blocks are accessed by members of the audience. For example an adversary might be able to infer that certain blocks are part of the encoding of a video with some resolution by observing that blocks are fetched with predictable intervals and in a fixed order.

Storage and transport layers SHOULD use encryption to prevent passive network attackers from being able to observe such patterns.

Members of the audience MAY use obfuscation tactics when getting blocks from storage and transport layers to prevent malicious intermediary peers from being able to observe such patterns.

5. Test Vectors

5.1. Machine Readable

The set of test vectors are provided as machine-readable JSON files in the archive eris-test-vectors-v0.3.0.tar.gz.

Implementations of the ERIS encoding MUST be able to satisfy the test vectors as described below.

For example the test vector eris-test-vector-00.json:

{
  "id": 0,
  "spec-version": "0.3.0",
  "name": "short string (block size 1KiB)",
  "description": "Encode the UTF-8 encoding of the string \"Hello world!\" with block-size 1KiB and null convergence-secret.",
  "content": "JBSWY3DPEB3W64TMMQQQ",
  "convergence-secret": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA",
  "block-size": 1024,
  "read-capability": {
    "block-size": 1024,
    "level": 0,
    "root-reference": "H77AGSYKAVTQPUHODJTQA7WZPTWGTTKLRB2GLMF5H53NEKFJ3FUQ",
    "root-key": "CPTDEZH4ALSLCBR7INTIBWLLGMJ7MNFFCGX7PMKEAA52KZDDFTNQ"
  },
  "urn": "urn:erisx2:BIAD77QDJMFAKZYH2DXBUZYAP3MXZ3DJZVFYQ5DFWC6T65WSFCU5S2IT4YZGJ7AC4SYQMP2DM2ANS2ZTCP3DJJIRV733CRAAHOSWIYZM3M",
  "blocks": {
    "H77AGSYKAVTQPUHODJTQA7WZPTWGTTKLRB2GLMF5H53NEKFJ3FUQ": "EWZKXK73236ETFGMMFORFLMNIPE5V3S3WDVFECUPI47RFJBA5ZMBHH6HMOZCNFQKOTADCPJMPHTZJNEW4VOKHBSNABYVIZWQDV5GQPECUBDAULOPR2S7ITYQSGGVPPEWJVEZNIUKUFR4XE7GQPUDY3FPFSCUYIISZX6PWLLPNPI5V3RKWQGN2L6LLE5G7TZ5FVPAYUHOES4LGHRKSXYCNQF6IR5HLKX2C2EPVKSU2T6XOSAF5VHUZ2GTQS7BLT3VYP5BYI2WR4GJEYDWLY26TK6ZQ2DYZZBIYSVUIY557FE6QOV3L5X5HCAQEWPYCUKUADOOSMNU7EEONPRMBJU4XLQ66AOOVRQ66OJLHANVLNFDXXPLH6KDVCJBVQWWWI7PA6OGKGPU7ZZPZT2DIBOAUGWM6DVZWWX3DA3GHWS3VY6RQMLAKDXHZRQ6VDVMLMFSULJYHACC7G57CZ2SG7XB24XT3SLJG56PO3Z7YJJYEVP44F44YCZ5YS4NRZKWS4OTFMXGNF25G3GIGSV5NEVVTSO6J5EKEXWTX74X27HYI4UZ45YF675423AWYUVTPVLUWOJMGANQRDWYOPFE5QH6JUINCH5NYZUHYPZP6WHC4IVOLYFDAUNOWLRVR37BLT5E44VVJ6XDQZAS2XT6G2XM3RJUUQEYD2RRFBWGPNSOJ2RUPE654GKHRDCKUX2MZ6D43LKI2DKCF7QEPYWJWJH6EI74NQNOLCHEAUFEXH5ZXUXO6JJ5PKOXGL4RBOGCP2X2RYXOJOCT55BAGCRQHID2TRO7NPZWGQNMSSWHOAXY6JFCVFXXGR4JM62HHXZTKODD7NYXO7EUS3GMY2NDQFENM3XKNAI5MFNLL7ERMPSIXHAJ44ASIDZS7RPZ542SLH7XONZ6PMCPI4V66ALJJTTXMJAEU35YPH3UD7UHBCM4OI3SDGTUL3TQQWMDIFBNECJN7FNAWRXTWCXM6CIILVYAITWSEDIDEMLBKR5KIGE5SQTW2ITIIA725SNZO3PJMQCAPJI4H3QXVPKG4OZIOTENU2VW3W3PNAYVE65YJBQGPY6M6LRQYGPYYSEFTRPW3YXGGC2ICFROUD7FXCFXVD6OWA4B6LDFDX4LPF4H7525BVRBNW2ZLMXZUXCFZSZOSSP7VKBCWIDJ72XSR43YFKTL5TADVXDF3RN2HHAGKXWOXINMJJLRE4K72H54IOROFS4FD5QYXWSJWH4ENYC4PAOJ6JELRFYC6RMXP73VR745WY4ZOFQTRQ5ZEA2C3M7JTQUVKV26XGVVHBYA7NEMRPZNVRXHCKYN3CGJSICBUFGMHSSDBTRIF3BCPVMLRBU25DFGGM4LEEL4KTIAJITYY5XPR4XDRD55PEDVOUL342IXCNEBTTPPLMPV6EJYUFJS42R4XLDOT7NOFPLTZUBLWSLL7IVZNPNI6DZ4CR7YEQP72DDUWDJJTKACT35JLFPDW3M2VUOJF3CUWN6FYN5YJJSXYMXSVDZDVIAJYF2HOPQEHLMRF3MJAXMTLMCOIARLFZKAGRSW6PWQZ7ZJLCQAPSJTPNDA2SLUA3UHH34NWEPTAVWOBDPNTMT27TK5P4VKLE2YEJHKWE6SJA3V7A3UPQS24SWDJ2BPOV7JG23ZVIA"
  }
}

The fields of JSON test vectors are:

id: Numeric identifier of the test vector.
spec-version: Version of ERIS specification
name: Short human readable name.
description: Human readable description of the test.
content: The binary content to be encoded as Base32 (unpadded) string.
convergence-secret: The convergence secret to be used as Base32 string.
block-size: Block size that should be used for encoding in bytes (either 1024 or 32768).
read-capability: JSON map containing the components of the read capability. This is not used in tests but is here as a help for developers.
urn: The ERIS URN of the content.
blocks: A JSON map of blocks required to decode the content given the URN. Key and field are encoded as Base32 strings.

Implementations MUST verify that the content encodes to the URN given the specified block size and convergence secret and verify that given the URN and blocks the content can be decoded.

5.2. Large content

In order to verify implementations that encode content by streaming (see Section 2.4.4) URNs of large contents that are generated in a specified way are provided:

Test name

Content size

Block size

URN

Level of root reference

100MiB (block size 1KiB)

100MiB

1KiB

urn:erisx2:BICXPZNDNXFLO4IOMF6VIV2ZETGUJEUU7GN4AHPWNKEN6KJMCNP6YNUMVW2SCGZUJ4L3FHIXVECRZQ3QSBOTYPGXHN2WRBMB27NXDTAP24

1GiB (block size 32KiB)

1GiB

32KiB

urn:erisx2:B4BFG37LU5BM5N3LXNPNMGAOQPZ5QTJAV22XEMX3EMSAMTP7EWOSD2I7AGEEQCTEKDQX7WCKGM6KQ5ALY5XJC4LMOYQPB2ZAFTBNDB6FAA

256GiB (block size 32KiB)

256GiB

32KiB

urn:erisx2:B4BZHI55XJYINGLXWKJKZHBIXN6RSNDU233CY3ELFSTQNSVITBSVXGVGBKBCS4P4M5VSAUOZSMVAEC2VDFQTI5SEYVX4DN53FTJENWX4KU

Content is the ChaCha20 stream using a null nonce and the key which is the Blake2b hash of the UTF-8 encoded test name (e.g. KEY := Blake2b-256("100MiB (block size 1KiB)")). The ChaCha20 stream can be computed by encoding a null byte sequence (e.g. CHACHA20_STREAM := ChaCha20(NULL, KEY)).

6. Implementations

A list of known implementations that satisify the test vectors:

Name Programming language License Notes Homepage

Name	Programming language	License	Notes	Homepage
`guile-eris`	Guile	GPL-3.0-or-later	Reference implementation	https://inqlab.net/git/eris.git/
`elixir-eris`	Elixir	GPL-3.0-or-later	Used in the CPub ActivityPub server	https://inqlab.net/git/elixir-eris.git/
`eris`	Go	BSD-3-Clause		https://github.com/cjslep/eris
`eris`	Nim	ISC		https://git.sr.ht/~ehmry/eris
`js-eris`	JavaScript	LGPL-3.0-or-later		https://inqlab.net/git/js-eris.git/

guile-eris

Guile

GPL-3.0-or-later

Reference implementation

https://inqlab.net/git/eris.git/

elixir-eris

Elixir

GPL-3.0-or-later

Used in the CPub ActivityPub server

https://inqlab.net/git/elixir-eris.git/

eris

BSD-3-Clause

https://github.com/cjslep/eris

eris

Nim

ISC

https://git.sr.ht/~ehmry/eris

js-eris

JavaScript

LGPL-3.0-or-later

https://inqlab.net/git/js-eris.git/

Further implementations are under development in Wisp, Python and OCaml.

7. Mailing List

A mailing list for general discussion on ERIS is available at ~pukkamustard/eris@lists.sr.ht.

Please feel free to direct any questions or comments regarding the specification to the mailing list. You are also invited to share your implementations and use-cases.

8. IANA Considerations

8.1. CBOR Tags Registry

This specification requires the assignment of a CBOR tag for a binary ERIS read capability. The tag is added to the CBOR Tags Registry as defined in RFC 8949 [RFC8949].

Tag

Data Item

Semantics

276

byte string

ERIS binary read capability (see Section 2.6)

9. GANA Considerations

9.1. GNU Name System record types registry

GANA [GANA] is requested to add an entry into the "GNU Name System record types" registry as follows:

Number

Name

Comment

References

65557

ERIS_READ_CAPABILITY

Encoding for Robust Immutable Storage (ERIS) binary read capability

http://purl.org/eris

10. Acknowledgments

Thanks to Cory Slep, Arne Babenhauserheide, Serge Wroclawski, Christine Lemmer Webber, Christian Grothoff, Natacha, Hellekin, Nemael, TG, Devan, Emery, Arie, Allen, Joe and many others for the discussions, suggestions and support.

Development of ERIS has been supported by the NLNet Foundation trough the NGI0 Discovery Fund.

Changelog

The most recent version of the specification is published at http://purl.org/eris.

v0.3.0 (11. January 2022)

Added

CBOR Tag for ERIS binary read capability
GANA GNU Name System record types entry
Security considerations

Fixed

Off-by-one errors in specification of PAD and UNPAD
Simplify pseudocode implementation of Split-Content by reading from padded content

Changed

Encoding of block size in binary read capability: Use 0x0a for block size 1KiB (instead of 0x00) and 0x0f for block size 32KiB (instead of 0x01)
Remove confidentiality from objectives and add intermediary peer deniability and censorship resistance
Test vectors are provided in a tar archive

v0.2.0 (7. December 2020)

Major update of encoding that removes the verification capability - ability to verify integrity of content without reading content.

v0.1.0 (11. June 2020)

Initial version.

Copyright

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

Normative References

[GANA] GNUnet e.V., GNUnet Assigned Numbers Authority (GANA), 2020.
[LSD0001] M. Schanzenbach, C. Grothoff, B. Fix, The GNU Name System, 2021.
[RFC2119] S. Bradner, Key words for use in RFCs to Indicate Requirement Levels, 1997.
[RFC3552] Rescorla & Korver, Guidelines for Writing RFC Text on Security Considerations, 2003.
[RFC4648] S. Josefsson, The Base16, Base32, and Base64 Data Encodings, 2006.
[RFC8949] C. Bormann & P. Hoffman. Concise Binary Object Representation (CBOR), 2020.
[RFC7693] M-J. Saarinen and J-P. Aumasson, The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC), 2015.
[RFC8439] Nir and Langley, ChaCha20 and Poly1305 for IETF Protocols, 2018.
[RFC8141] Saint-Andre, Filament and Klensin, Uniform Resource Names (URNs), 2017.

Informative References

[ActivityStreams] Snell and Prodromou, Activity Streams 2.0, 2017.
[BEP52] The BitTorrent Protocol Specification v2, 2017.
[DMC] pukkamustard, Distributed Mutable Containers, 2020.
[ECRS] Grothoff, et al., An encoding for censorship-resistant sharing, 2003.
[Freenet] Clarke, et al., Freenet: A distributed anonymous information storage and retrieval system, 2001.
[Perng2005], Perng, et al., Censorship Resistance Revisited, 2005.
[Polleres2020] Polleres, et al., A more decentralized vision for Linked Data, 2020.
[RDF-Signify] pukkamustard, RDF Signify, 2020.
[RFC7927] Kutscher et. al. Information-Centric Networking (ICN) Research Challenges, 2016.
[Zooko2008] Zooko Wilcox-O’Hearn. Drew Perttula and Attacks on Convergent Encryption, 2008.

1. This padding algorithm is apparently also specified in ISO/IEC 7816-4. However, the speicifcation is not openly available. Fuck you ISO.