This document describes the Encoding for Robust Immutable Storage (ERIS). ERIS is an encoding of arbitrary content into a set of uniformly sized, encrypted and content-addressed blocks as well as a short identifier that can be encoded as an URN. The content can be reassembled from the blocks only with this identifier. The encoding is defined independent of any storage and transport layer or any specific application. We illustrate how ERIS can be used as to build robust and decentralized applications.¶
Unavailability of content on computer networks is a major cause for reduced reliability of networked services [Polleres20].¶
Availability can be increased by caching content on multiple peers. However most content on the Internet is identified by its location. Caching location-addressed content is complicated as the content receives a new location when cached.¶
An alternative to identifying content by its location is to identify content by its content itself. This is called content-addressing. The hash of some content is computed and used as an unique identifier for the content.¶
Caching content-addressed content and making it available redundantly is much easier as the content is completely decoupled from any physical location. Integrity of content is automatically ensured with content-addressing (when using a cryptographic hash) as the identifier of the content can be recomputed to check that the content matches the requested identifier.¶
However, naive content-addressing has certain drawbacks:¶
ERIS addresses these issues by splitting content into small uniformly sized and encrypted blocks. These blocks can be reassembled to the original content only with access to a short read capability, which can be encoded as an URN.¶
Encodings similar to ERIS are already widely-used in applications and protocols such as GNUNet (see Section 1.3), BitTorrent, Freenet [Freenet], Gnutella, Direct Connect [THEX] and others. However, they all use slightly different encodings that are tied to the respective protocols and applications. ERIS defines an encoding independent of any specific protocol or application and decouples content from transport and storage layers. ERIS may be seen as a modest step towards Information-Centric Networking [RFC7927].¶
The objectives of ERIS are:¶
Confidentiality is not an objective of ERIS and ERIS SHOULD NOT be used to ensure that content is kept secret from an adversary.¶
ERIS describes how arbitrary content (sequence of bytes) can be encoded into a set of uniformly sized blocks and an identifier with which the content can be decoded from the set of blocks.¶
ERIS does not prescribe how the blocks should be stored or transported over network. The only requirement is that a block can be referenced and accessed (if available) by the hash value of the contents of the block. In Section 3.1 we show how existing technology can be used to store and transport blocks.¶
There is also no support for grouping content or mutating content. In Section 3.3 we describe how such functionality can be implemented on top of ERIS.¶
ERIS is an attempt to find a minimal common basis on which higher functionality, such as mutability, can be built.¶
ERIS is inspired and based on the encoding used in the file-sharing application of GNUNet - Encoding for Censorship-Resistant Sharing (ECRS) [ECRS].¶
ERIS differs from ECRS in following points:¶
Other related projects include Tahoe-LAFS, Freenet and Datashards. The reader is referred to the ECRS paper [ECRS] for an in-depth explanation and comparison of related projects.¶
This specification is versioned according to Semantic Versioning 2.0.0.¶
The most recent version of the specification is published at http://purl.org/eris.¶
Patch version may be incremented with editorial fixes. Minor version may be incremented with new backwards compatible functionality. Major version will be incremented when backwards incompatible changes are necessary (e.g. update of cryptographic primitives).¶
See also Appendix "Changelog".¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].¶
We use binary prefixes for multiples of bytes, i.e: 1024 bytes is 1 kibibyte (KiB), 1024 kibibytes is 1 mebibyte (MiB) and 1024 mebibytes is 1 gigibytes (GiB).¶
The cryptographic primitives used by ERIS are a cryptographic hash function, a symmetric key cipher and a padding algorithm. The hash function and cipher are readily available in open-source libraries such as libsodium or Monocypher. The padding algorithm can be implemented with reasonable effort.¶
We use Blake2b [RFC7693] with output size of 256 bit (32 byte). The keying feature is used and we refer to the key used for keying Blake2b as the hashing key. The hashing key always has a size of 256 bit (32 byte) (see Section 2.3).¶
The functions provided are Blake2b-256-Keyed(INPUT, HASHING-KEY)
for keyed hashing and Blake2b-256(INPUT)
for unkeyed hashing.¶
We use the ChaCha20 (IETF variant) [RFC8439] stream cypher. The provided function is ChaCha20(INPUT, KEY, NONCE)
, where INPUT
is an arbitrary length byte sequence, KEY
is the 256 bit encryption key and NONCE
is the 96 bit nonce. The output is the encrypted byte sequence.¶
The 32 bit initial counter is set to null.¶
Decryption is done with the same function where INPUT
is the encrypted byte sequence.¶
We use a byte padding scheme to ensure that input content size is a multiple of a block size. The procedures Pad
and Unpad
, as described below, provide the necessary functionality¶
The padding scheme used is the same as the one implemented in the libsodium cryptographic library.¶
The procedure Pad(input, block-size)
given input
of length n
adds a mandatory byte valued 0x80
(hexadecimal) to input
followed by m < block-size
bytes valued 0x00
such that n + m + 1
is the smallest multiple of block-size
.¶
The procedure Unpad(input, block-size)
starts reading bytes from the end of input
until a 0x80
is read and then returns bytes of input
before the 0x80
. The procedure throws an error if a value other than 0x00
is read before reading 0x80
, if no 0x80
is read after reading block-size
bytes from the end of input
or if length of input
is less than block-size
.¶
Implementations MUST check that padding is valid when unpadding. This is verified in the negative test vectors (see Section 2.8.6.2).¶
ERIS uses two block sizes: 1KiB (1024 bytes) and 32KiB (32768 bytes). The block size must be specified when encoding content.¶
Both block sizes can be used to encode content of arbitrary size. The block size of 1KiB is an optimization towards smaller content.¶
The block size is encoded in the read capability and the decoding process is capable of handling both block sizes.¶
Implementations MUST support encoding and decoding content with both block sizes (1KiB and 32KiB).¶
Applications are RECOMMENDED to use a block size of 1KiB for content smaller than 16KiB and a block size of 32KiB for larger content.¶
When using block size 32KiB to encode content smaller than 1KiB, the content will be encoded in a 32KiB block. This is a storage overhead of over 3100%. When encoding very many pieces of small content (e.g. short messages or cartographic nodes) this overhead might not be acceptable. On the other hand, using block size 1KiB to encode large content is also not efficient, as the content is split into many small 1KiB blocks and must be reassembled using internal nodes (see Section 2.4.3). When encoding larger content it is more efficient to use a block size of 32KiB. Using 16KiB as a breaking point is reasonable for most applications.¶
Note that the best choice of block size may depend on other factors such as number of round-trips to the storage layer. Content larger than 1KiB encoded with block size 1KiB will always be encoded in multiple levels, requiring multiple calls to a storage layer. For certain applications it might be better to minimize the number of calls to the storage layer at the cost of higher storage overhead.¶
In other applications the size of the content to be encoded might not be known when encoding starts and block size must be chosen (see Section 2.8.2). In such cases applications should use appropriate heuristics.¶
Using the hash of the content as key is called convergent encryption.¶
Because the hash of the content is deterministically computed from the content, the key will be the same when the same content is encoded twice. This results in de-duplication of content, as well as deterministic identifiers. Both are useful properties for certain applications.¶
However, convergent encryption suffers from two known attacks that allow adversaries to either confirm the presence of some encoded content or even learn the content when parts are predictable (see Section 4.6 for details). A solution to both attacks is to use a convergence secret.¶
ERIS requires a 32 byte convergence secret to be specified when encoding some content. Using different convergence secrets to encode the same content will result in different blocks and different read capabilities. This prevents deterministic identifiers and de-duplication, but allows a slightly stronger form of censorship resistance (see Section 4.5).¶
The convergence secret only needs to be provided during encoding. The content can be decoded from blocks without access to the convergence secret (see Section 2.5).¶
Users who do not want to be vulnerable to the known attacks against convergent encryption and are ready to give up determistic identifiers and de-duplication SHOULD use a random (cryptographically secure) and unique convergent secret for every encoded content. Note that even while using random and unique convergent secrets ERIS SHOULD NOT be used to ensure confidentiality.¶
A group using a shared convergence secret can benefit from the advantages of convergent encryption (de-duplication and deterministic identifiers) while being safe against certain attacks from adversaries that do not know the convergence secret.¶
If the known attacks against convergent encryption are well understood and the advantages of deterministic identifiers and de-duplication outweigh, then the null convergence secret (32 bytes of zeroes) MAY be used.¶
The convergence secret is implemented as the keying feature of the Blake2 cryptographic hash [RFC7693].¶
Inputs to the encoding process are:¶
content
convergence-secret
block-size
The output of the encoding process is a set of uniformly sized blocks and a read capability.¶
The encoding process constructs a tree of uniformly sized nodes. Leaf nodes at the bottom of the tree (level 0) correspond to the input content. Internal nodes (not leaf nodes) consist of references to nodes at lower levels .¶
The maximum number of references to nodes of a lower level in a node is called the arity of the tree and depends on the block size. For block size 1KiB the arity of the tree is 16, for block size 32KiB the arity is 512.¶
A simplified view of a tree constructed by encoding some content that is split into four leaf nodes is given in Figure 3. For illustration purposes the tree is of arity 2 (instead of 16 or 512).¶
Nodes are unencrypted elements of the tree . The encoding process output blocks which are encrypted nodes .¶
As nodes are always stored as encrypted blocks, it is necessary to have the reference to the encrypted block as well as the encryption key to decrypt the node from the block. A reference to a node is a pair consisting of a reference to an encrypted block and the key to decrypt the block. We call such a pair a reference-key pair. The reference is the unkeyed Blake2b hash of the encrypted block (32 bytes) and the key is the ChaCha20 key to decrypt the block (32 bytes). The concatenation of a reference-key pair is 64 bytes long (512 bits).¶
Internal nodes of the encoding tree (i.e. not leaf nodes) are concatenations of reference-key pairs (see Section 2.4.3).¶
The block size, the level of the root node and the reference-key pair to the root node (root reference and root key) are the necessary pieces of information required to decode content (see Section 2.5). The tuple consisting of block size, level, root reference and key is called the read capability.¶
The steps of the encoding process are:¶
A pseudocode implementation of the encoding process:¶
The sub-processes Split-Content
, Encrypt-Leaf-Node
, Encrypt-Internal-Node
and Construct-Internal-Nodes
are explained in the following sections.¶
Note that the pseudocode implementation is provided for illustration purposes. Implementers are RECOMMENDED to implement a more efficient streaming encoder (see Section 2.8.2).¶
The input content is split into uniformly sized blocks of size block-size
. In order that this is always possible, the content is first padded.¶
A pseudo code implementation:¶
Note that when the length of the content is a multiple of the block size, then an entire leaf node will be created with padding. This is necessary as we do not encode the length of the content but require mandatory padding. Figure 7 illustrates such a case.¶
Unencrypted nodes (leaf nodes or internal nodes) are encrypted into blocks. We use two slightly different procedures to encrypt leaf nodes and internal nodes. Leaf nodes use the convergence secret whereas internal nodes do not.¶
The Encrypt-Leaf-Node
procedure is used to encrypt leaf nodes (level 0). It takes an unencrypted leaf node and the convergence secret and returns the encrypted block as well as a reference-key pair to the block:¶
The Encrypt-Internal-Node
procedure is used to encrypt internal nodes (level 1 and above). It takes an unencrypted node and the level of the node as input and returns the encrypted block as well as a reference-key pair to the block:¶
For both Encrypt-Leaf-Node
and Encrypt-Internal-Node
the ChaCha20 nonce is set to a single byte containing the level of the block followed by 11 bytes of null (zero) (for Encrypt-Leaf-Node
the level is 0). By doing this we have implicitly encoded the level of the block in the encrypted block. This prevents certain vulnerabilities where the level of a block is confused.¶
Note that we reuse the same nonce for all blocks at a certain level. This is safe as we always use different keys for different input content.¶
The convergence secret is only used to compute the encryption key of leaf nodes. This is a minor performance optimization and allows the read capability to be strongly verified in certain cases (see Section 2.5, Paragraph 4).¶
Note that the convergence secret is not used to compute the reference to the encrypted block. This allows blocks to be dereferenced and verified without knowing the convergence secret.¶
Internal nodes of the tree (not leaf nodes) are constructed by concatenating at most arity reference-key pairs to nodes of a the next lower level.¶
If there are less than arity number of references-key pairs to collect in a node, then the node is filled up to block-size
with zeroes. This ensures that nodes (and blocks) always have the same size (block-size
).¶
The procedure Construct-Internal-Nodes
takes as input a non-empty list of reference-key pairs and the block size and returns a list of nodes. A pseudocode implementation is provided:¶
Note that the method for filling up internal nodes with zeroes to make sure they have size block-size is different than the padding algorithm used on content (see Section 2.4.1). We do not need the special marker byte (0x80
; see Section 2.1.3) as structures in internal nodes (reference-key pairs) have fixed sizes. When decoding we process reference-key pairs until we reach a reference-key pair that is all zeroes (see Section 2.5, Paragraph 3).¶
An example node with three reference-key pairs and filled with zeroes is illustrated in Figure 11.¶
If in a tree of encoded content the number of nodes at any level is not a multiple of the arity, then there will be internal nodes that are filled with zeroes. Because internal nodes are always filled with reference-key pairs to the left, trees are always left-aligned. Figure 12 illustrates such a left-aligned tree.¶
Given a read capability and access to blocks via a block storage the content can be decoded as shown in the procedure ERIS-Decode
:¶
When iterating over reference-key pairs in nodes we read 64 bytes from the node. If the 64 bytes are non-zero they are split into a reference-key pair and iterated over. If 64 bytes of zeroes are encountered the rest of the node MUST be checked to be all zeroes (see Section 2.4.3, Paragraph 5). If rest of node is all zeros there are no more reference-key pairs in the node and iteration ends. If a non-zero byte is encountered after reading 64 bytes of zeroes then an error must be thrown indicating an invalid internal node.¶
Implementations MUST verify the key appearing in the read capability if level of encoded content is larger than 0.¶
The procedures Dereference-Node
and Verify-Key
are provided in the next section.¶
Note that the pseudocode implementation is provided for illustration purposes. Implementers are RECOMMENDED to implement a more efficient streaming or random-access decoder (see Section 2.8.3 and Section 2.8.4).¶
A node can be dereferenced given a reference-key pair by first retrieving the block for the reference and decrypting the block using the key.¶
The block storage is represented by some procedure Block-Storage-Get
that takes as input a reference and outputs the corresponding block, such that Blake2b-256(block) = reference
.¶
The procedure Dereference-Node
returns a node given inputs a reference-key pair and the level of the node to dereference. A pseudocode implementation is provided:¶
Implementations MUST ensure that blocks have expected size and are valid (i.e. that Blake2b-256(block) = reference
).¶
The key to an internal node is computed using unkeyed Blake2b without the convergence secret (see Section 2.4.2). This allows the key itself to be verified by recomputing the Blake2b hash of the node.¶
Note that this check only works if the level of the encoded content is larger than one. If the level is 0 then the reference-key pair points to a leaf node for which the key can not be verified as it is computed by using the convergence secret (see Section 2.4.2).¶
The read capability consisting of the block-size, level of root reference-key pair as well as the root reference-key pair form the necessary pieces of information required to decode content.¶
We specify an binary encoding of the read capability to 66 bytes:¶
Byte offset | Content | Length (in bytes) |
---|---|---|
0 | block size (0x0a for block size 1KiB and 0x0f for block size 32KiB) |
1 |
1 | level of root reference-key pair as unsigned integer | 1 |
2 | root reference | 32 |
34 | root key | 32 |
The CBOR tag 276
is assigned for a ERIS binary read capability (see Section 6.1). This allows efficient references to ERIS encoded content from CBOR [RFC8949].¶
The GNU Name System [LSD0001] is a decentralized and censorship-resistant name system that can be used to resolve memorable names to secure identifiers. A GNU Name System record type for ERIS read capabilities is defined (see Section 7.1) allowing any ERIS encoded content to be associated with memorable names.¶
A read capability can be encoded as an URN [RFC8141] using the namespace identifier eris
and the unpadded Base32 [RFC4648] encoding of the read capability as namespace specific string, i.e:¶
urn:eris:BASE32-READ-CAPABILITY
¶
For example the ERIS URN of the UTF-8 encoded string "Hello world!" (with block size 1KiB and null convergence secret):¶
urn:urn:eris:BIAD77QDJMFAKZYH2DXBUZYAP3MXZ3DJZVFYQ5DFWC6T65WSFCU5S2IT4YZGJ7AC4SYQMP2DM2ANS2ZTCP3DJJIRV733CRAAHOSWIYZM3M
¶
Blocks are referenced by their Blake2b-256 hash (see Section 2.4, Paragraph 9). In certain circumstances it is useful to encode such a reference as an URN. Applications are RECOMMENDED to use the URN namespace identifier blake2b
and the unpadded Base32 [RFC4648] encoding of the reference as namespace specific string, i.e:¶
urn:blake2b:BASE32-REF
¶
For example the URN encoding of a reference to a block might look like this:¶
urn:blake2b:H77AGSYKAVTQPUHODJTQA7WZPTWGTTKLRB2GLMF5H53NEKFJ3FUQ
¶
This section contains notes on implementing ERIS.¶
For further inspiration there are implementations available in programming languages such as Guile Scheme, Nim, OCaml, Smalltalk, Python or Rust. See also the project website for an updated list of existing implementations.¶
Implementations SHOULD provide an interface that describes which version of the specification is implemented.¶
The pseudocode algorithms for encoding presented in Section 2.4 assumes that the entire content is available in memory when encoding. This is not always practical or feasible. For efficiency purposes and in order to handle content that is larger than memory it is necessary to encode content in smaller pieces and eagerly emit partial results while maintaining a small amount of state. We call this streaming encoding.¶
ERIS allows streaming encoding and it is RECOMMENDED to implement such an encoding procedure.¶
A streaming encoder can be implemented by maintaining a list of reference-key pairs that have not yet been collected into a node at each level. As soon as enough reference-key pairs are available they are combined into a node, encrypted to a block and emitted. Such an encoding procedure needs to maintain an amount of state proportional to the level of tree. The level of the tree is bound by the logarithm of the content size (with base arity of the tree).¶
For an example, see the reference Guile implementation.¶
A streaming decoder can be implemented by eagerly emitting decoded leaf nodes instead of collecting them to a final output as done in the Eris-Decode
(see Figure 13). The amount of state that needs to be maintained is proportional to the level of the tree the content is encoded to.¶
It is possible to efficiently decode a fixed number of bytes from some given offset without decoding the entire content or de-referencing all blocks that the content is encoded to. This can be done by selectively descending down the tree of the encoded content to the leaf nodes that hold the requested bytes. A decoder that is capable of decoding small pieces of some encoded content efficiently and furthermore can seek to given positions for further reading is called a random-access decoder.¶
Applications of random-access decoders include seeking in encoded media files, reading individual files from an encoded file-system (see ERIS-FS) or identifying file-types based on magic bytes at fixed positions.¶
Implementers should consider implementing a decoder on which following procedures are defined:¶
Position()
Seek(p)
p
.¶
Read(n)
n
bytes of the encoded content from current position.¶
Length()
Implementing a random-access decoder seems to be the most challenging aspect of implementing ERIS. Implementers are encouraged to use extensive testing (e.g. property-based testing). Please don't hesitate to get in touch with the maintainers if you are implementing a random-access decoder (see Section 5). We are very curious to see your implementation!¶
The Guile Scheme and OCaml implementations have a random-access decoder.¶
The procedure Block-Storage-Get
as used in Section 2.5 may fetch blocks from various transport and storage layers (see Section 3.1). Implementors of ERIS libraries SHOULD provide abstractions or interfaces that allow users to use their own custom transport and storage layers.¶
Implementations SHOULD also consider cacheing block de-references or allowing users to define suitable caching mechanisms. This is especially relevant for random-access decoders.¶
Three types of test vectors are provided that can be used to ensure correct implementation of the encoding: positive (see Section 2.8.6.1), negative (see Section 2.8.6.2) and large content (see Section 2.8.6.3).¶
Implementations MUST satisfy the positive and negative test vectors.¶
Implementations are RECOMMENDED to also satisfy the large content test vectors. However this requires implementing a streaming encoder (see Section 2.8.2) which might not be necessary or desirable for certain applications (e.g. constrained environments where size of content is always very small).¶
The positive and negative test vectors are provided as machine-readable JSON files in the archive eris-test-vectors-v1.0.0.tar.gz.¶
Following JSON field appear in both positive and negative test vectors:¶
id
type
positive
or negative
.¶
spec-version
name
description
read-capability
urn
blocks
Further fields are used in the positive test vectors.¶
The positive test vectors can be used to ensure that implementations correctly encode content to a given read capability and can decode the same content from given blocks.¶
Following additional fields are used in the JSON files for positive test vectors:¶
content
convergence-secret
block-size
The test vector eris-test-vector-positive-00.json
is shown as example:¶
Implementations MUST for all positive test vectors:¶
Negative test vectors are provided to help implementations ensure that they correctly verify content while decoding.¶
For all negative test vectors, implementations should attempt to decode content given the URN and blocks. The test passes if decoding fails. The reason for failure is described in the description
field. Implementations MUST pass all negative test vectors.¶
The test vector eris-test-vector-negative-13.json
is shown as example:¶
In order to verify implementations that encode content by streaming (see Section 2.8.2) URNs of large contents that are generated in a specified way are provided:¶
Test name | Content size | Block size | Level of root reference | URN |
---|---|---|---|---|
100MiB (block size 1KiB) | 100MiB | 1KiB | 5 |
urn:eris:BIC6F5EKY2PMXS2VNOKPD3AJGKTQBD3EXSCSLZIENXAXBM7PCTH2TCMF5OKJWAN36N4DFO6JPFZBR3MS7ECOGDYDERIJJ4N5KAQSZS67YY
|
1GiB (block size 32KiB) | 1GiB | 32KiB | 2 |
urn:eris:B4BL4DKSEOPGMYS2CU2OFNYCH4BGQT774GXKGURLFO5FDXAQQPJGJ35AZR3PEK6CVCV74FVTAXHRSWLUUNYYA46ZPOPDOV2M5NVLBETWVI
|
256GiB (block size 32KiB) | 256GiB | 32KiB | 3 |
urn:eris:B4B5DNZVGU4QDCN7TAYWQZE5IJ6ESAOESEVYB5PPWFWHE252OY4X5XXJMNL4JMMFMO5LNITC7OGCLU4IOSZ7G6SA5F2VTZG2GZ5UCYFD5E
|
Content is the ChaCha20 stream using a null nonce and the key which is the Blake2b hash of the UTF-8 encoded test name (e.g. key := Blake2b-256("100MiB (block size 1KiB)")
). The ChaCha20 stream can be computed by encoding a null byte sequence (e.g. chacha20-stream := ChaCha20(null_byte_stream, KEY)
).¶
Traditionally encoding schemes similar to ERIS are used in peer-to-peer file-sharing applications (e.g. BitTorrent, GNUNet file-sharing, FreeNet). Given a performant block tranport layer, ERIS can be used for efficient file-sharing. However, we hope to motivate usage for a much wider scope of applications.¶
As part of the openEngiadina project we are using ERIS to encode small bits of information that constitute "local knowledge" (e.g. geographic information, social and cultural events, etc.) along with the social interactions that created and curated this information (using the ActivityStreams vocabulary). ERIS allows such information to be securely cached on multiple peers to increase the robustness of the system. The fact that ERIS encoded content can be referenced by an URN allows it to be embeded into existing data structures and protocols. In particular, ERIS encoded content can be referenced from RDF and RDF can be made content-addressable with ERIS (see ERIS and RDF).¶
ERIS can be used to share content in small communal and friend-to-friend networks. For this the ability to use convergence secrets is very useful for increased security (see Section 4.6). This use-case has been further research as part of the DREAM project.¶
ERIS can also be used to create larger content-delivery networks. In particular, we are working towards making software sources and pre-built packages for Guix more robustly available and peer-to-peer shareable.¶
ERIS identifiers have also been embedded in ELF binaries to reference shared libraries (see Sigil OS).¶
ERIS is defined independent of any storage and transport layer for blocks. The only requirement is that blocks can be accessed by their reference - the hash of the block content.¶
Possible storage layers include:¶
Transport mechanisms include:¶
More interesting transport and storage layers use the fact that blocks are content-addressed. For example the peer-to-peer network IPFS can be used to store and transport blocks. The major advantages over using IPFS directly are that blocks are encrypted and not readable to IPFS peers without the read capability and that identifier of blocks and encoded content are not tied to the IPFS network. Applications can transparently use IPFS or any other storage/transport mechanism.¶
Other protocols we are investigating for usage as ERIS transport layer include Named Data Networking, BitTorrent and GNUNet.¶
We are also researching transport protocols based on CoAP [RFC7252].¶
While decoding ERIS encoded content the integrity of the content is verified. Content can not be tampered with without changing the identifier (read capability) of the content. To prove authenticity of encoded content it is sufficient to sign the read capability.¶
We have presented a concrete proposal on how this might be done using a RDF vocabulary and the Ed25519 cryptographic signature scheme RDF-Signify.¶
Encoded content is immutable in the sense that changing the encoded content results in a new identifier. Existing references to the old content need to be updated. This is a property that allows robust availabilty of content.¶
Nevertheless, there are applications where one wants to reference mutable content. Examples include user profiles or dynamic collections of content. Making small changes to a user profile or adding a piece of content to a collection should preserve the identifiers.¶
There are many ways of implementing such mutability or namespaces. ERIS does not specify any particular mechanism. Possible mechanisms include:¶
We believe that the best suited mechanism for handling mutability depends on concrete applications and use-cases. A key value of ERIS is that it is agnostic of such mechanisms and can be used from any of them.¶
In this section we discuss security considerations when using ERIS as an abstract encoding as well as when used in conjunction with storage and transport layers (see Section 3.1).¶
We use terms for communication security as defined in RFC 3552 [RFC3552] (e.g. CONFIDENTIALITY or PEER ENTITY AUTHENTICATION).¶
We consider a setting with following entities:¶
See also the ECRS paper [ECRS] and the theoretical treatment of censorship resistance by Perng et al. [Perng2005].¶
The publisher can make the blocks of the encoded content available by replicating them to intermediary peers and audience directly over various different storage and transport layers (see Section 3.1).¶
Members of the audience can verify the integrity of the content while decoding by verifying the Blake2b hash of a block (see Section 2.5). Even when a malicious intermediary peer is distributing invalid blocks, this will be detected by the internal decoding process run by the audience.¶
Intermediary peers can pro-actively detect invalid blocks by checking the Blake2b hash of the block.¶
Note that the ERIS read capability can not be used to verify integrity of content when the content is given directly and not decoded from blocks. When the content is given in as a sequence of bytes, the only way to compute the read capability is to encode the content. However, the read capability to verify might have been computed using a convergence secret (see Section 2.3) that is not known, making it impossible to verify that the content corresponds to the read capability.¶
Intermediary peers do not need to have access to the read capability in order to store and transport blocks of encoded content. As the blocks only contain encrypted data, intermediary peers can plausibly deny being able to decode the content.¶
Note that in certain situations an active attack that can reveal parts of the encoded content is possible (see Section 4.6).¶
We define censorship resistance as the inability of a censor who does not have access to the read capability of some content to prevent members of the audience from decoding the specific content without preventing access from all content (i.e. DENIAL OF SERVICE).¶
This holds as a censor that does not have access to the read capability of some content can not decide if a given block is required to decode the specific content or is a block from the encoding of some other content.¶
Note that a censor can prevent the audience from decoding any content by dropping all communication to intermediary peers - the censor can perform a DENIAL OF SERVICE attack. Practically a DENIAL OF SERVICE attack can be made difficult by replicating blocks and using various different storage and transport layers (see Section 4.2).¶
Our definition of censorship resistance is slightly stronger than that of ECRS [ECRS]. In ECRS the censor may not know the exact content, whereas in ERIS the censor may not know the read capability. If the censor does know the exact content that should be censored than we can use a fresh convergence secret to create a read capability that the censor does not know (see Section 2.3).¶
Convergent encryption allows de-duplication of content and deterministic identifiers. However, it also suffers from two known attacks [Zooko2008]:¶
ERIS is vulnerable to both when using a convergent secret that is known to the adversary. A defense against both attacks is to use a convergence secret that is not known by the adversary (see Section 2.3). Using a different convergence secret causes the same content to be encoded into different blocks and identifiers.¶
De-duplication and deterministic identifiers are both properties that may be important to certain applications and users. Users should be aware of the known attacks and must decide depending on application and context on whether mitigations are necessary.¶
A passive adversary that only observes communication between audience and intermediary peers might be able to learn information about encoded content from the pattern in which blocks are accessed by members of the audience. For example an adversary might be able to infer that certain blocks are part of the encoding of a video with some resolution by observing that blocks are fetched with predictable intervals and in a fixed order.¶
A passive adversary may also log block transfers between audience and intermediary peers for extended periods of time. If a read capability is leaked to the adversary later, they will know who has accessed blocks for decoding content with the leaked read capability in the past.¶
Storage and transport layers SHOULD use encryption to prevent passive network attackers from being able to observe such patterns.¶
Members of the audience MAY use obfuscation tactics when getting blocks from storage and transport layers to prevent malicious intermediary peers from being able to observe such patterns.¶
A mailing list for general discussion on ERIS is available at ~pukkamustard/eris@lists.sr.ht. Ephemeral discussions take place in the #eris channel on the Libera IRC network. See also the project page for more information.¶
Please feel free to direct any questions or comments regarding the specification to the mailing list. You are also invited to share your implementations and use-cases.¶
Urgent and sensitive security issues may be addressed directly to the ERIS maintainers.¶
GANA [GANA] is requested to add an entry into the "GNU Name System record types" registry as follows:¶
Number | Name | Comment | References |
---|---|---|---|
65557 | ERIS_READ_CAPABILITY | Encoding for Robust Immutable Storage (ERIS) binary read capability | http://purl.org/eris |
Initial development of ERIS was done as part of the openEngiadina project and was supported by the NLnet Foundation trough the NGI0 Discovery Fund. Further development is being supported by the NLnet Foundation trough NGI Assure.¶
Our friend and fellow developer rustra is imprisoned as a victim of political repression in Belarus. Read his last words in court and an interview with him. Consider donating to the Anarchist Black Cross Belarus. Support victims of repression and resist any form of repression and oppression. Resistance is not futile.¶
erisx3
¶
Major update of encoding that removes the verification capability - ability to verify integrity of content without reading content.¶
Initial version.¶