ERIS-FS: A format for encoding file systems

Introduction

ERIS defines how a sequence of bytes can be encoded and made available robustly. Many types of content can be serialized to a sequence of bytes that can be then encoded with ERIS. One type of content that deserves special attention are collections of files organized in a tree - file systems. ERIS-FS is a format of encoding file systems that is optimized for usage with the ERIS encoding. In particular, files that appear multiple times within a file system and also within multiple file systems can be de-duplicated using ERIS-FS. This is important for applications such as software package management (e.g. Guix or Nix) where two versions of a package might share many common files.

A file system encoded with ERIS-FS can be decoded to a directory containing the tree of file system objects or be directly mounted.

ERIS-FS is similar to formats such as SquashFS, Tar or EROFS.

File System Objects

Operating systems have different kinds of objects with a lot of metadata that live in a file system. ERIS-FS only encodes minimal information that is required for simple file sharing and software packages:

This is directly inspired by file system objects encoded as handled by Nix or Guix (see Section 5.2.1 of Eelco Dolstra thesis).

Deterministic Encoding

The same tree of file system objects is encoded to the same image by specifying a strict ordering of files. Metadata structures are encoded using deterministically encoded CBOR as defined in section 4.2.1 of RFC 8949.

Limitations

ERIS-FS Index

ERIS-FS is a CBOR structure that holds the structure of the file system. Content of files is referenced with an ERIS read capability.

The ERIS-FS Index is described using CDDL RFC8610:


eris-fs-index = #6.1701996916 [
    version = 1,
    { * path => entry }
  ]

path = [ * tstr ]
  
entry = file-entry / executable-entry / symlink-entry

file-entry = [ 
  type = 0,
  content = read-capability
]

executable-entry = [ 
  type = 1,
  content = read-capability
]

read-capability = #6.276 ( bstr )

symlink-entry = [
  type = 2,
  target: tstr
]

The index is tagged with the CBOR tag 1701996916 according to RFC 9277.

Block size

Files should be encoded using block size as recommended in the ERIS specification (see ERIS specification section 2.2.1). That is files smaller than 16KiB should use block size 1KiB and larger files should use block size 32KiB.

The index should also be encoded using the same block size recommendation - block size 1KiB if the index is smaller than 16KiB and block size 32KiB else.

MIME type

The ERIS-FS MIME type is application/x-eris-fs+cbor.

This can be used to identify ERIS-FS file systems in applications or protocols.

Implementation Notes

Detecting ERIS-FS

An ERIS-FS file system can be detected by checking if the first five bytes are:

0xDA65726974

See also RFC 9277.

Parallelization

Creating and decoding ERIS-FS file systems can be parallelized by encoding/decoding files in parallel.

Symbolic Links

ERIS-FS can hold symbolic links that point outside of the encoded file system. This seems to be necessary for properly supporting Nix/Guix substitutes.

Symbolic links can be dangerous and implementations SHOULD issue a warning when decoding symbolic links that point outside of the encoded file system.

Deprecated Version

Version 0

Version 0 of the ERIS-FS format encoded all files into a continuous sequence of bytes that so that file boundaries are aligned to ERIS block boundaries. This version of the encoding is considered to be deprecated and SHOULD NOT be used.

The specification of the old version is available in the EER Git repository.

Implementations

IANA Considerations

CBOR Tags Registry

This specification requires the assignment of a CBOR tag:

The tags is added to the CBOR Tags Registry as defined in RFC 8949.