filepack is a command-line file hashing and verification utility written in
Rust.
It is an alternative to .sfv files and tools like shasum. Files are hashed
using BLAKE3, a fast, cryptographic
hash function.
A manifest named filepack.json containing the hashes of files in a directory
can be created with:
filepack create path/to/directoryWhich will write the manifest to path/to/directory/filepack.json.
Files can later be verified with:
filepack verify path/to/directoryTo protect against accidental or malicious corruption, as long as the manifest has not been tampered with.
If you run filepack a lot, you might want to alias fp=filepack.
filepack is currently unstable: the interface and file format may change at
any time. Additionally, the code has not been extensively reviewed and should
be considered experimental.
filepack is written in Rust and can be built
from source and installed from a checked-out copy of this repo with:
cargo install --path .Or from crates.io with:
cargo install filepackSee rustup.rs for installation instructions for Rust.
Pre-built binaries for Linux, MacOS, and Windows can be found on the releases page.
You can use the following command on Linux, MacOS, or Windows to download the
latest release, just replace DEST with the directory where you'd like to put
filepack:
curl --proto '=https' --tlsv1.2 -sSf https://filepack.com/install.sh | bash -s -- --to DESTFor example, to install filepack to ~/bin:
# create ~/bin
mkdir -p ~/bin
# download and extract filepack to ~/bin/filepack
curl --proto '=https' --tlsv1.2 -sSf https://filepack.com/install.sh | bash -s -- --to ~/bin
# add `~/bin` to the paths that your shell searches for executables
# this line should be added to your shell's initialization file,
# e.g. `~/.bashrc` or `~/.zshrc`
export PATH="$PATH:$HOME/bin"
# filepack should now be executable
filepack --helpNote that install.sh may fail on GitHub Actions or in other environments
where many machines share IP addresses. install.sh calls GitHub APIs in order
to determine the latest version of filepack to install, and those API calls
are rate-limited on a per-IP basis. To make install.sh more reliable in such
circumstances, pass a specific tag to install with --tag.
Filepack supports a number of subcommands, including filepack create to
create a manifest, and filepack verify to verify a manifest.
See filepack help for supported subcommands and filepack help SUBCOMMAND
for information about a particular subcommand.
Create a manifest.
Recommended lints can be enabled with:
filepack create --deny distributionVerify the contents of a directory against a manifest.
To verify the contents of DIR against DIR/filepack.json:
filepack verify DIRIf the current directory contains filepack.json, DIR can be omitted:
filepack verifyfilepack verify takes an optional --print flag, which prints the manifest
to standard output if verification succeeds. This can be used in a pipeline to
ensure that you the manifest has been verified before proceeding:
filepack verify --print | jqFilepack stores local configuration in the filepack subdirectory of the
platform data directory.
The location of the platform data directory is platform dependent, with the
$XDG_DATA_DIR being respected on all platforms.
| Platform | Value |
|---|---|
| All | $XDG_DATA_DIR |
| Linux | $HOME/.local/share |
| macOS | $HOME/Library/Application Support |
| Windows | {FOLDERID_LocalAppData} |
Filepack stores keys in the keychain subdirectory of the filepack data
directory. So if the platform data directory is ~/.local/share, the
keychain directory is ~/.local/share/filepack/keychain.
The location of the filepack data directory used by a command can be overridden
with the --data-dir option.
filepack manifests are conventionally named filepack.json and are placed
alongside the files they reference.
Manifests are UTF-8-encoded JSON.
Manifests contain an object with two mandatory keys, files and notes.
The value of the mandatory files key is an object mapping path components to
directory entries. Directory entries may be subdirectories or files. Files are
objects with keys hash, the hex-encoded BLAKE3 hash of the file, and size,
the length of the file in bytes.
As a consequence of the manifest being UTF-8, all path components must be valid Unicode.
Path components may not be . or .., contain the path separators / or \,
contain NUL, be longer than 255 bytes, or begin with a Windows drive prefix,
such as C:.
The value of the mandatory signatures key is an array of signatures.
Signatures are bech32 strings that include an Ed25519 key, the package
fingerprint the signature is made over, an optional timestamp, and the
signature itself.
Public keys are Curve25519 points and signatures are Ed25519 signatures made
over the root of a Merkle tree which commits to the content of files via the
package fingerprint.
An manifest over a directory containing the files README.md and src/main.c,
signed by the public key
public1a67dndhhmae7p6fsfnj0z37zf78cde6mwqgtms0y87h8ldlvvflyqcxnd63:
{
"files": {
"README.md": {
"hash": "fc253b84551ce6b00e820a826ac18054dc7f63a318ce62f3175315f5c467a62a",
"size": 11883
},
"src": {
"main.rs": {
"hash": "1fa48b95ed335369d45b91af8138bdccd1413364bcdbfa6e9034e8a2cfd6e17f",
"size": 33
}
}
},
"signatures": ["…"]
}The signature is elided for brevity. Signatures are bech32m-encoded strings containing both a public key and an Ed25519 signature.
Public keys, private keys, signatures, and package fingerprints are all
bech32m-encoded
strings beginning with public1…, private1…, signature1…, and package1…
respectively.
BLAKE3 file hashes are 64-character lowercase hexadecimal.
Filepack packages may contain a file named metadata.yaml describing the
package and its content.
filepack create loads metadata.yaml if present and checks for validity and
unknown fields.
filepack verify also loads metadata.yaml if present and checks for
validity. Unknown fields, however, are not an error, so that future versions of
filepack may define new metadata fields in a backwards-compatible fashion.
Filepack metadata is intended to a broadly useful machine and human readable description of the contents of a package, covering personal, distribution, and archival use-cases.
Metadata follows a fixed schema and is not user-extensible. Future versions of
filepack may define new metadata fields, causing verification errors if those
fields are present and invalid according to the new schema.
Please feel free to open an issue with ideas for new metadata fields.
Fields are given as NAME: TYPE.
Mandatory fields:
title: component: The content's human-readable title.
Optional fields:
-
artwork: component.png: The filename of an PNG file containing artwork for the content, for example, cover art for an album or key art for a movie. -
creator: component: The person or group who created the content. -
date: date: The date the content was created or released. -
description: markdown: A description of the content. -
homepage: url: Primary URL for the content. Should be the official homepage of the content, if any, and not, for example, a Wikipedia or media database link. -
language: language: The primary language of the content. -
package: object: The package metadata. -
readme: component.md: The filename of the content readme.
Optional package field describing the package itself, as opposed its content:
-
creator: component: The person or group who created the package. -
creator-tag: tag: The tag of the person or group who created the package. -
date: date: The date the package was created. -
description: markdown: A description of the package. -
homepage: url: Primary URL for the package. -
nfo: component.nfo: The filename of the package nfo file.
Types:
-
component: A string with the same restrictions as path components in the manifestfilesobject, allowing them to be used as unix filesystem paths. Note that Windows imposes additional restrictions which are not enforced, so components may not be valid paths on Windows. -
component.EXTENSION: A component that must end with.EXTENSION. -
date: A string containing a date in one of several formats: as a year only, when the date and time is unknown, a date only, when the time is unknown, or a date and time with a mandatory time zone. -
language: A string containing an ISO 639-1 two-character language code. Seefilepack languagesfor valid language codes. -
markdown: A string containing CommonMark markdown. -
tag: A string containing a tag, commonly an abbreviation of a release group name. Must match the regular expression[0-9A-Z]+(\.[0-9A-Z]+)*. -
url: A string containing a URL.
Example dates:
1970
1970-01-01
1970-01-01T00:00:00Z
1970-01-01 00:00:00Z
1970-01-01T00:00:00+00:00
1970-01-01 00:00:00 +00:00title: Tobin's Spirit Guide
creator: John Horace Tobin
artwork: cover.png
date: 1929
description: A compilation of supernatural occurrences, entities, and facts.
homepage: https://tobin-society.org/spirit-guide
language: en
readme: README.md
package:
creator: Egon Spengler
creator-tag: ES
date: 1984-07-08 19:32:00 -04:00
description: >
First edition on loan from NYPL Main Branch research stacks. Captured via
Microtek MS-300A flatbed scanner.
homepage: https://ghost-busters.net/~egon
nfo: tobins.nfoThe homepage URLs are of course anachronistic, as the World Wide Web was
created in 1989, some years after Egon first packaged Tobin's Spirit Guide.
filepack create supports optional lints that can be enabled by group:
filepack create --deny distributionThe distribution lint group checks for issues which can cause problems if the
package is intended for distribution, such as non-portable paths that are
illegal on Windows, paths which would conflict on case-insensitive file
systems, and inclusion of junk files such as .DS_Store.
Lint group names and the lints they cover can be printed with:
filepack lintsfilepack supports the generation of
Curve25519 public/private keypairs,
and the creation and verification of
EdDSA signatures over manifests.
Keypairs are generated with:
filepack keygenWhich creates master.public and master.private files in the keychain
subdirectory of the filepack data directory.
Generated public keys can be printed with:
filepack keySignatures are created with:
filepack signWhich signs the manifest in the current directory with your master key and adds
the signature to the manifest's signatures array. Signatures are made over a
fingerprint hash, recursively calculated from the contents of the manifest.
Signatures embedded in a manifest are verified whenever a manifest is verified. The presence of a signature by a particular public key can be asserted with:
filepack verify --key PUBLIC_KEYWhich will fail if a valid signature for PUBLIC_KEY over the manifest
contents is not present.
Filepack signatures are made over the package fingerprint, which is the root of a Merkle tree of the files and directories contained in the manifest.
Fingerprints are BLAKE3 hashes, constructed such that it is impossible to produce objects which are different, either in type or content, but which have the same fingerprint.
Fingerprints may be used as a globally unique identifier. If two packages have the same fingerprint, they have the same content.
For details on how fingerprints are calculated, see DESIGN.md.
The [DIRECTORY] argument in examples can be omitted, in which case it
defaults to the current directory.
To create a filepack manifest:
filepack create [DIRECTORY]This creates filepack.json containing hashes and file sizes of all files in
the DIRECTORY and subdirectories.
To enable linting, use the --deny flag with a lint or lint group:
filepack create --deny <LINT> [DIRECTORY]To view all lints:
filepack lintsPlease feel free to open issues requesting new lints.
To create a package with metadata describing the package, create a file named
metadata.yaml in the package directory, for example:
title: The Necronomicon
creator: Abdul Alhazred
description: >
The Old Ones, their history, and the rites by which they may be summoned.
language: laThen, to create the package:
filepack create [DIRECTORY]Which will validate metadata if present.
See metadata for the full schema.
To sign a package when creating it, first create a new public and private key pair:
filepack keygenThis creates a new public key, master.public, with corresponding private key,
master.private, in the filepack keychain directory.
The filepack info command will print the path to the filepack data directory,
keychain directory, and any public keys in the keychain directory.
Your public key can be printed with:
filepack keyTo sign a new package:
filepack create --sign [DIRECTORY]To add a signature to an existing package:
filepack signTo print the fingerprint of a package:
filepack fingerprint [DIRECTORY]Manifests contain only hashes and not file content, and so can be published anywhere, including places where technical or legal limitations would preclude publication of the content itself.
Package content can be verified with:
filepack verify [DIRECTORY]Any extra files, missing files, or modified files will produce an error.
This verifies files against hashes stored in the manifest.
Running filepack verify on its own can detect accidental corruption, but not
intentional modification, since an attacker could modify package content, and
then modify the manifest to make it match the modified content and thus pass
verification.
However, if the manifest was obtained from a trusted source, verification will catch any modifications, intentional or otherwise, even if the package content was obtained from an untrusted source.
To detect intentional modification, you can either verify a package against a known-good fingerprint:
filepack verify --fingerprint <FINGERPRINT> [DIRECTORY]This verifies both the contents of the package and that the manifest has the expected fingerprint. The former protects against accidental corruption, and the latter against intentional modification.
Packages can also be verified by checking for a signature by a known public key. This also protects against intentional modification, as long as the private key corresponding to the public key is protected, and only known to you, or someone you trust.
To verify that a package has a signature from a particular public key:
filepack verify --key <PUBLIC_KEY> [DIRECTORY]The --key option can be repeated to require signatures from multiple public
keys.
Create a filepack manifest with:
filepack create <PACKAGE>This will create <PACKAGE>/filepack.json
To later verify the package against the manifest:
filepack verify <PACKAGE>Because the manifest contains cryptographic hashes, accidental corruption to
the files or manifest will always be detected by filepack verify.
This is not the case with intentional, malicious corruption, since an attacker could modify the files and replace the manifest hashes with the hashes of the modified files.
Because an attacker could modify the files and replace the manifest hashes with the hashes of the modified files, you must ensure that the manifest has not been tampered with.
This can be accomplished in a number of ways, either by saving the manifest to a secure location, saving the package fingerprint, or signing the package.
To save the manifest in a secure location, use the --manifest option to save
the manifest somewhere other than the package:
filepack create <PACKAGE> --manifest <MANIFEST>Then, verify the package against the saved manifest:
filepack verify <PACKAGE> --manifest <MANIFEST>Because the manifest was protected, any modification to the package will be detected. This has the advantage that not only will any modifications be detected, but which files were modified can also be detected.
Create the manifest in the package root with:
filepack create <PACKAGE>Print the package fingerprint:
filepack fingerprint <PACKAGE>Save the fingerprint in a secure location.
Then, verify the package against the saved fingerprint:
filepack verify <PACKAGE> --fingerprint <FINGERPRINT>Because the fingerprint was protected, any modification to the package will be detected. This has the advantage that you only have to save a small text string, but the disadvantage that while any modifications will be detected, you will not be able to determine which files have changed.
Create the manifest in the package root and sign it with your master key:
filepack create <PACKAGE> --signThen, verify the package and its signature:
filepack verify <PACKAGE> --key masterAny modification to the package or manifest will invalidate the signature, which will be detected. This has the advantage of not needing to save the manifest or fingerprint of packages you want to verify. However, you will need to generate and secure your private key.
To check the authenticity of a package created by someone else, get their public key and verify that the package contains a signature by that key:
filepack verify <PACKAGE> --key <KEY>filepack serves the same purpose as programs like shasum, which hash files
and output a text file containing file hashes and paths, which can later be
used with the same program to verify that the files have not changed.
They output hashes and paths one per line, separated by whitespace, and mainly differ in which hash function they use.
Some examples, with links to implementations and the hash functions they use:
| binary | hash function |
|---|---|
b2sum |
BLAKE2 |
b3sum |
BLAKE3 |
cksfv |
CRC-32 |
hashdeep |
various |
hashdir |
various |
sha3sum |
SHA-3 |
shasum |
SHA-1 and SHA-2 |
CRC-32 is not a cryptographic hash function and cannot be used to detect intentional modifications. Similarly, SHA-1 was thought to be a cryptographic hash function, but is now known to be insecure.
filepack and b3sum both use BLAKE3, a fast, general-purpose cryptographic
hash function.
filepack can also create and verify signatures. Other signing and
verification utilities include:
| binary | about |
|---|---|
gpg |
general-purpose, OpenPGP implementation |
ssh-keygen |
general-purpose, shipped with OpenSSH |
minisign |
general-purpose |
signifiy |
general-purpose |
SignTool |
Windows code signing |
codesign |
macOS code signing |
jarsigner |
JDK code signing |
If you package content for distribution, Filepack offers a number of benefits
over simple file verification with .sfv files.
-
Filepack can detect both accidental corruption and intentional modification, whereas
.sfvfiles can only detect accidental corruption. -
Because
filepack.jsonmanifests contain file sizes, Filepack can tell the user not just whether a file has been modified, but also whether it is empty, truncated, or too long. -
Filepack packages have a fingerprint, a short text string beginning with
package1…which is guaranteed to be globally unique, allowing packages to be identified and referenced by fingerprint alone. -
Fingerprints can be used to verify that a
filepack.jsonmanifest itself has not been tampered with, proving authenticity of a package regardless of its source. -
Packages can be signed, allowing users to verify authenticity of any package from a packager by public key, a short string beginning with
public1…. -
Filepack can warn you if package filenames might cause issues on other operating systems or file systems.
-
Filepack packages are Merkle trees, both of files and within files. A user with a manifest, package fingerprint, or trusted public key can incrementally stream and verify files, or access and verify file content at random. Additionally, errors can be detected and recovered from by transmitting only those parts of the file which are corrupted.
-
Packages may contain machine readable metadata following a filepack-defined schema, allowing packages to be searched, indexed, and exposed through rich, featureful interfaces.
Metadata is the difference between a good user experience and a bad user experience.
On Netflix, movies are presented in an attractive fashion with artwork, titles, actors, and other information. All content is playable on whatever device you happen to be using, streaming starts instantly, search is useful, and a recommendation engine surfaces movies you might like. It is usable by anyone, of any age and with any degree of technical proficiency.
On BitTorrent, movies are presented inscrutably in text listings with weirdly formatted titles, spotty or missing information, and no artwork. Content may or may not be playable on your device, streaming is impossible, search is unstructured, and recommendations are unavailable. It is only usable by a small and decreasing percentage of the population, and asking someone over to "BitTorrent and chill" will be met by blinking incomprehension, followed by derision as you fiddle with your torrent client like a digital steam engine operator, desperately trying to find seeds for the movie you want to watch.
The difference between the two systems is not due to a difference in underlying data, every piece of content one could possibly imagine is available on BitTorrent. It is a difference in the availability of metadata.
Netflix has a database of standardized, machine-readable metadata, supporting nearly every feature of the system. BitTorrent has a disparate collection of folders of files, with inconsistent and incomplete metadata, and the user experience and features of the system are a direct consequence.
As polished services driven by metadata have become more popular, the user experience that systems based on folders of files can offer has become ever more unacceptable and alien, regardless of cost or underlying availability of content.
Filepack seeks to rectify this by standardizing machine-readable metadata that can be included with folder-of-files content.
Filepack metadata is stored a single file, metadata.yaml, in the root of a
filepack package, and contains information related to what the package
contains, which files in the package contain the package content and their file
formats, and who created the package. This metadata can serve as a base for the
creation of rich local and distributed applications and services, with user
experiences that compete with and exceed those of closed centralized
alternatives.
Ideas for future features include:
-
Deterministic multi-part RAR archive packing and unpacking
-
.nfofile generation from templates and metadata -
Additional lints for complex naming conventions
-
Content-type specific metadata
-
Packages of packages
-
Chain packages where each package points to the previous package by the same packager
-
Semantic signatures for package revocation, invalidation, or replacement
A filepack manifest contains all information needed to verify the contents of a
directory. The files key of the manifest is a directory object mapping
filenames to directory entries, which may themselves be directories, or files,
in which case they contain the hash of the file contents, as well as the length
of the file.
The length of the file is not strictly necessary for verification, but is included so that truncated, empty, and overlong files can be identified, which may help in understanding verification failures.
The contents of files are hashed with BLAKE3 using the official Rust implementation. BLAKE3 was chosen both for its speed, and for the fact that it utilizes a Merkle tree construction. A Merkle tree allows for verified file streaming and subrange inclusion proofs, which both seem useful in the context of file hashing and verification.
Filepack allows for the creation of Ed25519 signatures over the contents of a manifest, which thus commit to the contents of the directory covered by the manifest. Signatures are made not over serialized manifest, but over a message containing a "fingerprint" hash, a Merkle tree hash created from the contents of the manifest. This keeps signatures independent of the manifest format, avoids issues with canonicalization of the manifest JSON, avoids hash loops due to the inclusion of signatures in the manifest itself, and allows proving the inclusion of files covered by a signature using a Merkle receipt.
Although only package fingerprints are exposed externally, several types of fingerprints are used internally, namely directory, entry, file, and message fingerprints.
Fingerprints are constructed to be unique, both between and within types, meaning that it is impossible two different values with different types or contents but which have the same fingerprint.
Fingerprints are BLAKE3 hashes. To guarantee that fingerprints are unique between types, the hasher is first initialized with a length-prefixed string unique to each type.
After the prefix, the value is hashed as a sequence of TLV fields.
Fields are hashed in order, but may be skipped, in the case of optional fields, or repeated, in the case of fields containing multiple values.
Currently, no fingerprint test vectors exist, and the best documentation is the code itself.
In particular, see: