Standardized Shamir Secret Sharing Scheme for BIP-39 Mnemonics

Simple Summary

A standardized algorithm for applying Shamir’s Secret Sharing Scheme to BIP-39 mnemonics.

Abstract

A standardized approach to splitting a BIP-39 mnemonic into N BIP-39 mnemonics, called shares, so that T shares are required to recover the original mnemonic and no information about the original mnemonic, other than its size, is leaked with less than T shares.

Motivation

We’d like to make it easier for less-technical users to store keys securely.

Currently, many users use BIP-39 mnemonics to store entropy values underlying their keys. These mnemonics are a single point of failure. If lost, the user may never regain access to the assets locked by the keys. If stolen, a malicious actor can steal the assets.

Shamir’s Secret Sharing Scheme addresses this concern directly. It creates “shares” of the secret, such that a subset can be used to recover the secret, but only if a minimum threshold of shares is reached. Without the minimum, no information about the original secret is leaked.

One concern with Shamir’s Secret Sharing Scheme is there is no canonical, standard implementation. This puts recovery at risk, as tooling may change over time.

Here, we propose a standardized implementation of Shamir’s Secret Sharing Scheme applied specifically to BIP-39 mnemonics, so users can easily create shares of their mnemonic, destroy the original, store the shares appropriately, and confidently recover the original mnemonic at a later date.

Specification

Shamir’s Secret Sharing Scheme is a cryptographic method to split a secret into N unique parts, where any T of them are required to reconstruct the secret.

First, a polynomial f of degree T − 1 is constructed. Then, each share is a point on the polynomial’s curve: an integer x, and its corresponding y point f(x).

With any set of T shares (or points), the initial polynomial can be recovered using polynomial interpolation.

When constructing the initial polynomial, the secret is stored as the coefficient of x⁰ and the rest of the coefficients are randomly generated.

BIP-39 Mnemonics

BIP-39 is a common standard for storing entropy as a list of words. It is easier to work with for human interactions than raw binary or hexadecimal representations of entropy.

BIP-39 mnemonics encode two pieces of data: the original entropy and a checksum of that entropy. The checksum allows the mnemonic to be validated, ensuring that the user entered it correctly.

Generating the Mnemonic

The mnemonic must encode entropy in a multiple of 32 bits. With more entropy security is improved but the sentence length increases. We refer to the initial entropy length as ENT. The allowed size of ENT is 128-256 bits.

First, an initial entropy of ENT bits is generated. A checksum is generated by taking the first ENT / 32 bits of its SHA256 hash. This checksum is appended to the end of the initial entropy. Next, these concatenated bits are split into groups of 11 bits, each encoding a number from 0-2047, serving as an index into a word list. Finally, we convert these numbers into words and use the joined words as a mnemonic sentence.

The following table describes the relation between the initial entropy length (ENT), the checksum length (CS), and the length of the generated mnemonic sentence (MS) in words.

CS = ENT / 32
MS = (ENT + CS) / 11

|  ENT  | CS | ENT+CS |  MS  |
+-------+----+--------+------+
|  128  |  4 |   132  |  12  |
|  160  |  5 |   165  |  15  |
|  192  |  6 |   198  |  18  |
|  224  |  7 |   231  |  21  |
|  256  |  8 |   264  |  24  |

Recovering the Entropy

The initial entropy can be recovered by reversing the process above. The mnemonic is converted to bits, where each word is converted to 11 bits representing its index in the word list. The entropy portion is defined in the table above, based on the size of the mnemonic.

Word List

This specification only supports the BIP-39 English word list, but this may be expanded in the future.

See word list.

Applying Shamir’s Scheme to BIP-39 Mnemonics

To ensure that the shares are valid BIP-39 mnemonics, we:

Convert the target BIP-39 mnemonic to its underlying entropy
Apply Shamir’s Scheme to the entropy
Convert each resulting share’s y value to a BIP-39 mnemonic

By converting to entropy before applying Shamir’s Scheme, we omit the checksum from the initial secret, allowing us to calculate a new checksum for each share when converting the share y values to mnemonics, ensuring that they are valid according to BIP-39.

When applying Shamir’s Scheme to the entropy, we apply it separately to each byte of the entropy and GF(256) is used as the underlying finite field. Bytes are interpreted as elements of GF(256) using polynomial representation with operations modulo the Rijndael irreducible polynomial x⁸ + x⁴ + x³ + x + 1, following AES.

A share represents a point on the curve described by the underlying polynomial used to split the secret. It includes two pieces of data:

An ID: the x value of the share
A BIP-39 mnemonic: the y value of the share represented by a mnemonic

Creating Shares

Inputs: BIP-39 mnemonic, number of shares (N), threshold (T)

Output: N Shares, each share including an ID, { x

0 < x < 256 }, and a BIP-39 mnemonic of the same length as the input one

Check the following conditions:
- 1 < T <= N < 256
- The mnemonic is valid according to BIP-39
Recover the underlying entropy of the mnemonic as a vector of bytes
Define values:
- Let E be the byte-vector representation of the mnemonic’s entropy
- Let n be the length of E
- Let coeff₁, … , coeff_{T - 1} be byte-vectors belonging to GF(256)ⁿ generated randomly, independently with uniform distribution from a source suitable for generating cryptographic keys
Evaluate the polynomial for each share
- For each x from 1 to N, evaluate the polynomial f(x) = E + coeff₁x¹ + … + coeff_{T - 1}x^{T - 1}, where x is the share ID and f(x) is the share value (as a vector of bytes)
Using f(x) as the underlying entropy, generate a mnemonic for each share
Return the ID and mnemonic for each share

Recovering the Mnemonic

To recover the original mnemonic, we interpolate a polynomial f from the given set of shares (or points on the polynomial) and evaluate f(0).

Polynomial Interpolation

Given a set of m points (x_i, y_i), 1 ≤ i ≤ m, such that no two x_i values equal, there exists a polynomial that assumes the value y_i at each point x_i. The polynomial of lowest degree that satisfies these conditions is uniquely determined and can be obtained using the Lagrange interpolation formula given below.

Since Shamir’s Secret Sharing Scheme is applied separately to each of the n bytes of the shared mnemonic’s entropy, we work with y_i as a vector of n values, where y_i[k] = f_k(x_i), 1 ≤ k ≤ n, and f_k is the polynomial in the k-th instance of the scheme.

Interpolate(x, {(x_i, y_i), 1 ≤ i ≤ m})

Input: the desired index x, a set of index/value-vector pairs {(x_i, y_i), 1 ≤ i ≤ m} &subseteq; GF(256) × GF(256)ⁿ

Output: the value-vector (f₁(x), … , f_n(x))

$f_k(x) = \sum_{i=1}^m y_i[k] \prod_{\underset{j \neq i}{j=1}}^m \frac{x - x_j}{x_i - x_j}$

Recover the Mnemonic

Input: A set of m Shares

Output: The original mnemonic

Recover the underlying entropy of each share’s mnemonic as a vector of bytes
Calculate E = Interpolate(0, [(x₁, y₁),…,(x_m, y_m)]), where x is the share ID and y is the byte-vector of the share’s mnemonic’s entropy
Using E as the underlying entropy, generate a mnemonic and return it

Rationale

Choice of Field

The field GF(256) was chosen, because the field arithmetic is easy to implement in any programming language and many implementations are already available since it is used in the AES cipher. Although using GF(256) requires that we convert the mnemonic to its underlying entropy as a byte-vector, this is also easy to implement and many implementations of it exist in a variety of programming languages.

GF(2048) was also considered. Using GF(2048), we could have applied Shamir’s Scheme directly to the mnemonic, using the word indexes as the values. This would have allowed us to avoid converting the mnemonic to its underlying entropy. But, the resulting shares would not have been valid BIP-39 mnemonics - the checksum portion would not be a valid checksum of the entropy. And, working around this would add considerable complexity.

Another option was GF(2ⁿ) where n is the size of the entropy in bits. We’d still convert the mnemonic to entropy, but then apply Shamir’s Scheme over the entire entropy rather than on a vector of values. The downside of this approach is we’d need a different field for each mnemonic strength along with an associated irreducible polynomial. Additionally, this would require working with very large numbers that can be cumbersome to work with in some languages.

The shares produced by the specification include an ID, in addition to the BIP-39 mnemonic.

Other options could have encoded the share ID into the mnemonic, simplifying storage - only the mnemonic would need to be stored.

One possibility would be to store the ID instead of the checksum in the mnemonic. The downside of this approach is that the shares would not be valid BIP-39 mnemonics because the “checksum” section of the mnemonic would not match the “entropy” section. Shares with valid BIP-39 mnemonics are useful because they are indistinguishable from any other. And users could store the ID in a variety of ways that obscure it.

Validation on Recovery

We decided not to include a validation mechanism on recovering the original mnemonic. This leaks less information to a potential attacker. There is no indication they’ve gotten the requisite number of shares until they’ve obtained T + 1 shares.

We could provide recovery validation by replacing one of the random coefficients with a checksum of the original mnemonic. Then, when recovering the original mnemonic and the polynomial, we could validate that the checksum coefficient is the valid checksum of recovered mnemonic.

Test Cases

Coming soon.

All implementations must be able to:

Split and recover each mnemonic with the given numShares and threshold.
Recover the mnemonic from the given knownShares.

Security Considerations

The shares produced by the specification include an ID in addition to the BIP-39 mnemonic. This raises two security concerns:

Users must keep this ID in order to recover the original mnemonic. If the ID is lost, or separated from the share mnemonic, it may not be possible to recover the original. (Brute force recovery may or may not be possible depending on how much is known about the number of shares and threshold)

The additional data may hint to an attacker of the existence of other keys and the scheme under which they are stored. Therefore, the ID should be stored in a way that obscures its use.