Localized Messaging with Signal-to-Text | Ethereum Improvement Proposals

Simple Summary

A method of converting machine codes to human-readable text in any language and phrasing.

Abstract

An on-chain system for providing user feedback by converting machine-efficient codes into human-readable strings in any language or phrasing. The system does not impose a list of languages, but rather lets users create, share, and use the localizated text of their choice.

Motivation

There are many cases where an end user needs feedback or instruction from a smart contract. Directly exposing numeric codes does not make for good UX or DX. If Ethereum is to be a truly global system usable by experts and lay persons alike, systems to provide feedback on what happened during a transaction are needed in as many languages as possible.

Returning a hard-coded string (typically in English) only serves a small segment of the global population. This standard proposes a method to allow users to create, register, share, and use a decentralized collection of translations, enabling richer messaging that is more culturally and linguistically diverse.

There are several machine efficient ways of representing intent, status, state transition, and other semantic signals including booleans, enums and ERC-1066 codes. By providing human-readable messages for these signals, the developer experience is enhanced by returning easier to consume information with more context (ex. revert). End user experience is enhanced by providing text that can be propagated up to the UI.

Specification

Contract Architecture

Two types of contract: LocalizationPreferences, and Localizations.

The LocalizationPreferences contract functions as a proxy for tx.origin.

                                                                   +--------------+
                                                                   |              |
                                                          +------> | Localization |
                                                          |        |              |
                                                          |        +--------------+
                                                          |
                                                          |
+-----------+          +-------------------------+        |        +--------------+
|           |          |                         | <------+        |              |
| Requestor | <------> | LocalizationPreferences | <-------------> | Localization |
|           |          |                         | <------+        |              |
+-----------+          +-------------------------+        |        +--------------+
                                                          |
                                                          |
                                                          |        +--------------+
                                                          |        |              |
                                                          +------> | Localization |
                                                                   |              |
                                                                   +--------------+

`Localization`

A contract that holds a simple mapping of codes to their text representations.

interface Localization {
  function textFor(bytes32 _code) external view returns (string _text);
}

`textFor`

Fetches the localized text representation.

function textFor(bytes32 _code) external view returns (string _text);

`LocalizationPreferences`

A proxy contract that allows users to set their preferred Localization. Text lookup is delegated to the user’s preferred contract.

A fallback Localization with all keys filled MUST be available. If the user-specified Localization has not explicitly set a loalization (ie. textFor returns ""), the LocalizationPreferences MUST redelegate to the fallback Localization.

interface LocalizationPreferences {
  function set(Localization _localization) external returns (bool);
  function textFor(bytes32 _code) external view returns (bool _wasFound, string _text);
}

`set`

Registers a user’s preferred Localization. The registering user SHOULD be considered tx.origin.

function set(Localization _localization) external;

`textFor`

Retrieve text for a code found at the user’s preferred Localization contract.

The first return value (bool _wasFound) represents if the text is available from that Localization, or if a fallback was used. If the fallback was used in this context, the textFor’s first return value MUST be set to false, and is true otherwise.

function textFor(bytes32 _code) external view returns (bool _wasFound, string _text);

String Format

All strings MUST be encoded as UTF-8.

"Špeĉiäl chârãçtérs are permitted"
"As are non-Latin characters: アルミ缶の上にあるみかん。"
"Emoji are legal: 🙈🙉🙊🎉"
"Feel free to be creative: (ﾉ◕ヮ◕)ﾉ*:･ﾟ✧"

Templates

Template strings are allowed, and MUST follow the ANSI C printf conventions.

"Satoshi's true identity is %s"

Text with 2 or more arguments SHOULD use the POSIX parameter field extension.

"Knock knock. Who's there? %1$s. %1$s who? %2$s!"

Rationale

`bytes32` Keys

bytes32 is very efficient since it is the EVM’s base word size. Given the enormous number of elements (card(A) > 1.1579 × 10⁷⁷), it can embed nearly any practical signal, enum, or state. In cases where an application’s key is longer than bytes32, hashing that long key can map that value into the correct width.

Designs that use datatypes with small widths than bytes32 (such as bytes1 in ERC-1066) can be directly embedded into the larger width. This is a trivial one-to-one mapping of the smaller set into the larger one.

Local vs Globals and Singletons

This spec has opted to not force a single global registry, and rather allow any contract and use case deploy their own system. This allows for more flexibility, and does not restrict the community for opting to use singleton LocalizationPreference contracts for common use cases, share Localizations between different proxys, delegate translations between Localizations, and so on.

There are many practical uses of agreed upon singletons. For instance, translating codes that aim to be fairly universal and integrated directly into the broader ecosystem (wallets, frameworks, debuggers, and the like) will want to have a single LocalizationPreference.

Rather the dispersing several LocalizationPreferences for different use cases and codes, one could imagine a global “registry of registries”. While this approach allows for a unified lookups of all translations in all use cases, it is antithetical to the spirit of decentralization and freedom. Such a system also increases the lookup complexity, places an onus on getting the code right the first time (or adding the overhead of an upgradable contract), and need to account for use case conflicts with a “unified” or centralized numbering system. Further, lookups should be lightweight (especially in cases like looking up revert text).

For these reasons, this spec chooses the more decentralized, lightweight, free approach, at the cost of on-chain discoverability. A registry could still be compiled, but would be difficult to enforce, and is out of scope of this spec.

Off Chain Storage

A very viable alternative is to store text off chain, with a pointer to the translations on-chain, and emit or return a bytes32 code for another party to do the lookup. It is difficult to guarantee that off-chain resources will be available, and requires coordination from some other system like a web server to do the code-to-text matching. This is also not compatible with revert messages.

ASCII vs UTF-8 vs UTF-16

UTF-8 is the most widely used encoding at time of writing. It contains a direct embedding of ASCII, while providing characters for most natural languages, emoji, and special characters.

Please see the UTF-8 Everywhere Manifesto for more information.

When No Text is Found

Returning a blank string to the requestor fully defeats the purpose of a localization system. The two options for handling missing text are:

A generic “text not found” message in the preferred language
The actual message, in a different language

Generic Option

This designed opted to not use generic fallback text. It does not provide any useful information to the user other than to potentially contact the Localization maintainer (if one even exists and updating is even possible).

Fallback Option

The design outlined in this proposal is to providing text in a commonly used language (ex. English or Mandarin). First, this is the language that will be routed to if the user has yet to set a preference. Second, there is a good chance that a user may have some proficiency with the language, or at least be able to use an automated translation service.

Knowing that the text fell back via textFors first return field boolean is much simpler than attempting language detection after the fact. This information is useful for certain UI cases. for example where there may be a desire to explain why localization fell back.

Decentralized Text Crowdsourcing

In order for Ethereum to gain mass adoption, users must be able to interact with it in the language, phrasing, and level of detail that they are most comfortable with. Rather than imposing a fixed set of translations as in a traditional, centralized application, this EIP provides a way for anyone to create, curate, and use translations. This empowers the crowd to supply culturally and linguistically diverse messaging, leading to broader and more distributed access to information.

`printf`-style Format Strings

C-style printf templates have been the de facto standard for some time. They have wide compatibility across most languages (either in standard or third-party libraries). This makes it much easier for the consuming program to interpolate strings with low developer overhead.

Parameter Fields

The POSIX parameter field extension is important since languages do not share a common word order. Parameter fields enable the reuse and rearrangement of arguments in different localizations.

("%1$s is an element with the atomic number %2$d!", "Mercury", 80);
// => "Mercury is an element with the atomic number 80!"

Simplified Localizations

Localization text does not require use of all parameters, and may simply ignore values. This can be useful for not exposing more technical information to users that would otherwise find it confusing.

#!/usr/bin/env ruby

sprintf("%1$s é um elemento", "Mercurio", 80)
# => "Mercurio é um elemento"

#!/usr/bin/env clojure

(format "Element #%2$s" "Mercury" 80)
;; => Element #80

Interpolation Strategy

Please note that it is highly advisable to return the template string as is, with arguments as multiple return values or fields in an event, leaving the actual interpolation to be done off chain.

event AtomMessage {
  bytes32 templateCode;
  bytes32 atomCode;
  uint256 atomicNumber;
}

#!/usr/bin/env node

var printf = require('printf');

const { returnValues: { templateCode, atomCode, atomicNumber } } = eventResponse;

const template = await AppText.textFor(templateCode);
// => "%1$s ist ein Element mit der Ordnungszahl %2$d!"

const atomName = await PeriodicTableText.textFor(atomCode);
// => "Merkur"

printf(template, atomName, 80);
// => "Merkur ist ein Element mit der Ordnungszahl 80!"

Unspecified Behaviour

This spec does not specify:

Public or private access to the default Localization
Who may set text
- Deployer
- onlyOwner
- Anyone
- Whitelisted users
- and so on
When text is set
- constructor
- Any time
- Write to empty slots, but not overwrite existing text
- and so on

These are intentionally left open. There are many cases for each of these, and restricting any is fully beyond the scope of this proposal.

Implementation

pragma solidity ^0.4.25;

contract Localization {
  mapping(bytes32 => string) private dictionary_;

  constructor() public {}

  // Currently overwrites anything
  function set(bytes32 _code, string _message) external {
    dictionary_[_code] = _message;
  }

  function textFor(bytes32 _code) external view returns (string _message) {
    return dictionary_[_code];
  }
}

contract LocalizationPreference {
  mapping(address => Localization) private registry_;
  Localization public defaultLocalization;

  bytes32 private empty_ = keccak256(abi.encodePacked(""));

  constructor(Localization _defaultLocalization) public {
    defaultLocalization = _defaultLocalization;
  }

  function set(Localization _localization) external returns (bool) {
    registry_[tx.origin] = _localization;
    return true;
  }

  function get(bytes32 _code) external view returns (bool, string) {
    return get(_code, tx.origin);
  }

  // Primarily for testing
  function get(bytes32 _code, address _who) public view returns (bool, string) {
    string memory text = getLocalizationFor(_who).textFor(_code);

    if (keccak256(abi.encodePacked(text)) != empty_) {
      return (true, text);
    } else {
      return (false, defaultLocalization.textFor(_code));
    }
  }

  function getLocalizationFor(address _who) internal view returns (Localization) {
    if (Localization(registry_[_who]) == Localization(0)) {
      return Localization(defaultLocalization);
    } else {
      return Localization(registry_[tx.origin]);
    }
  }
}

Simple Summary

Abstract

Motivation

Specification

Contract Architecture

Localization

textFor

LocalizationPreferences

set

textFor

String Format

Templates

Rationale

bytes32 Keys

Local vs Globals and Singletons

Off Chain Storage

ASCII vs UTF-8 vs UTF-16

When No Text is Found

Generic Option

Fallback Option

Decentralized Text Crowdsourcing

printf-style Format Strings

Parameter Fields

Simplified Localizations

Interpolation Strategy

Unspecified Behaviour

Implementation

Copyright

`Localization`

`textFor`

`LocalizationPreferences`

`set`

`textFor`

`bytes32` Keys

`printf`-style Format Strings