Localized Messaging with Signal-to-Text
Simple Summary
A method of converting machine codes to human-readable text in any language and phrasing.
Abstract
An on-chain system for providing user feedback by converting machine-efficient codes into human-readable strings in any language or phrasing. The system does not impose a list of languages, but rather lets users create, share, and use the localizated text of their choice.
Motivation
There are many cases where an end user needs feedback or instruction from a smart contract. Directly exposing numeric codes does not make for good UX or DX. If Ethereum is to be a truly global system usable by experts and lay persons alike, systems to provide feedback on what happened during a transaction are needed in as many languages as possible.
Returning a hard-coded string (typically in English) only serves a small segment of the global population. This standard proposes a method to allow users to create, register, share, and use a decentralized collection of translations, enabling richer messaging that is more culturally and linguistically diverse.
There are several machine efficient ways of representing intent, status, state transition, and other semantic signals including booleans, enums and ERC-1066 codes. By providing human-readable messages for these signals, the developer experience is enhanced by returning easier to consume information with more context (ex. revert
). End user experience is enhanced by providing text that can be propagated up to the UI.
Specification
Contract Architecture
Two types of contract: LocalizationPreferences
, and Localization
s.
The LocalizationPreferences
contract functions as a proxy for tx.origin
.
+--------------+
| |
+------> | Localization |
| | |
| +--------------+
|
|
+-----------+ +-------------------------+ | +--------------+
| | | | <------+ | |
| Requestor | <------> | LocalizationPreferences | <-------------> | Localization |
| | | | <------+ | |
+-----------+ +-------------------------+ | +--------------+
|
|
| +--------------+
| | |
+------> | Localization |
| |
+--------------+
Localization
A contract that holds a simple mapping of codes to their text representations.
interface Localization {
function textFor(bytes32 _code) external view returns (string _text);
}
textFor
Fetches the localized text representation.
function textFor(bytes32 _code) external view returns (string _text);
LocalizationPreferences
A proxy contract that allows users to set their preferred Localization
. Text lookup is delegated to the user’s preferred contract.
A fallback Localization
with all keys filled MUST be available. If the user-specified Localization
has not explicitly set a loalization (ie. textFor
returns ""
), the LocalizationPreferences
MUST redelegate to the fallback Localization
.
interface LocalizationPreferences {
function set(Localization _localization) external returns (bool);
function textFor(bytes32 _code) external view returns (bool _wasFound, string _text);
}
set
Registers a user’s preferred Localization
. The registering user SHOULD be considered tx.origin
.
function set(Localization _localization) external;
textFor
Retrieve text for a code found at the user’s preferred Localization
contract.
The first return value (bool _wasFound
) represents if the text is available from that Localization
, or if a fallback was used. If the fallback was used in this context, the textFor
’s first return value MUST be set to false
, and is true
otherwise.
function textFor(bytes32 _code) external view returns (bool _wasFound, string _text);
String Format
All strings MUST be encoded as UTF-8.
"Špeĉiäl chârãçtérs are permitted"
"As are non-Latin characters: アルミ缶の上にあるみかん。"
"Emoji are legal: 🙈🙉🙊🎉"
"Feel free to be creative: (ノ◕ヮ◕)ノ*:・゚✧"
Templates
Template strings are allowed, and MUST follow the ANSI C printf
conventions.
"Satoshi's true identity is %s"
Text with 2 or more arguments SHOULD use the POSIX parameter field extension.
"Knock knock. Who's there? %1$s. %1$s who? %2$s!"
Rationale
bytes32
Keys
bytes32
is very efficient since it is the EVM’s base word size. Given the enormous number of elements (card(A) > 1.1579 × 1077), it can embed nearly any practical signal, enum, or state. In cases where an application’s key is longer than bytes32
, hashing that long key can map that value into the correct width.
Designs that use datatypes with small widths than bytes32
(such as bytes1
in ERC-1066) can be directly embedded into the larger width. This is a trivial one-to-one mapping of the smaller set into the larger one.
Local vs Globals and Singletons
This spec has opted to not force a single global registry, and rather allow any contract and use case deploy their own system. This allows for more flexibility, and does not restrict the community for opting to use singleton LocalizationPreference
contracts for common use cases, share Localization
s between different proxys, delegate translations between Localization
s, and so on.
There are many practical uses of agreed upon singletons. For instance, translating codes that aim to be fairly universal and integrated directly into the broader ecosystem (wallets, frameworks, debuggers, and the like) will want to have a single LocalizationPreference
.
Rather the dispersing several LocalizationPreference
s for different use cases and codes, one could imagine a global “registry of registries”. While this approach allows for a unified lookups of all translations in all use cases, it is antithetical to the spirit of decentralization and freedom. Such a system also increases the lookup complexity, places an onus on getting the code right the first time (or adding the overhead of an upgradable contract), and need to account for use case conflicts with a “unified” or centralized numbering system. Further, lookups should be lightweight (especially in cases like looking up revert text).
For these reasons, this spec chooses the more decentralized, lightweight, free approach, at the cost of on-chain discoverability. A registry could still be compiled, but would be difficult to enforce, and is out of scope of this spec.
Off Chain Storage
A very viable alternative is to store text off chain, with a pointer to the translations on-chain, and emit or return a bytes32
code for another party to do the lookup. It is difficult to guarantee that off-chain resources will be available, and requires coordination from some other system like a web server to do the code-to-text matching. This is also not compatible with revert
messages.
ASCII vs UTF-8 vs UTF-16
UTF-8 is the most widely used encoding at time of writing. It contains a direct embedding of ASCII, while providing characters for most natural languages, emoji, and special characters.
Please see the UTF-8 Everywhere Manifesto for more information.
When No Text is Found
Returning a blank string to the requestor fully defeats the purpose of a localization system. The two options for handling missing text are:
- A generic “text not found” message in the preferred language
- The actual message, in a different language
Generic Option
This designed opted to not use generic fallback text. It does not provide any useful information to the user other than to potentially contact the Localization
maintainer (if one even exists and updating is even possible).
Fallback Option
The design outlined in this proposal is to providing text in a commonly used language (ex. English or Mandarin). First, this is the language that will be routed to if the user has yet to set a preference. Second, there is a good chance that a user may have some proficiency with the language, or at least be able to use an automated translation service.
Knowing that the text fell back via textFor
s first return field boolean is much simpler than attempting language detection after the fact. This information is useful for certain UI cases. for example where there may be a desire to explain why localization fell back.
Decentralized Text Crowdsourcing
In order for Ethereum to gain mass adoption, users must be able to interact with it in the language, phrasing, and level of detail that they are most comfortable with. Rather than imposing a fixed set of translations as in a traditional, centralized application, this EIP provides a way for anyone to create, curate, and use translations. This empowers the crowd to supply culturally and linguistically diverse messaging, leading to broader and more distributed access to information.
printf
-style Format Strings
C-style printf
templates have been the de facto standard for some time. They have wide compatibility across most languages (either in standard or third-party libraries). This makes it much easier for the consuming program to interpolate strings with low developer overhead.
Parameter Fields
The POSIX parameter field extension is important since languages do not share a common word order. Parameter fields enable the reuse and rearrangement of arguments in different localizations.
("%1$s is an element with the atomic number %2$d!", "Mercury", 80);
// => "Mercury is an element with the atomic number 80!"
Simplified Localizations
Localization text does not require use of all parameters, and may simply ignore values. This can be useful for not exposing more technical information to users that would otherwise find it confusing.
#!/usr/bin/env ruby
sprintf("%1$s é um elemento", "Mercurio", 80)
# => "Mercurio é um elemento"
#!/usr/bin/env clojure
(format "Element #%2$s" "Mercury" 80)
;; => Element #80
Interpolation Strategy
Please note that it is highly advisable to return the template string as is, with arguments as multiple return values or fields in an event
, leaving the actual interpolation to be done off chain.
event AtomMessage {
bytes32 templateCode;
bytes32 atomCode;
uint256 atomicNumber;
}
#!/usr/bin/env node
var printf = require('printf');
const { returnValues: { templateCode, atomCode, atomicNumber } } = eventResponse;
const template = await AppText.textFor(templateCode);
// => "%1$s ist ein Element mit der Ordnungszahl %2$d!"
const atomName = await PeriodicTableText.textFor(atomCode);
// => "Merkur"
printf(template, atomName, 80);
// => "Merkur ist ein Element mit der Ordnungszahl 80!"
Unspecified Behaviour
This spec does not specify:
- Public or private access to the default
Localization
- Who may set text
- Deployer
onlyOwner
- Anyone
- Whitelisted users
- and so on
- When text is set
constructor
- Any time
- Write to empty slots, but not overwrite existing text
- and so on
These are intentionally left open. There are many cases for each of these, and restricting any is fully beyond the scope of this proposal.
Implementation
pragma solidity ^0.4.25;
contract Localization {
mapping(bytes32 => string) private dictionary_;
constructor() public {}
// Currently overwrites anything
function set(bytes32 _code, string _message) external {
dictionary_[_code] = _message;
}
function textFor(bytes32 _code) external view returns (string _message) {
return dictionary_[_code];
}
}
contract LocalizationPreference {
mapping(address => Localization) private registry_;
Localization public defaultLocalization;
bytes32 private empty_ = keccak256(abi.encodePacked(""));
constructor(Localization _defaultLocalization) public {
defaultLocalization = _defaultLocalization;
}
function set(Localization _localization) external returns (bool) {
registry_[tx.origin] = _localization;
return true;
}
function get(bytes32 _code) external view returns (bool, string) {
return get(_code, tx.origin);
}
// Primarily for testing
function get(bytes32 _code, address _who) public view returns (bool, string) {
string memory text = getLocalizationFor(_who).textFor(_code);
if (keccak256(abi.encodePacked(text)) != empty_) {
return (true, text);
} else {
return (false, defaultLocalization.textFor(_code));
}
}
function getLocalizationFor(address _who) internal view returns (Localization) {
if (Localization(registry_[_who]) == Localization(0)) {
return Localization(defaultLocalization);
} else {
return Localization(registry_[tx.origin]);
}
}
}
Copyright
Copyright and related rights waived via CC0.