diff --git a/abstract.md b/abstract.md index 82ba3d3..23743c7 100644 --- a/abstract.md +++ b/abstract.md @@ -1 +1 @@ -The Ethereum Ecosystem allows for the verifiable execution of programs in a trusted manner thanks to smart contracts. One of the most common contracts is the implementation of a token---the digital representation of an asset---and most implementations follow the ERC20 Token Standard. The ERC20 Token Standard is however loosely defined and flawed. Various implementations which follow the ERC20 standard, have unclear and unexpected behaviours. Developers also tend to combat the limitation of ERC20 by supplementing their token implementation with non-standard and more complex features which often leads to severe bugs. In this thesis, we analyse the shortcomings and flaws of ERC20 and submit two new standard proposals: ERC777 and ERC820 aimed at not only solving the drawbacks of ERC20 but at offering new and exciting new features as well. The goal is to provide a better alternative to ERC20 which improves the safety, security and ease of use of tokens for the end user, as well as facilitate the work of blockchain developers by providing a modular reference implementation of a token. Developers can readily use this implementation to create their tokens without relying on incorrect, unsafe and non-standard features and implementations. Furthermore, we discuss the time spent interacting with and gathering feedback from, the community which is critical to have our proposals officially accepted and adopted by the Ethereum community. Subsequently, we provide an analysis of competing proposals which take a different approach with the same aim of addressing some of the issues of ERC20. Finally, we elaborate on the future steps of ERC777, such as further research on some of its new features, the formal verification of the reference implementation and even external tools which can assist with the design of more efficient implementations of both ERC777 tokens and other programs in general. +The Ethereum Ecosystem allows for the verifiable execution of programs in a trusted manner thanks to smart contracts. One of the most common contracts is the implementation of a token---the digital representation of an asset---and the vast majority of implementations follow the ERC20 Token Standard. However, the ERC20 Token Standard is loosely defined and flawed. Various implementations which follow the ERC20 standard, have unclear and unexpected behaviours. Developers also tend to combat the limitation of ERC20 by supplementing their token implementation with non-standard and more complex features which often leads to severe bugs. In this thesis, we analyse the shortcomings and flaws of ERC20 and submit two new standard proposals: ERC777 and ERC820 aimed at not only solving the drawbacks of ERC20 but at offering new exciting features as well. The goal is to provide a better alternative to ERC20 which improves the safety, security and ease of use of tokens for the end user, as well as facilitate the work of blockchain developers by providing a modular and reusable reference implementation of a token. Developers can readily use this implementation to create their own tokens without relying on incorrect, unsafe, and non-standard features or implementations. Furthermore, we discuss the time spent interacting with and gathering feedback from the community which is critical to have our proposals officially accepted and adopted by the Ethereum community. Subsequently, we provide an analysis of competing proposals which take a different approach with the same aim of addressing some of the issues of ERC20. Finally, we elaborate on the future steps of ERC777, such as further research on some of its new features, the formal verification of the reference implementation and even external tools which can assist with the design of more efficient implementations of both ERC777 tokens and other programs in general. diff --git a/acknowledgements.md b/acknowledgements.md index 3d2410e..ba41b10 100644 --- a/acknowledgements.md +++ b/acknowledgements.md @@ -4,8 +4,8 @@ Secondly, I would like to thanks Professor Cesare Pautasso, for his kindness and In the third place, I would also like to thank, Samantha Rosso who supported me on a personal level and more importantly promptly and efficiently designed the final version of the logo for ERC777 based on our doodles and according to our (fastidious) prerequisites. I am thankful she agreed to waive licensing rights after the fact to allow us to put the logo in the public domain. -Adjacently, I would like to thank anyone who helped and is currently helping ERC777 and ERC820 advance and improve through feedback on the GitHub issues or privately, through discussion online and in person, and by donating their time. This include but is not limited to many members of Giveth, Aragon, Truelevel, Web3 and attendees of EthCC in Paris in March 2018 as well as many people on the Github issues, including but not limited to Micah Zoltu, Nick Johnson, Alex Van de Sande, Jim McDonald, Dave Appleton, Chris Drake and Fabian Vogelsteller. +Adjacently, I would like to thank anyone who helped and is currently helping ERC777 and ERC820 advance and improve through feedback on the GitHub issues or privately, through discussion online and in person, and by donating their time. This include but is not limited to many members of Giveth, Aragon, TrueLevel, Web3 Foundation and attendees of EthCC in Paris in March 2018 as well as many people on the Github issues, including but not limited to Micah Zoltu, Nick Johnson, Alex Van de Sande, Jim McDonald, Dave Appleton, Chris Drake and Fabian Vogelsteller. -Subsequently, I would like to thank Bity and in particular its CEO Alexis Roussel and CTO Alejandro Avilés (OmeGak) for allowing to work on my thesis during company time as well as to provide me with the necessary contacts to assist me during my thesis. +Subsequently, I would like to thank Bity and in particular the CEO Alexis Roussel and the CTO Alejandro Avilés (OmeGak) for allowing me to work on my thesis during company time as well as to provide me with the necessary contacts to assist me during my thesis. Finally, I would like to express a big thanks to Thomas Shababi who agreed to be my co-advisor and devoted a significant amount of his time to follow me and assist me over the course of an entire year both with technical and human aspects of my work. diff --git a/chapters/01-introduction.md b/chapters/01-introduction.md index c239d5c..97da186 100644 --- a/chapters/01-introduction.md +++ b/chapters/01-introduction.md @@ -2,23 +2,23 @@ ## Motivation -Ethereum is a new blockchain inspired by Bitcoin, with the design goal of abstracting away transaction complexity and allowing for easy programmatic interaction through the use of a Virtual Machine and relying upon the state of this Virtual Machine rather than dealing with transaction outputs; transactions merely modify the state. +Ethereum is a new blockchain inspired by Bitcoin, with the design goal of abstracting away transaction complexity and allowing for easy programmatic interaction through the use of a virtual machine and relying upon the state of this virtual machine rather than dealing with transaction outputs; transactions merely modify the state. -This idea of a global computer allows one to write a program, hereinafter a Smart Contract, which interacts with the EVM and inherits the safety properties of the Ethereum system (and also its limitations). Essentially it is a very low power/capacity computing platform with interesting safety properties (such as operations and state data being essentially immutable once a transaction is included in the blockchain [with sufficient confirmations as its probabilistic after all]. This is ideally suited to small minimalistic programs governing essential data, such as a ledger of transactions. +This idea of a global computer allows one to write a program, hereinafter a Smart Contract, which interacts with the EVM and inherits the safety properties of the Ethereum system---as well as its limitations. Fundamentally it is a very low power and low capacity computing platform with interesting safety properties---such as operations and state data being essentially immutable once a transaction is included in the blockchain (with sufficient confirmations as it's probabilistic after all). This is ideally suited to small minimalistic programs governing essential data, such as a ledger of transactions. -One such example of smart contracts is the ERC20 token standard (there are varying smart contract implementations). This is likely the most widely deployed type of smart contract on Ethereum. One issue is the design of ERC20. The way to transfer tokens to an externally owned address or a contract address differs and transferring tokens to a contract assuming it is a regular address can result in losing those tokens forever. This consequence limits the way smart contracts can interact with ERC20 tokens and adds complexity to the \gls{ux}. +One such example of smart contracts is the ERC20 token standard (there are varying smart contract implementations). This is likely the most widely deployed type of smart contract on Ethereum. One issue is the design of ERC20. The way to transfer tokens to an externally owned address or a contract address differs and transferring tokens to a contract, while assuming it is a regular address, can result in losing those tokens forever. In consequence, this limits the way smart contracts can interact with ERC20 tokens and adds complexity to the \gls{ux}. The new ERC777 token standard solves these problems and offers new powerful features which facilitate new exciting use cases for tokens. ## Objective Of The Thesis -Our objective is to identify and describe the current issues and shortcomings of Ethereum's ERC20 token standard in order to create the more advanced token standard, ERC777 which not only solves the drawbacks of ERC20 but provide new powerful features which facilitate new exciting use cases for tokens. The goals include better safety for token holders, improved usability, enhanced and more complex interactions between parties when creating exchanging and destroying tokens, and last but not least, a wide adoption by the Ethereum community. +Our objective is to identify and describe the current issues and shortcomings of Ethereum's ERC20 token standard in order to create a more advanced token standard, ERC777 which not only solves the drawbacks of ERC20 but provides new powerful features which facilitate new and exciting use cases for tokens. The goals include better safety for token holders, improved usability, enhanced and more complex interactions between parties when creating, exchanging, and destroying tokens, and last but not least, wide adoption by the Ethereum community. A part of this thesis' objective is to provide, as well, a reference implementation of the ERC777 advanced token standard which is not only used as an example but provides a modular structure such that token designers can build their own token on top of the reference implementation, thusly avoiding common programming mistakes. ## Challenges -Writing a standard requires the ability to define specifications which have to be versatile and adaptable within strict confines. Therefore, on the one hand, the standard needs to be generic enough to be adopted and used by a large number of people. On the other hand, its definition needs to be precise and explicit enough to avoid any ambiguities, conflicting conditions and undefined scenarios which are a recipe for disaster. The language of the standard must also be clear but succinct and easily understandable by non-native and non-proficient speakers. +Writing a standard requires the ability to define specifications which have to be versatile and adaptable within strict confines. Therefore, on the one hand, the standard needs to be generic enough to be adopted and used by a large number of people. On the other hand, its definition needs to be precise and explicit enough to avoid any ambiguities, conflicting conditions or undefined scenarios which can be a recipe for disaster. The language of the standard must also be clear but succinct and easily understandable by non-native and non-proficient English speakers. Finally, the core goal of the standard is to be accepted and used by as many members of the Ethereum community as possible. We decided the best approach to tackle this challenge is to build the standard with the community as much as possible by asking for their thoughts and feedback and incorporate it in the standard as much as possible, as with the `tokensToSend` hook. @@ -30,18 +30,16 @@ Chapter \ref{ethereum-a-decentralised-computing-platform} begins by introducing Chapter \ref{tokens-and-standardisation}, provides a generic definition of a token with respect to the Ethereum ecosystem, and how they are traditionally implemented. We then argue about how standardisation can empower users to more easily and safely use tokens and how the use of a standard-compliant token can help its adoption. We continue by describing the process by which standards in Ethereum are catalogued and how anyone is able to submit a new standard proposal---such as the ones described later in this thesis. By the end of chapter \ref{tokens-and-standardisation} we provide a table and a genealogical tree comparing the Ethereum standards and standard proposals related to tokens. -Chapter \ref{erc20-token-standard} describe ERC20, the current token standard. It provides a description of the standard itself and the mechanisms used to transfer tokens. Next, we provided a details analysis of the strengths and weaknesses of the ERC20 token standard, together with the description of one critical flaw in the standard allowing an attacker to transfer more tokens than intended from a victim's account. +Chapter \ref{erc20-token-standard} describe ERC20, the current token standard. It provides a description of the standard itself and the mechanisms used to transfer tokens. Next, we provided a detailed analysis of the strengths and weaknesses of the ERC20 token standard, together with the description of one critical flaw in the standard allowing an attacker to transfer more tokens than intended from a victim's account. Chapter \ref{erc777-a-new-advanced-token-standard-for-ethereum-tokens} introduces the new and advanced token standard proposal which was developed as part of this thesis: ERC777. We begin by defining operators and hooks, two new concepts brought by ERC777 and followed by describing the sending, minting and burning mechanisms specified in the standard. Next, we explain the other relevant aspects of ERC777 such as the addition---with respect to ERC20---of the `data` and `operatorData` parameters when moving tokens, the view functions required by ERC777, and the approach taken to deal with decimals. Subsequently, we discuss the compatibility between ERC20 and ERC777. Afterwards, we elaborate on the interaction with the community and the public as well as the elaboration of the ERC777 logo which is also part of the standard. Finally, we comment on the reference implementation, a full implementation of an ERC777-compliant token and how the implementation is designed in a way to promote reusability and assist blockchain developers to create ERC777-compliant tokens with ease. -Chapter \ref{erc820-pseudo-introspection-registry-contract} covers the ERC820 pseudo introspection registry contract, a second standard proposal describing a registry we had to submit to enable some of the core features of ERC777, namely hooks and preventing accidental locking of tokens. We initially describe the two previous attempts to solve this registry problem and how both attempts revealed ill-suited for ERC777. +Chapter \ref{erc820-pseudo-introspection-registry-contract} covers the ERC820 pseudo introspection registry contract, a second standard proposal describing a registry we had to submit to enable some of the core features of ERC777, namely hooks and preventing accidental locking of tokens. We initially describe the two previous attempts to solve this registry problem and how both attempts revealed ill-suited for ERC777. Next, we go over the decision and reasons to provide the registry as a separate standard proposal---independently of ERC777---rather than bundling everything in a single standard. We cover the functions of the registry and the compatibility with ERC165 including the caching of ERC165 interfaces within the registry. In the end, we describe the lesser-known and somewhat unusual keyless deployment method which allows the registry to have a single address across all chains. Besides we explain as well how we achieved to have a vanity address starting with `0x820`. -In chapter \ref{competing-token-standards}, we provide a comparison between the previously described ERC777 standard proposal and two of the most popular token-related proposal: the "ERC223 token standard"\citep{erc223} and the "ERC827 Token Standard (ERC20 Extension)"\citep{erc827}. We go over the different approaches---with respect to ERC777---taken by these standards to solve the issues of ERC20, and the issues with those approaches including the known vulnerabilities they contain. Next, we describe the ERC820 registry itself and go over the decision and reasons to provide the registry as a separate standard proposal---independently of ERC777---rather than bundling everything in a single standard. We cover the functions of the registry and the compatibility with ERC165 including the caching of ERC165 interfaces within the registry. In the end, we describe the lesser-known and somewhat unusual keyless deployment method which allows the registry to have a single address across all chains. Besides we explain as well how we achieved to have a vanity address starting with `0x820`. - -Chapter \ref{competing-token-standards} analyses ERC223 and ERC827, two popular alternatives to ERC777 and their approach to solve some of the token-related issues we encountered and described in chapter \ref{erc777-a-new-advanced-token-standard-for-ethereum-tokens}. This analysis also includes the current issues from which both proposals suffer, including a flaw which resulted in the fraudulent minting of eleven million tokens on a vulnerable token contract. +In chapter \ref{competing-token-standards}, we analyses---besides ERC777---two of the most popular token-related proposals: the "ERC223 token standard"\citep{erc223} and the "ERC827 Token Standard (ERC20 Extension)"\citep{erc827}. We go over the different approaches---with respect to ERC777---taken by these standards to solve the issues of ERC20, and the issues with those approaches including the known vulnerabilities they contain. This analysis also includes the current issues from which both proposals suffer, including a flaw which resulted in the fraudulent minting of eleven million tokens on a vulnerable token contract. Chapter \ref{the-state-of-tooling-in-the-ethereum-ecosystem} goes over the current state of tools in the Ethereum ecosystem, our experiences with those tools, the effect their quality and maturity had on the rest of the work outlined in this thesis and the contributions we brought to some of the tools during the development of this thesis. We finish by identifying the need for a gas profiler---which is recognised by members of the Ethereum community as one of the tools which are missing---and we elaborate the importance and uses for such a tool. -Chapter \ref{future-research-and-work} covers the work which remains to be done until the ERC777 is formally accepted and widely adopted by the community. This includes the formal verification of the reference implementation which is already ongoing by an independent third party, the need to research and develop generic hooks and operators for ERC777, the community work to promote the standard and any form of assistance we provide to blockchain developers working on ERC77 tokens, related tools and \glspl{dapp}. +Chapter \ref{future-research-and-work} covers the work which remains to be done until ERC777 is formally accepted and widely adopted by the community. This includes the formal verification of the reference implementation which is already ongoing by an independent third party, the need to research and develop generic hooks and operators for ERC777, the community work to promote the standard and any form of assistance we provide to blockchain developers working on ERC77 tokens, related tools, and \glspl{dapp}. Finally, in chapter \ref{conclusion} we conclude the work of this thesis by synthesising the work outlined in the thesis and how ERC777 can improve the Ethereum ecosystem and solve some of its current issues. diff --git a/chapters/02-ethereum.md b/chapters/02-ethereum.md index 9266a3d..629c122 100644 --- a/chapters/02-ethereum.md +++ b/chapters/02-ethereum.md @@ -1,14 +1,14 @@ # Ethereum, A Decentralised Computing Platform -The Ethereum network is a decentralised computing platform. As described in its the white paper, Ethereum "[...] is essentially the ultimate abstract foundational layer: a blockchain with a built-in Turing-complete programming language, allowing anyone to write smart contracts and decentralised applications where they can create their own arbitrary rules for ownership, transaction formats and state transition functions" \citep{buterin2013whitepaper}. This differentiates Ethereum from Bitcoin which is a trustless peer-to-peer version of electronic cash and lacks a Turing-complete language. +The Ethereum network is a decentralised computing platform. As described in its white paper, Ethereum "[...] is essentially the ultimate abstract foundational layer: a blockchain with a built-in Turing-complete programming language, allowing anyone to write smart contracts and decentralised applications where they can create their own arbitrary rules for ownership, transaction formats and state transition functions" \citep{buterin2013whitepaper}. This differentiates Ethereum from Bitcoin which is a trustless peer-to-peer version of electronic cash and lacks a Turing-complete language. ## The Ether Currency And Gas -The Ethereum still includes its own built-in currency named ether akin to Bitcoin. It "[...] serves the dual purpose of providing a primary liquidity layer to allow for efficient exchange between various types of digital assets and, more importantly, of providing a mechanism for paying transaction fees" \citep{buterin2013whitepaper}. The currency comes with different denominations defined. The smallest denomination is a wei---named after the computer scientist and inventor of b-money, Wei Dai. An ether is defined as 10^18^ wei. In other words, a wei represents 0.000000000000000001 ethers. The wei denomination is used for technical discussions and internal representation of the data. Most tools, libraries and smart contracts use wei, and the values are only converted to ether or some other denomination for the end-user. +The Ethereum platform still includes its own built-in currency named ether akin to Bitcoin. It "[...] serves the dual purpose of providing a primary liquidity layer to allow for efficient exchange between various types of digital assets and, more importantly, of providing a mechanism for paying transaction fees" \citep{buterin2013whitepaper}. The currency comes with different denominations defined. The smallest denomination is a wei---named after the computer scientist and inventor of b-money, Wei Dai. An ether is defined as 10^18^ wei. In other words, a wei represents 0.000000000000000001 ether. The wei denomination is used for technical discussions and internal representation of the data. Most tools, libraries and smart contracts use wei, and the values are only converted to ether or some other denomination for the end-user. ### Computing Fees -The fees are part of the incentive mechanism as in Bitcoin. The main difference is the way the fees are expressed and computed. In Bitcoin, the fees are fixed and set as the difference between the input value and the output value. Because transactions on the Ethereum network execute code of a Turing-complete language, the fee is defined differently "[...] to prevent accidental or hostile infinite loops or other computational wastage in code" \citep{buterin2013whitepaper}. A transaction defines two fields `STARTGAS` and `GASPRICE`. The `STARTGAS`---also referred to as just `gas` or `gasLimit`---is the maximum amount of gas the transaction may use. The `GASPRICE` is the fee the sender will pay per unit of gas consumed. Essentially, the fees are a limitation on the Turing-completeness. While the language is Turing-complete, the execution of the program is limited in its number of steps. In essence, fees are not only a part of the incentive mechanism but are also an anti-spam measure as every extra transaction is a burden on everyone in the network, and it would be effectively free to grief the network if there were no fees. +The fees are part of the incentive mechanism as in Bitcoin. The main difference is the way the fees are expressed and computed. In Bitcoin, the fees are fixed and set as the difference between the input value and the output value. Because transactions on the Ethereum network execute code of a Turing-complete language, the fee is defined differently "[...] to prevent accidental or hostile infinite loops or other computational wastage in code" \citep{buterin2013whitepaper}. A transaction defines two fields `STARTGAS` and `GASPRICE`. The `STARTGAS`---also referred to as just `gas` or `gasLimit`---is the maximum amount of gas the transaction may use. The `GASPRICE` is the fee in wei the sender will pay per unit of gas consumed. Essentially, the fees are a limitation on the Turing-completeness. While the language is Turing-complete, the execution of the program is limited in its number of steps. In essence, fees are not only a part of the incentive mechanism but are also an anti-spam measure as every extra transaction is a burden on everyone in the network, and it would be effectively free to grief the network if there were no fees. A computational step costs roughly one unit of gas. This is not exact as some steps "cost higher amounts of gas because they are more computationally expensive, or increase the amount of data that must be stored as part of the state" \citep{buterin2013whitepaper}. A cost of five units of gas per byte is also applied to all transactions. @@ -27,19 +27,19 @@ There are two types of accounts on the Ethereum network, externally owned accoun ## Transactions And Messages -Ethereum makes a distinction between a transaction and a message. A transaction is a signed data packet only emitted from a regular account. This packet contains the address of a recipient, a signature to identify the sender, the amount of ether sent from the sender to the recipient a data field---which is optional and thus may be empty---and both the gas price and the gas limit---whose meanings are explained in section \ref{computing-fees}. +Ethereum makes a distinction between a transaction and a message. A transaction is a signed data packet only emitted from a regular account. This packet contains the address of a recipient, a signature to identify the sender, the amount of ether sent from the sender to the recipient, a data field---which is optional and thus may be empty---and both the gas price and the gas limit---whose meanings are explained in section \ref{computing-fees}. -A message is defined as a "virtual objects that are never serialized and exist only in the Ethereum execution environment" \citep{buterin2013whitepaper}. A message contains the sender and recipient, the amount of ether transfer with the message from the sender to the recipient, an optional potentially empty data field, and a gas limit. +Messages are defined as "virtual objects that are never serialized and exist only in the Ethereum execution environment" \citep{buterin2013whitepaper}. A message contains the sender and recipient, the amount of ether transferred with the message from the sender to the recipient, an optional---potentially empty---data field, and a gas limit. Transactions and messages are very similar. The difference is that a transaction comes from a regular account only and a message comes from a contract. A transaction can call a function of a contract which in turn can create a message and call another function, either on itself or another contract, using the `CALL` and `DELEGATECALL` opcodes. The gas used for messages comes from the transaction which triggered the call. ## The Ethereum Virtual Machine -Ethereum is a decentralised computing platform. In other words alongside a blockchain, Ethereum provides a Turing-complete language and the \gls{evm}, a virtual machine able to interpret and execute code. This code "is written in a low-level, stack-based bytecode language, referred to as "Ethereum virtual machine code" or "EVM code"\citep{buterin2013whitepaper}. This bytecode is represented by a series of bytes. The execution of code consists of first setting an instruction pointer at the beginning of the bytecode sequence, next process the operation at the current location of the point, and last increment the instruction pointer to the next byte. Those steps repeated forever until either the end of the bytecode sequence is reached, an error is raised, or a `STOP` or `RETURN` instruction is executed. +Ethereum is a decentralised computing platform. In other words alongside a blockchain, Ethereum provides a Turing-complete language and the \gls{evm}, a virtual machine able to interpret and execute code. This code "is written in a low-level, stack-based bytecode language, referred to as 'Ethereum virtual machine code' or 'EVM code'" \citep{buterin2013whitepaper}. This bytecode is represented by a series of bytes. The execution of code consists of first setting an instruction pointer at the beginning of the bytecode sequence, next process the operation at the current location of the point, and lastly increment the instruction pointer to the next byte. Those steps repeat forever until either the end of the bytecode sequence is reached, an error is raised, or a `STOP` or `RETURN` instruction is executed. -The operations can perform computations and interact with data. There are three kinds of mediums to store data. First, there is a stack. This a commonly known abstract data type in computer science. Data can be added by using a push operation which adds the data on top of the stack. Mutually, the data can then be removed with a pop operation which removes and returns the data from the top of the stack. Mainly, the stack is known as a \gls{lifo} data structure meaning the last value pushed (added) is the first value popped (taken). Secondly, there is a memory, which is an ever-expandable array of bytes. Those kinds of storage are both non-persistent storage. Within the context of Ethereum, this translates to this data only being available within the call or transaction and not being permanently stored on the blockchain. The third and last kind of storage is commonly referred to as "storage" is a permanent key/value store intended for long-term storage. +The operations can perform computations and interact with data. There are three kinds of medium to store data. First, there is a stack. This is a commonly known abstract data type in computer science. Data can be added by using a push operation which adds the data on top of the stack. Mutually, the data can then be removed with a pop operation which removes and returns the data from the top of the stack. Mainly, the stack is known as a \gls{lifo} data structure meaning the last value pushed (added) is the first value popped (taken). Secondly, there is a memory, which is an ever-expandable array of bytes. Both of these kinds of storage are non-persistent storage. Within the context of Ethereum, this translates to this data only being available within the call or transaction and not being permanently stored on the blockchain. The third and last kind of storage, commonly referred to as "storage", is a permanent key/value store intended for long-term storage. -In addition to those types of storage, the code may access the block header data, and the incoming transaction's sender address, value, and data fields. +In addition to these types of storage, the code may access the block header data, and the incoming transaction's sender address, value, and data fields. ## Solidity @@ -53,14 +53,14 @@ All of the contract code written for this thesis is written in Solidity and take ### State Variables -State variables are variables whose values are permanently stored with the contract, i.e. the state variables are located in the storage. The state variables are part of the state of the contract and transaction---which have to pay gas---can modify the state of the contract by executing code which modifies those state variables. +State variables are variables whose values are permanently stored with the contract, i.e. the state variables are located in the storage. The state variables are part of the state of the contract and transactions---which have to pay gas---can modify the state of the contract by executing code which modifies those state variables. ### Function Modifiers Function modifiers are specific functions associated with the regular functions of a contract. The modifiers are called before the actual function and thus have the ability to change the behaviour of the function. They are very popular to provide access-control to functions which use should be limited according to specific conditions. \begin{minipage}{\linewidth}\centering -\lstinputlisting[caption={[OpenZepplin's implementation of the \texttt{onlyOwner} modifier]OpenZepplin's implementation of the \texttt{onlyOwner} modifier which restrict the access to the owner of the contract.},label=lst:OZOnlyOwner,language=Solidity]{lst/onlyOwner.sol} +\lstinputlisting[caption={[OpenZepplin's implementation of the \texttt{onlyOwner} modifier]OpenZepplin's implementation of the \texttt{onlyOwner} modifier which restricts the access to the owner of the contract.},label=lst:OZOnlyOwner,language=Solidity]{lst/onlyOwner.sol} \end{minipage} The listing \ref{lst:OZOnlyOwner} shows the implementation of a modifier which uses `require` to revert the transaction if the condition is not met and the strange `_;` syntax which is replaced with the bytecode of the function the modifier is associated with during the call of the actual function. @@ -84,6 +84,10 @@ View functions in Solidity are defined as functions which do not modify the stat Note that the solidity compiler will automatically generate getter functions for public state variables. These are view functions with the same names as theses variables returning the value of the state variables. For example in the listing \ref{lst:owner}, the Solidity compiler will generate a getter named `owner()` for the public state variable `owner`. +\begin{minipage}{\linewidth}\centering +\lstinputlisting[caption={[Centralised administrator contract]Centralised administrator contract, example from the Ethereum Foundation website.},label=lst:owner,language=Solidity]{lst/admin.sol} +\end{minipage} + ### The `require` Instruction The Solidity instruction `require` reverts the transaction if its parameter is false and continues the execution if the parameter is true. Most commonly, a condition is evaluated and passed as a parameter to `require`. If the condition is false, `require` will call the `REVERT` \gls{evm} opcode which stops the execution of the transaction without consuming all of the gas and reverts the state changes. @@ -104,14 +108,14 @@ Note that the attributes---including `msg.sender` and `msg.value`---can change f ### Fallback Function -Every contract is allowed to have at most one unnamed function which is referred to as the "fallback function". This fallback function is called if the transaction contains no data---which contains the id of the function to call---or if the id provided in the data does not match any function of the contract. +Every contract is allowed to have at most one unnamed function which is referred to as the "fallback function". This fallback function is called if the transaction contains no data---which contains the identifier of the function to call---or if the id provided in the data does not match any function of the contract. The fallback function is also limited to only 2300 gas for its execution, which is known as the gas stipend. ## Visualising Transactions And Interactions -There is no standard notation---specific to Ethereum---to visualise the interaction between different entities or to illustrate a transaction between multiple parties. Despite, there exists more generic notations such as the \acrfull{uml} which is well known by virtually every software engineer and includes sequence diagrams to depict the interactions between various entities over time. +There is no standard notation---specific to Ethereum---to visualise the interaction between different entities or to illustrate a transaction between multiple parties. Despite this, there exists more generic notations such as the \acrfull{uml} which is well known by virtually every software engineer and includes sequence diagrams to depict the interactions between various entities over time. In this thesis, we will use a customised version of \gls{uml} sequence diagrams to illustrate transactions and calls between addresses---both regular accounts and contracts---on the Ethereum Network. This modified version of sequence diagrams includes colouring of the messages exchanged and activation boxes to indicate the type of communication taking place. Specifically, off-chain communications are painted green, Ethereum transactions for which the sender must pay gas and which are asynchronous in nature, are coloured in red and finally calls, either as part of a transaction or on their own are represented in blue. @@ -121,11 +125,7 @@ Finally, to help with clarity, some of the parameters of functions may be omitte As an example let us consider the example code for a centralised administrator from the Ethereum website \citep{ethowner}---shown in the listing \ref{lst:owner}. -\begin{minipage}{\linewidth}\centering -\lstinputlisting[caption={[Centralised administrator contract]Centralised administrator contract, example from the Ethereum Foundation website.},label=lst:owner,language=Solidity]{lst/admin.sol} -\end{minipage} - -The figure \ref{fig:uml} illustrates a modified \gls{uml} diagram between two regular accounts---Alice and Bob---and the contract Carlos implementing the centralised administrator---whose code is written in the listing \ref{lst:owner}. In the depicted scenario, Alice is the current owner of Carlos. She begins by making a transaction which calls `transferOwnership` on Carlos which first verifies if Alice is the current owner thanks to the `onlyOwner` modifier and then update the state of Carlos to set Bob as the new owner of the contract. +Figure \ref{fig:uml} illustrates a modified \gls{uml} diagram between two regular accounts---Alice and Bob---and the contract Carlos implementing the centralised administrator---whose code is written in listing \ref{lst:owner}. In the depicted scenario, Alice is the current owner of Carlos. She begins by making a transaction which calls `transferOwnership` on Carlos which first verifies if Alice is the current owner thanks to the `onlyOwner` modifier and then update the state of Carlos to set Bob as the new owner of the contract. \input{fig/umlexample} diff --git a/chapters/03-tokens.md b/chapters/03-tokens.md index d6a579c..264b683 100644 --- a/chapters/03-tokens.md +++ b/chapters/03-tokens.md @@ -16,9 +16,9 @@ Dai Stablecoin\footnotemark, Gnosis\footnotemark or Augur (Reputation)\footnotem Tokens are the result of certain types of smart contracts which maintain a ledger on top the Ethereum blockchain and with the goal of acting like a "coin". Internally this smart contract holds a mapping from addresses to balances. The balances are expressed with unsigned integers. This design choice is similar to ethers which themselves internally are expressed as wei. It also comes from the fact that the Solidity language does not fully support floating point numbers. The smart contract then exposes functions to let user acquire tokens---known as minting---destroy tokens---known as burning---and most importantly to let token holders transfer their tokens. From a business perspective, a token is the possibility for a company to issue shares, securities or any form of accounting unit; even their own currency which the company has control over. Many companies offer services which can be purchased only using their tokens. Based on this economic principle, comes the neologism: Initial Coin Offering or ICO. An ICO is a process where a company will sell a limited quantity of their tokens for a fixed price before their product is finalised. This is for a startup a mean to raise funds on their own without having to go through the vetting process traditionally required by venture capitals and banks. An ICO is usually done through a smart contract which will trade tokens for ethers at specific times and for a certain price. This allows the startup to raise some capital and the investors to potentially gain a profit by buying tokens at a discount. There is, of course, the risk that the startup fails and the tokens become worthless. -## Standardization +## Standardisation -With many startups creating tokens to make initial coin offerings, building \glspl{dapp} and providing various services both on-chain and off-chain to use these tokens; the need for a standardised way to interact with said tokens arose rapidly. A standard for tokens allows wallets---holding a user's private key---to easily let the user interact with both their ether and an extensive collection of their tokens easily. It allows any smart contract---whether it is a wallet or a \gls{dapp}---to effortlessly receive, hold and send tokens. Smart contract are immutable which makes them notoriously hard to update. Typically, any update is done by replacing an existing smart contract with a new one at a different address with a copy of the data. Any off-chain infrastructure must then point to the address of this new contract. Updating a smart contract to handle a different way of interacting with a new and specific token would be an impossible task. Having a standard which defines an interface to interact with tokens allows \glspl{dapp} and wallet to instantly be compatible with any existing and future token which complies with the standard. +With many startups creating tokens to make initial coin offerings, building \glspl{dapp} and providing various services both on-chain and off-chain to use these tokens; the need for a standardised way to interact with said tokens arose rapidly. A standard for tokens allows wallets---holding a user's private key---to easily let the user interact with both their ether and an extensive collection of their tokens easily. It allows any smart contract---whether it is a wallet or a \gls{dapp}---to effortlessly receive, hold and send tokens. Smart contracts are immutable which makes them notoriously hard to update. Typically, any update is done by replacing an existing smart contract with a new one at a different address with a copy of the data. Any off-chain infrastructure must then point to the address of this new contract. Updating a smart contract to handle a different way of interacting with a new and specific token would be an impossible task. Having a standard which defines an interface to interact with tokens allows \glspl{dapp} and wallets to instantly be compatible with any existing and future token which complies with the standard. ## Ethereum Improvement Proposals And Ethereum Request For Comments @@ -28,11 +28,11 @@ Blockchain projects in general, including Ethereum, are ecosystems which tend to ## Ethereum Token Standards -Currently, there is only one approved token standard, ERC20 described in section \ref{erc20-token-standard}. There are however many standard proposals which build on ERC20, either by suggesting modifications to ERC20 or adding new features to it. There is a couple of standard proposals---including ERC777---which define entirely new token standards. The table \ref{tbl:standards} defines the various proposals. +Currently, there is only one approved token standard, ERC20 described in section \ref{erc20-token-standard}. There are however many standard proposals which build on ERC20, either by suggesting modifications to ERC20 or adding new features to it. There are a couple of standard proposals---including ERC777---which define entirely new token standards. The table \ref{tbl:standards} defines the various proposals. \input{fig/genealogical_tree} -Virtually every proposal finds its roots in the ERC20 standard. Most proposals are extensions of ERC20 and try to either resolve one of its shortcomings or limitations or to add a new feature. Furthermore many proposals are somewhat "stale". Specifically, they have been created some time ago and either do not have any recent comment or have not been updated by their authors in quite some time. The process to submit an \gls{eip} has changed multiple times. First, an issue or pull request had to be submitted to the \glspl{eip} repository, and only once the proposal was published as a document in the repository, the standard would be accepted. This changed recently where drafts could be merged and updated automatically by their authors' thanks to an automatic merging bot. Standard proposals today are merged as drafts and available on the \citepalias[website of the][]{eipssite}. This change in mechanisms is an easy way to detect stale proposals. The figure \ref{fig:genealogicaltree} shows the genealogical tree of the token-related standards. Green nodes are accepted standards, red nodes are rejected or withdrawn standards, and blue nodes are draft standards. Nodes with dashed borders are standard proposals which have not merged a document---therefore are not available on the \citepalias[website of the][]{eipssite}---and can be considered as stale or still at a very early stage. +Virtually every proposal finds its roots in the ERC20 standard. Most proposals are extensions of ERC20 and try to either resolve one of its shortcomings or limitations or to add a new feature. Furthermore many proposals are somewhat "stale". Specifically, they have been created some time ago and either do not have any recent comment or have not been updated by their authors in quite some time. The process to submit an \gls{eip} has changed multiple times. First, an issue or pull request had to be submitted to the \glspl{eip} repository, and only once the proposal was published as a document in the repository, the standard would be accepted. This changed recently where drafts could be merged and updated automatically by their authors' thanks to an automatic merging bot. Standard proposals today are merged as drafts and available on the \citepalias[website of the][]{eipssite}. This new process makes it easier to detect stale proposals. The figure \ref{fig:genealogicaltree} shows the genealogical tree of the token-related standards. Green nodes are accepted standards, red nodes are rejected or withdrawn standards, and blue nodes are draft standards. Nodes with dashed borders are standard proposals which have not merged a document---therefore are not available on the \citepalias[website of the][]{eipssite}---and can be considered as stale or still at a very early stage. \input{fig/standards_table} diff --git a/chapters/04_token_standards.md b/chapters/04_token_standards.md index 3db4e1d..9f7d67d 100644 --- a/chapters/04_token_standards.md +++ b/chapters/04_token_standards.md @@ -2,9 +2,9 @@ ## The First Token Standard -The ERC20 standard was created on November 19th 2015 as listed on the EIPs website under the ERC track \cite[see][ERC track]{eipssite}. A standard for tokens must define a specific interface and expected behaviours when interacted with by regular accounts and contracts. This allows wallets, \glspl{dapp} and services to interact with any token easily. It defines a simple interface which lets anyone transfer his or her tokens to other address, check a balance, the total supply of tokens and such. Specifically it defines nine functions a token must implement: `name`, `symbol`, `decimals`, `totalSupply`, `balanceOf`, `transfer`, `transferFrom`, `approve`, `allowance` as well as two events which must be fired in particular cases: `Transfer` and `Approval`. +The ERC20 standard was created on November 19th 2015 as listed on the EIPs website under the ERC track \cite[see][ERC track]{eipssite}. A standard for tokens must define a specific interface and expected behaviours when interacted with by regular accounts and contracts. This allows wallets, \glspl{dapp} and services to interact with any token easily. It defines a simple interface which lets anyone transfer his or her tokens to other addresses, check a balance, the total supply of tokens and such. Specifically it defines nine functions a token must implement: `name`, `symbol`, `decimals`, `totalSupply`, `balanceOf`, `transfer`, `transferFrom`, `approve`, `allowance` as well as two events which must be fired in particular cases: `Transfer` and `Approval`. -The `name` and `symbol` are optional functions which fairly basic and easy to understand. They return the name and the symbol or abbreviation of the token. Considering the Aragon token as an example, the `name` function returns the string `Aragon Network Token` and the `symbol` functions returns `ANT`. Another somewhat harder to understand optional function is `decimals`. This function returns the number of decimals used by the token and thus defines what transformation should be applied to any amount of tokens before being displayed to the user or communicated to the token contract. As previously explained, the balances and amounts of tokens handled by the token contracts are (256 bits) unsigned integers. Therefore the smallest fractional monetary unit is one. For some---or many---tokens, it makes more sense to allow smaller fractions. The `decimals` function returns the number of decimals to apply to any amount passed to or returned by the token contract. Most tokens follow Ether---which uses eighteen decimals---and use eighteen decimals as well. Another decimals value used is zero. A token with zero decimals can make sense when a token represent an entity which is not divisible---such as a physical entity. Altogether those functions are optional and purely cosmetic. The most important function being `decimals` as any misuse will show an incorrect representation of tokens and thus of value. +The `name` and `symbol` are optional functions which are fairly basic and easy to understand. They return the name and the symbol or abbreviation of the token. Considering the Aragon token as an example, the `name` function returns the string `Aragon Network Token` and the `symbol` functions returns `ANT`. Another somewhat harder to understand optional function is `decimals`. This function returns the number of decimals used by the token and thus defines what transformation should be applied to any amount of tokens before being displayed to the user or communicated to the token contract. As previously explained, the balances and amounts of tokens handled by the token contracts are (256 bits) unsigned integers. Therefore the smallest fractional monetary unit is one. For some---or many---tokens, it makes more sense to allow smaller fractions. The `decimals` function returns the number of decimals to apply to any amount passed to or returned by the token contract. Most tokens follow Ether---which uses eighteen decimals---and use eighteen decimals as well. Another decimals value used is zero. A token with zero decimals can make sense when a token represents an entity which is not divisible---such as a physical entity. Altogether these functions are optional and purely cosmetic. The most important function being `decimals` as any misuse will show an incorrect representation of tokens and thus of value. The `totalSupply` and `balanceOf` are also `view` functions. Simply put, they do not modify the state of the token contract, but only return data from it. This behaviour is similar to what one can expect from getter functions in object-oriented programming. @@ -14,7 +14,7 @@ The `balanceOf` function takes an address as a parameter and returns the number ## Transferring ERC20 Tokens -The `transfer` and `transferFrom` functions are used to move tokens across addresses. The `transfer` function takes two parameters, first the address of the recipient and secondly the number of tokens to transfer. When executed, the balance of the address which called the function is debited, and the balance of the address specified as the first parameter is credited the number of token specified as the second parameter. Of course, before updating any balance, some checks are performed to ensure the debtor has enough funds. +The `transfer` and `transferFrom` functions are used to move tokens across addresses. The `transfer` function takes two parameters, first the address of the recipient and secondly the number of tokens to transfer. When executed, the balance of the address which called the function is debited, and the balance of the address specified as the first parameter is credited the number of tokens specified as the second parameter. Of course, before updating any balance, some checks are performed to ensure the debtor has enough funds. \input{fig/transfer.tex} @@ -22,7 +22,7 @@ As seen on figure \ref{fig:erc20transfer} when performing a transfer, the spende \pagebreak -Examples of the implementation details to update the balances are shown in listings \ref{lst:OZTransfer} and \ref{lst:TronixTransfer}. +Examples of implementations to update the balances are shown in listings \ref{lst:OZTransfer} and \ref{lst:TronixTransfer}. \begin{minipage}{\linewidth}\centering \lstinputlisting[caption={OpenZepplin's implementation of ERC20's transfer function.},label=lst:OZTransfer,language=Solidity]{lst/oztransfer.sol} @@ -32,7 +32,7 @@ The implementation of the `transfer` function in the listing \ref{lst:OZTransfer The first check ensures that the token holder---here referred to as the sender---does not try to send a number of tokens higher than its balance. The variable `msg.sender` is a special value in Solidity which holds the address of the sender of the message for the current call. In other words, `msg.sender` is the address which called the `transfer` function. -The second checks ensure that the recipient---defined in the parameter `_to`---is not the \gls{0x}. The notation `address(0)` is a cast of the number literal zero to a 20 bits address. The \gls{0x} is a special address. Sending tokens to the \gls{0x} is assimilated to burning the tokens. Ideally, the balance of the \gls{0x} should not be updated in this case. This is not always the case, and the \gls{0x} holds tokens such as Tronix. A quick look at their implementation shown in listing \ref{lst:TronixTransfer} of the transfer function shows there is no check to ensure the recipient is not the \gls{0x}. Note that the `validAddress` modifier only verifies the `msg.sender` or in other words, the spender, not the recipient. +The second checks ensure that the recipient---defined in the parameter `_to`---is not the \gls{0x}. The notation `address(0)` is a cast of the number literal zero to a 20 bytes address. The \gls{0x} is a special address. Sending tokens to the \gls{0x} is akin to burning the tokens. Ideally, the balance of the \gls{0x} should not be updated in this case. This is not always the case, and the \gls{0x} holds tokens such as Tronix. A quick look at their implementation shown in listing \ref{lst:TronixTransfer} of the transfer function shows there is no check to ensure the recipient is not the \gls{0x}. Note that the `validAddress` modifier only verifies the `msg.sender` or in other words, the spender, not the recipient. The `transferFrom` function is the second function available to transfer tokens between addresses. It's use is depicted in figure \ref{fig:erc20transferFrom}. It takes three parameters the debtor address, the creditor address and the number of tokens to transfer. @@ -44,7 +44,7 @@ The reason for the existence of this second function to transfer tokens is for c \input{fig/transferFrom.tex} -Consider an ERC20 token, a regular user Alice and a contract Carlos. Alice wishes to send five tokens to Carlos to purchase a service offered by Carlos. If she uses the `transfer` function, the contract will never be made aware of the five tokens it received and will not activate the service for Alice. Instead, Alice can call `approve` to allow Carlos to transfer five of Alice's tokens. Anyone can then call `allowance` to check that Alice did allow Carlos to transfer the five tokens from Alice's balance. Alice can then call a public function of Carlos or notify off-chain the maintainers of the Carlos contract such that they can call the function. This function of Carlos can call the `transferFrom` function of the token contract to receive the five tokens from Alice. +Consider an ERC20 token, a regular user Alice and a contract Carlos. Alice wishes to send five tokens to Carlos to purchase a service offered by Carlos. If she uses the `transfer` function, the contract will never be made aware of the five tokens it received and will not activate the service for Alice. Instead, Alice can call `approve` to allow Carlos to transfer five of Alice's tokens. Anyone can then call `allowance` to check that Alice did allow Carlos to transfer the five tokens from Alice's balance. Alice can then call a public function of Carlos or notify the maintainers of the Carlos contract off-chain such that they can call the function. This function of Carlos can call the `transferFrom` function of the token contract to receive the five tokens from Alice. \pagebreak @@ -54,25 +54,25 @@ The internals of the `transferFrom` function is similar to those of the `transfe \lstinputlisting[caption={OpenZepplin's implementation of ERC20's transferFrom function.},label=lst:OZTransferFrom,language=Solidity]{lst/oztransferfrom.sol} \end{minipage} -Of course, the allowed amount is updated as well for a successful transfer. The listing \ref{lst:OZTransferFrom} shows OpenZepplin's implementation of the function, which performs the allowance check on line 16 and the update of the allowance on line 21. The balances update is similar to the transfer function from listing \ref{lst:OZTransfer}, except that the parameter `_from` is used instead of `msg.sender` as the debtor. +Of course, the allowed amount is updated for a successful transfer. The listing \ref{lst:OZTransferFrom} shows OpenZepplin's implementation of the function, which performs the allowance check on line 16 and the update of the allowance on line 21. The balances update is similar to the transfer function from listing \ref{lst:OZTransfer}, except that the parameter `_from` is used instead of `msg.sender` as the debtor. ## Strengths And Weaknesses Of ERC20 Overall the ERC20 token standard was kept simple in its design. Hence the standard results in simple token contracts. This is one of the upsides of the standard. Token contracts can be kept short and simple which makes them easy and cheap to audit. This is especially important as an insecure contract may result in funds being stolen or lost from the contract and good smart contract auditors are expensive and often unavailable. -The attack described in chapter ref{erc827} and illustrated in figure \ref{fig:customcallattack} is a perfect evidence of the issues that arise when using a more complex token standard. In this specific instance, the complexity of the design contributed to a flaw not being detected in a token contract which leads to an attacker fraudulently issuing eleven million tokens. +The attack described in chapter \ref{erc827} and illustrated in figure \ref{fig:customcallattack} is perfect evidence of the issues that arise when using a more complex token standard. In this specific instance, the complexity of the design contributed to a flaw not being detected in a token contract which leads to an attacker fraudulently issuing eleven million tokens. At the other end of the spectrum, however, this translates to a higher burden on the user, applications and wallets interacting with the tokens. ### Locked Tokens -One of the most significant issues is that the sender must make a distinction between a regular account and a contract recipient when transferring tokens. There are no issues if the recipient is a regular account, `transfer` just works. Alternatively, calling `approve` with the correct amount and let the recipient call `transferFrom` is also acceptable. The \gls{ux} in this latter case is somewhat suboptimal as it requires off-chain communication, two transactions, and the recipient has to pay the gas for the second transaction. Nonetheless, the intended goal is achieved, and the transfer from the spender to the recipient is executed. +One of the most significant issues is that the sender must make a distinction between a regular account and a contract recipient when transferring tokens. There are no issues if the recipient is a regular account, `transfer` just works. Alternatively, calling `approve` with the correct amount and letting the recipient call `transferFrom` is also acceptable. The \gls{ux} in this latter case is somewhat suboptimal as it requires off-chain communication, two transactions, and the recipient has to pay the gas for the second transaction. Nonetheless, the intended goal is achieved, and the transfer from the spender to the recipient is executed. The same cannot be said if the recipient is a contract account. When using `transfer` to send tokens to a contract, the spender initiates the transfer and only communicates with the token contract the recipient is never notified---as previously shown in figure \ref{fig:erc20transfer}. The result is that while the token balance of the receiving contract is increased, that contract may never be able to use and spend the tokens it received---this situation is commonly referred to as "locked tokens". A simple proof is the Tronix contract whose `transfer` function was discussed before. A rapid look at the token balance of the Tronix contract---deployed at itself shows a balance of 5'504'504.3514 TRX as of August 8^th^ 2018. With an exchange rate of \$0.0272, this represents a value of just a little under 150,000 US dollars. By analysing the code, one can see there are no functions which would allow the contract to spend those tokens. There are of course many more similar examples of such scenarios where people sent tokens either to the token contract, or some other contract by mistake and the amounts add up quickly. ### Approval Race Condition -By abusing the \gls{abi} of ERC20, an attacker can trick its victim into approving more tokens for the attacker to spend than intended. This attack was revealed on November 29^th^ 2018. Primarily, it takes advantage of two of ERC20's functions: `approve` and `transferFrom`. Because this is an issue with the logic in the standard, all ERC20-compliant implementations are affected. This attack works as follow, as described in the original paper \citep{erc20approveattack}: +By abusing the \gls{abi} of ERC20, an attacker can trick its victim into approving more tokens for the attacker to spend than intended. This attack was revealed on November 29^th^ 2016. Primarily, it takes advantage of two of ERC20's functions: `approve` and `transferFrom`. Because this is an issue with the logic in the standard, all ERC20-compliant implementations are affected. This attack works as follow, as described in the original paper \citep{erc20approveattack}: 1. Alice allows Bob to transfer $N$ of Alice's tokens ($N>0$) by calling the `approve` function on the token smart contract, passing Bob's address and $N$ as function arguments. 2. After some time, Alice decides to change from $N$ to $M$ ($M>0$) the number of Alice's tokens Bob is allowed to transfer, so she calls the `approve` function again, this time passing Bob's address and $M$ as function arguments @@ -86,11 +86,11 @@ So, Alice's attempt to change Bob's allowance from $N$ to $M$ ($N>0$ and $M>0$) The figure \ref{fig:erc20approveattack} shows both cases of the race condition where the attack either succeeds or fails to front-run its victim. Note that in this scenario, $M < N$ which is why when the front-run fails the `transferFrom` call of Eve for $N$ tokens fails. In the case where $M > N$, the first `transferFrom` call for $N$ would succeed, and the allowance would be decreased to $M - N$ and the second `transferFrom` call of Eve for $M$ would fail. In this situation, Eve does manage to transfer some tokens, but the attack has still failed as she manages to transfer only $N$ tokens---not $N + M$ tokens which are outside her "intended approval". -### Absence Of Burning +### Crude Minting And Absence Of Burning The ERC20 standard defines the behaviour for minting new tokens. Namely, "[a] token contract which creates new tokens SHOULD trigger a Transfer event with the `_from` address set to `0x0` when tokens are created" \citep{erc20}. Unfortunately, the standard does not go further, nothing is specified regarding the balance of `0x0` or the `totalSupply` for example. -Furthermore, the standard does not contain any specification about burning tokens. Sending to the `0x0` address is commonly assumed to represent burning---not to be confused with voluntary locking where the tokens are sent to some other address made of a repeating and non-random looking pattern such as `0x1111111111111111111111111111111111111111`. While "sending to `0x`" is a perfectly reasonable abstraction to represent a burn of tokens, it needs to be clearly defined. Should the balance of `0x0` be incremented? If yes, should the total supply remain the same or should it ignore the balance of `0x0` and be decreased? Should a `Transfer` event with the `to` address set to `0x0` be emitted? Can the tokens be burned using either a `transfer` call or `transferFrom` call with `to` set to `0x0`, or should a specific function be used to burn the tokens? +Furthermore, the standard does not contain any specification about burning tokens. Sending to the `0x0` address is commonly assumed to represent burning---not to be confused with voluntary locking where the tokens are sent to some other address made of a repeating and non-random looking pattern such as `0x1111111111111111111111111111111111111111`. While "sending to `0x0`" is a perfectly reasonable abstraction to represent a burn of tokens, it needs to be clearly defined. Should the balance of `0x0` be incremented? If yes, should the total supply remain the same or should it ignore the balance of `0x0` and be decreased? Should a `Transfer` event with the `to` address set to `0x0` be emitted? Can the tokens be burned using either a `transfer` call or `transferFrom` call with `to` set to `0x0`, or should a specific function be used to burn the tokens? Out of all the questions above, most tokens tend to emit `Transfer` events with the `to` address set to `0x0`. The remaining questions are solved differently for various tokens. Multiple mutually exclusive solutions may be acceptable. However, in some cases, some solutions may be preferable over others. As an example, most of the smart contracts are written in Solidity where an uninitialised variable of type `address` has a value of zero (`0x0`). On the off-chance that the value passed as the `to` parameter to a `transfer` call is uninitialised, then if the token contract allows burning via `transfer`, this will result in an unintentional burn of the tokens. In such a scenario, it may be preferable to revert the transaction instead and expose a specific function to (explicitly) burn tokens instead. diff --git a/chapters/05-erc777.md b/chapters/05-erc777.md index 3595c99..f910f7b 100644 --- a/chapters/05-erc777.md +++ b/chapters/05-erc777.md @@ -4,7 +4,7 @@ ERC777 is a new advanced token standard for Ethereum tokens. It is the result of The standard describes three central mechanisms: sending tokens, minting tokens and burning tokens. Those mechanisms are performed by a specific role---an operator---which is also defined in the standard. These mechanisms take advantage of hooks---specific functions which are called to notify and control the debit or credit of tokens. Lastly, ERC777 includes extra constraints for backwards compatibility with ERC20. -Creating a new standard requires careful consideration. Many aspects had to be considered such as security, usability, compatibility with the existing ecosystem and backward compatibility with existing ERC20 infrastructures. All things considered, ERC777 brings many enhancements including data associated with transactions, operators, hooks and backwards-compatibility with ERC20 which address the previously mentioned considerations. +Creating a new standard requires careful consideration. Many aspects had to be considered such as security, usability, compatibility with the existing ecosystem and backward compatibility with existing ERC20 infrastructures. All things considered, ERC777 brings many enhancements including data associated with transactions, operators, hooks and backwards-compatibility with ERC20 which address the previously mentioned considerations in section \ref{strengths-and-weaknesses-of-erc20}. ## Operators @@ -12,13 +12,13 @@ An operator is a specific role which must be defined first, in order to correctl > An operator is an address which is allowed to send and burn tokens on behalf of another address \citep{erc777}. -On top of this core definition, constraints are defined and applied to all operators. First, every address is always an operator for itself. This right is not revocable. Second, any address--regular account or contract---is allowed to authorise and later revoke other addresses as their operators. Therefore some accounts may have their token funds managed by another party. Ideally, operators are intended to be contracts whose code may be audited. As a result, users can authorise a contract as their operator without the fear of the operator withdrawing all their tokens. Evidently, this implies users have previously verified the code of the operator, and they have convinced themselves that the operator code does not include vulnerabilities and is not able to withdraw all the funds. Examples of such operator contracts include payment or cheque processors, \glspl{dex}, subscription managers and automatic payment systems. +On top of this core definition, constraints are defined and applied to all operators. First, every address is always an operator for itself. This right is not revocable. Second, any address---regular account or contract---is allowed to authorise and later revoke other addresses as their operators. Therefore some accounts may have their token funds managed by another party. Ideally, operators are intended to be contracts whose code may be audited. As a result, users can authorise a contract as their operator without the fear of the operator withdrawing all their tokens. Evidently, this implies users have previously verified the code of the operator, and they have convinced themselves that the operator code does not include vulnerabilities and is not able to withdraw all the funds. Examples of such operator contracts include payment or cheque processors, \glspl{dex}, subscription managers and automatic payment systems. There are also exciting scenarios which leverage hooks to authorise regular accounts as operators whilst only letting them spend tokens according to specific rules. ### Default Operators -All addresses are automatically and irrevocably operators for themselves---and may explicitly authorise any other address(es) as operator(s). Additionally, any token contract may define a set of operators at creation/deployment time which are implicitly authorised for all token holders. This feature allows token designers to offer additional features specific to their token---with a modular design---to let their users move their funds more seamlessly/in a more integrated fashion. It is worth noting that a token contract which enables default operators would implicitly require that these operators are included in any review of the token contract. Taking inspiration from the examples of operators mentioned at the end of section \ref{operators}, if a token is used as a form of payment for subscription, the company behind the service may be interested in not only creating the token but an operator as well to directly and regularly levy the subscription fee. Since the use---and therefore the value---of the token is based on this subscription service, it is logical to authorise the subscription operator by default. Default operators can be revoked by the token holder, and a token contract must not be able to change the list of default operators after the contract is created. +All addresses are automatically and irrevocably operators for themselves---and may explicitly authorise any other address(es) as operator(s). Additionally, any token contract may define a set of operators at creation time which are implicitly authorised for all token holders. This feature allows token designers to offer additional features specific to their token---with a modular design---to let their users move their funds more seamlessly, in a more integrated fashion. It is worth noting that a token contract which enables default operators would implicitly require that these operators are included in any review or audit of the token contract. Taking inspiration from the examples of operators mentioned at the end of section \ref{operators}, if a token is used as a form of payment for subscription, the company behind the service may be interested in not only creating the token but creating an operator as well to directly and regularly levy the subscription fee. Since the use---and therefore the value---of the token is based on this subscription service, it is logical to authorise the subscription operator by default. Default operators can be revoked by the token holder, and a token contract must not be able to change the list of default operators after the contract is created. ### Authorising And Revoking Operators @@ -46,9 +46,9 @@ Sending tokens to a regular account will never result in locked tokens, providin One essential aspect is where those hooks are located. One approach is to have those hook functions located at the recipient, but this has two significant drawbacks. First, the recipient must then be a contract to implement the hooks---hence regular accounts could not use hooks. Secondly, existing contracts do not implement the hooks and could not receive ERC777 tokens. -The approach used in ERC777 is to use a registry to lookup the address of the contract implementing the hook for a given recipient. This approach has many advantages over the previously mentioned one. Primarily, all addresses, even regular accounts, can use the registry to register a contract implementing the hook on their behalf. Second, this means that existing contracts can also register hooks via a proxy contract which implements the hook on their behalf. Essentially this means that an account or an already deployed contract can just deploy a new contract to implement these hooks on their behalf. +The approach used in ERC777 is to use a registry to lookup the address of the contract implementing the hook for a given recipient. This approach has many advantages over the previously mentioned one. Primarily, all addresses, even regular accounts, can use the registry to register a contract implementing the hook on their behalf. Second, this means that existing contracts can also register hooks via a proxy contract which implements the hook on their behalf. Essentially this means that an account or an already deployed contract can just deploy a new contract to implement these hooks on their behalf. This is vital for \glspl{multisig} which can hold large amounts of ether and tokens and may not want to move all the funds to a new wallet. -ERC777 relies upon this registry which had to be created since there was no suitable registry existing as explained in chapter \ref{erc820-pseudo-introspection-registry-contract}. The registry was created to be used in ERC777, but it is not itself part of the ERC777 standard. Instead, the registry is specified in a separate standard, ERC820: A Pseudo-introspection Registry Contract \citep{erc820}, outlined in chapter \ref{erc820-pseudo-introspection-registry-contract}. ERC777 then relies upon ERC820. The advantage of dissociating the token standard from the registry is that first it can be used by other standards and secondly it offers a good separation of concerns. Any developer wishing to work with ERC777---whether it is to implement a token or any kind of \gls{dapp}---will need to thoroughly understand ERC777 in order to deploy code which is compliant. In comparison, the ERC820 registry should already be deployed, and the developer only needs to understand how to interact with it properly. +ERC777 relies upon this registry which had to be created since there was no suitable registry existing as explained in chapter \ref{erc820-pseudo-introspection-registry-contract}. The registry was created to be used by ERC777, but it is not itself part of the ERC777 standard. Instead, the registry is specified in a separate standard, ERC820: A Pseudo-introspection Registry Contract \citep{erc820}, outlined in chapter \ref{erc820-pseudo-introspection-registry-contract}. ERC777 then relies upon ERC820. The advantage of dissociating the token standard from the registry is that first it can be used by other standards and secondly it offers a good separation of concerns. Any developer wishing to work with ERC777---whether it is to implement a token or any kind of \gls{dapp}---will need to thoroughly understand ERC777 in order to deploy code which is compliant. In comparison, the ERC820 registry should already be deployed, and the developer only needs to understand how to interact with it properly. ## Sending Tokens @@ -83,14 +83,14 @@ So far the scenarios focused on the `tokensReceived` hook which is the only requ ## Minting Tokens -Minting is the technical term referring to the creation of new tokens---it originates from the minting of metal coins. The creation of tokens in Ethereum is particular to the asset represented by the token and involves various mechanisms accordingly. Some tokens have a fixed amount of tokens minted at creation time---often referred to as initial supply---which is given to the user(s) controlling the contract. Other tokens have an issuance model which mint tokens according to signed messages provided by a trusted third party. +Minting is the technical term referring to the creation of new tokens---it originates from the minting of metal coins. The creation of tokens in Ethereum is particular to the asset represented by the token and involves various mechanisms accordingly. Some tokens have a fixed amount of tokens minted at creation time---often referred to as pre-mining---where the tokens are given to the user(s) controlling the contract. Other tokens have an issuance model which mint tokens according to signed messages provided by a trusted third party. -The figures \ref{lst:alismint} and \ref{lst:statusmint1}, \ref{lst:statusmint2}, \ref{lst:statusmint3} illustrate two widely different minting process. The Alis token (figure \ref{lst:alismint}) uses a minting process inspired by OpenZepplin's Crowdsale logic \footnote{\href{https://github.com/OpenZeppelin/openzeppelin-solidity/blob/master/contracts/crowdsale/Crowdsale.sol\#L83}{github.com/OpenZeppelin/openzeppelin-solidity/contracts/crowdsale/Crowdsale.sol\#L83}}. This is a very simple logic where tokens are issued to the pro-rata of ether sent to the contract. After passing some checks (on lines 2 to 5), line 15 computes the amount of tokens to mint based on the amount of ether sent (in wei). On line 18 the contract updates the amount of ether received. Finally on line 20, the tokens are minted for the beneficiary. +The listings \ref{lst:alismint} and \ref{lst:statusmint1}, \ref{lst:statusmint2}, \ref{lst:statusmint3} illustrate two widely different minting process. The Alis token (listing \ref{lst:alismint}) uses a minting process inspired by OpenZepplin's Crowdsale logic \footnote{\href{https://github.com/OpenZeppelin/openzeppelin-solidity/blob/master/contracts/crowdsale/Crowdsale.sol\#L83}{github.com/OpenZeppelin/openzeppelin-solidity/contracts/crowdsale/Crowdsale.sol\#L83}}. This is a very simple logic where tokens are issued to the pro-rata of ether sent to the contract. After passing some checks (on lines 2 to 5), line 15 computes the amount of tokens to mint based on the amount of ether sent (in wei). On line 18 the contract updates the amount of ether received. Finally on line 20, the tokens are minted for the beneficiary. \begin{minipage}{\linewidth}\centering \lstinputlisting[caption={Minting process for the Alis Token Crowdsale},label=lst:alismint,language=Solidity]{lst/alismint.sol} \end{minipage} -The status token uses a much more complex minting process, displayed in figures \ref{lst:statusmint1}, \ref{lst:statusmint2} and \ref{lst:statusmint3}. The user must call the `proxyPayment` function (figure \ref{lst:statusmint1}, line 1) which detects if the buyer has a guaranteed amount of tokens to purchase. If not the purchase continues with the `buyNormal` function (figure \ref{lst:statusmint1}, line 6 and figure \ref{lst:statusmint2}, line 1). Next, the purchase process has an anti-spam policy which gets updated on line 15 (figure \ref{lst:statusmint2}), and the maximum amount of ether a person can invest is computed based on a dynamic ceiling (figure \ref{lst:statusmint1}, line 18). Finally the purchase is processed (figure \ref{lst:statusmint2}, line 28 and figure \ref{lst:statusmint3}, line 2) and the new tokens are minted for the buyer on line 8 (figure \ref{lst:statusmint3}). +The status token uses a much more complex minting process, displayed in listings \ref{lst:statusmint1}, \ref{lst:statusmint2} and \ref{lst:statusmint3}. The user must call the `proxyPayment` function (listing \ref{lst:statusmint1}, line 1) which detects if the buyer has a guaranteed amount of tokens to purchase. If not the purchase continues with the `buyNormal` function (listing \ref{lst:statusmint1}, line 6 and listing \ref{lst:statusmint2}, line 1). Next, the purchase process has an anti-spam policy which gets updated on line 15 (listing \ref{lst:statusmint2}), and the maximum amount of ether a person can invest is computed based on a dynamic ceiling (listing \ref{lst:statusmint2}, line 18). Finally the purchase is processed (listing \ref{lst:statusmint2}, line 28 and listing \ref{lst:statusmint3}, line 2) and the new tokens are minted for the buyer on line 8 (listing \ref{lst:statusmint3}). \begin{minipage}{\linewidth}\centering \lstinputlisting[firstline=1,lastline=10,caption={Minting process for the Status token (\texttt{proxyPayment} function).},label=lst:statusmint1,language=Solidity]{lst/statusmint.sol} @@ -107,12 +107,12 @@ The status token uses a much more complex minting process, displayed in figures On the one hand, because of these varying issuance models, it is hard to provide a standardised process which creates tokens. Therefore, this is intentionally left out of ERC777. On the other hand, ERC777 does define a set of rules which must be respected when minting new tokens. These rules include: 1. The total supply must be updated to reflect the mint. -2. The tokens must be minted for an account whose balance must be increased +2. The tokens must be minted for an account whose balance must be increased. 3. A `Minted` event must be fired. 4. The `tokensReceived` hook must be called if present. 5. If the recipient is a contract which does not have a `tokensReceived` hook, the minting process must revert. -The rationale for enforcing minting originates from reading various ERC20 token contracts and see the differences in implementation for each token. From the recipients' point of view, minting and sending tokens is similar. Therefore it is critical to have a well-defined and predictable process when an account receive tokens whether they come from a send or minting. +The rationale for enforcing minting rules originates from reading various ERC20 token contracts and seeing the differences in implementation for each token. From the recipients' point of view, minting and sending tokens is similar. Therefore it is critical to have a well-defined and predictable process when an account receives tokens whether they come from a send or mint. The main difference with respect to send is, with minting, the `from` address is the \gls{0x} which indicates the tokens are newly created. The notion of an operator is also slightly different for minting. As mentioned in chapter \ref{operators}, an operator is an address which can spend the tokens of some account (either through sending or burning). This notion does not apply to minting as no one previously owns the minted tokens. ERC777 does not enforce any constraint on which address can mint tokens. It is up to each token to define conditions in order to restrain the minting process such that it matches the desired issuance model. For example, the minting can be entirely restricted, only allow some addresses to mint, or only allow minting in certain quantities, at certain times or if some other condition is met such as providing a signed message. These various issuances model are the reason why there is no explicit function for minting as part of the standard. @@ -120,7 +120,7 @@ The main difference with respect to send is, with minting, the `from` address is Burning tokens, similarly to minting can be specific to which asset a token represents. Some token contracts may wish never to allow burning of tokens, others may only allow some addresses to burn token, and some may allow anyone to burn tokens if specific conditions are met. Lastly, token contracts may want or need to take specific actions when tokens are burned, e.g., a token may represent a redeemable asset where the token is burned in order to redeem the asset. -Because burning involves a loss of tokens for users, similar to a send, it is essential to define a well-known and predictable behaviour as well. Furthermore, standard burning functions---similar to send---are defined to allow wallets and \glspl{dapp} to let their use burn their token easily. Moreover, if a token contract wishes not to burn any tokens, it can do so explicitly by reverting in the burn functions. +Because burning involves a loss of tokens for users, similar to a send, it is essential to define a well-known and predictable behaviour as well. Furthermore, standard burning functions---similar to send---are defined to allow wallets and \glspl{dapp} to let their users burn their token easily. Moreover, if a token contract wishes not to burn any tokens, it can do so explicitly by reverting in the burn functions. Similarly to minting, burning applies rules identical to send, but in this instance on the token holder or spender. I.e. equivalently to a regular send, an operator must be authorised to burn the tokens, and the `tokensToSend` hook of the token holder must be called, the only difference compared to a send is that the recipient---the `to` parameter---of the hook is set to the \gls{0x} when burning. Note that when burning the actual balance of the \gls{0x} must not be increased. As a side note, this constraint coupled with the constraint that sending to the \gls{0x} is forbidden, implies that it is impossible for the \gls{0x} to ever hold any ERC777 token. @@ -144,11 +144,11 @@ Specifically the `name`, `symbol`, `totalSupply` and `balanceOf` functions are k The ERC20 `decimal` function is conspicuously absent from the view functions listed above. As previously explained, a variable `decimals` value is problematic. For this reason, the `decimals` has been set at a fixed value of $18$. This renders the `decimals` function pointless. The standard only enforces the implementation of the `decimals` function when implementing an ERC20 backwards-compatible token. In this case, the `decimals` function must both be implemented and return $18$. The choice has been made to make the `decimal` function mandatory in this case, even though ERC20 considers the function optional. The rationale behind this decision comes from the lack of an explicit value defined in the ERC20 standard when the `decimals` function is not defined. Furthermore, requiring people to check whether a token is both ERC20 and ERC777 compatible---and then deduct from the ERC777 standard that the number of decimals is $18$---is both unreliable and terrible \gls{ux}. Besides, this would add an opaque constraint when implementing both standards. -The `decimals` function nonetheless showcases the need to control the partition of a token. In ERC777, a different approach is taken---based on community feedback. As explained, the number of decimals is set to $18$, but the token contract can define a `granularity`. The granularity is the smallest part of the token that's not divisible. Besides, the granularity must be set at creation time and is immutable throughout the lifetime of the token contract. Every mint, burn, and send must be a multiple of the granularity. The recommended granularity is $1$---meaning the token is fully partitionable up to eighteen decimals---unless the token has a good reason not to be fully partitionable. There are such cases, where for example a token represents a gram of precious metal in some vault. If depositing and redeeming metal for tokens is precise to the gram, then it should not be possible to send fractions of a token and the granularity must be set to $10^18$. Other examples include tokens pegged on fiat currencies such as the US dollar or the Swiss franc. The smallest denominations are, for the dollar 1 cent or 0.01 dollar and for the Swiss franc 5 cents or 0.05 francs---despite the fractional monetary unit being 0.01 franc---therefore the granularity should be $10^16$ and $5\cdot10^16$ respectively. The example of the Swiss franc showcases as well the greater flexibility of specifying a granularity instead of a decimals which does not allow to set a value such as $0.05$ as the smallest denomination but only $0.01$ or $0.1$. +The `decimals` function nonetheless showcases the need to control the partition of a token. In ERC777, a different approach is taken---based on community feedback. As explained, the number of decimals is set to $18$, but the token contract can define a `granularity`. The granularity is the smallest part of the token that's not divisible. Besides, the granularity must be set at creation time and is immutable throughout the lifetime of the token contract. Every mint, burn, and send must be a multiple of the granularity. The recommended granularity is $1$---meaning the token is fully partitionable up to eighteen decimals---unless the token has a good reason not to be fully partitionable. There are such cases, where for example a token represents a gram of precious metal in some vault. If depositing and redeeming metal for tokens is precise to the gram, then it should not be possible to send fractions of a token and the granularity must be set to $10^{18}$. Other examples include tokens pegged on fiat currencies such as the US dollar or the Swiss franc. The smallest denominations are, for the dollar 1 cent or 0.01 dollar and for the Swiss franc 5 cents or 0.05 francs---despite the fractional monetary unit being 0.01 franc---therefore the granularity should be $10^{16}$ and $5\cdot10^{16}$ respectively. The example of the Swiss franc showcases as well the greater flexibility of specifying a granularity instead of a decimals which does not allow to set a value such as $0.05$ as the smallest denomination but only $0.01$ or $0.1$. ## Compatibility -One key aspect for the ERC777 standard is to maintain backward compatibility with the older ERC20 standard. Decentralised blockchains, in general, are notoriously hard and slow to update. A well-known example is \gls{segwit}, where the regular signalling failed and a new signalling mechanism known as \gls{uasf} was created to force nodes to update to \gls{segwit}. It then took months before activating the related-code for BIP148 and enable nodes to update to \gls{segwit} \citep{uasfco, bip144, bip148}. The same applies to Ethereum ecosystem which translates to many wallets, \glspl{dex} and other \glspl{dapp} that today support ERC20 but will not support ERC777 for years to come if not ever. Hence ERC777 tokens will not be supported on existing platforms immediately, creating a problem for people wishing to sell and trade their ERC777 token. Having a token able to behave at first like an ERC20 token on those platforms alongside with the newer ERC777 behaviour is a major social and economic advantage. +One key aspect for the ERC777 standard is to maintain backward compatibility with the older ERC20 standard. Decentralised blockchains, in general, are notoriously hard and slow to update. A well-known example is \gls{segwit} in Bitcoin, where the regular signalling failed and a new signalling mechanism known as \gls{uasf} was created to force nodes to update to \gls{segwit}. It then took months before activating the related-code for BIP148 and enable nodes to update to \gls{segwit} \citep{uasfco, bip144, bip148}. The same applies to Ethereum ecosystem which translates to many wallets, \glspl{dex} and other \glspl{dapp} that today support ERC20 but will not support ERC777 for years to come if not ever. Hence ERC777 tokens will not be supported on existing platforms immediately, creating a problem for people wishing to sell and trade their ERC777 token. Having a token able to behave at first like an ERC20 token on those platforms alongside with the newer ERC777 behaviour is a major social and economic advantage. The ERC777 standard also allows some forward-compatibility. Namely, the format of the `data` and `operatorData` have been left free for future standard to define specific formats they need. The ERC820 registry (see section \ref{erc820-pseudo-introspection-registry-contract}) can also be used by a token contract to declare interfaces of future standards which it implements. @@ -163,13 +163,13 @@ Fundamentally, ERC777 allows for a token to be implemented as both an ERC20 toke This behaviour is achieved by enforcing that for any transfer of tokens (using either ERC20 or ERC777), both a `Sent` event and a `Transfer` event must be emitted. Correspondingly for minting and burning, alongside the ERC777 `Minted` and `Burned` events, an ERC20 `Transfer` event with `from` and the `to` field set to the \gls{0x} respectively. This is effectively a stricter constraint than ERC20 which only recommends---but does not require---a `Transfer` event with the `from` field set to the \gls{0x} and does not specify the concept of burning. The reason for this stricter constraint is to maintain consistency across the standards and to provide the same data regardless of which standard is used. -It should be noted that defining ERC20-related constraints in ERC777 does not conflict with ERC20. Adding the constraints to ERC20 directly is problematic as it would make existing tokens non-compliant, although it is not an issue of the constraint is expressed in ERC777 and they only apply to ERC20--ERC777 hybrid tokens and none of them exist to this date. ERC20 was intentionally defined more loosely to ensure that it would make some existing tokens retroactively compliant. With the new process for \glspl{eip}, we have the opportunity with ERC777 to clearly state that the standard is still in a draft phase and should not be used. This, of course, does not prevent people from trying to implement the standard, however breaking changes may still happen at this stage, and it is up to the token designer to make sure their implementation is compliant with the final version of ERC777 once it is finalised. Hence we do not have to worry about having to weaken the standard to support some existing and poorly-implemented token. Efforts will need to be put into ensuring the first developers correctly implement the standard, and we have already personally and privately contacted the chief technical officers or founders of some startup to inform them that their current implementation is not compliant with the latest version of ERC777. +It should be noted that defining ERC20-related constraints in ERC777 does not conflict with ERC20. Adding the constraints to ERC20 directly is problematic as it would make existing tokens non-compliant, although it is not an issue if the constraint is expressed in ERC777 and they only apply to ERC20--ERC777 hybrid tokens and none of them exist to this date. ERC20 was intentionally defined more loosely to ensure that it would make some existing tokens retroactively compliant. With the new process for \glspl{eip}, we have the opportunity with ERC777 to clearly state that the standard is still in a draft phase and should not be used. This, of course, does not prevent people from trying to implement the standard, however breaking changes may still happen at this stage, and it is up to the token designer to make sure their implementation is compliant with the final version of ERC777 once it is finalised. Hence we do not have to worry about having to weaken the standard to support some existing and poorly-implemented token. Efforts will need to be put into ensuring the first developers correctly implement the standard, and we have already personally and privately contacted the chief technical officers or founders of some startups to inform them that their current implementation is not compliant with the latest version of ERC777. ## Community And Public Reception An important factor towards the finalisation and the success of this standard is how well the community receives it. It was crucial to remain open and listen to the views, suggestions and feedback from the community. Most of the feedback has been provided publicly via comments on the ER777 issue \citep{erc777issue}. Some feedback was also given privately via email, instant messages or in person--mostly when meeting other developers at conferences and events. -When reading any comments, instead of going away with a fixed mindset and standardising our own view, we adapted the standard to accommodate for the feedback of the community. Naturally, such effort requires some filtering as not every comment can result in a change of the standard. Some of the messages were inaccurate or wrong due to a misunderstanding of the standard or lack of knowledge regarding the Ethereum ecosystem. In such a situation, it was essential for us not to ignore those comments but to reply and try to explain or clarify the topics which misinterpreted. Doing so gave us the opportunity to understand where the inaccuracies came from and clarify the standard to provide an explicit and clear message for all future readers. Some of the readers or developers who will use the standard may not be native or even proficient English speakers, and it is paramount to make the text plain enough to be understood by all and accessible to anyone. +When reading any comments, instead of going away with a fixed mindset and standardising our own view, we adapted the standard to accommodate for the feedback of the community. Naturally, such effort requires some filtering as not every comment can result in a change of the standard. Some of the messages were inaccurate or wrong due to a misunderstanding of the standard or lack of knowledge regarding the Ethereum ecosystem. In such situations, it was essential for us not to ignore those comments but to reply and try to explain or clarify the topics which were misinterpreted. Doing so gave us the opportunity to understand where the inaccuracies came from and clarify the standard to provide an explicit and clear message for all future readers. Some of the readers or developers who will use the standard may not be native or even proficient English speakers, and it is paramount to make the text plain enough to be understood by all and accessible to anyone. Some of the comments have provided valuable information which resulted in changes to the standard. An example includes how `decimals` and `granularity` is handled. Initially, the `decimals` function was part of ERC777 and similar to ERC20. Today the function has been removed from the standard, the number of decimals is fixed to $18$, and the concept of granularity and the `granularity` has been defined. @@ -244,7 +244,7 @@ The logo is designed to be simple such that it can easily scale both down to sma \fbox{\includegraphics[width=\textwidth]{ERC-777-logo-black-192px}} \caption{\centering black variant (\texttt{\#000000})} \end{subfigure} - \caption[ERC777 Logo in all color variants]{The logo in all its color variations, inspired by the colors form the Ethereum Visual Identity 1.0.0 guidelines \citep{ethvizguidelines}.} + \caption[ERC777 Logo in all colour variants]{The logo in all its colour variations, inspired by the colours from the Ethereum Visual Identity 1.0.0 guidelines \citep{ethvizguidelines}.} \label{fig:erc777logo} \end{figure} @@ -253,7 +253,7 @@ Ultimately the ERC777 logo is a blank slate which can be derived as a logo for a \begin{figure}[h] \centering \includegraphics[width=.6\textwidth]{erc777-logo-example} - \caption{Modified version of the log used in an article about ERC777 in Russian \citep{cryptofox}.} + \caption{Modified version of the logo used in an article about ERC777 in Russian \citep{cryptofox}.} \label{fig:erc777logouse} \end{figure} @@ -269,10 +269,12 @@ A blatant example of this behaviour is ERC20's `batchTransfer` security flaw \ci While the flaw in itself is idiotic and could have easily been avoided, the worst part is that not a single but over a dozen contracts have been found with this vulnerability as those contracts have been found to be essentially copy paste of the original one. Some people upon finding out about the flaw started to speak up about a vulnerability in the ERC20 standard without fully understanding that the vulnerability lied outside of the standard. This is an excellent real-life example which shows how many people lack the skills and understanding. -With the ERC777 reference implementation, we want to do more than provide some code which people copy paste and tweak. This has many issues, including improperly copy-pasting the code, considering an old (and potentially vulnerable version); copying from already copied and modified versions from other sources. We expect similar situations to happen with ERC777 as we have already witnessed absurd claims related to ERC777 including ERC777 will replace web cookies \citep{cookies} or operators is artificial intelligence on the blockchain: +With the ERC777 reference implementation, we want to do more than just provide some code which people copy paste and tweak. This has many issues, including improperly copy-pasting the code, considering an old (and potentially vulnerable version); copying from already copied and modified versions from other sources. We expect similar situations to happen with ERC777 as we have already witnessed absurd claims related to ERC777 including ERC777 will replace web cookies \citep{cookies} and surprising statements such as operators are similar to artificial intelligence on the blockchain: > Through the implementation of a program called an "operator" which works like a basic AI system that considers conditions and manages some decision-making for an account, a token holder will have the option of using a robot-like function. The operator can manage an account executing transactions and payments according to the needs of the user and takes care of all transactions for him/her while keeping maximum level of security. \flushright \citep{callwhitepaper} +Those statements illustrate the effort needed to not only clearly explain all aspects of the standard but provide easily understandable and reusable code as well. + Since Solidity supports inheritance, we decided to structure the reference implementation in separate contracts, including a base implementation of ERC777 which anyone can use. On top, we provide a second base implementation which adds support for ERC20 backwards-compatibility. This lets developers easily choose from an ERC20 backwards-compatible version or not. Finally, at the very top, we provide a reference implementation and inherit all of the base code needed to create the token. All we have to provide for the reference implementation is the custom behaviour it needs such as the minting process or overriding the default burn functions to limit the access to burn tokens. The figure \ref{fig:erc777uml} shows a \gls{uml} class diagram of the structure of the various contracts in the ERC777 reference implementation, as well as the connection with the ERC820 Registry and a sample `ExternalERC777Implementation` which uses the base ERC20-compatible token from the ERC777 reference implementation. Note that in this instance, the classes are smart contracts. Moreover, the functions with the wave underline represent `view` functions which do not modify the state of the contract. \input{fig/erc777_uml} diff --git a/chapters/06-erc820.md b/chapters/06-erc820.md index e31c255..d682db2 100644 --- a/chapters/06-erc820.md +++ b/chapters/06-erc820.md @@ -8,13 +8,13 @@ Besides, the token contract itself must register its address as implementing the ERC165 was created on January 23^rd^ 2018 and finalised on February 21^st^ the same year. It is a short and straightforward specification which allows interacting with a contract directly to detect if the contract implements a specific function. While this standard could be used for ERC777 to detect if a recipient contract implements the `tokensReceived` hook, it is very limited in that only contracts and not regular accounts can use the hook and it does not allow contracts to delegate the implementation of the hook to a proxy contract. -This standard has significant drawbacks which as it is, would automatically make ERC777 incompatible with all existing contracts, including \gls{multisig} which can hold large sums of ether and tokens and whose migration to a new contract is both is a sensible subject both from a security and a safety point of view if people are not careful. Hence it was decided a better alternative should be used. +This standard has significant drawbacks which as it is, would automatically make ERC777 incompatible with all existing contracts, including \glspl{multisig} which can hold large sums of ether and tokens and whose migration to a new contract is a sensible subject both from a security and a safety point of view if people are not careful. Hence it was decided a better alternative should be used. ## Second Attempt, The ERC672: ReverseENS Pseudo-Introspection, or standard interface detection ERC 672: ReverseENS Pseudo-Introspection, or standard interface detection \citep{erc672} was the second attempt at creating a better solution which could fulfil the primary motivation behind ERC777: Designing a system---such as a registry---that given a contract recipient, the token contract would be able to find the address of some contract---the recipient or other---which implements a function with the logic to notify the recipient contract such that the tokens are not locked. -This second attempt relied on \gls{ens} and implementing a reverse \gls{ens} lookup through a registry contract. , however, we came to realise this attempt may be overly complicated unsuitable for security reasons. Indeed, this solution relies on \gls{ens}, and interactions with \gls{ens} complicate the task of resolving the interface. Furthermore, \gls{ens} is still controlled by a multi-signature contract and theoretically with enough of the keys the system could be corrupted. +This second attempt relied on \gls{ens} and implementing a reverse \gls{ens} lookup through a registry contract. However, we came to realise this attempt may be overly complicated and unsuitable for security reasons. Indeed, this solution relies on \gls{ens}, and interactions with \gls{ens} complicate the task of resolving the interface. Furthermore, \gls{ens} is still controlled by a multi-signature contract and theoretically with enough of the keys the system could be corrupted. ## Final Attempt, The Need For The ERC820 Registry @@ -28,7 +28,7 @@ This solution offers to solve the issues of the attempts by ERC165 and ERC672. N The ERC777 standard relies on the ERC820 registry to work as intended. Without the registry, it is not possible to move tokens in a compliant way. A fair proposal would be only to submit a single standard containing the specification for tokens and the registry. However while developers will have to implement token contracts, no developer is expected to implement the registry, thus moving the registry in its own standard is a good separation of concerns. At most, they may use the provided raw transaction and broadcast it on the chain they use if the registry is not already deployed. -Furthermore, the ERC820 registry may be used independently of ERC777. Other standard or \glspl{dapp} may use it lookup implementers of specific interfaces they need. Splitting the standard in this way gives us the opportunity to make available some of the more generic work needed for ERC777 for other tasks which are not ERC777---or even tokens---specific. +Furthermore, the ERC820 registry may be used independently of ERC777. Other standard or \glspl{dapp} may use it to lookup implementers of specific interfaces they need. Splitting the standard in this way gives us the opportunity to make some of the more generic work needed for ERC777 available for other tasks which are not ERC777---or even token---specific. The ERC820 registry is developed within the scope of this thesis. While the implementation of the registry is part of the standard itself, the implementation of the registry is done separately and includes in addition to the registry contract, client contracts and test cases \citep{erc820impl}. @@ -38,7 +38,7 @@ Furthermore, the ERC820 registry is compatible with ERC165 and can act as a cach ### Caching ERC165 Interfaces -Caching concerning ERC165 is rather simple. Since the code of a contract is immutable, once a contract is deployed with a given interface, it cannot easily change its interface over time. For most contracts, the interface changes only when the contract is created and when the contract is destroyed. A few specific contracts, may enable and disable some of their functions dynamically---through a call from a specific address for example---and thus those contracts may wish to indicate that one of their interfaces is not enabled or not (i.e. that the contract implements some interface or not). In those cases, the cache needs to be manually updated, as there is no automatic cache invalidation or cache update process. This is a limitation as there is no easy or standard way to invalidate or update the cache automatically. In almost every, the interface of a contract is not dynamic and will not change over the lifespan of the contract. Ultimately, it is the responsibility of the contract changing its interface to notify the registry. Furthermore, it goes towards the explicit choice to keep the registry simple and keep the gas consumption low. The section \ref{updateerc165cache} describes the function needed to update the cache. +Caching concerning ERC165 is rather simple. Since the code of a contract is immutable, once a contract is deployed with a given interface, it cannot easily change its interface over time. For most contracts, the interface changes only when the contract is created and when the contract is destroyed. A few specific contracts, may enable and disable some of their functions dynamically---through a call from a specific address for example---and thus those contracts may wish to indicate that one of their interfaces is enabled or not (i.e. that the contract implements some interface or not). In those cases, the cache needs to be manually updated, as there is no automatic cache invalidation or cache update process. This is a limitation as there is no easy or standard way to invalidate or update the cache automatically. In almost every case, the interface of a contract is not dynamic and will not change over the lifespan of the contract. Ultimately, it is the responsibility of the contract changing its interface to notify the registry. Furthermore, it goes towards the explicit choice to keep the registry simple and keep the gas consumption low. The section \ref{updateerc165cache} describes the function needed to update the cache. ## Registry Interface @@ -70,11 +70,11 @@ If the interface is a full thirty-two bytes long, then the function will return The `setInterfaceImplementer` function is used to set the address of the contract implementing the given interface for the given address. For obvious security reasons, not every address is allowed to set an interface implementation for a given address. Only the manager of an address is allowed to set the implementation of an interface for the given address. By default, every address is its own manager, but each address can set another address as its manager using the `setManager` function described in section \ref{setmanager}. -The figure \ref{fig:erc820SelfRegister} illustrate the basic use case of Alice, a regular account, deploying a contract named Carols which sets itself as being its own implementation of the `ERC777TokensRecipient` interface---i.e. the interface for the `tokensReceived` hook required by ERC777 for contracts to receive tokens. Notice in this example, as well as all the following ones, the call to `getManager` which is used internally to check if the `setInterfaceImplementer` call originates from the actual manager (of Carlos in this instance). +The figure \ref{fig:erc820SelfRegister} illustrate the basic use case of Alice, a regular account, deploying a contract named Carlos which sets itself as being its own implementation of the `ERC777TokensRecipient` interface---i.e. the interface for the `tokensReceived` hook required by ERC777 for contracts to receive tokens. Notice in this example, as well as all the following ones, the call to `getManager` which is used internally to check if the `setInterfaceImplementer` call originates from the actual manager (of Carlos in this instance). \input{fig/erc820SelfRegister.tex} -Furthermore to avoid addresses settings random contracts as interface implementers for themselves, if the address for which to set the implementation and the address of the implementer differ, then the ERC820 registry requires for the implementer to implement the `ERC820ImplementerInterface` interface which consists of a single function: `canImplementInterfaceForAddress` detailed in section \ref{the-canimplementinterfaceforaddress-function-and-the-accept-magic-return-value} below. +Furthermore to avoid addresses setting random contracts as implementers for themselves, contracts which implement an interface for other addresses must implement the `ERC820ImplementerInterface` interface as well. This interface includes a single function: `canImplementInterfaceForAddress`, detailed in section \ref{the-canimplementinterfaceforaddress-function-and-the-accept-magic-return-value} below, to indicate which interface they are willing to provide to which address. #### The `canImplementInterfaceForAddress` Function And The "Accept Magic" Return Value @@ -140,19 +140,19 @@ It is paramount for the ERC820 registry not to be controlled by anyone. If any a There is nice---and somewhat unknown---feature of Ethereum which we can take advantage of to achieve this goal: keyless deployment using a single-use Ethereum address for which no one has the key. This method is also referred to as "Nick's method" as an acknowledgement to Nick Johnson who suggested this method for ERC820. -In order to understand how this method works, one must first comprehend how a transaction is signed in Ethereum and how the address of the sender---which is not explicitly part of the transaction---is recovered. In Ethereum, the transactions are signed using \gls{ecdsa}. To send a verified transaction, one must generate a message and sign it using their private key. This signed message is the authorisation to spend a specific amount of ethers from the account. Precisely, this signed message is made up of the following components forming an Ethereum transaction: the `to` value (i.e. the recipient), the `value` (i.e. the amount of wei to spend), the `gas` (i.e. the gas limit or the maximum amount of gas the transaction is allowed to spend), the `gasPrice`, (i.e the price of each unit of gas in wei), a nonce and the `data` field. The signing number returns an Ethereum signature composed of three numbers, commonly referred to as `r`, `s`, `v`. The numbers $r$ and $s$ are defined by the \gls{ecdsa} algorithm and define the coordinate on the curve---extremely roughly $r$ is the x-value and $s$ is the y-value of the coordinate. +In order to understand how this method works, one must first comprehend how a transaction is signed in Ethereum and how the address of the sender---which is not explicitly part of the transaction---is recovered. In Ethereum, the transactions are signed using \gls{ecdsa}. To send a verified transaction, one must generate a message and sign it using their private key. This signed message is the authorisation to spend a specific amount of ethers from the account. Precisely, this signed message is made up of the following components forming an Ethereum transaction: the `to` value (i.e. the recipient), the `value` (i.e. the amount of wei to spend), the `gas` (i.e. the gas limit or the maximum amount of gas the transaction is allowed to spend), the `gasPrice`, (i.e the price of each unit of gas in wei), a nonce and the `data` field. The signing process returns an Ethereum signature composed of three numbers, commonly referred to as `r`, `s`, `v`. The numbers $r$ and $s$ are defined by the \gls{ecdsa} algorithm. The value $v$ is defined in the Ethereum Yellow Paper as $v\in[27, 28]$, more precisely: > It is assumed that $v$ is the 'recovery identifier'. The recovery identifier is a 1 byte value specifying the parity and finiteness of the coordinates of the curve point for which $r$ is the x-value; this value is in the range of $[27, 30]$. However, we declare the upper two possibilities, representing infinite values, invalid. The value $27$ represents an even $y$ value and $28$ represents an odd $y$ value \citep[][Appendix F]{yellowpaper}. -Ethereum defines a function knows as `ecrecover` which given the message hash and the three numbers `r`, `s` and `v` is able to recover the public key and thus the address of the spender which signed the transaction. Because only the corresponding private key could generate valid values for `r`, `s` and `v`, it results in the correct public key and therefore the correct address. +Ethereum defines a function known as `ecrecover` which given the message hash and the three numbers `r`, `s` and `v` is able to recover the public key and thus the address of the spender which signed the transaction. Because only the corresponding private key could generate valid values for `r`, `s` and `v`, it results in the correct public key and therefore the correct address. Single-use addresses come from the answer to a simple question: What if someone generates a valid transaction such as a signed message to send ether to a specific address and then use some random values for `r`, `s` and `v` which are hardcoded and not derived from some private key? Now the hash of the message and the `r`, `s` and `v` values can be passed to `ecrecover` to obtain the origin address for this transaction. Moreover, the transaction can be broadcasted on the Ethereum network, and if the origin address has the funds they will be transferred! Thus we have just achieved a transfer of ethers from an address for which we do not know the private key. Before being thrown into a widespread panic that funds are insecure and may be spent by anyone able to craft a transaction, it is imperative to note that this method does not provide any control to select the origin address for the transaction. The origin address is derived using \gls{ecdsa} which is cryptographically secure and generating a transaction this way---without knowing the private key---for a specific origin address would require to brute-force multiple values for `r` and `s` until values which derive to the desired address are found. (`v` is defined as $v\in[27, 28]$. Hence it is trivial to cover this key space.) This is equivalent to brute-forcing the private key and then using it to generate the correct `r` and `s` values, and brute-forcing the private key is today computationally infeasible. -Nonetheless, this process of generating transaction is useful for single-use addresses. Essentially it is computationally infeasible and probabilistically improbable that a second transaction for the same address can be generated. However, we manage to generate a single transaction for this address and if we send enough ether to this address (including ether to pay for the gas) before broadcasting our transaction, once the transaction is broadcast the ether from that address will be spent and credited to the address we set as recipient in the transaction. +Nonetheless, this process of generating transactions is useful for single-use addresses. Essentially it is computationally infeasible and probabilistically improbable that a second transaction for the same address can be generated. However, we manage to generate a single transaction for this address and if we send enough ether to this address (including ether to pay for the gas) before broadcasting our transaction, once the transaction is broadcast the ether from that address will be spent and credited to the address we set as recipient in the transaction. \input{fig/nicksmethod} @@ -162,14 +162,15 @@ Next this tree of transactions is shared off-chain with the owners of the \gls{m In the case of the ERC820 registry we do not need to send ether or tokens to multiple addresses of course but the same technique may be adapted to generate a single transaction to deploy the contract for which the private key controlling the address is not known---in other words, a keyless deployment using "Nick's method". The second advantage of this technique is that the address of a contract is deterministic. It is computed using the address from which the transaction originated and the nonce of the transaction. Specifically, the address is the `keccak256` hash of the owner's address and the nonce, encoded using \gls{rlp} with the first twelve bytes truncated. This means that the address of the contract is known in advanced and the address will be the same across all chains, thus solving the issue of looking up the address of the registry. -To build this transaction we all we need is to set the correct values in our message. Since this is a contract deployment the `to` address must be the \gls{0x}, and the `value` should be zero as we do not want to send ether to the contract and the nonce should be zero as well since this is the first (and only) transaction from the given address. The `data` is the compiled bytecode needed to deploy the contract, and all that remains is the gas and gas price. The gas consumption can easily be computed using the `eth_estimateGas` call since we know the code which will be executed as part of the transaction. The gas price is a bit more tricky. If set too low the transaction may never be picked up by miners and sit in the memory pool until it is evicted. Setting the gas price too high and the deployment will be very costly. At this point since the gas price is part of the signed message, adjusting the gas price will modify the message and result in a new hash, thus changing the origin address of the transaction and by extension the address of the contract. The \gls{eip}1014 propose the creation of a `CREATE2` opcode expressly to handle this case. The `CREATE2` opcode can only consider the origin address, the actual initialisation code and some salt value \citep{eip1014}. Sadly it is not yet available at this time. +To build this transaction all we need is to set the correct values in our message. Since this is a contract deployment the `to` address must be the \gls{0x}, and the `value` should be zero as we do not want to send ether to the contract and the nonce should be zero as well since this is the first (and only) transaction from the given address. The `data` is the compiled bytecode needed to deploy the contract, and all that remains is the gas and gas price. The gas consumption can easily be computed using the `eth_estimateGas` call since we know the code which will be executed as part of the transaction. The gas price is a bit more tricky. If set too low the transaction may never be picked up by miners and sit in the memory pool until it is evicted. Setting the gas price too high and the deployment will be very costly. At this point since the gas price is part of the signed message, adjusting the gas price will modify the message and result in a new hash, thus changing the origin address of the transaction and by extension the address of the contract. The \gls{eip}1014 propose the creation of a `CREATE2` opcode expressly to handle this case. The `CREATE2` opcode can only consider the origin address, the actual initialisation code and some salt value \citep{eip1014}. Sadly it is not available at this time. -Lastly, all we have left is to set the `r`, `s` and `v` values. The value for `v` is trivially set to $27$. The value for `r` is set to `0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798` and most importantly, `s` must be set to the value -`0x0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa`. This is a predictable value, to convince everyone that no one holds the private key for the address derived from the transaction. +Lastly, all we have left is to set the `r`, `s` and `v` values. The value for `v` is trivially set to $27$. The value for `r` and `s` are set to +`0x8208208208208208208208208208208208208208208208208208208208208200` and +`0x0820820820820820820820820820820820820820820820820820820820820820` respectively. Those are predictable values, to convince everyone that no one holds the private key for the address derived from the transaction. We are now all set, and with the above value we can generate the contract deployment transaction and derive its sender, then we must send enough ether (0.08 ether) to the address, and then broadcast the transaction. -The actual sender address is `0xC3AdeE9B2E23837DF6259A984Af7a437dE4E2ab6` and the deployment addresses for the registry contract is `0x820d0Bc4d0AD9E0E7dc19BD8cF9C566FC86054ce` respectively\footnote{ERC820 is not yet approved and changes to the contract will result in a different address, please read the standard \citep{erc820} once approved to know the correct address.} which starts with `0x820`. This is known as a vanity address, and it is not fully random. The process used to generate such an address is described in the next section (\ref{vanity-address}). +The actual sender address is `0x2681AFA843b492f3d7851afCeca7385a3D13fCE0` and the deployment addresses for the registry contract is `0x820c4597Fc3E4193282576750Ea4fcfe34DdF0a7` respectively\footnote{ERC820 is not yet approved and changes to the contract will result in a different address, please read the standard \citep{erc820} once approved to know the correct address.} which starts with `0x820`. This is known as a vanity address, and it is not fully random. The process used to generate such an address is described in the next section (\ref{vanity-address}). ### Vanity Address @@ -181,7 +182,7 @@ There is another issue with those vanity generators, of course, "Vanitygen" is f Indeed, we know that if we change any of the fields of the deployment transaction, then we change the hash of the signed message and if we change the hash message, we change the origin address returned by `ecrecover`. If we change the origin address, we also change the address of the deployed contract which is computed in a deterministic fashion from the origin address and the nonce. The only question which remains is which field can be safely changed in the transaction. The `to` must be the \gls{0x}, the `nonce` must be $0$, setting a `value` other than $0$ is literally the equivalent of burning ether and the gas price and gas limit are set specifically to make sure the transaction does not consume too little gas and that it will be expensive enough to be considered. The only remaining value which may be changed is the `data` which contains the initialisation code for the registry contract. -This initialisation code is automatically generated by a compile and should not be modified. Nevertheless, the initialisation code contains a copy of the bytecode of the registry and while we do not want to modify the actual code of the registry, there is one fact which can help us: bytecode compiled using `solc`, the Solidity compiler includes in the bytecode, the hash of the metadata for the compiled contract as return by the standard output of `solc` \citepalias[see][Encoding of the Metadata Hash in the Bytecode]{soldoc}. The reason for this choice is to be able to link the metadata to the specific instance of the contract. +This initialisation code is automatically generated by a compiler and should not be modified. Nevertheless, the initialisation code contains a copy of the bytecode of the registry and while we do not want to modify the actual code of the registry, there is one fact which can help us: bytecode compiled using `solc`---the Solidity compiler---includes in the bytecode the hash of the metadata for the compiled contract as returned by the standard output of `solc` \citepalias[see][Encoding of the Metadata Hash in the Bytecode]{soldoc}. The reason for this choice is to be able to link the metadata to the specific instance of the contract. Among other fields, this metadata can contain the original source code of the contract. So this is the crucial point; we can modify a random comment at the beginning of the source file. This will modify the content field of the metadata which will result in a different hash for the metadata which will result in a slightly different contract bytecode which will result in a slightly different deployment code and thus a slightly different data field of the transaction and finally in a different hash for the transaction or message. Hence we have managed to change the message hash thus changing the spender address and by extension the contract address. diff --git a/chapters/07-competing-token-standards.md b/chapters/07-competing-token-standards.md index 6fca491..a9688ec 100644 --- a/chapters/07-competing-token-standards.md +++ b/chapters/07-competing-token-standards.md @@ -2,11 +2,11 @@ The \glspl{eip} repository is open to everyone, and anyone is free to suggest any \gls{eip}. Many people correctly identified the drawbacks of ERC20 as explained in section \ref{strengths-and-weaknesses-of-erc20} and many amendments to ERC20 have been proposed. Those amendments are problematic as they change the established standard, migrating to a newer and improved token standard is a better solution---which is the goal behind ERC777. Moreover, ERC777 is not the only or even the first new token standard to be proposed to replace ERC20. It is also is not the last, as ERC777 gain popularity a few related standards and other token standards started to appear on the \glspl{eip} repository \citep{eipsrepo}. -In this chapter, we will explore three of the main tokens standard proposals competing with ERC777. The first one is ERC223 which predates ERC777 and looked very promising and gained some community support as it was for a time the only real alternative to ERC20. The second one came after ERC777 as indicated by its number: ERC8275. In the same way, ER223 tries to be an answer to the drawbacks of ERC20, ERC777 and ERC827 try to be an answer to the drawbacks of ERC20 and the issues from ERC223. +In this chapter, we will explore two of the main tokens standard proposals competing with ERC777. The first one is ERC223 which predates ERC777 and looked very promising and gained some community support as it was for a time the only real alternative to ERC20. The second one came after ERC777 as indicated by its number: ERC827. In the same way, ER223 tries to be an answer to the drawbacks of ERC20, ERC777 and ERC827 try to be answers to the drawbacks of ERC20 and the issues from ERC223. ## ERC223 -ERC223 was submitted on March 5^th^ 2017, by a developer knows as Dexaran \citep{erc223}. It has one clear goal in mind: to address the issue of accidentally locking token in ERC20 (see section \ref{locked-tokens}). +ERC223 was submitted on March 5^th^ 2017, by a developer knows as Dexaran \citep{erc223}. It has one clear goal in mind: to address the issue of accidentally locking tokens in ERC20 (see section \ref{locked-tokens}). The solution suggested by this proposal is to define a `tokenFallback` function similar to the default fallback function \citepalias[see][Fallback Function]{soldoc}. This function takes as parameters the address of the spender (`from`), the amount of tokens transferred and a `data` field. Any contract wishing to receive tokens must implement this function. @@ -24,17 +24,16 @@ The proposal also has some inaccurate claims such as backward compatibility with > Now ERC23 is 100% backwards compatible with ERC20 and will work with every old contract designed to work with ERC20 tokens. \flushright (Dexaran, comment on ERC223) -Specifically, both standards define an identical `transfer` function as part of their interface. Therefore, some contract capable of calling the ERC20 `transfer` function will be capable of calling the identically named `transfer` function on an ERC223 token contract. Nevertheless, this does not imply compatibility between the two standards. The behaviour of the `transfer` function changes widely from one standard to the next, and this change of behaviour may break things. Potentially a contract could handle transferring ERC20 tokens by first checking if the recipient is a contract or not and call `transfer` or `approve` accordingly. If this contract is given an ERC2223 token, it may try to call the `approve` function on the ERC223 token which does not implement the function, and the transaction will fail. +Specifically, both standards define an identically named `transfer` function as part of their interface. Therefore, some contract capable of calling the ERC20 `transfer` function will be capable of calling the identically named `transfer` function on an ERC223 token contract. Nevertheless, this does not imply compatibility between the two standards. The behaviour of the `transfer` function changes widely from one standard to the next, and this change of behaviour may break things. Potentially a contract could handle transferring ERC20 tokens by first checking if the recipient is a contract or not and call `transfer` or `approve` accordingly. If this contract is given an ERC2223 token, it may try to call the `approve` function on the ERC223 token which does not implement the function, and the transaction will fail. > ERC777 has been built to solve some of the shortcomings of ERC223. Please have a look at it: > > \url{http://eips.ethereum.org/EIPS/eip-777} \flushright (chencho777, comment on ERC223) -Finally, the developer behind the standard appears to be more focus on solving the issue of locked tokens despite the concerns mentioned above and raised by the community. Ultimately there was a feeling that an agreement would be hard to reach, community members became more and more doubtful regarding the viability of ERC223 and the standard started to become more and more stagnant, with the last comments suggesting to look at ERC777 instead. +Finally, the developer behind the standard appears to be more focused on solving the issue of locked tokens despite the concerns mentioned above and raised by the community. Ultimately there was a feeling that an agreement would be hard to reach, community members became more and more doubtful regarding the viability of ERC223 and the standard started to become more and more stagnant, with the last comments suggesting to look at ERC777 instead. > \@MicahZoltu is 100% correct. This discussion did not lead to a consensus, so don't expect this standard to be followed. [...] \flushright (Griff Green, comment on ERC223) - ## ERC827 ERC827 is another proposal to fix ERC20\citep{erc827}. Unlike ERC777 which takes a more independent approach which is entirely dissociated from ERC20 and where both standards can be implemented side-by-side, the ERC827 proposal tries to build a second standard on top of ERC20. @@ -45,7 +44,7 @@ This approach is simple and does provide full backwards-compatibility with the E Secondly, passing both the name and the data (i.e. the parameters) of the function to call in the `transferAndCall`, `approveAndCall` and `transferFromAndCall` functions implies that there is no guaranteed way to communicate directly to that function the actual amount of tokens being transferred. Some token contract may for example automatically levy a transfer fee in tokens, or the token may represent some currency with demurrage and part of the amount is burned when transferring. In other words, to know the actual amount transferred, a recipient should keep track of its balance internally, call the `balanceOf` function and from there it can compute the amount received and update the internal balance. This is both tedious and expensive in gas to do. Moreover, there is always the risk of the state of the internal balance diverging from the balance in the contract, for example calling the ERC20 `transfer` function will increase the balance in the token contract but not in the recipient contract. -Finally, the contract suffers from a significant security flaw. Essentially the three functions added by ERC827 allow anyone to perform arbitrary call from the token contract which is a security risk \citep{consensysrecommendations} and in this context the same security flaw as the implementation of ERC223 with custom fallback---which is mostly the same mechanism of allowing spenders to execute custom calls via the token contract. +Finally, the contract suffers from a significant security flaw. Essentially the three functions added by ERC827 allow anyone to perform arbitrary calls from the token contract which is a security risk \citep{consensysrecommendations} and in this context the same security flaw as the implementation of ERC223 with custom fallback---which is mostly the same mechanism of allowing spenders to execute custom calls via the token contract. In greater details, the flaw was exploited live in the ATN token \citep{atnreport} \citep{secbit2018lacking}, an instance of the ERC223 implementation containing the flawed custom fallback. The attack comes from the unsafe assumption that a spender will pass a function to call on the recipient such that the recipient can react to the delivery of tokens. Albeit this may be the intended use, it cannot be enforced, and the spender is free to specify any function that the token contract will then call. For the ATN token contract\footnote{\href{https://etherscan.io/address/0x461733c17b0755ca5649b6db08b3e213fcf22546}{Deployed at 0x461733c17b0755ca5649b6db08b3e213fcf22546}}, the attacker decided to transfer zero tokens (a transfer of `0` token is considered valid) to the token contract itself. Therefore the token contract was also the recipient contract, and it will call any function on itself. This is an interesting scenario as access control in Ethereum is often enforced by looking at the address from which the call originated (`msg.sender` in Solidity). Often some functions are only executed if they are called by the owner of the contract (the address which deployed the contract in the first place) or the contract itself. The `ds-auth` library applies this principle exactly---as shown in listing \ref{lst:dsauth}---and it was taken advantage of by the attacker. diff --git a/chapters/08-tools.md b/chapters/08-tools.md index b23adb5..3491c82 100644 --- a/chapters/08-tools.md +++ b/chapters/08-tools.md @@ -1,6 +1,6 @@ # The State Of Tooling In The Ethereum Ecosystem -The Ethereum ecosystem is still very new as a result the specific tools and libraries required are also either in their infancy or lacking. The existing tools and libraries are often still in alpha, beta or zero prefixed versions---Solidity itself is only at version `0.4`. This means they are often unstable, and with changing interfaces. The Ethereum in some respect tries to build a newer and more decentralised web, this is why the main library is called \gls{web3}, and the most mature version of it is written in JavaScript. A lot of the tooling is written using \gls{node}. Reminiscent of the JavaScript ecosystem, the Ethereum ecosystem moves fast, even faster than JavaScript's, and the language syntax, tools and libraries are constantly changing. +The Ethereum ecosystem is still very new and as a result the specific tools and libraries required are also either in their infancy or lacking. The existing tools and libraries are often still in alpha, beta or zero prefixed versions---Solidity itself is only at version `0.4`. This means they are often unstable, and with changing interfaces. Ethereum in some respect tries to build a newer and more decentralised web, this is why the main library is called \gls{web3}, and the most mature version of it is written in JavaScript. A lot of the tooling is written using \gls{node}. Reminiscent of the JavaScript ecosystem, the Ethereum ecosystem moves fast, even faster than JavaScript's, and the language syntax, tools and libraries are constantly changing. ## Compilation @@ -10,6 +10,8 @@ A lot of these wrappers add features such as partial recompilation by only recom In comparison for the ERC820 registry, Giveth's `solcpiler` is used as it provides us with a greater control over the compilation process which is a critical aspect as it is paramount to have reproducible builds such that people can compile the source code on their own and obtain the same bytecode in order to convince themselves that the deployed bytecode matches the source file. +\pagebreak + > Tools such as drawbridge which provide deterministic builds are critical for wallets and similar applications to ensure verifiable security. \flushright (Daniel Ternyak, CEO of grant.io, former CTO of MyEtherWallet & MyCrypto) @@ -50,6 +52,8 @@ Implementing a profiler capable of performing a static analysis of the code to e Second, unlike a dynamic approach which is trivially capable of returning the gas consumption as a single number given the parameters, a static tool may not be able to do so. For example if the code contains an iteration over an array whose length is not known at compile time, then the gas consumption will be expressed as a formula like $X + n \cdot Y$ where $X$ is the gas used by the code outside the iterations, $n$ represents the number of iterations and $X$ is the gas used by a single iteration. Note that the values of $X$ and $Y$ are computed by the tool, but the value of $n$ is never known, an actual example (in wei) could be $29000 + n \cdot 3700$. +\pagebreak + > We need tools such as a static gas profiler. It is a project I would be happy to support. \flushright (Daniel Ternyak, CEO of grant.io, former CTO of MyEtherWallet & MyCrypto) diff --git a/chapters/09-future-research.md b/chapters/09-future-research.md index 405cdd0..8789d38 100644 --- a/chapters/09-future-research.md +++ b/chapters/09-future-research.md @@ -8,7 +8,7 @@ The reference implementation described in section \ref{reference-implementation} ## Generic Operators And Hooks For ERC777 End-Users -ERC777 does more than solving some of the shortcomings of ERC200 and provides novel features such as operators, hooks and the data field. Those features bring new possibilities and novel approaches to tackle problems related to token. +ERC777 does more than solving some of the shortcomings of ERC20 and provides novel features such as operators, hooks and the data field. Those features bring new possibilities and novel approaches to tackle problems related to token. Generic operators and hooks are an exciting concept which aims to deploy in a trustless fashion---for example using the same keyless deployment method as the ERC820 registry---operator contracts and hooks which may be used by any address. These generic hooks and operators allow less technically-inclined users to use the advanced features of ERC777 without having an in-depth technical knowledge of Ethereum required for example to deploy a contract. @@ -16,11 +16,11 @@ Efforts must be spent researching how to adequately provide generic operators an ## Promotion Of The ERC777 Standard -The ERC777 standard is lucky to have broad community support and acceptance already. We already see many people looking it creating their own ERC777 tokens. A simple look at the number of download of the ERC777 reference implementation via \gls{npm} which is over 230 or the over 50 stars on its Github repository. We can see the interest is picking up, but there is still a long way to go. +The ERC777 standard is lucky to have broad community support and acceptance already. We already see many people looking it creating their own ERC777 tokens. A simple look at the number of download of the ERC777 reference implementation via \gls{npm} which is over 250 or the almost 60 stars on its Github repository. We can see the interest is picking up, but there is still a long way to go. \input{fig/ethcc} -Meeting the community and providing talks such as the one at EthCC in Paris (see figure \ref{fig:ethcc}), back in March 2018 \citep{ethcc} are also important. We hope to have the opportunity to talk about ERC777 at future Ethereum events including the Web3 summit in Berlin, the Ethereum Magicians' Council of Prague and Devcon4 also in Prague. +Meeting the community and providing talks such as the one at EthCC in Paris (see figure \ref{fig:ethcc}), back in March 2018 \citep{ethcc} are also important. We hope to have the opportunity to talk about ERC777 at future Ethereum events including the Web3 summit in Berlin. Moreover we try to contribute to other projects which help the adoption of the standard, such as writing a paragraph\footnote{\href{https://github.com/ethereumbook/ethereumbook/pull/611}{First Mastering Ethereum pull request about ERC777: github.com/ethereumbook/ethereumbook/pull/611}}\footnote{\href{https://github.com/ethereumbook/ethereumbook/pull/612}{Second Mastering Ethereum pull request about ERC777: github.com/ethereumbook/ethereumbook/pull/612}} for the upcoming book "Mastering Ethereum" \citep{antonopoulos2018mastering} which was well received as shown in figure \ref{fig:masteringethcomment}. diff --git a/chapters/11-appendices.md b/chapters/11-appendices.md index d24b733..1c3776d 100644 --- a/chapters/11-appendices.md +++ b/chapters/11-appendices.md @@ -1,4 +1,6 @@ \appendix +\appendixpage +\addappheadtotoc # The `devdoc` And `userdoc` Raw \acrshort{json} Data Of The ERC820 Registry diff --git a/fig/erc777_uml.tex b/fig/erc777_uml.tex index 4f8dbb7..2eae271 100644 --- a/fig/erc777_uml.tex +++ b/fig/erc777_uml.tex @@ -109,6 +109,6 @@ \umlVHVinherit{ExternalERC777Implementation}{ERC777ERC20BaseToken} \end{tikzpicture} } -\caption[\acrshort{uml} class diagram of the ERC777 Reference Implementation]{\acrshort{uml} class diagram of the ERC777 Reference Implementation, with the ERC820 dependency and an extran implementation using the \texttt{ERC777ERC20BaseToken}.} +\caption[\acrshort{uml} class diagram of the ERC777 Reference Implementation]{\acrshort{uml} class diagram of the ERC777 Reference Implementation, with the ERC820 dependency and an extra implementation using the \texttt{ERC777ERC20BaseToken}.} \label{fig:erc777uml} \end{figure} diff --git a/fig/erc820DelegateRegister.tex b/fig/erc820DelegateRegister.tex index 8bf746c..3b67eb9 100644 --- a/fig/erc820DelegateRegister.tex +++ b/fig/erc820DelegateRegister.tex @@ -30,6 +30,6 @@ \node at (3.6,-11.5) {Call}; \end{tikzpicture} } -\caption{Example of a regular account, Alice, first deplying a contract Carlos which sets Alice as its manager. Secondly, Alice set Carlos both as its own implementation of \texttt{ERC777TokensRecipient} and as hers.} +\caption{Example of a regular account, Alice, first deploying a contract Carlos which sets Alice as its manager. Secondly, Alice set Carlos both as its own implementation of \texttt{ERC777TokensRecipient} and as hers.} \label{fig:erc820DelegateRegister} \end{figure} diff --git a/fig/umlexample.tex b/fig/umlexample.tex index 93d70c0..8e5d483 100644 --- a/fig/umlexample.tex +++ b/fig/umlexample.tex @@ -26,6 +26,6 @@ \filldraw[draw=OliveGreen,fill=OliveGreen!20] (5,-8) circle (.2); \node at (6,-8) {Off-chain}; \end{tikzpicture} -\caption{Alice transfer the ownership of Carlos to Bob, then attempts to transfer to ownership again but fails as she is not the owner anymore.} +\caption{Alice transfers the ownership of Carlos to Bob, then attempts to transfer the ownership again but fails as she is not the owner anymore.} \label{fig:uml} \end{figure} diff --git a/img/genealogical_tree.dot b/img/genealogical_tree.dot index 8e049c1..dfb590a 100644 --- a/img/genealogical_tree.dot +++ b/img/genealogical_tree.dot @@ -1,8 +1,10 @@ digraph G { rankdir=TB; - node [fontname="helvetica", fontsize=40 ]; + bgcolor="transparent"; + edge [color="black", penwidth=4.0;]; + node [fontname="helvetica", fontsize=60, fontcolor="black", penwidth=4.0;]; subgraph timeline { - node [ shape=plaintext ]; + node [ shape=plaintext, fontcolor="black" ]; "Nov. 2015" -> "..." -> "Nov. 2016" -> @@ -41,7 +43,7 @@ digraph G { ERC724 [ label="ERC724", URL="https://github.com/ethereum/EIPs/issues/724", style="filled,bold,rounded,dashed" ]; ERC732 [ label="ERC732", URL="https://github.com/ethereum/EIPs/pull/732", style="filled,bold,rounded,dashed" ]; ERC777 [ label="ERC777", URL="https://eips.ethereum.org/EIPS/eip-777", shape=box]; - ERC827 [ label="ERC827", URL="https://github.com/ethereum/EIPs/issues/827", style="filled,bold,rounded,dashed", shape=box ]; + ERC827 [ label="ERC827", URL="https://github.com/ethereum/EIPs/issues/827", style="filled,bold,rounded,dashed" ]; ERC995 [ label="ERC995", URL="https://github.com/ethereum/EIPs/issues/995", style="filled,bold,rounded,dashed" ]; ERC1003 [ label="ERC1003", URL="https://github.com/ethereum/EIPs/issues/1003", style="filled,bold,rounded,dashed" ]; ERC1111 [ label="ERC1111", URL="https://github.com/ethereum/EIPs/issues/1111", style="filled,bold,rounded,dashed" ]; @@ -82,7 +84,7 @@ digraph G { } subgraph cluster_legend { - labelloc="t" + labelloc="t"; label="Legend"; rank=max; // margin="0.01" diff --git a/img/genealogical_tree.pdf b/img/genealogical_tree.pdf deleted file mode 100644 index c477b70..0000000 Binary files a/img/genealogical_tree.pdf and /dev/null differ diff --git a/img/genealogical_tree.png b/img/genealogical_tree.png new file mode 100644 index 0000000..ce571c8 Binary files /dev/null and b/img/genealogical_tree.png differ diff --git a/metadata.yml b/metadata.yml index 8f9c66e..9cbf829 100644 --- a/metadata.yml +++ b/metadata.yml @@ -13,8 +13,8 @@ coadvisor: name: Thomas Shababi date: year: 2018 - month: June - day: 20 + month: September + day: 10 place: Neuchâtel university: Università della Svizzera Italiana, Switzerland faculty: Faculty of Informatics diff --git a/template/template.tex b/template/template.tex index f88305e..083abb7 100644 --- a/template/template.tex +++ b/template/template.tex @@ -10,6 +10,7 @@ \usepackage{tabu} \usepackage{fnpct} \usepackage[normalem]{ulem} +\usepackage{appendix} % \newfontfamily\cyrillicfont[Script = Cyrillic]{LinLibertine} \newfontfamily\cyrillicfont[ BoldFont = LinLibertine_RB.otf, @@ -425,6 +426,8 @@ $if(natbib)$ $if(bibliography)$ +\cleardoublepage +\phantomsection $if(biblio-title)$ $if(book-class)$ \renewcommand\bibname{$biblio-title$} @@ -432,6 +435,7 @@ \renewcommand\refname{$biblio-title$} $endif$ $endif$ +\addcontentsline{toc}{chapter}{$biblio-title$} \bibliography{references} $endif$ $endif$