crypto.bi – ELI5 Cryptography, cryptocurrency and programming

What is Base58 encoding? Why create yet another encoding scheme?

Base58 is a character encoding system developed by Satoshi Nakamoto. It was first released on the earliest Bitcoin source code tree. Satoshi felt that a new encoding was necessary for Bitcoin’s addresses and transactions, since he thought the existing ones, like Base64, would cause confusion when writing down Bitcoin addresses and TX hashes. In essence, Base58 was created to improve human usability and not for technical reasons – it’s part of Bitcoin’s friendly user interface.

Base58 removes characters that could seem ambiguous when written on paper, like zero and O, one and minor case L and so on. Essentially Base58 is similar to the ubiquitous Base64 minus several characters that could confuse readers later when read back into a wallet.

Character encoding works by giving each numerical value found on a computer memory a visual representation through some existing character. Therefore we could map the value 100 to the capital letter A and each subsequent letter after that a value greater than 100. Character encodings usually preserve the ordering of their readable values. If A would be 100, then B would likely be 101 and so on, but this is not mandatory. When an encoding is used which cannot be intuitively decoded, it is usually considered to be cryptographic in nature.

Base58 does not work like a regular character encoding, but instead has its own rules and its own character map. For example, while our alphabet has A to Z and then 0 to 9 in the numeric map, Base58 has 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz for Bitcoin addresses. This is the full Base58 alphabet and Base58-encoded Bitcoin addresses will never have any character outside this sequence. Note the absence of the upper case O but the presence of the minor case o, since o is smaller it does not get confused with 0 (zero).

Amazingly, any binary data can be encoded by Base58! Since the output of most cryptographic routines will be binary data, which is usually unreadable by humans, an encoding scheme is needed to translate the zeroes and ones to something we can write down or communicate with others. That way, all the values we see being communicated, including Bitcoin addresses, transaction hashes and Merkle Tree root hashes are all really binary data which have been encoded using Base58 or some similar system.

Part of the Bitcoin success can surely be attributed to Base58! Being able to write down and print Bitcoin addresses has been key to its usability. Bitcoin private keys can also be encoded using Base58, although different applications may encode the secret data differently. Since the private key is never shared or transmitted in any way, its encoding is not predetermined, unlike addresses which can be shared in public.

Bitcoin establishes different prefixes for different kinds of values. For example, the number 1 is the first character in the Base58 alphabet. As you probably know, computers start counting from zero, so the number 1 is actually zero in the Base58 alphabet. Bitcoin addresses have different prefixes according to their type. Specifically, prefix 1 means a regular Bitcoin address (public key hash), prefix 3 means a script hash (analog to a Ethereum smart contract address), 4 prefix is for proposed compact addresses, M or N are used by Namecoin, 5 designates a private key, m and n designate the Testnet addresses and 2 designates a Testnet script address.

We hope this brief intro to Base58 has given you a better idea about how the Bitcoin addresses are encoded. Satoshi Nakamoto devised this encoding scheme to make it easy to read and write by humans, a trait which has been key to Bitcoin’s enormous success.


Exit mobile version