uint256.cpp – Commented Bitcoin Core source code

uint256.cpp – Commented Bitcoin Core source code
on September 30, 2019

If you’ve developed systems in C or C++ before then you’re probably familiar with the stdint.h (cstdint for C++) typedefs. Since there’a lot of variation between platforms, stdint.h standardizes integer type names in an intuitive way. A uint8_t is an 8-bit unsigned integer and a uint64_t is a 64 bit unsigned integer.

What you won’t find on stdint.h, though, are 256 bit integer types!  Not in the foreseeable future at least.

So Bitcoin Core defines its own 256 bit data type so 256 bit hashes can be stored in a single object. Of course most end-user computers only handle up to 64 bits at a time in their CPU registers, so to work with 256 bits we need a bit of built-in algebra on the data type itself.

So let’s have a look at how the Bitcoin Core source works with these very large integers.

uint256.h

First thing we find on uint256.h is a declaration of a template base class for what the Bitcoin Core comment calls “fixed size blobs” :

template<unsigned int BITS>
class base_blob

If you’re not familiar with C++ then this bit of code may warrant some explanation. As you know C++ templates are resolved at compile time. We’re used to seeing typename‘s in template specifiers but this time we have a unsigned int value called BITS passed to it. What happens then when you have a literal value instead of a type?

In this case BITS is substituted for the specified integer value everywhere it shows up at compile time! This means that declaring a 256 bit blob or a 128 bit blob does no incurr in any runtime overhead – one specialized class is generated for each value.

Next we have the resource managed by base_blob:

static constexpr int WIDTH = BITS / 8;
    uint8_t data[WIDTH];

The constexpr specifier tells the compiler that WIDTH is more than just a constant. It’s a constant that can be determined at compile time because BITS is substituted for a plain number by the templating system.

In the case where BITS = 256, the expression BITS / 8 will become 32 at compile time so we’ll end up with static constexpr int WIDTH = 32;for a 256 bit class.

The next line uint8_t data[WIDTH]will be translated as uint8_t data[32]for 256 bits.

The default constructor simply zeroes out a WIDTH number of bytes (determined by sizeofoperator which knows this width because it was all resolved at compile time) in the data array:

base_blob()
    {
        memset(data, 0, sizeof(data));
    }

This guarantees that whatever blob is allocated, it’s all set to zero before it’s used.

Another constructor takes the blob in the form of a string:

explicit base_blob(const std::vector<unsigned char>& vch);

We’ll see how this works when we look at the implementation details.

A inline IsNull utility function is then defined:

bool IsNull() const
    {
        for (int i = 0; i < WIDTH; i++)
            if (data[i] != 0)
                return false;
        return true;
    }

IsNull simply rolls up the blob bytes looking for non-zero values. If any is found, it immediately short cuts and returns false. Otherwise if all bytes are zero, then it’s a null blob.

SetNull zeroes out the blob, copying zero to every byte in the blob:

void SetNull()
    {
        memset(data, 0, sizeof(data));
    }

Compare does a byte-wise comparison on two blobs using C function memcmp which subtracts each byte on the right operand from the left operand. If the left one is smaller it’ll yield a negative value, if it’s the same it’ll yield zero and if bigger it’ll yield a positive value.

The three next functions use Compare to derive the most common operators:

friend inline bool operator==(const base_blob& a, const base_blob& b) { return a.Compare(b) == 0; }
    friend inline bool operator!=(const base_blob& a, const base_blob& b) { return a.Compare(b) != 0; }
    friend inline bool operator<(const base_blob& a, const base_blob& b) { return a.Compare(b) < 0; }

Four string <-> blob functions are now declared. We’ll take a look at these at the implementation.

std::string GetHex() const;
    void SetHex(const char* psz);
    void SetHex(const std::string& str);
    std::string ToString() const;

Here we have C++ container-like functions that should look familiar to any CPP programmer:

unsigned char* begin()
    {
        return &data[0];
    }

    unsigned char* end()
    {
        return &data[WIDTH];
    }

    const unsigned char* begin() const
    {
        return &data[0];
    }

    const unsigned char* end() const
    {
        return &data[WIDTH];
    }

    unsigned int size() const
    {
        return sizeof(data);
    }

begin() returns a pointer to the first data item. end() returns a pointer to one byte past the end of data (This is important! end() points beyond the bounds of array memory!). The unsigned versions do the same, then we have size() which simply returns the same value as WIDTH which is the size of data in bytes.

Now we have an interesting function called GetUint64(int pos) which returns a slice of 64 bit unsigned integer from a 256 bit blob starting pos bytes from the beginning of data.

uint64_t GetUint64(int pos) const
    {
        const uint8_t* ptr = data + pos * 8;
        return ((uint64_t)ptr[0]) | \
               ((uint64_t)ptr[1]) << 8 | \
               ((uint64_t)ptr[2]) << 16 | \
               ((uint64_t)ptr[3]) << 24 | \
               ((uint64_t)ptr[4]) << 32 | \
               ((uint64_t)ptr[5]) << 40 | \
               ((uint64_t)ptr[6]) << 48 | \
               ((uint64_t)ptr[7]) << 56;
    }

First it sets ptr to pos bytes offset from the beginning of data. ptr now becomes our reference point from which 8 subsequent chunks of 64 bits each are all aligned and OR‘ed together.  If the first bit of the next 64 bit chunk is turned on, it’ll be turned on on the return value. If bit 3 of the 4th chunk is 1 then bit 3 of the returned value will be one as well.

It works a bit like the first stage of a bloom filter where bits are turned on for every chunk of a certain size ( here every 64 bit chunk after ptr ). The returned value is the result of OR’ing together all those chunks.

Next we have 2 utility I/O functions to read and write the blob onto a Stream:

template<typename Stream>
    void Serialize(Stream& s) const
    {
        s.write((char*)data, sizeof(data));
    }

    template<typename Stream>
    void Unserialize(Stream& s)
    {
        s.read((char*)data, sizeof(data));
    }

These two functions read and write the data to the stream in the same order it’s found on the buffer writing byte 0 first, then byte 1 and so on. Any numerical transformations from string to numbers and viceversa must be performed by the implementing class itself.

Then we have a specialization of our blob class as a 160 bit blob:

class uint160 : public base_blob<160>

160 bit blobs will be heavily used in the RIPEMD-160 code sections.

A specialization for 256 bit blobs:

class uint256 : public base_blob<256>

Finally two overloaded versions of inline uint256 uint256S where a std::string and a C style char* are converted to uint256.

/* uint256 from const char *.
 * This is a separate function because the constructor uint256(const char*) can result
 * in dangerously catching uint256(0).
 */inline uint256 uint256S(const char *str)
{
    uint256 rv;
    rv.SetHex(str);
    return rv;
}
/* uint256 from std::string.
 * This is a separate function because the constructor uint256(const std::string &str) can result
 * in dangerously catching uint256(0) via std::string(const char*).
 */inline uint256 uint256S(const std::string& str)
{
    uint256 rv;
    rv.SetHex(str);
    return rv;
}

uint256.cpp

uint256 is a curious case where the header file is actually more complex than the source code! We’ve covered a lot of ground in our header discussion, now let’s see what’s left in the Bitcoin Core implementation of 256 bit unsigned integers.

First we have a base_blob<BITS> constructor:

template <unsigned int BITS>
base_blob<BITS>::base_blob(const std::vector<unsigned char>& vch)
{
    assert(vch.size() == sizeof(data));
    memcpy(data, vch.data(), sizeof(data));
}

First it checks that the vector size is the same size of the allocated data array. Then it copies the vector data to the local data member variable.

GetHex() returns the hexadecimal std::string representation of data:

template <unsigned int BITS>
std::string base_blob<BITS>::GetHex() const
{
    return HexStr(std::reverse_iterator<const uint8_t*>(data + sizeof(data)), std::reverse_iterator<const uint8_t*>(data));
}

GetHex() works by pointing the initial iterator to one item past the end of data and the end iterator to the beginning of data. HexStr() then iterates backwards, byte by byte while generating the return std::string.

SetHex(const char *psz)  takes a C-style string as input and interprets it as a hexadecimal number, setting data accordingly.

I’ll comment within the code snippet for practicity. As always, my comments beging with > while original source comments begin with // or /*

template <unsigned int BITS>
void base_blob<BITS>::SetHex(const char* psz)
{
> Set data to all anull bytes (zeroes)

    memset(data, 0, sizeof(data));

    // skip leading spaces
    while (IsSpace(*psz))
        psz++;

    // skip 0x
    if (psz[0] == '0' && ToLower(psz[1]) == 'x')
        psz += 2;

    // hex string to uint
    size_t digits = 0;

> HexDigit is a utility function imported from util/strencodings.cpp
> It looks hex digits up in table p_util_hexdigit. Returns -1 when
> invalid code is passed to it.
    while (::HexDigit(psz[digits]) != -1)
        digits++;

> p1 will now point to the beginning of data. Note it's a uchar data type (1 byte).
    unsigned char* p1 = (unsigned char*)data;

> pend will point one byte beyond the last character in data. 
> We discussed WIDTH at length earlier on.
    unsigned char* pend = p1 + WIDTH;

> Now we iterate backwards from last digit (higher mem addr) to first digit
> (lower mem addr). Remember : us humans write numbers in big endian form,
> Intel-based computers write them in little-endian!
> So while we still have digits and the end pointer is higher than the 
> beginning of data, repeat...
    while (digits > 0 && p1 < pend) {
> ...store psz[digits - 1] in p1
        *p1 = ::HexDigit(psz[--digits]);
>       if we still have digits...         
        if (digits > 0) {
>           ... multiply the digit by 16 (hexadecimal base)
>               and then OR it (binary addition) with the value in p1
            *p1 |= ((unsigned char)::HexDigit(psz[--digits]) << 4);
>           increment the p1 pointer to set the next digit
            p1++;
        }
    }
}

Now we have a version of SetHex for C++ std::strings which simply calls the c_str() method to return a pointer to a C-style string. Then it calls the C string version we just discussed:

template <unsigned int BITS>
void base_blob<BITS>::SetHex(const std::string& str)
{
    SetHex(str.c_str());
}

A ToString() method is provided. It just calls GetHex().

template <unsigned int BITS>
std::string base_blob<BITS>::ToString() const
{
    return (GetHex());
}

Finally we have a bit of a C++ hack that forces the compiler to add the 160 and 256 bit templated codes into libraries explicitly. Let’s take a look at the code and then I’ll comment.

// Explicit instantiations for base_blob<160>
template base_blob<160>::base_blob(const std::vector<unsigned char>&);
template std::string base_blob<160>::GetHex() const;
template std::string base_blob<160>::ToString() const;
template void base_blob<160>::SetHex(const char*);
template void base_blob<160>::SetHex(const std::string&);

// Explicit instantiations for base_blob<256>
template base_blob<256>::base_blob(const std::vector<unsigned char>&);
template std::string base_blob<256>::GetHex() const;
template std::string base_blob<256>::ToString() const;
template void base_blob<256>::SetHex(const char*);
template void base_blob<256>::SetHex(const std::string&);

As you can see no variable is being declared nor is anything getting defined.

What’s happening here is we’re telling the compile-time templating engine to generate this code which does nothing at the end of the source file. This forces the compiler to insert the 160 and 256 bit specializations of base_blob into the binary library code!

Since the idea is to extract libraries from the Bitcoin Core source, so that other software can link to it dynamically, if we compiled this code without linkage to an actual use of the 160 and 256 bit blobs, the code would be simply dropped as unused templates do not necessarily get included in object files when they’re not used.

Return to Bitcoin Source Code commentary index