Why the “fast” Ethereum sync is so slow

Why the “fast” Ethereum sync is so slow

Why is the geth sync slow? We keep hearing this question again and again from ETH node owners.

Users have been complaining forever now about the notoriously slow geth sync on the official Ethereum wallet. Some even question the level of decentralization of a cryptocurrency which requires several hundred gigabytes of hard drive space and which may take weeks to fully sync with the rest of the network.

While geth syncs, full Ethereum node admins may end up seeing a screen such as this one for a very long time:

There are countless valid arguments in the “Ethereum sync is too slow” camp – and the Ethereum core devs did take note of the issue.

They’ve since created a light client mode and deprecated the old “–fast” switch, adopting a new algorithm using the –syncmode “fast” command line switch instead.

But these changes did not solve the problem. Full Ethereum nodes are still notoriously hard to sync. Why does this happen? How can you mitigate the effects? In this article we discuss a few known issues, why they happen and how you can try to circumvent them.

Known Issues

There are currently a few known issues which make Ethereum syncing very slow:

  1. Clients have difficulty finding peers. Ethereum is P2P, like bit torrent. When there are few peers with slow connections, the transfer rates can be very low and the transfer may even halt.
  2. Ethereum does not show a progress meter per block or structure transfer, it only increases the counter when one structure/block transfer is done. Users then report “having the same block on screen for days”. That is because the download for the next block is taking a very long time and there’s no progress indication for the incomplete transfer.
  3. Spam on the blockchain. There have been many attacks on Ethereum (as with every other popular cryptocurrency).
  4. Unlike Bitcoin and its 1 MB block size, Ethereum block size is not limited. Some blocks are gigantic. (Note the dissonance with the “bigger block” movement in the Bitcoin community?)
  5. Nodes behind NAT. Full nodes need to serve the same bandwitdh as they request from the network. When this doesn’t happen we get what the Torrent community calls “leeches”.
  6. The “fast” algorithm requires the entire chain structure to be downloaded, per block, before processing the block.

Let’s take a quick look at each of these factors individually.

Finding Peers

There’s very little we can do about this issue. It’s an inherent difficulty in all P2P networks. There have been other fully decentralized P2P systems, they were also notoriously slow when there weren’t enough peers on the network.

There have been decentralized anonymous systems like Freenet (which still exists), but also are very very slow because they depend on nodes talking to each other to exchange information on a large scale. Many nodes are run in slow and limited internet connections. When you connect to a slow node, it will drag the entire sync operation slower with it, as the next blocks require the previous ones to be done before continuing.

There is nothing you can do to avoid connecting to slow nodes. There is no concept of “quality of service” in Ethereum, so there’s no way to avoid getting hogged by slow nodes when you trip into one of them.

Spam

Spam is a problem in every Internet related crowd-driven technology. Be it a Telegram chat, IRC, forums or even cryptocurrencies, spam is everywhere. But every spam operation must be profitable, otherwise there’d be no incentive for it.

What’s the incentive for cryptocurrency blockchain spamming? The answer is usually to promote a rival cryptocurrency.

We’ve seen spam attacks against Bitcoin, Ethereum and against every other mineable and top 100 ranked currency. The spam attacks usually consist of submitting millions of tiny transactions which clog up the network and fill the mempool of unconfirmed transactions. There may be other more sophisticated attacks, but the simplest one is usually what slows the whole network down.

When this kind of attack is perpetrated by actors with high bandwidth and lots of cash and resources, they are almost impossible to deflect. But even so, Bitcoin and Ethereum have held up against most the bombardments received, especially throughout 2017 when big  money began to fear the spread of cryptocurrencies.

Block Size

If there’s one good argument against increasing block size in the Bitcoin blockchain, then Ethereum must be it. Ethereum blocks can be of any arbitrary size, limited only by the GAS limit itself. GAS is a measure of computational resources spent in processing Ethereum contracts. A bigger and more complex contract requires more GAS than a small and simple one. GAS is the computation currency in the “world computer” composed of all Ethereum nodes. When a contract is very large, it will require large amounts of GAS, which is itself traded for Ethereum. The amount of GAS limits the amount of contracts, transactions and therefore the physical size of Ethereum blocks.

NAT

As we all know, the IPv4 address space is exausted. Therefore, a immense part of the Internet sits behind NAT routers. NAT is a way to multiplex N private IPs into one or a few public facing IPs. This is how most internet providers are able to offer access to household computers. The provider itself has a small range of IPs it acquires from a backbone operator, and these IPs are distributed to its customers. One small internet provider IP can serve thousands of non-routable private IPs.

The issue with NAT is it must limit what kind of traffic can reach the ISP customers’ computers. Imagine if there were no filtering, everyone’s files and insecure configurations would be open to the world. This has, in fact, already been exploited by hackers before. For example, when worldwide printers started spitting out funny drawings, because hackers had found thousands of open and shared printers on unfiltered networks!

So NAT is there to help us. But there’s a catch.

NAT is terrible for P2P networks. P2P networks should be as symmetric as possible, meaning that when a node downloads at 2 Megabytes/s, it should also upload at 2 MB/s. When this does not happen, the uploaders quickly run out of bandwidth and the P2P network slows down.

Most Ethereum full nodes sit behind heavily filtered NAT firewalls and there is very little the core devs can do about this.

The Fast Algorithm

Finally, there’s the new fast syncing algorithm.

We won’t go into the details of how this works, but what you must know is that this algorithm requires the full chain structure to be downloaded before each block can be committed to the local blockchain database.

As we’ve mentioned, some blocks are gigantic and take days to fully download the whole chain structure. While this chain structure does not finish downloading, the block count sits frozen. Users often report this as a bug or as “frozen wallets”, but the fact is if you look at the underlying geth logs you’ll see that it is downloading the chain structure for the block in the background.

This, combined with the NAT-filtered network and difficulty to find peers, can make the syncing very slow.

Solutions?

So, what can you do about these known issues? While there is no definitive recipe, here are a few tips to help you speed up your full Ethereum node sync.

  • Realize that the nature of P2P is non deterministic. That is, if you restart the geth client, you might get a totally different set of peers which may be a lot faster (or slower) than the peers you had before. Therefore, restarting geth does get things sorted out sometimes. This may seem unintuitive, but simply restarting geth does work sometimes.
  • You can temporarily rent a Amazon AWS instance or other VPS and sync the blockchain from their gigabyte-speed network, then download the chain to your PC. This requires some technical knowledge, but it does get you out of the NAT blockade and downloading the chain from Amazon servers will be faster than the P2P system. If you do this, then make sure to create a new and empty wallet on the AWS instance, do not send your personal wallet there. You can later delete that temporary wallet, no risks involved.
  • Increase the –cache command line parameter to 1024 or some larger value. The default cache size is tiny and some users report considerable speed increases using this trick.
  • Use the –nat none geth command line switch. This will let geth know that it’s not supposed to assume freely incoming connections.

Keep in mind that you can use Ethereum without having to download the entire blockchain to your computer. Online wallets such as MyEtherWallet offer full functionality without requiring a full node. This comes at the expense of some trust : you must trust that the chain to which it is connected will be the official Ethereum network, otherwise they could temporarily route your transactions elsewhere. No such case has been reported for MyEtherWallet, but you never know when a network may be compromised, so keep this detail in mind.

We hope this clears up some questions about the slow Ethereum sync!



Send us news tips, suggestions or general comments by email: contact [at] crypto.bi