crypto.bi

ELI5: Why is the “fast” Ethereum sync so slow?

Why is the geth sync so slow?

I keep hearing this question again and again from Ethereum node operators. Why is it still so difficult to fully sync an Ethereum full node?

Slow Geth

Users have been complaining forever now about the notoriously slow geth sync on the official Ethereum wallet.

Some even question the level of scaling capabilities of a cryptocurrency which requires several hundred gigabytes of hard drive space and which may take weeks to fully sync with the rest of the network.

While geth syncs, full Ethereum node admins may end up seeing a screen, such as the one below, for a very long time:

Fast Sync Optimization

There are countless valid arguments in the “Ethereum sync is too slow” camp – and the Ethereum core devs did take note of the issue.

They’ve since created a light client mode and deprecated the old “–fast” switch, adopting a new algorithm using the –syncmode “fast” command line switch instead.

But these changes did not solve the problem. Full Ethereum nodes are still notoriously hard to sync.

Why does this happen?

How can you mitigate the effects? In this article we discuss a few known issues, why they happen and how you can try to circumvent them.

Known Issues

There are currently a few known issues which make Ethereum syncing very slow:

  1. Clients have difficulty finding peers. Ethereum is P2P, like bit torrent. When there are few peers with slow connections, the transfer rates can be very low and the transfer may even halt.
  2. Ethereum does not show a progress meter per block or structure transfer, it only increases the counter when one structure/block transfer is done. Users then report “having the same block on screen for days”. That is because the download for the next block is taking a very long time and there’s no progress indication for the incomplete transfer.
  3. Spam on the blockchain. There have been many attacks on Ethereum (as with every other popular cryptocurrency).
  4. Unlike Bitcoin and its 1 MB block size, Ethereum block size is not limited. Some blocks are gigantic. (Note the dissonance with the “bigger block” movement in the Bitcoin community?)
  5. Nodes behind NAT. Full nodes need to serve the same bandwitdh as they request from the network. When this doesn’t happen we get what the Torrent community calls “leeches”.
  6. The “fast” algorithm requires the entire chain structure to be downloaded, per block, before processing the block.

Let’s take a quick look at each of these factors.

Finding Peers

There’s very little we can do about this issue. Discovering initial peers is an inherent difficulty in all P2P networks. There have been other fully decentralized P2P systems and most were also notoriously slow when unable to bootstrap initial connections to enough peers on the network.

There have been decentralized anonymous systems like Freenet (which still exists BTW), but which are also very slow because they depend on nodes talking directly to each other to exchange information on a large scale. There isn’t one fast centralized node aiding the network in a truly P2P system.

Many nodes are also run in slow and limited internet connections. When you connect to a slow node, it will drag the entire sync operation slower with it until your node detects a lagging connection and tries a different peer.

There is nothing you can do to avoid connecting to slow nodes a first time. When you find a very slow or incorrectly running node you can blacklist it or quarantine it in your local peer list. But it’s impossible to know it’s a slow node before connecting to it at least once.

Spam

Spam is a problem in every Internet related crowd-driven technology. Be it a Telegram chat, IRC, forums or even cryptocurrencies, spam is everywhere. But every spam operation must be profitable, otherwise there’d be no incentive for it.

What’s the incentive for cryptocurrency blockchain spamming? The answer is usually to promote a rival cryptocurrency.

We’ve seen spam attacks against Bitcoin, Ethereum and against every other mineable and top 100 ranked currency. The spam attacks usually consist of submitting millions of tiny transactions which clog up the network and fill the mempool of unconfirmed transactions. There may be other more sophisticated attacks, but the simplest one is usually what slows the whole network down.

When this kind of attack is perpetrated by actors with high bandwidth and lots of cash and resources, they are almost impossible to deflect. But even so, Bitcoin and Ethereum have held up against most the bombardments received, especially throughout 2017 when big  money began to fear the spread of cryptocurrencies.

Block Size

If there’s one good argument against increasing block size in the Bitcoin blockchain, then Ethereum must be it.

Ethereum blocks can be of any arbitrary size, limited only by the GAS limit itself.

GAS is a measure of computational resources spent in processing Ethereum transactions (including smart contract interaction).

A bigger and more complex contract requires more GAS than a small and simple one.

GAS is the computation currency in the “world computer” composed of all Ethereum nodes.

When a contract is very large and complex, it will require large amounts of GAS, which is itself traded for Ethereum.

The amount of GAS limits the amount of contracts and transactions contained in a  block. Therefore GAS indirectly limits the physical size of Ethereum blocks.

NAT

As we all know, the IPv4 address space has been exhausted.

Therefore, a immense part of the Internet sits behind NAT routers. NAT is a way to multiplex N private IPs into one or a few public facing IPs.

This is how most internet providers are able to offer access to household computers. The provider itself has a small range of IPs it acquires from a backbone operator, and these IPs are distributed to its customers. One small internet provider IP can serve thousands of non-routable private IPs.

The issue with NAT is it must limit what kind of traffic can reach the ISP customers’ computers.

Imagine if there were no filtering, everyone’s files and insecure configurations would be open to the world. This has, in fact, already been exploited by hackers before. For example, when worldwide printers started spitting out funny drawings, because hackers had found thousands of open and shared printers on unfiltered networks!

So NAT is there to help us. But there’s a catch.

NAT is terrible for P2P networks.

P2P networks should be as symmetric as possible, meaning that when a node downloads at 2 Megabytes/s, it should also upload at 2 MB/s. When this does not happen, the uploaders quickly run out of bandwidth and the P2P network slows down.

Most Ethereum full nodes sit behind heavily filtered NAT firewalls and there is very little the core devs can do about this.

The Fast Algorithm

Finally, there’s the new fast syncing algorithm.

We won’t go into the details of how this works, but what you must know is that this algorithm requires the full chain structure to be downloaded before each block can be committed to the local blockchain database.

As we’ve mentioned, some blocks are gigantic and take days to fully download the whole chain structure. While this chain structure does not finish downloading, the block count sits frozen. Users often report this as a bug or as “frozen wallets”, but the fact is if you look at the underlying geth logs you’ll see that it is downloading the chain structure for the block in the background.

This, combined with the NAT-filtered network and difficulty to find peers, can make the syncing very slow.

Solutions?

So, what can you do about these known issues? While there is no definitive recipe, here are a few tips to help you speed up your full Ethereum node sync.

  • Realize that the nature of P2P is non deterministic. That is, if you restart the geth client, you might get a totally different set of peers which may be a lot faster (or slower) than the peers you had before. Therefore, restarting geth does get things sorted out sometimes. This may seem unintuitive, but simply restarting geth does work sometimes.
  • You can temporarily rent a Amazon AWS instance or other VPS and sync the blockchain from their gigabyte-speed network, then download the chain to your PC. This requires some technical knowledge, but it does get you out of the NAT blockade and downloading the chain from Amazon servers will be faster than the P2P system. If you do this, then make sure to create a new and empty wallet on the AWS instance, do not send your personal wallet there. You can later delete that temporary wallet, no risks involved.
  • Increase the –cache command line parameter to 1024 or some larger value. The default cache size is tiny and some users report considerable speed increases using this trick.
  • Use the –nat none geth command line switch. This will let geth know that it’s not supposed to assume freely incoming connections.

Keep in mind that you can use Ethereum without having to download the entire blockchain to your computer.

Online wallets such as MyEtherWallet offer full functionality without requiring a full node.

The tradeoff to using a thin wallet is some loss of trust : you must trust that the chain contained in the node to which it is connected will be the official Ethereum network blockchain.

Otherwise this trusted node could temporarily route your transactions elsewhere or even broadcast invalid TX’s.

No such case has been reported for MyEtherWallet that I know of.

We never know when a network may be compromised, so keep in mind that the only secure and legitimately decentralized way to use any cryptocurrency, including Ethereum, is to run a full node that is 100% in sync.

We hope this clears up some questions about the slow Ethereum sync!

Ethereum Full Node Requirements

In theory, you don’t need very powerful hardware to run an Ethereum node. Since it’s mostly a network and IO-bound process, you’ll need good Internet connectivity and fast hard drives with at least a couple of Terabytes free space.

Hard disk: At least 2 TB. The Ethereum blockchain is huge and growing fast.

Memory: 1 GB is enough

CPU: Any modern CPU with at least 32 bits is OK. Even Raspberry Pi can run geth

Internet: Ethereum requires around 400 kb/s average in/out transfer rate 24×7. If you use your connection for other work, you need at least 10 MB/s for a functional setup. Internet speed is critical for mining.

Alternative Wallets

If you’re in a hurry and must send/receive ETH urgently, you can try alternative Ethereum wallets and node implementations.

For instance, OpenEthereum (formerly Parity Ethereum) has been gaining popularity recently.

You may also run web-based light wallets which do not require the blockchain to be downloaded at all. MyEtherWallet is a popular choice in this category, but there are others.

Find an Ethereum wallet that’s right for your need

Links

Getting Deep Into Geth: Why Syncing Ethereum Node Is Slow

Geth full sync so slow #19486

Geth 1.4.18 syncing is very slow #3207

Ethereum wallet/Geth incredibly slow to sync, help?

Geth is very slow to sync after block ~2420000

Ethereum Wallet Syncing Problems

About the Author
Published by Crypto Bill - Bill is a writer, geek, crypto-curious polyheurist, a dog's best friend and coffee addict. Information security expert, encryption software with interests in P2P networking, decentralized applications (dApps), smart contracts and crypto based payment solutions. Learn More About Us