What is a Directed Acyclic Graph (DAG)? Why should I know about this?

What is a Directed Acyclic Graph (DAG)? Why should I know about this?

A Directed Acyclic Graph, or DAG, is a graph where no “loops” exist and where you can only flow one way throughbnlout its paths. Essentially a DAG represents a process where each step cannot point backwards. Each step only moves forward towards the end and may not reconnect to a previous step, which would form a loop.

DAG’s are useful because processes modeled this way can usually be parallelized. A DAG with 8 paths can be processed by sending the first step of each branch to a different processor or even remote computer if a cluster is available.

This data structure is used by Apache Spark, for example, in order to model highly parallelized big data jobs. Spark starts by organizing all the necessary tasks in a big data job into a DAG of tasks. Each step in the DAG must be performed in order to move forward. When there’s a failure, this step is re-attempted and only after it succeeds can the process move forward. The job never moves backwards in the DAG.

Since there are no loops, the entire process can be planned ahead of time. Loops make some data impossible to predict. If a step would come from the future, how would you know what data will come from the future? It’s impossible. This is why DAG’s are relevant in job scheduling and parallel computing: you don’t ever have to plan for future data to arrive back. The entire process only moves forward and doesn’t reinject results back to earlier stages.

Here’s a Wikipedia-sourced illustration for a DAG:

Several paths exist in this DAG but it can be seen that the process never points backwards and one step does not ever depend on future results. In this process it is possible to reach the end straight from the first step, or the process can flow through c or a longer process may be needed, flowing through b, d and then on to e.

Note that some processes may have loops inside them as sub-processes. For instance, in a big data job while the main process may be moving forward only, some of its substeps could be processing data in a loop, therefore the components of a DAG need not be DAG’s themselves. The following illustration, also from Wikipedia, shows a DAG (yellowish) where its sub-processes can be seen to form loops:

As can be seen, the main process only moves forward or sideways, but internally each big node may contain cyclic flows.

For example, an Ethereum smart contract could be represented inside one of the yellow nodes on the above DAG. The smart contract language, Solidity, is Turing-complete and allows loops and cycles to happen. But the blockchain in which the smart contract itself is stored only moves forward, never backwards. A blockchain may fork, but the fork will also only move forward and a block cannot rejoin the previous blockchain at a previous state.

Despite this, blockchains are not DAG’s because forked chains are entirely new blockchains and do not configure a graph – blockchains are “a straight line”, never a network-like structure. Though a chronological depiction of a fork may show a blockchain splitting in two and then continuing on like a graph, those kinds of illustrations only serve to better explain how forks happen. In reality two new blockchains are formed during a fork and they become independent from each other, not a single structure. Forks involve making a complete copy of the previous blockchain right up to the point where the fork happened. Remember from previous articles we mentioned that blocks only contain pointers to one, and only one, previous block or a pointer to nowhere in the case of the genesis block. This data structure only allows for a linked list, not a graph, because each block cannot point to more than one previous block.

But DAG’s may be the next evolution of the blockchain.

There are really hot topics in blockchain research right now and one the top priorities in this field is finding ways to make different blockchains interact securely while maintaining decentralization and logical decoupling between the two chains. That is, if Bitcoin and Ethereum were to have a sidechain that could process transactions in either crypto, it should be designed in such a way that both ETH and BTC continue to work normally and are backwards compatible with all their previous transactions, while now allowing sidechain transactions to happen. DAGs are seen as one possible solution to this technical challenge, as it would allow for the chain to accommodate parallel chains that could later rejoin the main chain or not, exactly like a DAG does.

Sidechains are a very interesting concept, where a main blockchain has parallel chains, some of which could even store transactions made in different cryptocurrencies. This system, of course, isn’t simple to implement. Different cryptocurrencies use different hashing algorithms, different block formats, various consensus strategies and so on. Therefore “mining” a transaction that involves two different cryptos can be quite the technical challenge.

Sample implementations: IoT chain is an implementation of the DAG concept. IOTA is another very popular crypto that uses DAGs.


DAG illustration: David Eppstein – Own work

on June 7, 2018