A Directed Acyclic Graph, or DAG, is a graph where no “loops” exist.
You can only traverse a DAG in the forward direction, even if you take different paths in the graph.
Essentially, a DAG represents processes where each step can only move forward and never forms any repetition loops.
If you model a workflow using a Directed Acyclic Graph, it forces you to focus on getting to the end of the job.
A given step in a DAG may not connect back to a previous step. This way, no cyclic dependencies are formed where one step of the process could get in a deadlock state.
If a process requires repetition, then in order to turn it into a DAG, the system must unroll the repetition loop, turning it into a long string of sequential steps.
By modeling processes this way, data flow becomes simpler and more predictable.
The only tradeoff is the execution plan potentially becoming bigger and more complex.
Processes modeled into a DAG structure can be more easily parallelized.
For example, on a workstation with 8 CPU’s (or 8 independent cores), a DAG with 8 paths can be modeled by dividing work into 8 and then sending the first step of each branch to a different processor.
The same idea works for large clusters. If there are 32 nodes available in a cluster, a job can be modeled into a 32 path DAG where each path is mapped to a different node and then reduced back into a single node (map/reduce strategy).
In either case, the parallel paths move forward and then rejoin the main process path at a future point.
This is especially useful in associative functions that can be trivially performed in parallel.
For example, adding a large amount of numbers can be performed in any order and will yield the same result.
You can parallelize the addition of numbers among several processors, and then rejoin the main addition by sending each result to a common point where all subtotals are added together to get the grand total.
For example, the DAG data structure is used by Apache Spark in order to model highly complex parallel big data jobs.
Spark starts by organizing all the necessary tasks in a big data job into a DAG composed of smaller steps.
Each step in the DAG must be completed before moving forward.
When there’s a failure, the step is re-attempted until it succeeds, and only then it moves forward. The job never moves backwards in the DAG.
Since there are no loops, the entire process can be easily planned ahead of time.
Loops make some processes impossible to predict. If necessary data for a step would come from the “future” (one iteration ahead in a loop), it’d be impossible to optimize that step ahead of time.
This is why DAG’s are relevant in job scheduling and parallel computing theory: you don’t ever have to plan for future data to arrive backward.
Here’s a Wikipedia-sourced illustration for a DAG:
Several paths exist in this DAG but it can be seen that the process never points backwards. One step does not ever depend on future results.
In this process, it is possible to either reach the end node (e) straight from the first step, or the process can flow through c or a longer process may be needed, flowing through b, d and then on to e.
Note that some steps may have loops inside them as sub-processes.
For instance, in a big data job, while the main process may be moving forward only, some of its sub-steps could be processing data in a loop. The individual components of a DAG need not be DAG’s themselves.
The following illustration, also from Wikipedia, shows a DAG (yellowish) where its sub-processes can be seen to form loops:
As can be seen, the main process only moves forward or sideways, but internally each big node may contain cyclic flows.
An Ethereum smart contract could be represented inside one of the yellow nodes on the above DAG illustration.
The Ethereum smart contract language, Solidity, is Turing-complete and allows loops and cycles to happen.
But the blockchain in which the smart contract itself is contained is modeled in a data structure that has no loops.
A blockchain may fork, but the fork will also only move forward. A future block cannot rejoin the blockchain at a previous state to form a loop.
Blockchain logic only flows forward in time, while the smart contracts contained in it may perform loops or even infinite computations.
DAG’s may help model the next generations of blockchain platforms.
There are really hot topics in blockchain research right now and one the top priorities in this field is finding ways to make different blockchains interact securely with each other, while maintaining decentralization and logical decoupling between the two chains.
Does this ring a bell? You guessed it: DAG’s are perfect data structures to model such a system.
For example, if Bitcoin and Ethereum were to behave like sidechains in a system that could process transactions in either crypto, it should be designed in such a way that both ETH and BTC continue to work normally (flowing forward), while now allowing sidechain interactions to happen (forward pointing parallel paths like we saw on illustration a->b->c->d->e above).
DAGs are perfect for modeling such a system!
Sidechains are a very interesting concept, where a main blockchain has parallel chains, some of which could even store transactions made in different cryptocurrencies.
This system, of course, isn’t simple to implement.
Different cryptocurrencies use different hashing algorithms, block formats, various consensus strategies and so on.
Therefore “mining” a transaction that involves two ore more different cryptos can be quite the technical challenge.
I hope this basic introduction to Directed Acyclic Graphs has helped you understand the concept and how it can integrate into cryptocurrency development.
Big data systems already leverage DAG’s to model complex workflows in a predictable way.
Sidechains are one possible application of DAG’s in cryptocurrency development, but you’ll probably find this powerful data structure being used in several other applications.
IoT chain is an implementation of the DAG concept.
IOTA is another very popular crypto that also uses DAGs.