What is blockchain and how can you analyze data in a blockchain? This article will discuss various forms of blockchain analytics from a tactical or heuristic perspective. I’ll explain how SAS® technologies can provide advanced analytics for operational, value/asset and regulatory viewpoints in the diverse world of open source blockchain technologies.

Blockchain landscape

Let’s start with a few basic viewpoints to set the ground work of our discussion.

Blockchain definition

A simple take on a blockchain is to think of it as a linked list of linked lists. As clients generate transactions, each transaction is collected in a linked list by a consensus process that updates a data store that is a linked list of immutable blocks. The security and integrity of the blockchain is guaranteed through built-in protocols and cryptographic algorithms.

Blockchains are growing in popularity because they offer a way to conduct transactions without the need for a trusted third party. Transferring money, tracking goods and sharing legal documents are common uses of blockchain technologies.

Types of blockchains

For the purpose of this discussion, blockchains will be viewed as either public or permissioned/private:

  • Public blockchains like Bitcoin are primarily found in the cryptocurrency world and offer anonymous or pseudonymous identity.
  • Permissioned blockchains for the most part are implemented behind company firewalls, are enterprise-ready and typically have known identity. Many proof-of-concept projects use permissioned blockchains. Examples include R3 CordaChainBigChainDB, and Hyperledger – but there are many others.

Blockchain structure

Structure defines the operational components of a blockchain and mainly centers on a blockchain’s data store. With the profusion of open source blockchain implementations, there are almost as many types of blockchain data structures. Many of the blockchain data stores are derivatives of other blockchain technologies. For example, LitecoinZCash, and Prova are based in various implementations of Bitcoin. Permissioned blockchains lean toward the use of a key/value data store such as LevelDBRocksDB and MongoDB.

Accessing the blockchain

From our discussion so far we can derive two categories of data for all blockchains.

  1. The first category is data at rest, or data that already exists in a blockchain’s immutable data store. In the case of Bitcoin all transactions from the beginning of Bitcoin are stored in its blockchain. There are many ways to access the immutable data store of a blockchain. For example, Python scripts and Base SAS have been used to export the entire Bitcoin blockchain into SAS data sets, offering a wide range of both regulatory and operational analytics. Transactions of interest may be considered for anti-money laundering (AML), know your customer (KYC) or fraud detection.
  2. The second and most interesting category is data in movement. This moves the collection point of data in event form to the processes of a blockchain. Adding event generation at various points in the client, miner/consensus and protocol processes of a blockchain, it is possible to provide stream-based, real-time analytics of any blockchain activity or blockchain content. This approach may also be more helpful in the case of fully encrypted blockchains.

Analyzing blockchain data on the fly

To demonstrate the power of capturing data in movement, better defined as a streaming approach, we developed a blockchain simulator using SAS Event Stream Processing. The simulator generates client requests into a miner process that are controlled by a consensus process. Both the simulator and consensus processes use the pub/sub APIs connected to the SAS Event Stream Processing model for managing blockchain updates.

Here is a workflow view of the implemented SAS Event Stream Processing model:

A blockchain miner/consensus process using SAS Event Stream Processing.

Operational blockchain analytics

The first streaming analytics produced using this method were operational in nature and included transactions per second, block updates per second, and total transaction times from creation to block update.

Adding a configuration window to the model provided a method to start, stop, pause and mute miners and dynamically change the blockchain update rate. Future enhancements will add deep learning at the miner and consensus process levels to automatically manage blockchain metrics such as block size and elapsed time. Running 30 miners at 850 millisecond blockchain updates were easily provided through the SAS Event Stream Processing engine. This is an ideal environment for performance in analyzing IoT projects.

What about analyzing data in a real, open source blockchain such as R3 Corda, Hyperledger or Chain? Well, once the processes for any blockchain are modified to generate the desired events, a SAS Event Stream Processing model similar (minus the consensus and configuration windows) to this simulator could be applied.

As blockchain technologies mature and IoT use cases become the bellwether for blockchain implementations, the need for higher speed block updates, processes and communications will trend toward stream-based composition. The demand for stream-based blockchain analytics technology, such as SAS Event Stream Processing, will prove instrumental to the overall success of blockchains.

Regulatory requirements and blockchain investigations

Public blockchains in the cryptocurrency space are under significant pressure to address topics such as AMLKYC and fraud. With the advent of initial coin offerings and surging market value of cryptocurrencies, regulatory pressures are increasing all over the world.

SAS Visual Investigator addresses these concerns with a variety of intelligence analysis and management needs. It can reveal suspicious activity while performing fraud, security and compliance investigations. One of the key features is to import various forms of data, then define relationships and user interfaces specific to the imported data.

For example, what if your money laundering investigation included someone with a known Bitcoin address? As a blockchain-based exercise, we created an investigation case exercise utilizing SAS Visual Investigator. Using blockchain.info APIs and Python scripting, all the transactions for the Bitcoin address were extracted and three levels of input transactions extracted. Using the transaction date, the Bitcoin price for that date was extracted from another web API to get the dollar value at the time the transaction was created.

Interestingly, that first extract included a relayed by IP address. Using an IP location finder, we identified a longitude and latitude based on the given IP address. The data was aggregated, combined and imported into SAS Visual Investigator. By simply dragging and dropping the data into our case, we were able to show a network diagram of transactions and users, as well as a geographical map of the activity.

Extracting and aggregating Bitcoin input transactions was the challenging part of this case. Due to the anonymity of Bitcoin addresses, other than the known address, only patterns, amounts and possibly the location information added value to the investigation. But with a little work, it’s possible to access and include a variety of data from various blockchain technologies using SAS Visual Investigator.

Find out more about SAS Visual Investigator

Future of blockchain analytics

Blockchain-based technologies will continue to expand into many industries and areas. The secure, decentralized essence of blockchain will make it a popular technology option for any system where security is important. From managing smart contracts to validating money transfers, expect to see many common uses of the technology.

As blockchain use increases, more organizations will need to access and analyze the data, even as it grows in complexity and volume. Moving forward, there may also be the need to offer analytics across multiple blockchain variants. I’m excited to work for a company that has anticipated the interest in blockchain technology and is already applying advanced analytics techniques in this evolving space.