Some Facets of Information Theoretical Graph Analytics

F. Oggier (School of Physical and Mathematical Sciences, NTU, Singapore)

CSCIT 2019, CUHK

Centralities and Entropic Centralities

Random Graphs and Power Law

Information Theoretic Clustering

A Case Study: Bitcoin Forensics

A Case Study: Bitcoin Forensics

The Bitcoin network is a peer-to-peer payment network, in which users send and receive the Bitcoin cryptocurrency.

Payments are grouped by transactions, and transactions are recorded into a public distributed database (ledger) called the blockchain.

An actual Bitcoin transaction comprises metadata (including the transaction hash, a unique ID), inputs and outputs.

Every input specifies a previous transaction hash, and the index of the previous transaction's output (outputs are labelled from 0) that is being spent.

Every output contains information that permits to confirm the recipient address, and the transaction amount. The sum of all the output amounts has to be equal to or less than (if a transaction fee is taken) the sum of all the input amounts.

trans ID transactions
1 Inputs: $\varnothing$
1 Outputs: 25.0 $\rightarrow$ addr$_A$
2 Inputs: 1[0]
2 Outputs: 17.0 $\rightarrow$ addr$_B$, 8.0 $\rightarrow$ addr$_A$
3 Inputs: 2[0]
3 Outputs: 8.0 $\rightarrow$ addr$_C$, 9.0 $\rightarrow$ addr$_B$
4 Inputs: 2[1]
4 Outputs: 4.0 $\rightarrow$ addr$_D$, 2.0 $\rightarrow$ addr$_B$, 2.0 $\rightarrow$ addr$_A$
5 Inputs: 3[1], 4[1]
5 Outputs: 11.0 $\rightarrow$ addr$_B$
6 Inputs: 3[0], 5[0]
6 Outputs: 3.0 $\rightarrow$ addr$_D$, 7.0 $\rightarrow$ addr$_C$, 9.0 $\rightarrow$ addr$_B$

For example, the inputs of transaction 5 are 3[1] and 4[1], referring to the second output of transaction 3 and 4. The same address is used (fund consolidation).

9232663ba4db65c3f37626f278e878fd7b5b57429829a9304833f20bccac3031,Tue Feb 10 12:35:46 SGT 2015,1,0000000000000000000000000000000000000000000000000000000000000000, 4294967295,1, 1HyepGCdXGpiR7b9RRuaVPAdmL7x6znqQr,P2PK,BTC 25.02251307 2cac7eaf12763a284d6419dcca16708b7c2d78d76d9ae7f43c752b7baf4dd918,Tue Feb 10 12:35:46 SGT 2015,10,f75b455b4df94bcdcd54985b4aeea9948d2a1dca0f63c65b85ad2d941050c5cf,248, 9abf594926b6f4500a3028ba89d71b0f534ae22c764c1e54361c6340323e0ea9,0, fbddcc10f4330f9d00be1c6252331335d39e68b16ad8988413e34ef143efafa8,1, 82d329e50b19a43869c21bc526e289d58538c0300365792388556e002070ac34,1, d9bb5e06631f4265e645bd41c6cd2c53cb334419652ad3d37159fb22d3b3deb2,0, 656a8da0f6370c38535167ffc50d2782c74b29c6b7ea6e4b039b955cc2aaf9eb,0, 90148e3a102f2eb7eacf5f3764a2d98debdf701e61b7721974ff862520f248fe,0, 0e1f771879876f087b9ac20e8e92435669c6ab0d313888236d57b99ae8de35c6,0, cadacf0524f32f3126ec41370817332c452c8d58af4dd1dfa1dc06787934c0e8,28, 7d64911e2d1321b635be26f0a191d4c0f00183b4f682487b1a28544ab6e230fb,0,1, 1A4eBqEHZBm5CzCg4b14QAiL7swYsRFr4y,P2PK,BTC 341.79767906

Bitcoin Forensics: the Ashley Madison extortion incident that followed the data breach of Ashley Madison website.

Perpetrator(s) asked for 1.05 Bitcoin from Ashley Madison website data breach victims, so that their information was not brought to the attention of close friends and family.

Looking at transactions of 1.05 Bitcoin for the period around 23 August 2015 when the extortion campaign was carried out, the author of the blog does-blackmailing-pay-signs-in -bitcoin-blockchain-of-responses-to-ashley-madison-extortion-emails identified a list of 67 Bitcoin wallet addresses.

Start with these 67 addresses: 61 of the addresses are connected to each other in an undirected variant of the address graph, while the other six are totally isolated.

Look at the distribution of pair-wise distances among the 61 addresses.

Consider graphs of radius 6 around each of the 61 addresses, to identify subsets of nodes within the specified distance: 39 of the nodes were each isolated from all other nodes in the group by a distance of at least 6.

Only one subset comprising 22 of the original 61 nodes, which were identified to be close enough to each other.

Consider subgraphs (of radii 4-5) centered around the node closest on an average to the aforementioned group of 61 nodes, namely 1G52wBtL51GwkUdyJNYvMpiXtqaGkTLrMv. They contain the 22 suspected nodes.

  • a (very) small sample $G_1=(V_1,E_1)$ with $|V_1|=4571$ and $|E_1|=4762$ (radius 4)

  • a small sample $G_2=(V_2,E_2)$ with $|V_2|=82663$, $|E_2|=119190$ (radius 5), within the period of Aug 13 to Sep 6 2017.

Fitting a power law for $G_1$: on the left, the in-degree distribution with $\alpha_{in}=2.38$, $x_{min}^{in}=1$, and on the right, the out-degree distribution $\alpha_{out}=2.08$, $x_{min}^{out}=1$.

Fitting a power law for $G_2$: on the left, the in-degree distribution with $\alpha_{in}=1.47$, $x_{min}^{in}=1$, and on the right, the out-degree distribution $\alpha_{out}=1.493$, $x_{min}^{out}=1$}

Average path length profile for $G_2$ according to the theoretical analysis.

in-address $k^{in}$ out-address $k^{out}$ length $l_{ij}$ actual average
1EuRpZCEoyRKWcEpv6jD42N2CxJWrrETcD 2 15hWpb3m5VXdn9KVsS4rSMnrQQJLhXVyN4 16 2 3.084 5.403
1HZErPx5XPrp8mH98rKRjyazQLYN7j34Lk 2 15hWpb3m5VXdn9KVsS4rSMnrQQJLhXVyN4 16 2 3.084 5.403
17CcixA4fyEmscXPsZRshecXzJ52eoKNNY 2 15hWpb3m5VXdn9KVsS4rSMnrQQJLhXVyN4 16 2 3.084 5.403
1PUt7bm1ZU6BRT85yDJxd651Fgneveuzs3 2 15hWpb3m5VXdn9KVsS4rSMnrQQJLhXVyN4 16 2 3.084 5.403
14gkHySfCnoV4Fy4nCxoLUgavMP3DMXKsS 2 15hWpb3m5VXdn9KVsS4rSMnrQQJLhXVyN4 16 2 3.084 5.403
137ENC6fNM97tawPMAGgyn9BKLEmkTx7Xd 2 1J97MX1kCHMgePTBwB97KaHyjSYnkTyt7V 14 2 3.111 5.602
12bbs3MZAsd9mseXKTA93PtHVd2nSkvjDy 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 2 3.161 5.328
1Hmjw6JcNDxCQuVo7pkUmRmJzRJe4yLze9 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 2 3.161 5.328
14YeaK34GE68S6YASdtHR1iThj8c8hBQ1C 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 2 3.203 5.490
1MAwvr21q95UBu6y4WhRM47K5RBijQ164G 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 2 3.203 5.346
1Kbj3tzAbtHgJHt6h6G5PuvWYRZkWXAfhH 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 2 3.203 5.346
1KFc1tekeQtZg48pKbCEU2j7J14N2XFTNW 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 2 3.203 5.346
13Mj4nmV127symEJ6GTTb4kdTWyQ884QoN 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 2 3.203 5.346
1ETmDWjyto3gGegqch41PCeYJiVx9pmsuv 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 2 3.203 5.346
1ErwF3T6QCdZ75PBXZdyxXJPN2k7bBV9fd 2 1HZErPx5XPrp8mH98rKRjyazQLYN7j34Lk 2 2 3.514 5.566
1G52wBtL51GwkUdyJNYvMpiXtqaGkTLrMv 2 17CcixA4fyEmscXPsZRshecXzJ52eoKNNY 2 2 3.514 5.566
18euqRRpC2Zp9i9dwrT7Qp3M8jfbu9TUn6 2 14gkHySfCnoV4Fy4nCxoLUgavMP3DMXKsS 2 2 3.514 5.566
1Q3AvrmZ8cykHGQ7kCkSZaopnWiHPzc1Qw 2 14gkHySfCnoV4Fy4nCxoLUgavMP3DMXKsS 2 2 3.514 5.566
1Nr2y7XD8c27tMQF5XrQJdfUWjCZ3zY5uy 2 1PUt7bm1ZU6BRT85yDJxd651Fgneveuzs3 2 2 3.514 5.566
1BkUNiTfJRTTG1iycaWGZavmLnLVHorUUo 2 14YeaK34GE68S6YASdtHR1iThj8c8hBQ1C 2 2 3.514 5.566
in-address $k^{in}$ out-address $k^{out}$ length $l$ $\tau(l)$ $l_{ij}$
1ErwF3T6QCdZ75PBXZdyxXJPN2k7bBV9fd 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 4 1.333 3.161
1ErwF3T6QCdZ75PBXZdyxXJPN2k7bBV9fd 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 4 0.8 3.203
1JBygewRQuHuh4qJnpQtb1qCfwNJ2zNrg5 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 4 2.0 3.161
1JBygewRQuHuh4qJnpQtb1qCfwNJ2zNrg5 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 4 0.8 3.203
1G52wBtL51GwkUdyJNYvMpiXtqaGkTLrMv 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 4 1.333 3.161
1G52wBtL51GwkUdyJNYvMpiXtqaGkTLrMv 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 4 0.8 3.203
18euqRRpC2Zp9i9dwrT7Qp3M8jfbu9TUn6 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 4 1.333 3.161
18euqRRpC2Zp9i9dwrT7Qp3M8jfbu9TUn6 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 4 0.8 3.203
1Q3AvrmZ8cykHGQ7kCkSZaopnWiHPzc1Qw 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 4 1.333 3.161
1Q3AvrmZ8cykHGQ7kCkSZaopnWiHPzc1Qw 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 4 0.8 3.203
1Nr2y7XD8c27tMQF5XrQJdfUWjCZ3zY5uy 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 4 1.333 3.161
1Nr2y7XD8c27tMQF5XrQJdfUWjCZ3zY5uy 2 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2 9 4 0.8 3.203
1BkUNiTfJRTTG1iycaWGZavmLnLVHorUUo 2 15WJMyfHeiD3rT8nvvrnyQs2THA3J3wxnF 11 4 1.333 3.161

$\tau(l)=\tfrac{l}{{\mbox{no. of shortest paths}}}$ where $l$ is the length of the shortest path between two nodes of given degrees.

Comparing both tables, we identify wallet addresses 1Nr2y7XD8c27tMQF5XrQJdfUWjCZ3zY5uy and 1JujBBkRGAEm7JvdnCDfGw939cEtbuuWa2.

Trace the money flow confined and circulating among certain addresses, indicating potential coordination among them. Two approaches: an algorithm for path confluences and clustering.

Path confluence among pairs of nodes from the shortlist of 22 nodes: 102 paths of confluence were found.

This identifies 23 further addresses that the original suspect addresses seem to be liaising with. This thus yields a new group of 45 suspect addresses.

Not in itself a definitive proof of culpability: it is only indicative.

Directed clustering: flow-based

BIVA Demo.

Phetsouvanh, Oggier, Datta, EGRET: Extortion Graph Exploration Techniques in the Bitcoin Network

Oggier, Phetsouvanh, Datta, BiVA: Bitcoin Network Visualization & Analysis