Paper: Anti-Money Laundering in Bitcoin: Experimenting with GCN Networks for Financial Forensics

前言

Last Updated : 2022/04/21

Paper titile: Anti-Money Laundering in Bitcoin: Experimenting with GCN Networks for Financial Forensics

Reference: Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I. Weidele, Claudio Bellei, Tom Robinson, and Charles E. Leiserson. 2019. Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics. In Proceedings of ACM Conference (KDD ’19 Workshop on Anomaly Detection in Finance). ACM, New York, NY, USA, 7 pages.

Keywords: Graph Convolutional Networks, Anomaly Detection, Financial Forensics, Cryptocurrency, Anti-Money Laundering, Visualization

背景

本篇是我在看完GCN模型後，為其構思合適的異常偵測應用場景時所了解到的一個方向，即將GCN網路應用於比特幣交易反洗錢。雖然本篇的結果顯示單純的套用GCN模型在本資料集上的表現遜色於Random Forest的方法，但在看好graph-based method的同時也帶來了一些改善GCN的思考。本文作者來自MIT/IBM/Elliptic(一家從事區塊鏈資料分析的公司)，以下做整理。

金融背景

Anti-money laundering (AML) regulations play a critical role in safeguarding financial systems, but bear high costs for institutions and drive financial exclusion for those on the socioeconomic and international margins. The advent of cryptocurrency has introduced an intriguing paradox: pseudonymity allows criminals to hide in plain sight, but open data gives more power to investigators and enables the crowdsourcing of forensic analysis.

這邊主要是提到反洗錢工作的重要性，以及虛擬貨幣時代到來對反洗錢工作的機遇和挑戰。我們主要關注在技術內容，這部分不多贅述。

Elliptic Dataset

參見上篇已整理文件內容

Task

AML analytics is an anomaly detection challenge of accurately classifying a small number of illicit transactions in massive, ever-growing data sets.

We want to reduce false positive rates without increasing false negative rates, i.e. include more innocent people without allowing more criminals.

主要任務在於處理這種資料量巨大、label數量非常不平衡、實時增長的資料集類型。

Methods

benchmark method:

Bench mark ML methods use the first 94 features in supervised learning for binary classification.Such techniques include Logistic Regression(LR) , Multilayer Perceptron (MLP), and Random Forest(RF).

LR: for explainability

MLP: each input neuron takes in a data feature and the output is a softmax with a probability vector for each class.

RF: for accuracy

這邊是決定一些用來比較的傳統ML方法，例如在可解釋性上有優勢的LR，處理不平衡資料集的常用模型RF等

Graph-based ML method:

Original GCN

We consider Graph Convolutional Networks (GCNs). A GCN consists of multiple layers of graph convolution, which is similar to a perceptron but additionally uses a neighborhood aggregation step motivated by spectral convolution.Consider the Bitcoin transaction graph from the Elliptic Data Set as , where is the set of node transactions and is the set of edges representing the flow of BTC. The -th layer of the GCN takes the adjacency matrix and the node embedding matrix as input, and uses a weight matrix to update the node embedding matrix to as output. Mathematically, we write

where is a normalization of defined as:

and is typically ReLU, , output layer: Softmax

A 2-layer GCN, are often used, can be written as:

Skip-GCN

A “skip” variant, which we find practically useful, inserts a skip connection between the intermediate embedding.

Where is a weight matrix for the skip connection.

EvolveGCN

Financial data are inherently temporal as transactions are time stamped.A prediction model will be more useful if it is designed in a manner to capture the dynamism.This way, a model trained on a given time period may better generalize to subsequent time steps. The better the model captures system dynamics, which are also evolving, the longer horizon it can forest into.

Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, and Charles E. Leiserson. 2019. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. Preprint arXiv:1902.10191.

這邊主要介紹了本文用來實驗的graph-based ML方法，都是GCN家族的，包括傳統的GCN和微改良後的Skip-GCN，已經可以處理隨時間變化資料集的EvolveGCN.

Experiments

We performed a 70:30(train/test) temporal split of training and test data, respectively.y. That is, the first 34 time steps(training) are used for training the model and the last 15 for test. 進行train test split時通過time step進行切割，以模擬真實使用的情況。

MLP: one hidden layer of 50 neurons, 200 epochs, Adam optimizer, a learning rate of 0.001.

GCN: 2-layer, 1000 epochs, Adam optimizer, learning rate of 0.001, size of node embeddings to be 100, weighted cross entropy loss(for property of imbalance), 0.3/0.7 ratio for the licit and illicit classes.

這邊我們最重要的指標是，上表體現出以下幾個結論：

使用aggregated feature可以提高模型效果，這也是變相的使用到了graph的特性
單純比較模型效果，最佳的是random forest，其中又以使用aggregated feature和GCN前處理得到的node Embedding的RF效果為全表最佳。
使用GCN先獲得node Embedding可以提高模型效果，把GCN當做一個前處理的工具

上表體現出利用到temporal info. 的EvolveGCN相較於Skip-GCN效果更佳。

One interesting aspect of this data set is the sudden closure of a dark market occurring during the time span of the data (at time step 43).As seen in Figure 2, this event causes all methods to perform poorly after the shutdown.

上圖體現出在市場出現劇烈波動時，如Dark market shutdown時，所有模型的效果都劇烈下跌，即其針對突發情況的robustness還有充分的改進空間。

總結

本篇提供了一個在比特幣交易反洗錢異常偵測上的GCN模型實驗，其結論是最佳的主體模型仍是傳統的Random Forest，而GCN可被考慮用作獲得node Embedding的前處理工具。作者認為graph-base的方法有非常好的前景，將RF與GCN做進一步的整合或許是未來演進的方向。

Roy's Blog