MLG笔记:(九)Embeeding Entire Graphs

Introduction

不同於上一篇談到的將節點進行embedding，現在我們關注於如何將一個graph或者一個subgraph進行embedding。

Goal: want to embed a subgraph or an entire graph . Graph Graph embedding:

對應的應用： * 在醫藥上判斷某個分子(原子形成的graph)是否有毒 * Graph anomaly detection

這種方式即使看起來非常直接簡單卻也可以得到良好的效果。

第二種方法在將引入一種Virtual Node(VN)來表示graph(subgraph)，然後再進行諸如node2vec等standard node embedding tech。

具體流程是：

States in anonymous walk(AW) correspond to the index of the first time we visited the node in a random walk.

根據節點被第幾個訪問作為index來標註walk，如下圖：

我們先來看看AW長度與其對應可能AW種類之間的關係(如長度為3時可能對應111,112,121,123,122五種情況)：

可見這種正常大約呈現指數增長。

那如何將這種方式應用在graph呢，接下來：

我們如何計算上邊第3步的不同AW種類的probability分佈，即我們要進行幾次的walk呢？這邊提供一個公式:

其中代表能容忍的最大錯誤率，代表最大機率，代表AW長度對應的種類數。

上邊我們用sample得到的不同種類AW出現次數除以總次數得到機率，也許我們可以考慮其他方法：try to learn embedding of anonymous walk

具體為：

Learn a graph embedding together with all the anonymous walk embeddings : ,where is the number of sampled anonymous walks.

那麼如何對AW進行embed？如下：

經過以上步驟，我們已經獲得了graph embedding vector ，我們可以用這個做一些預測任務，諸如圖分類等。

因為有些graph中的節點具有cluster的特性，可以相應應用Hierarchical embedding的形式，如下圖示意：

Clustering/community detection:Cluster points 我們在graph上計算embedding得到，接著在上應用現有的clustering算法。
Node classification: Predict label of node based on
Link prediction: Predict edge based on
Graph classification:Graph embedding via aggregating node embeddings or anonymous random walks. Predict label based on graph embedding .