Goal: We want features that characterize the structure of an entire graph

Background: Kernel Methods

Idea: Design kernels instead of feature vectors.

What is kernel:

TODO:kernel詳情


Graph Kernels

Graph Kernel: Measure similarity between two graphs

Goal: Design graph feature vector

Key idea: Bag-of-words(BoW) for a graph

假如簡單的將nodes當做words那麼如上圖將很容易造成對不同的圖提取出相同的feature vector。所以嘗試用node degree來提取feature vector,如下圖:

Both Graphlet Kernel and Weisfeiler-Lehman(WL) Kernel use Bag-of-* representation of graph, where * is more sophisticated than node degrees.

  1. Graphlet Kernel

    key idea: Count the number of different graphlets in a graph

    注意:這邊的graphlet和node-level feature中提到的有些許不同。

    Given graph , and a graphlet list , define the graphlet count vector(GCV) as # for

    Compute graphlet kernel:

    Given two graphs, and , graphlet kernel is computed as

    計算graphlet kernel時可以用的GCV的轉置矩陣dot product上的GCV得到。但當兩個graph的size不同時結果會發生偏差,所以使用以下方法normalize:

    Limitations: Counting graphlets is expensive.

    Graphlet kernel的局限性在於對graphlet的計數所花的計算成本非常高。

  2. Weisfeiler-lehman Kernel

Goal: design an efficient graph feature descriptor

Idea: use neighborhood structure to iteratively enrich node vocabulary.

Algorithm to achieve this: color refinement

Color Refinement:

e.g.

將節點以其鄰居節點情況進行合併

應用hash table:

再進行合併:

再進行hash:

以(hash-合併)步驟持續迭代,而後WL kernel統計不同顏色的數量形成顏色向量(11 12 13似乎有勘誤):

視覺化演示

接著計算兩個向量的inner product即得到WL Kernel value:

WLkernel在計算上是非常高效的,其複雜度與節點數量呈線性關係。


參考

  • http://web.stanford.edu/class/cs224w/
  • https://github.com/holgerdell/color-refinement