前言

因為做林軒田老師作業的原因接觸到了由同系Machine Learning Group團隊開發的LIBLINEAR套件。該套件提供了對超大數據量下之各種線性分類器的支持,包括:

  • L2-regularized classifiers
  • L2-loss linear SVM, L1-loss linear SVM, and logistic regression (LR)
  • L1-regularized classifiers (after version 1.4)
  • L2-loss linear SVM and logistic regression (LR)
  • L2-regularized support vector regression (after version 1.9)
  • L2-loss linear SVR and L1-loss linear SVR.
  • L2-regularized one-class support vector machines (after version 2.40)

MORE INFO


作者環境

1
2
3
LIBLINEAR Version: 2.43
Python Version: Anaconda with pyhton 3.8
IDE: Pycharm

python下套件安裝

在V2.43後安裝已經非常方便,直接使用pip安裝即可。

1
pip install -U liblinear-official

當然,LIBLINEAR也支持不同的平台和語言,更多的安裝方式可以在上邊的MORE INFO中找到。


Data格式

LIBLINEAR對資料格式有特別的要求,在應用套件之前應該先將原本的data set處理為目標格式。其格式形式為:Label feature_index_1:feature_value_1 feature_index_2:feature_value_2 ... feature_index_N:feature_value_N 例如(套件自帶範例heart_scale):

1
2
3
4
5
6
7
8
9
10
+1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 8:-0.419847 9:-1 10:-0.225806 12:1 13:-1 
-1 1:0.583333 2:-1 3:0.333333 4:-0.603774 5:1 6:-1 7:1 8:0.358779 9:-1 10:-0.483871 12:-1 13:1
+1 1:0.166667 2:1 3:-0.333333 4:-0.433962 5:-0.383562 6:-1 7:-1 8:0.0687023 9:-1 10:-0.903226 11:-1 12:-1 13:1
-1 1:0.458333 2:1 3:1 4:-0.358491 5:-0.374429 6:-1 7:-1 8:-0.480916 9:1 10:-0.935484 12:-0.333333 13:1
-1 1:0.875 2:-1 3:-0.333333 4:-0.509434 5:-0.347032 6:-1 7:1 8:-0.236641 9:1 10:-0.935484 11:-1 12:-0.333333 13:-1
-1 1:0.5 2:1 3:1 4:-0.509434 5:-0.767123 6:-1 7:-1 8:0.0534351 9:-1 10:-0.870968 11:-1 12:-1 13:1
+1 1:0.125 2:1 3:0.333333 4:-0.320755 5:-0.406393 6:1 7:1 8:0.0839695 9:1 10:-0.806452 12:-0.333333 13:0.5
+1 1:0.25 2:1 3:1 4:-0.698113 5:-0.484018 6:-1 7:1 8:0.0839695 9:1 10:-0.612903 12:-0.333333 13:1
+1 1:0.291667 2:1 3:1 4:-0.132075 5:-0.237443 6:-1 7:1 8:0.51145 9:-1 10:-0.612903 12:0.333333 13:1
+1 1:0.416667 2:-1 3:1 4:0.0566038 5:0.283105 6:-1 7:1 8:0.267176 9:-1 10:0.290323 12:1 13:1


Python中使用

常用函數:

  • svm_read_problem

  • train

  • predict

函數參數詳情:

例子:

1
2
3
4
5
6
7
8
9
10
11
from liblinear.liblinearutil import * # Updated in V2.43

# 獲取訓練數據,從本目錄下讀取訓練檔和測試檔到y_tr(label),x_tr(feature info)
y_tr, x_tr = svm_read_problem('.\\f_traindata')
y_te, x_te = svm_read_problem('.\\f_testdata')

# 設計訓練參數和訓練模型(以L2-regularized logistic regression為例)
model1 = train(y_tr,x_tr,'-s 0 -c 5000 -e 0.000001')

# 使用模型進行預測
predict(y_te,x_te,model1)

訓練參數與實際應用例子

關於函數參數的選擇應參考實際用途,涉及數學的參數應比對下邊document中定義之模型數學式比對計算。如L2-regularized LR在document中的數學定義為:

即:

應對比自己應用如:

計算C值作為參數傳入train函數中。


參考