前言
本筆記來自台大林軒田老師的CSIE5043 Machine Learning 2021FALL課程,於此做一個學習記錄。
What is Machine Learning
skill: improve some performance measure(e.g. prediction accuracy)
技巧是指提高效能觀測指標的能力,例如提高預測的精確度。
相對於人類從觀察進行技巧學習的模式,機器從data中進行技巧學習。
Why use ML
例如在樹木辨識這一議題中,倘若使用傳統的編程解決,那麼將面臨樹木這一概念難以定義的問題。而建構一個ML樹木辨識系統卻相對簡單的多。這意味著面對複雜問題時,ML是一種可選的替代方案。
Some Use Scenarios(ML應用場景):
when human cannot program the system manually.
e.g. navigating on Mars
when human cannot 'define the solution' easily.
e.g. speech/visual recognition
when needing rapid decisions that huamns cannot do.
e.g. high-frequency trading
when needing to be user-oriented in a massive scale
e.g. consumer-targeted marketing
Key Essence of ML(使用ML的要素)
- exists some 'underlying pattern' to be learned. So 'performance measure' can be improved
- but no programmable(easy) definition.
- somehow there is data about pattern. So ML has some 'inputs' to learn from
Application of ML
疾病診斷:
4G通訊:
缺陷PCB檢測:
Components of ML
Formalize the Learning Problem:
ML: use data to compute hypothesis
that approximates target ML:用資料集從潛在函數集合中選擇出最接近理想target function的一個 >
ML and other fields
ML and Data Mining
ML: use data to compute hypothesis
that approximates target DM: use (huge) data to find property that is interesting.
在現實情況中ML和DM往往難以區分。
ML and Statistics
ML: use data to compute hypothesis
that approximates target Statistics: use data to make inference about an unknown process.
現實中統計學為ML提供了許多可用的工具
Perceptron(感知器) Hypothesis Set
Perceptron is a simple hypothesis set.
對消費者的各種feature(年齡薪水等)設計不同的權重,當其feature加權和大於某門檻時則允許發卡,反之拒絕。
將hypothesis set中的函數h(x)繼續用數學推導簡化(將threshold歸納進權重矩陣):
不同的
Perceptrons in
隨著vector
如何選擇接近target function
迭代過程:
省略
PLA可行性證明:
因為
$$
$$
代入故有以下式二:
$$
$$
由式一式二累加相消且取
即在線性可分的情況下,PLA進行mistake corrections的上限次數被約束在