XGBoost

開発元 The XGBoost Contributors

初版 2014年3月27日 (10年前)

最新版

2.1.4^[1]

/ 2025年2月7日 (46日前)

リポジトリ

github.com/dmlc/xgboost

XGBoost

開発元	The XGBoost Contributors
初版	2014年3月27日 (10年前) (2014-03-27)

最新版	2.1.4^[1] / 2025年2月7日 (46日前)
リポジトリ	github.com/dmlc/xgboost
プログラミング言語	C++
対応OS	Linux macOS Windows
種別	機械学習
ライセンス	Apache License 2.0
公式サイト	xgboost.ai
テンプレートを表示

XGBoost^[2]は、 C++、Java、Python^[3]、R^[4]、Julia^[5]、Perl ^[6]、Scala用の正則化勾配ブースティングフレームワークを提供するオープンソースソフトウェアライブラリ。 Linux、Windows^[7]、macOSで動作する^[8]。プロジェクトの説明によると、「スケーラブルでポータブルな分散型勾配ブースティング（GBM、GBRT、GBDT）ライブラリ」を提供することを目的としている。単一のマシンだけでなく、分散処理フレームワークであるApache Hadoop、Apache Spark、Apache Flink、Daskでも動作する^[9]^[10]。

機械学習コンテストの優勝チームの多くが選択するアルゴリズムとして、人気と注目を集めている^[11]。

同様に勾配ブースティングに基づくアルゴリズムとして、LightGBMとCatBoostが存在する。

XGBoostは、Distrubuted (Deep) Machine Learning Community (DMLC) グループの一員であるTianqi Chen氏の研究プロジェクトとしてスタートした^[12]。当初は、libsvmの設定ファイルで設定可能なターミナル・アプリケーションだった。 Higgs Machine Learning Challenge で優勝した際に使用されたことで、機械学習コンテストの世界で広く知られるようになった。その後すぐにPythonとRのパッケージが作られ、Java、Scala、Julia、Perl、その他の言語のパッケージ実装ができた。これにより、XGBoost はより多くの開発者に利用されるようになり、Kaggleコミュニティでも人気を博し、多くのコンペティションで利用されている^[11]。

すぐに他の多くのパッケージと統合され、それぞれのコミュニティでの使用が容易になった。 Pythonユーザーにはscikit-learn、Rユーザーにはcaretパッケージと統合された。また、抽象化されたRabit^[13]とXGBoost4Jを使って、Apache Spark、 Apache Hadoop、Apache FLINK^[14] などのデータフローフレームワークに統合することもできる。XGBoostは、OpenCL for FPGAでも利用できる^[15] 。 XGBoostの効率的でスケーラブルな実装は、Tianqi ChenとCarlos Guestrinによって発表された^[16]。

特徴

XGBoostは、他の勾配ブースティングアルゴリズムとは異なる、以下の様な特徴を持っている^[17]^[18]^[19]。

Clever penalization of trees
A proportional shrinking of leaf nodes
Newton Boosting
Extra randomization parameter
Implementation on single, distributed systems and out-of-core computation
Automatic Feature selection

アルゴリズム

XGBoostは、関数空間でニュートンラフソンとして動作する。関数空間で勾配降下法として機能する勾配ブースティングとは異なり、損失関数に2次テイラー近似を使用してニュートンラフソン法との関連性を持たせている。

一般的な非正則化 XGBoost アルゴリズムは次の通り。

Input: training set $\{(x_{i},y_{i})\}_{i=1}^{N}$ , a differentiable loss function $L(y,F(x))$ , a number of weak learners $M$ and a learning rate $\alpha$ .

Algorithm:

Initialize model with a constant value:
${\hat {f}}_{(0)}(x)={\underset {\theta }{\arg \min }}\sum _{i=1}^{N}L(y_{i},\theta ).$
For m = 1 to M:
1. Compute the 'gradients' and 'hessians':
  ${\hat {g}}_{m}(x_{i})=\left[{\frac {\partial L(y_{i},f(x_{i}))}{\partial f(x_{i})}}\right]_{f(x)={\hat {f}}_{(m-1)}(x)}.$
  
  ${\hat {h}}_{m}(x_{i})=\left[{\frac {\partial ^{2}L(y_{i},f(x_{i}))}{\partial f(x_{i})^{2}}}\right]_{f(x)={\hat {f}}_{(m-1)}(x)}.$
2. Fit a base learner (or weak learner, e.g. tree) using the training set $\displaystyle \{x_{i},-{\frac {{\hat {g}}_{m}(x_{i})}{{\hat {h}}_{m}(x_{i})}}\}_{i=1}^{N}$ by solving the optimization problem below:
  ${\hat {\phi }}_{m}={\underset {\phi \in \mathbf {\Phi } }{\arg \min }}\sum _{i=1}^{N}{\frac {1}{2}}{\hat {h}}_{m}(x_{i})\left[-{\frac {{\hat {g}}_{m}(x_{i})}{{\hat {h}}_{m}(x_{i})}}-\phi (x_{i})\right]^{2}.$
  
  ${\hat {f}}_{m}(x)=\alpha {\hat {\phi }}_{m}(x).$
3. Update the model:
  ${\hat {f}}_{(m)}(x)={\hat {f}}_{(m-1)}(x)+{\hat {f}}_{m}(x).$
Output ${\hat {f}}(x)={\hat {f}}_{(M)}(x)=\sum _{m=0}^{M}{\hat {f}}_{m}(x).$

特徴

アルゴリズム

賞

関連項目

脚注

外部リンク

Related Articles