Since DSS 3.0, XGBoost is natively integrated into DSS virtual machine learning, meaning you can train XGBoost models without writing any code or using any custom model.
You'll find more information about how to use XGBoost in visual machine learning in the reference documentation.
The rest of this Howto covers how to use XGBoost manually, as a custom model. This can still be useful if you want to further customize XGBoost settings.
XGBoost is a gradient boosting tree algorithm:
- Gradient because it uses an optimization method called "gradient descent": it's a way to find a local minimum of a function: the algorithm follows the path of the descent.
- Boosting is a technique which consist of the fact that a set of weak learners is stronger than a single strong learner.
- And tree because it's a decision tree algorithm.
Since DSS 3.0, XGBoost is natively installed with DSS. You only need to follow this part if you are running DSS 2.X
In DSS 2.X, you have to download XGBoost from Github, compile it and install the Python module within the Python of DSS.
In your server console:
To check if everything works fine you can try to
import xgboost in a Jupyter notebook.
XGBoost in a custom model
Take a dataset and in your analysis, go to Models / Settings / Algorithms and scroll down until Custom Python Model. Click on Code Sample and select XGBoost, you'll be able to insert the model in the text area:
Let's compare it to scikit learn Gradient Boosting with both default parameter:
Same R2 score but XGBoost was trained in 20 seconds against 5 minutes for the scikit learn GBT!
You can now deploy it like another model in DSS!
Check out this blog post by Matt if you want to know more about how to optimise parameters and use more advanced features like early stopping.