4/19/2018 - 2:47 PM


XGBoost is an implementation of the Gradient Boosted Decision Trees algorithm (scikit-learn has another version of this algorithm, but XGBoost has some technical advantages.)

The n_estimators specifies how many times to go through the modeling cycle described above.

In general, a small learning rate (and large number of estimators) will yield more accurate XGBoost models, though it will also take the model longer to train since it does more iterations through the cycle.

The argument early_stopping_rounds offers a way to automatically find the ideal value. Early stopping causes the model to stop iterating when the validation score stops improving, even if we aren't at the hard stop for n_estimators. It's smart to set a high value for n_estimators and then use early_stopping_rounds to find the optimal time to stop iterating.

Use early stopping to find a good value for n_estimators. Then re-estimate the model with all of your training data, and that value of n_estimators.

my_model = XGBRegressor(n_estimators=1000, learning_rate=0.05)
my_model.fit(train_X, train_Y, early_stopping_rounds=5,eval_set=[(test_X, test_Y)], verbose=False)

# make predictions
predictions = my_model.predict(test_X)

# print mae
print("Mean Absolute Error : " + str(mean_absolute_error(predictions, test_y)))