masdeseiscaracteres
8/30/2017 - 6:49 AM

scikit-learn pipelines

scikit-learn pipelines

A sklearn.pipeline.Pipeline object can be used to chain multiple estimators into one. All estimators in a pipeline, except possibly the last one, must be transformers (i.e. must have a transform method). The last estimator may be of any type (transformer, classifier, regressor, etc.).

sklearn.pipeline.Pipeline method descriptions*:

fit(X[,y]) = [fit_transform(X[,y])...] + [fit(X[,y])]
fit_predict(X[,y]) = [fit_transform(X[,y])...] + fit_predict(X[,y])
fit_transform(X[,y]) = [fit_transform(X[,y])...] + fit_transform(X[,y])
predict(X) = [transform(X)...] + predict(X)

* in http://docopt.org/ notation (second added refers to the last estimator)

Example:

import numpy as np
import sklearn.preprocessing
import sklearn.linear_model
from sklearn.pipeline import Pipeline

p = Pipeline([
    ('scaler', sklearn.preprocessing.StandardScaler()),
    ('est', sklearn.linear_model.LinearRegression(fit_intercept=False))
])

X = np.random.randn(10, 3)
y = np.random.randn(10)

p.fit_transform(X,y) #try whatever you want