Regression job and pipelines

corifeanu

Rating & reviews (0 reviews)

Study notes

Before training a model, data must be preprocessed (prepare), make easy for the algorithm to fit it into the model.
The most two common preparation techniques are:

Scaling numeric features - bring all in the same range ie.

A	B	C
3	480	65

will become:

A	B	C
0.3	0.48	0.65

Encoding categorial variable

Size: S, M, L

will become:

Size: 0, 1,2

or better:

(one hot encodig)

Size_S	Size_M	Size_L
1	0	0
0	1	0
0	0	1

Preprocessing and the algorithm will be packed up into a pipeline.

# Will be used:

# sklearn

# compose

# ColumTransformer

# pipeline

# Pipeline

# impute

# SimpleImputer

# preprocessing

# StandardScaler

# OneHotEncoder

# liniar_model

# LiniarRegression

# Train the model
from sklearn.composeimport ColumnTransformer
from sklearn.pipelineimport Pipeline
from sklearn.imputeimport SimpleImputer
from sklearn.preprocessingimport StandardScaler, OneHotEncoder
from sklearn.linear_modelimport LinearRegression
import numpyas np

# preprocessing for numeric columns (scale)
numeric_features= [1,2]

numeric_transfirmer= Pipeline(

steps=[

('scaler', StandardScaler())

]

)

# preprocessing for categorial

categorial_features= [3,4]

categorial_transformer= Pipline(

steps=[

('onehot', OneHotEncoder(handle_unknown='ignore'))

]

)

# combine preprocessing bove

preprocessor= ColumnTransformer(

transformers= [

('num', numeric_transformar, numeric_features),

('cat', categorial_transformar, categorial_features),

]

)

# add into the same pipeline both preprocessing steps and the algorithm

pipeline = Pipeline(

steps = [

('peprocessor', preprocessor),

('regressor', GradientBoostingRegressor)

]

)

# Train model

model = pipeline.fit(X_train, (y_train))

print (model)

Result:

Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('num',
Pipeline(steps=[('scaler',
StandardScaler())]),
[1, 2]),
('cat',
Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore'))]),
[3, 4])])),
('regressor', GradientBoostingRegressor())])

Now is easy to use another algorithm

pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('regressor', RandomForestRegressor())])

# Save job:

import joblib

joblib.dump(model, 'my_pipelined_job.pkl')

# Load job in a model

loaded_model = joblib.load('my_pipelined_job.pkl')

References:
Create machine learning models - Training | Microsoft Learn
1. Supervised learning — scikit-learn 1.2.1 documentation

Tags: job / pipeline / preprocess / steps / hot encoding / sklearn / scale / ColumnTransformer / OneHotEncoder / regressor