ktl014
8/11/2017 - 6:17 AM

When missing values in your feature columns, we use the imputation technique to subsitute each value with the median of all the present valu

When missing values in your feature columns, we use the imputation technique to subsitute each value with the median of all the present values. When we have categorical data (I.e. "S", "C", "Q"), we need to substitute it with a unique integer so Python can handle it.

# Convert the male and female groups to integer form
train["Sex"][train["Sex"] == "male"] = 0
train["Sex"][train["Sex"] == "female"] = 1

# Impute the Embarked variable
train["Embarked"] = train["Embarked"].fillna("S")

# Convert the Embarked classes to integer form
train["Embarked"][train["Embarked"] == "S"] = 0
train["Embarked"][train["Embarked"] == "C"] = 1
train["Embarked"][train["Embarked"] == "Q"] = 2

#Print the Sex and Embarked columns
print(train["Sex"])
print(train["Embarked"])