set booleans within columns pandas. Use to compare columns and normalize from: https://campus.datacamp.com/courses/kaggle-python-tutorial-on-machine-learning/getting-started-with-python?ex=5
You can test this by creating a new column with a categorical variable Child. Child will take the value 1 in cases where age is less than 18, and a value of 0 in cases where age is greater than or equal to 18.
To add this new variable you need to do two things (i) create a new column, and (ii) provide the values for each observation (i.e., row) based on the age of the passenger.
Adding a new column with Pandas in Python is easy and can be done via the following syntax:
your_data["new_var"] = 0
This code would create a new column in the train DataFrame titled new_var with 0 for each observation.
To set the values based on the age of the passenger, you make use of a boolean test inside the square bracket operator. With the []-operator you create a subset of rows and assign a value to a certain variable of that subset of observations. For example,
train["new_var"][train["Fare"] > 10] = 1
would give a value of 1 to the variable new_var for the subset of passengers whose fares greater than 10. Remember that new_var has a value of 0 for all other values (including missing values).
A new column called Child in the train data frame has been created for you that takes the value NaN for all observations.
# Create the column Child and assign to 'NaN'
train["Child"] = float('NaN')
# Assign 1 to passengers under 18, 0 to those 18 or older. Print the new column.
train["Child"][train["Age"] < 18] = 1
train["Child"][train["Age"] >= 18] = 0
print(train["Child"])
# Print normalized Survival Rates for passengers under 18
print(train["Survived"][train["Child"] == 1].value_counts(normalize = True))
# Print normalized Survival Rates for passengers 18 or older
print(train["Survived"][train["Child"] == 0].value_counts(normalize = True))
Output
Name: Child, dtype: float64
1 0.539823
0 0.460177
Name: Survived, dtype: float64
0 0.618968
1 0.381032
Name: Survived, dtype: float64