import pandas
df = pandas.read_csv("titanic.csv")
print(df.head())
df["age_group"] = pandas.cut(x=df["age"], bins=[0, 5, 18, 60, 100], labels=["toddler", "young", "adult", "senior"])
print(df["age_group"])
Here, we are creating a new column “age_group” in the DataFrame. We are using the pandas.cut() function to discretize the numerical values contained in the “age” column and the discretized values are kept in the “age_group” column.
The x=df[“age”] parameter in the cut() function indicates the column whose values are discretized. The bins parameter represents the bins. And the labels parameter indicates the labels as per the binning.
So, those aged 0 to 5 years should be labeled as toddlers. Those aged 5 to 18 years should be labeled as young. Those who are more than 18, but less than 60 years should be labeled as adults. And the rest should be labeled as a senior. And these discretized values will be kept in a separate column named “age_group”.
The output of the above program will be:
survived pclass sex age ... deck embark_town alive alone
886 0 2 male 27.0 ... NaN Southampton no True
887 1 1 female 19.0 ... B Southampton yes True
888 0 3 female NaN ... NaN Southampton no False
889 1 1 male 26.0 ... C Cherbourg yes True
890 0 3 male 32.0 ... NaN Queenstown no True
[5 rows x 15 columns]
0 adult
1 adult
2 adult
3 adult
4 adult
...
886 adult
887 adult
888 NaN
889 adult
890 adult
Name: age_group, Length: 891, dtype: category
Categories (4, object): ['toddler' < 'young' < 'adult' < 'senior']








































0 Comments