(Tutorial) Pandas Add Column – DataCamp

[ad_1]

You might be by no means caught with simply the information you might be given. As an alternative, you’ll be able to add new columns to a DataFrame. This has many names, reminiscent of remodeling, mutating, and characteristic engineering.

You’ll be able to create new columns from scratch, however it’s also frequent to derive them from different columns, for instance, by including columns collectively or by altering their models.

Deriving a Column

Utilizing a canine dataset, as an example you need to add a brand new column to your DataFrame that has every canine’s top in meters as a substitute of centimeters.

On the left-hand aspect of the equals, you employ sq. brackets with the identify of the brand new column you need to create, on this case, height_m. On the right-hand aspect, you might have the calculation.

canine["height_m"] = canine["height_cm"] / 100
print(canine)
        identify        breed   colour   height_cm   weight_kg   date_of_birth   height_m
0      Bella     Labrador   Brown          56          24      2013-07-01       0.56
1    Charlie       Poodle   Black          43          24      2016-09-16       0.43
2       Lucy    Chow Chow   Brown          46          24      2014-08-25       0.46
3     Cooper    Schnauzer    Grey          49          17      2016-09-16       0.49
4        Max     Labrador   Black          59          29      2016-09-16       0.59
5     Stella    Chihuahua     Tan          18           2      2016-09-16       0.18
6     Bernie  St. Bernard   White          77          74      2016-09-16       0.77

Discover that each the present and the derived column are within the dataframe you modified.

Including a Column

On this instance, you’ll calculate doggy mass index and add it as a column to your dataframe. BMI stands for physique mass index, which is calculated by weight in kilograms divided by their top in meters, squared.

canine["height_m"] = canine["weight_kg"] / canine["height_m"] ** 2
print(canine.head())
        identify        breed   colour   height_cm   weight_kg   date_of_birth   height_m          bmi
0      Bella     Labrador   Brown          56          24      2013-07-01       0.56    76.530612
1    Charlie       Poodle   Black          43          24      2016-09-16       0.43   129.799892
2       Lucy    Chow Chow   Brown          46          24      2014-08-25       0.46   113.421550
3     Cooper    Schnauzer    Grey          49          17      2016-09-16       0.49    70.803832
4        Max     Labrador   Black          59          29      2016-09-16       0.59    83.309394

Once more, the brand new column is on the left-hand aspect of the equals, however this time, our calculation includes two columns.

Including a Column With A number of Manipulations

The actual energy of pandas is available in once you mix all the abilities that you’ve discovered up to now. Let’s determine the names of thin, tall canine.

First, to outline the thin canine, you’re taking the subset of canine which have a BMI of lower than 100. Subsequent, you kind the peak in descending order of top to get the tallest skinny canine on the prime.

Lastly, this time you’ll solely hold the columns you have an interest in.

bmi_lt_100 = canine[dogs["bmi"] < 100]
bmi_lt_100_height = bmi_lt_100.short_values("height_cm", ascending=False)
bmi_lt_100_height[["name", "height_cm", "bmi"]]
        identify      height_cm           bmi
4        Max             59     83.309394
0      Bella             56     76.530612
3     Cooper             49     70.803832
5     Stella             18     61.728395

Right here, you’ll be able to see that Max is the tallest canine with a BMI of below 100.

Interactive Instance

Within the beneath instance, you add a brand new column to DataFrame homelessness, named whole, containing the sum of the people and family_members columns. Then, add one other column to homelessness, named p_individuals, containing the proportion of homeless folks in every state who’re people. Lastly, print the homelessness dataframe.

# Add whole col as sum of people and family_members
homelessness["total"] = homelessness["individuals"] + homelessness["family_members"]

# Add p_individuals col as proportion of people
homelessness["p_individuals"] = homelessness["individuals"] / homelessness["total"]

# See the end result
print(homelessness)

After we run the above code, it produces the next end result:

                area                 state  people  family_members  state_pop     whole  p_individuals
0   East South Central               Alabama       2570.0           864.0    4887681    3434.0          0.748
1              Pacific                Alaska       1434.0           582.0     735139    2016.0          0.711
2             Mountain               Arizona       7259.0          2606.0    7158024    9865.0          0.736
3   West South Central              Arkansas       2280.0           432.0    3009733    2712.0          0.841
4              Pacific            California     109008.0         20964.0   39461588  129972.0          0.8
...
48      South Atlantic         West Virginia       1021.0           222.0    1804291    1243.0          0.821
49  East North Central             Wisconsin       2740.0          2167.0    5807406    4907.0          0.558
50            Mountain               Wyoming        434.0           205.0     577601     639.0          0.679

Try it for yourself.

To be taught extra about including new columns in pandas, please see this video from our course Data Manipulation with pandas
.

This content material is taken from DataCamp’s Data Manipulation with pandas course by Maggie Matsui and Richie Cotton.

[ad_2]

Source link

Write a comment