To drop a column from a pandas DataFrame, the easiest way is to use the pandas drop() function.
df.drop(columns=["Column1"]) #drop "Column1" using columns parameter
df.drop(["Column1"],axis=1) #drop "Column1" using axis parameter
When working with data, it can be useful to add or delete elements from your dataset easily. By deleting columns from your data, you are able to focus more on the variables that matter. In addition, removing unnecessary columns can make data processing much faster and more efficient.
When working with pandas, we can easily drop rows and columns with the pandas drop() function.
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
'Age': [43,23,71,49,52,37] })
print(df)
# Output:
Name Weight Height Age
0 Jim 130.54 50.10 43
1 Sally 160.20 68.94 23
2 Bob 209.45 71.42 71
3 Sue 150.35 48.56 49
4 Jill 117.73 59.37 52
5 Larry 187.52 63.42 37
We can drop columns from our DataFrame with the drop() function. To drop columns from a DataFrame, you can use the parameter “columns”, or pass the parameter “axis=1” to the drop() function.
print(df.drop(columns=["Height","Age"]))
print(df.drop(["Height","Age"], axis=1))
# Output:
Name Weight
0 Jim 130.54
1 Sally 160.20
2 Bob 209.45
3 Sue 150.35
4 Jill 117.73
5 Larry 187.52
Name Weight
0 Jim 130.54
1 Sally 160.20
2 Bob 209.45
3 Sue 150.35
4 Jill 117.73
5 Larry 187.52
Like many other pandas functions, you can pass the “inplace” parameter to perform the drop inplace and return a new DataFrame with the dropped rows or columns.
Dropping Columns with the dropna() pandas Function
When working with data, many time we need to deal with missing values in our datasets. One way to deal with missing data is to drop them from our dataset, and the pandas package has a very useful function for deleting columns with NaN values.
If you want to delete columns with missing values, we can use the pandas dropna() function.
Let’s say I have the following DataFrame of summarized data:
animal_type gender type variable level count sum mean std min 25% 50% 75% max
0 cat female numeric age N/A 5.0 18.0 3.60 1.516575 2.0 3.00 3.0 4.00 6.0
1 cat male numeric age N/A 2.0 3.0 1.50 0.707107 1.0 1.25 1.5 1.75 2.0
2 dog female numeric age N/A 2.0 8.0 4.00 0.000000 4.0 4.00 4.0 4.00 4.0
3 dog male numeric age N/A 4.0 15.0 3.75 1.892969 1.0 3.25 4.5 5.00 5.0
4 cat female numeric weight N/A 5.0 270.0 54.00 32.093613 10.0 40.00 50.0 80.00 90.0
5 cat male numeric weight N/A 2.0 110.0 55.00 63.639610 10.0 32.50 55.0 77.50 100.0
6 dog female numeric weight N/A 2.0 100.0 50.00 42.426407 20.0 35.00 50.0 65.00 80.0
7 dog male numeric weight N/A 4.0 180.0 45.00 23.804761 20.0 27.50 45.0 62.50 70.0
8 cat female categorical state FL 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
9 cat female categorical state NY 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
10 cat female categorical state TX 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
11 cat male categorical state CA 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
12 cat male categorical state TX 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
13 dog female categorical state FL 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
14 dog female categorical state TX 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
15 dog male categorical state CA 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
16 dog male categorical state FL 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
17 dog male categorical state NY 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
18 cat female categorical trained yes 5.0 NaN NaN NaN NaN NaN NaN NaN NaN
19 cat male categorical trained no 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
20 dog female categorical trained no 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
21 dog female categorical trained yes 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
22 dog male categorical trained no 4.0 NaN NaN NaN NaN NaN NaN NaN NaN
In this DataFrame, we have a lot of NaN values.
To remove columns with missing values, we can use the pandas dropna() function.
Let’s say that we want to delete all of the columns which contain NaN values. If we want to drop all of the columns which contain NaN values, we can pass ‘axis=1’ to dropna(). The following code will remove all columns with NaN values from our DataFrame.
df.dropna(axis=1)
animal_type gender type variable level count
0 cat female numeric age N/A 5.0
1 cat male numeric age N/A 2.0
2 dog female numeric age N/A 2.0
3 dog male numeric age N/A 4.0
4 cat female numeric weight N/A 5.0
5 cat male numeric weight N/A 2.0
6 dog female numeric weight N/A 2.0
7 dog male numeric weight N/A 4.0
8 cat female categorical state FL 2.0
9 cat female categorical state NY 1.0
10 cat female categorical state TX 2.0
11 cat male categorical state CA 1.0
12 cat male categorical state TX 1.0
13 dog female categorical state FL 1.0
14 dog female categorical state TX 1.0
15 dog male categorical state CA 1.0
16 dog male categorical state FL 1.0
17 dog male categorical state NY 2.0
18 cat female categorical trained yes 5.0
19 cat male categorical trained no 2.0
20 dog female categorical trained no 1.0
21 dog female categorical trained yes 1.0
22 dog male categorical trained no 4.0
Hopefully this article has been beneficial for you to understand how to delete and columns from your pandas DataFrames using the pandas drop() function in Python.