To drop rows, or columns, from a pandas DataFrame, the easiest way is to use the pandas drop() function.
df.drop(1) #drop the row with index 1
When working with data, it can be useful to add or delete elements from your dataset easily. By deleting elements from your data, you are able to focus more on the elements that matter. In addition, removing unnecessary rows and columns can make data processing much faster and more efficient.
When working with pandas, we can easily drop rows and columns with the pandas drop() function.
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
'Age': [43,23,71,49,52,37] })
print(df)
# Output:
Name Weight Height Age
0 Jim 130.54 50.10 43
1 Sally 160.20 68.94 23
2 Bob 209.45 71.42 71
3 Sue 150.35 48.56 49
4 Jill 117.73 59.37 52
5 Larry 187.52 63.42 37
If we want to drop the rows with index 1 and 3, we can do so easily in the following way:
print(df.drop([1,3]))
# Output:
Name Weight Height Age
0 Jim 130.54 50.10 43
2 Bob 209.45 71.42 71
4 Jill 117.73 59.37 52
5 Larry 187.52 63.42 37
We can also drop columns from our DataFrame with the drop() function. To drop columns from a DataFrame, you can use the parameter “columns”, or pass the parameter “axis=1” to the drop() function.
print(df.drop(columns=["Height","Age"]))
print(df.drop(["Height","Age"], axis=1))
# Output:
Name Weight
0 Jim 130.54
1 Sally 160.20
2 Bob 209.45
3 Sue 150.35
4 Jill 117.73
5 Larry 187.52
Name Weight
0 Jim 130.54
1 Sally 160.20
2 Bob 209.45
3 Sue 150.35
4 Jill 117.73
5 Larry 187.52
Like many other pandas functions, you can pass the “inplace” parameter to perform the drop inplace and return a new DataFrame with the dropped rows or columns.
Dropping Rows and Columns with the dropna() pandas Function
When working with data, many time we need to deal with missing values in our datasets. One way to deal with missing data is to drop them from our dataset, and the pandas package has a very useful function for dropping rows with duplicates and dropping rows with NaN values.
If you want to drop rows or columns with missing values, we can use the pandas dropna() function.
Let’s say I have the following DataFrame of summarized data:
animal_type gender type variable level count sum mean std min 25% 50% 75% max
0 cat female numeric age N/A 5.0 18.0 3.60 1.516575 2.0 3.00 3.0 4.00 6.0
1 cat male numeric age N/A 2.0 3.0 1.50 0.707107 1.0 1.25 1.5 1.75 2.0
2 dog female numeric age N/A 2.0 8.0 4.00 0.000000 4.0 4.00 4.0 4.00 4.0
3 dog male numeric age N/A 4.0 15.0 3.75 1.892969 1.0 3.25 4.5 5.00 5.0
4 cat female numeric weight N/A 5.0 270.0 54.00 32.093613 10.0 40.00 50.0 80.00 90.0
5 cat male numeric weight N/A 2.0 110.0 55.00 63.639610 10.0 32.50 55.0 77.50 100.0
6 dog female numeric weight N/A 2.0 100.0 50.00 42.426407 20.0 35.00 50.0 65.00 80.0
7 dog male numeric weight N/A 4.0 180.0 45.00 23.804761 20.0 27.50 45.0 62.50 70.0
8 cat female categorical state FL 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
9 cat female categorical state NY 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
10 cat female categorical state TX 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
11 cat male categorical state CA 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
12 cat male categorical state TX 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
13 dog female categorical state FL 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
14 dog female categorical state TX 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
15 dog male categorical state CA 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
16 dog male categorical state FL 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
17 dog male categorical state NY 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
18 cat female categorical trained yes 5.0 NaN NaN NaN NaN NaN NaN NaN NaN
19 cat male categorical trained no 2.0 NaN NaN NaN NaN NaN NaN NaN NaN
20 dog female categorical trained no 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
21 dog female categorical trained yes 1.0 NaN NaN NaN NaN NaN NaN NaN NaN
22 dog male categorical trained no 4.0 NaN NaN NaN NaN NaN NaN NaN NaN
In this DataFrame, we have a lot of NaN values.
To drop rows or columns with missing values, we can use the pandas dropna() function.
Let’s say that we want to drop all of the rows which contain NaN values. The following code will remove all rows with NaN values from our DataFrame.
df.dropna()
#output:
animal_type gender type variable level count mean sum std min 25% 50% 75% max
0 cat female numeric age N/A 5.0 3.60 18.0 1.516575 2.0 3.00 3.0 4.00 6.0
1 cat male numeric age N/A 2.0 1.50 3.0 0.707107 1.0 1.25 1.5 1.75 2.0
2 dog female numeric age N/A 2.0 4.00 8.0 0.000000 4.0 4.00 4.0 4.00 4.0
3 dog male numeric age N/A 4.0 3.75 15.0 1.892969 1.0 3.25 4.5 5.00 5.0
4 cat female numeric weight N/A 5.0 54.00 270.0 32.093613 10.0 40.00 50.0 80.00 90.0
5 cat male numeric weight N/A 2.0 55.00 110.0 63.639610 10.0 32.50 55.0 77.50 100.0
6 dog female numeric weight N/A 2.0 50.00 100.0 42.426407 20.0 35.00 50.0 65.00 80.0
7 dog male numeric weight N/A 4.0 45.00 180.0 23.804761 20.0 27.50 45.0 62.50 70.0
If we want to drop all of the columns which contain NaN values, we can pass ‘axis=1’ to dropna().
df.dropna(axis=1)
animal_type gender type variable level count
0 cat female numeric age N/A 5.0
1 cat male numeric age N/A 2.0
2 dog female numeric age N/A 2.0
3 dog male numeric age N/A 4.0
4 cat female numeric weight N/A 5.0
5 cat male numeric weight N/A 2.0
6 dog female numeric weight N/A 2.0
7 dog male numeric weight N/A 4.0
8 cat female categorical state FL 2.0
9 cat female categorical state NY 1.0
10 cat female categorical state TX 2.0
11 cat male categorical state CA 1.0
12 cat male categorical state TX 1.0
13 dog female categorical state FL 1.0
14 dog female categorical state TX 1.0
15 dog male categorical state CA 1.0
16 dog male categorical state FL 1.0
17 dog male categorical state NY 2.0
18 cat female categorical trained yes 5.0
19 cat male categorical trained no 2.0
20 dog female categorical trained no 1.0
21 dog female categorical trained yes 1.0
22 dog male categorical trained no 4.0
Dropping Rows and Columns with the drop_duplicates() pandas Function
With Python, we can find and remove duplicate rows in data very easily using the pandas package and the pandas drop_duplicates() function.
Let’s say we have the following DataFrame:
df = pd.DataFrame({'Name': ['Jim','Jim','Jim','Sally','Bob','Sue','Sue','Larry'],
'Weight':['100','100','200','100','200','150','150','200']})
# Output:
Name Weight
0 Jim 100
1 Jim 100
2 Jim 200
3 Sally 100
4 Bob 200
5 Sue 150
6 Sue 150
7 Larry 200
First, let’s find the duplicate rows in this DataFrame. We can do this easily using the pandas duplicated() function. The duplicated() function returns a Series with boolean values denoting where we have duplicate rows. By default, it marks all duplicates as True except the first occurrence.
print(df.duplicated())
# Output:
0 False
1 True
2 False
3 False
4 False
5 False
6 True
7 False
dtype: bool
We see above that we have 2 duplicate rows. If we want to remove these duplicate rows, we can use the pandas drop_duplicates() function like in the following Python code:
print(df.drop_duplicates())
# Output:
Name Weight
0 Jim 100
2 Jim 200
3 Sally 100
4 Bob 200
5 Sue 150
7 Larry 200
Hopefully this article has been beneficial for you to understand how to remove rows and columns from your pandas DataFrames in Python.