To drop rows from a pandas DataFrame, the easiest way is to use the pandas drop() function.

df.drop(1) #drop the row with index 1

When working with data, it can be useful to add or delete elements from your dataset easily. By deleting elements from your data, you are able to focus more on the elements that matter. In addition, removing unnecessary rows and columns can make data processing much faster and more efficient.

When working with pandas, we can easily drop rows with the pandas drop() function.

Let’s say we have the following pandas DataFrame.

import pandas as pd 

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
                   'Age': [43,23,71,49,52,37] })

print(df)
# Output: 
    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

If we want to drop the rows with index 1 and 3, we can do so easily in the following way:

print(df.drop([1,3]))

# Output: 
    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
2    Bob  209.45   71.42   71
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

Like many other pandas DataFrame functions, you can pass the “inplace” parameter to perform the drop inplace and return a new DataFrame with the deleted rows.

How to Drop First Row of pandas DataFrame

If you want to drop the first row of a pandas DataFrame, all you need to do is pass ‘0’ to drop().

Below shows you how to drop the first row of a pandas DataFrame in Python.

import pandas as pd 

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
                   'Age': [43,23,71,49,52,37] })

print(df)
print(df.drop(0))

# Output: 
    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

    Name  Weight  Height  Age
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

If you want to drop the first n rows of a pandas DataFrame, all you need to do is pass ‘df.index[:n]’ to drop(), where n is the number of rows you want to drop.

Below shows you how to drop the first n rows of a pandas DataFrame in Python.

import pandas as pd 

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
                   'Age': [43,23,71,49,52,37] })

print(df)
print(df.drop(df.index[:3]))

# Output: 
    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

    Name  Weight  Height  Age
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

How to Drop Last Row of pandas DataFrame

If you want to drop the last row of a pandas DataFrame, all you need to do is pass ‘df.index[-1:]’ to drop().

Below shows you how to drop the last row of a pandas DataFrame in Python.

import pandas as pd 

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
                   'Age': [43,23,71,49,52,37] })

print(df)
print(df.drop(-1))

# Output: 
    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

    Name  Weight  Height  Age
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

If you want to drop the last n rows of a pandas DataFrame, all you need to do is pass ‘df.index[-n:]’ to drop(), where n is the number of rows you want to drop.

Below shows you how to drop the last n rows of a pandas DataFrame in Python.

import pandas as pd 

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [130.54, 160.20, 209.45, 150.35, 117.73, 187.52],
                   'Height': [50.10, 68.94, 71.42, 48.56, 59.37, 63.42],
                   'Age': [43,23,71,49,52,37] })

print(df)
print(df.drop(df.index[-3:]))

# Output: 
    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71
3    Sue  150.35   48.56   49
4   Jill  117.73   59.37   52
5  Larry  187.52   63.42   37

    Name  Weight  Height  Age
0    Jim  130.54   50.10   43
1  Sally  160.20   68.94   23
2    Bob  209.45   71.42   71

Dropping Rows with the dropna() pandas Function

When working with data, many time we need to deal with missing values in our datasets. One way to deal with missing data is to drop them from our dataset, and the pandas package has a very useful function for dropping rows with duplicates and dropping rows with NaN values.

If you want to delete rows with missing values, we can use the pandas dropna() function.

Let’s say I have the following DataFrame of summarized data:

   animal_type  gender         type variable level  count    sum   mean        std   min    25%   50%    75%    max
0          cat  female      numeric      age   N/A    5.0   18.0   3.60   1.516575   2.0   3.00   3.0   4.00    6.0
1          cat    male      numeric      age   N/A    2.0    3.0   1.50   0.707107   1.0   1.25   1.5   1.75    2.0
2          dog  female      numeric      age   N/A    2.0    8.0   4.00   0.000000   4.0   4.00   4.0   4.00    4.0
3          dog    male      numeric      age   N/A    4.0   15.0   3.75   1.892969   1.0   3.25   4.5   5.00    5.0
4          cat  female      numeric   weight   N/A    5.0  270.0  54.00  32.093613  10.0  40.00  50.0  80.00   90.0
5          cat    male      numeric   weight   N/A    2.0  110.0  55.00  63.639610  10.0  32.50  55.0  77.50  100.0
6          dog  female      numeric   weight   N/A    2.0  100.0  50.00  42.426407  20.0  35.00  50.0  65.00   80.0
7          dog    male      numeric   weight   N/A    4.0  180.0  45.00  23.804761  20.0  27.50  45.0  62.50   70.0
8          cat  female  categorical    state    FL    2.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
9          cat  female  categorical    state    NY    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
10         cat  female  categorical    state    TX    2.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
11         cat    male  categorical    state    CA    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
12         cat    male  categorical    state    TX    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
13         dog  female  categorical    state    FL    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
14         dog  female  categorical    state    TX    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
15         dog    male  categorical    state    CA    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
16         dog    male  categorical    state    FL    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
17         dog    male  categorical    state    NY    2.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
18         cat  female  categorical  trained   yes    5.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
19         cat    male  categorical  trained    no    2.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
20         dog  female  categorical  trained    no    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
21         dog  female  categorical  trained   yes    1.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN    NaN
22         dog    male  categorical  trained    no    4.0    NaN    NaN        NaN   NaN    NaN   NaN    NaN   NaN

In this DataFrame, we have a lot of NaN values.

To drop rows with missing values, we can use the pandas dropna() function.

Let’s say that we want to delete all of the rows which contain NaN values. The following code will remove all rows with NaN values from our DataFrame.

df.dropna()

#output:
   animal_type  gender         type variable level  count   mean    sum        std   min    25%   50%    75%    max
0          cat  female      numeric      age   N/A    5.0   3.60   18.0   1.516575   2.0   3.00   3.0   4.00    6.0
1          cat    male      numeric      age   N/A    2.0   1.50    3.0   0.707107   1.0   1.25   1.5   1.75    2.0
2          dog  female      numeric      age   N/A    2.0   4.00    8.0   0.000000   4.0   4.00   4.0   4.00    4.0
3          dog    male      numeric      age   N/A    4.0   3.75   15.0   1.892969   1.0   3.25   4.5   5.00    5.0
4          cat  female      numeric   weight   N/A    5.0  54.00  270.0  32.093613  10.0  40.00  50.0  80.00   90.0
5          cat    male      numeric   weight   N/A    2.0  55.00  110.0  63.639610  10.0  32.50  55.0  77.50  100.0
6          dog  female      numeric   weight   N/A    2.0  50.00  100.0  42.426407  20.0  35.00  50.0  65.00   80.0
7          dog    male      numeric   weight   N/A    4.0  45.00  180.0  23.804761  20.0  27.50  45.0  62.50   70.0

Dropping Rows with the drop_duplicates() pandas Function

With Python, we can find and remove duplicate rows in data very easily using the pandas package and the pandas drop_duplicates() function.

Let’s say we have the following DataFrame:

df = pd.DataFrame({'Name': ['Jim','Jim','Jim','Sally','Bob','Sue','Sue','Larry'],
                   'Weight':['100','100','200','100','200','150','150','200']})

# Output:
    Name Weight
0    Jim    100
1    Jim    100
2    Jim    200
3  Sally    100
4    Bob    200
5    Sue    150
6    Sue    150
7  Larry    200

First, let’s find the duplicate rows in this DataFrame. We can do this easily using the pandas duplicated() function. The duplicated() function returns a Series with boolean values denoting where we have duplicate rows. By default, it marks all duplicates as True except the first occurrence.

print(df.duplicated())

# Output:
0    False
1     True
2    False
3    False
4    False
5    False
6     True
7    False
dtype: bool

We see above that we have 2 duplicate rows. If we want to remove these duplicate rows, we can use the pandas drop_duplicates() function like in the following Python code:

print(df.drop_duplicates())

# Output:
    Name Weight
0    Jim    100
2    Jim    200
3  Sally    100
4    Bob    200
5    Sue    150
7  Larry    200

Hopefully this article has been beneficial for you to understand how to delete rows from your pandas DataFrames in Python.

Categorized in:

Python,

Last Update: March 20, 2024