To check if a string starts with certain characters when using pandas, you can use the pandas startswith() function.

df["Name"].str.startswith("M") #Return boolean series with values showing which rows have a name starting with M

When working with data, the ability to get, search for or filter information from your data.

With the pandas package, there are many powerful functions which allow you to perform different operations.

One such operation is checking if a string starts with certain characters.

The pandas startswith() function allows you to check if a variable starts with certain characters.

Let’s say we have the following DataFrame.

import pandas as pd

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})

print(df)

# Output: 
    Name  Weight
0    Jim  160.20
1  Sally  123.81
2    Bob  209.45
3    Sue  150.35
4   Jill  102.43
5  Larry  187.52

Let’s see how many names start with the letter ‘J’ in our DataFrame with startswith().

If we look at the documentation, just pass a character sequence to startswith() to return a boolean series indicating which records start with the given character sequence.

import pandas as pd

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})

print(df["Name"].str.startswith('J'))

#Output:
0     True
1    False
2    False
3    False
4     True
5    False
Name: Name, dtype: bool

As we can see, we have two names which start with ‘J’.

Filtering a DataFrame with pandas startswith() Function

As shown above, startswith() returns a series of boolean values. We can use these boolean values to filter the original DataFrame.

To filter a DataFrame after using startswith(), just use the returned boolean series.

Let’s use the result from above to filter our DataFrame and just get the records which have names starting with ‘J’.

import pandas as pd

df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
                   'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})

boolean_series = df["Name"].str.startswith('J')

filtered_df = df[boolean_series]

print(filtered_df)

#Output:
   Name  Weight
0   Jim  160.20
4  Jill  102.43

Handling NaN with the pandas startswith() Function

If the column you are looking at has NaN values, then by default, startswith() will return NaN for those values.

If you want to change this, then you can use the second parameter to change the behavior and return a different value for NaN.

Let’s say we have a similar DataFrame as above but now we have some NaN’s.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['Jim', 'Sally', np.NaN, 'Sue', 'Jill', np.NaN],
                   'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})

print(df)

#Output:
    Name  Weight
0    Jim  160.20
1  Sally  123.81
2    NaN  209.45
3    Sue  150.35
4   Jill  102.43
5    NaN  187.52

If we want to find all of the records that start with ‘S’, we can use startswith() and pass ‘S’.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['Jim', 'Sally', np.NaN, 'Sue', 'Jill', np.NaN],
                   'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})

print(df["Name"].str.startswith('S'))

#Output:
0    False
1     True
2      NaN
3     True
4    False
5      NaN
Name: Name, dtype: object

As you can see, we now have NaN values in our boolean series. If you try to pass this back to the DataFrame and filter the original DataFrame, you will get a ValueError.

To make sure you don’t get this error, you can pass a value to the second parameter ‘na’.

If you want to filter a DataFrame, the best value to pass here is ‘False’ because then we will drop the NaN values from the DataFrame as well.

Below shows how you can use startswith() to drop the NaN values from a DataFrame by filtering.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Name': ['Jim', 'Sally', np.NaN, 'Sue', 'Jill', np.NaN],
                   'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})

boolean_series = df["Name"].str.startswith('S', na=False)

filtered_df = df[boolean_series]

print(filtered_df)

#Output:
    Name  Weight
1  Sally  123.81
3    Sue  150.35

Hopefully this article has been useful for you to learn how to use the pandas startswith() function.

Categorized in:

Python,

Last Update: March 11, 2024