To check if a string starts with certain characters when using pandas, you can use the pandas startswith() function.
df["Name"].str.startswith("M") #Return boolean series with values showing which rows have a name starting with M
When working with data, the ability to get, search for or filter information from your data.
With the pandas package, there are many powerful functions which allow you to perform different operations.
One such operation is checking if a string starts with certain characters.
The pandas startswith() function allows you to check if a variable starts with certain characters.
Let’s say we have the following DataFrame.
import pandas as pd
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})
print(df)
# Output:
Name Weight
0 Jim 160.20
1 Sally 123.81
2 Bob 209.45
3 Sue 150.35
4 Jill 102.43
5 Larry 187.52
Let’s see how many names start with the letter ‘J’ in our DataFrame with startswith().
If we look at the documentation, just pass a character sequence to startswith() to return a boolean series indicating which records start with the given character sequence.
import pandas as pd
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})
print(df["Name"].str.startswith('J'))
#Output:
0 True
1 False
2 False
3 False
4 True
5 False
Name: Name, dtype: bool
As we can see, we have two names which start with ‘J’.
Filtering a DataFrame with pandas startswith() Function
As shown above, startswith() returns a series of boolean values. We can use these boolean values to filter the original DataFrame.
To filter a DataFrame after using startswith(), just use the returned boolean series.
Let’s use the result from above to filter our DataFrame and just get the records which have names starting with ‘J’.
import pandas as pd
df = pd.DataFrame({'Name': ['Jim', 'Sally', 'Bob', 'Sue', 'Jill', 'Larry'],
'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})
boolean_series = df["Name"].str.startswith('J')
filtered_df = df[boolean_series]
print(filtered_df)
#Output:
Name Weight
0 Jim 160.20
4 Jill 102.43
Handling NaN with the pandas startswith() Function
If the column you are looking at has NaN values, then by default, startswith() will return NaN for those values.
If you want to change this, then you can use the second parameter to change the behavior and return a different value for NaN.
Let’s say we have a similar DataFrame as above but now we have some NaN’s.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Jim', 'Sally', np.NaN, 'Sue', 'Jill', np.NaN],
'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})
print(df)
#Output:
Name Weight
0 Jim 160.20
1 Sally 123.81
2 NaN 209.45
3 Sue 150.35
4 Jill 102.43
5 NaN 187.52
If we want to find all of the records that start with ‘S’, we can use startswith() and pass ‘S’.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Jim', 'Sally', np.NaN, 'Sue', 'Jill', np.NaN],
'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})
print(df["Name"].str.startswith('S'))
#Output:
0 False
1 True
2 NaN
3 True
4 False
5 NaN
Name: Name, dtype: object
As you can see, we now have NaN values in our boolean series. If you try to pass this back to the DataFrame and filter the original DataFrame, you will get a ValueError.
To make sure you don’t get this error, you can pass a value to the second parameter ‘na’.
If you want to filter a DataFrame, the best value to pass here is ‘False’ because then we will drop the NaN values from the DataFrame as well.
Below shows how you can use startswith() to drop the NaN values from a DataFrame by filtering.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Jim', 'Sally', np.NaN, 'Sue', 'Jill', np.NaN],
'Weight': [160.20, 123.81, 209.45, 150.35, 102.43, 187.52]})
boolean_series = df["Name"].str.startswith('S', na=False)
filtered_df = df[boolean_series]
print(filtered_df)
#Output:
Name Weight
1 Sally 123.81
3 Sue 150.35
Hopefully this article has been useful for you to learn how to use the pandas startswith() function.