To split a column by delimiter when using pandas in Python, you can use the pandas str.split() function.
import pandas as pd
df = pd.DataFrame({"name":["Bob Smith","Penny Johnson","Lorenzo Diaz","Juan Perez","Maria Rizzo"]})
df[["first_name","last_name"]] = df["name"].str.split(" ",expand=True)
print(df)
#Output:
name first_name last_name
0 Bob Smith Bob Smith
1 Penny Johnson Penny Johnson
2 Lorenzo Diaz Lorenzo Diaz
3 Juan Perez Juan Perez
4 Maria Rizzo Maria Rizzo
When working with data, the ability to easily be able to parse data and create new pieces of information is very valuable.
One such case is if you want to split a column in a pandas DataFrame by delimiter and create new columns.
To split a column by delimiter when using pandas in Python, you can use the pandas str.split() function.
The str.split() function has three parameters. The first parameter is the delimiter, the second is how many splits you want to do, and the third is if you want to expand the split into columns.
Let’s say we have the following pandas DataFrame.
import pandas as pd
df = pd.DataFrame({"name":["Bob Smith","Penny Johnson","Lorenzo Diaz","Juan Perez","Maria Rizzo"]})
print(df)
#Output:
name
0 Bob Smith
1 Penny Johnson
2 Lorenzo Diaz
3 Juan Perez
4 Maria Rizzo
In the column “name”, you can see we have the first name and last name of some people separately by a space.
If you wanted to split this column by space, then we could extract and create new columns called “first_name” and “last_name” with str.split().
Below shows you how you can split a column by delimiter with the pandas str.split() and create new columns in Python.
import pandas as pd
df = pd.DataFrame({"name":["Bob Smith","Penny Johnson","Lorenzo Diaz","Juan Perez","Maria Rizzo"]})
df[["first_name","last_name"]] = df["name"].str.split(" ",expand=True)
print(df)
#Output:
name first_name last_name
0 Bob Smith Bob Smith
1 Penny Johnson Penny Johnson
2 Lorenzo Diaz Lorenzo Diaz
3 Juan Perez Juan Perez
4 Maria Rizzo Maria Rizzo
Splitting Column by Delimiter with str.split() in pandas
There are a few different ways you can use str.split() to split a column in pandas.
If you just want to split a column by delimiter similar to normal Python and create a column with values that are lists of the split values, you can pass False to expand.
Below shows an example of using str.split() and not expanding the result.
import pandas as pd
df = pd.DataFrame({"name":["Bob Smith","Penny Johnson","Lorenzo Diaz","Juan Perez","Maria Rizzo"]})
df["split"] = df["name"].str.split(" ",expand=False)
print(df)
#Output:
name split
0 Bob Smith [Bob, Smith]
1 Penny Johnson [Penny, Johnson]
2 Lorenzo Diaz [Lorenzo, Diaz]
3 Juan Perez [Juan, Perez]
4 Maria Rizzo [Maria, Rizzo]
One other case is if you want to split a column and only want to consider some of the splits instead of all of them. For example, let’s say you only wanted to get the first name from our example.
In this case, you could pass ‘1’ to the second parameter ‘n’, and you will only receive the first split value for each row in one column and the rest in another column.
import pandas as pd
df = pd.DataFrame({"name":["Bob Anthony Smith","Penny Frida Johnson","Lorenzo Carlos Diaz","Juan Pablo Perez","Maria Jane Rizzo"]})
print(df["name"].str.split(" ",n=1,expand=True))
#Output:
0 1
0 Bob Anthony Smith
1 Penny Frida Johnson
2 Lorenzo Carlos Diaz
3 Juan Pablo Perez
4 Maria Jane Rizzo
How to Concatenate Strings and Create New Column with pandas
If you want to go the other way and concatenate strings in pandas and create columns, you can use the pandas str.cat() function.
Let’s say you have the following DataFrame with some first and last names.
import pandas as pd
df = pd.DataFrame({"first_name":["Bob","Penny","Lorenzo","Juan","Maria"], "last_name":["Smith","Johnson","Diaz","Perez","Rizzo"]})
print(df)
#Output:
first_name last_name
0 Bob Smith
1 Penny Johnson
2 Lorenzo Diaz
3 Juan Perez
4 Maria Rizzo
To create a new column called “name” with the first and last name concatenated, you can use the str.cat() function.
Below shows you how to concatenate strings in pandas using str.cat().
import pandas as pd
df = pd.DataFrame({"first_name":["Bob","Penny","Lorenzo","Juan","Maria"], "last_name":["Smith","Johnson","Diaz","Perez","Rizzo"]})
df["name"] = df['first_name'].str.cat(df['last_name'], sep=' ')
print(df)
#Output:
first_name last_name name
0 Bob Smith Bob Smith
1 Penny Johnson Penny Johnson
2 Lorenzo Diaz Lorenzo Diaz
3 Juan Perez Juan Perez
4 Maria Rizzo Maria Rizzo
Hopefully this article has been useful for you to learn how to split a column by delimiter in a pandas DataFrame in Python.