To split a column by delimiter when using pandas in Python, you can use the pandas str.split() function.

import pandas as pd 

df = pd.DataFrame({"name":["Bob Smith","Penny Johnson","Lorenzo Diaz","Juan Perez","Maria Rizzo"]})

df[["first_name","last_name"]] = df["name"].str.split(" ",expand=True)

print(df)

#Output:
            name first_name last_name
0      Bob Smith        Bob     Smith
1  Penny Johnson      Penny   Johnson
2   Lorenzo Diaz    Lorenzo      Diaz
3     Juan Perez       Juan     Perez
4    Maria Rizzo      Maria     Rizzo

When working with data, the ability to easily be able to parse data and create new pieces of information is very valuable.

One such case is if you want to split a column in a pandas DataFrame by delimiter and create new columns.

To split a column by delimiter when using pandas in Python, you can use the pandas str.split() function.

The str.split() function has three parameters. The first parameter is the delimiter, the second is how many splits you want to do, and the third is if you want to expand the split into columns.

Let’s say we have the following pandas DataFrame.

import pandas as pd 

df = pd.DataFrame({"name":["Bob Smith","Penny Johnson","Lorenzo Diaz","Juan Perez","Maria Rizzo"]})

print(df)

#Output:
            name
0      Bob Smith
1  Penny Johnson
2   Lorenzo Diaz
3     Juan Perez
4    Maria Rizzo

In the column “name”, you can see we have the first name and last name of some people separately by a space.

If you wanted to split this column by space, then we could extract and create new columns called “first_name” and “last_name” with str.split().

Below shows you how you can split a column by delimiter with the pandas str.split() and create new columns in Python.

import pandas as pd 

df = pd.DataFrame({"name":["Bob Smith","Penny Johnson","Lorenzo Diaz","Juan Perez","Maria Rizzo"]})

df[["first_name","last_name"]] = df["name"].str.split(" ",expand=True)

print(df)

#Output:
            name first_name last_name
0      Bob Smith        Bob     Smith
1  Penny Johnson      Penny   Johnson
2   Lorenzo Diaz    Lorenzo      Diaz
3     Juan Perez       Juan     Perez
4    Maria Rizzo      Maria     Rizzo

Splitting Column by Delimiter with str.split() in pandas

There are a few different ways you can use str.split() to split a column in pandas.

If you just want to split a column by delimiter similar to normal Python and create a column with values that are lists of the split values, you can pass False to expand.

Below shows an example of using str.split() and not expanding the result.

import pandas as pd 

df = pd.DataFrame({"name":["Bob Smith","Penny Johnson","Lorenzo Diaz","Juan Perez","Maria Rizzo"]})

df["split"] = df["name"].str.split(" ",expand=False)

print(df)

#Output:
            name             split
0      Bob Smith      [Bob, Smith]
1  Penny Johnson  [Penny, Johnson]
2   Lorenzo Diaz   [Lorenzo, Diaz]
3     Juan Perez     [Juan, Perez]
4    Maria Rizzo    [Maria, Rizzo]

One other case is if you want to split a column and only want to consider some of the splits instead of all of them. For example, let’s say you only wanted to get the first name from our example.

In this case, you could pass ‘1’ to the second parameter ‘n’, and you will only receive the first split value for each row in one column and the rest in another column.

import pandas as pd 

df = pd.DataFrame({"name":["Bob Anthony Smith","Penny Frida Johnson","Lorenzo Carlos Diaz","Juan Pablo Perez","Maria Jane Rizzo"]})

print(df["name"].str.split(" ",n=1,expand=True))

#Output:
         0              1
0      Bob  Anthony Smith
1    Penny  Frida Johnson
2  Lorenzo    Carlos Diaz
3     Juan    Pablo Perez
4    Maria     Jane Rizzo

How to Concatenate Strings and Create New Column with pandas

If you want to go the other way and concatenate strings in pandas and create columns, you can use the pandas str.cat() function.

Let’s say you have the following DataFrame with some first and last names.

import pandas as pd 

df = pd.DataFrame({"first_name":["Bob","Penny","Lorenzo","Juan","Maria"], "last_name":["Smith","Johnson","Diaz","Perez","Rizzo"]})

print(df)

#Output:
  first_name last_name
0        Bob     Smith
1      Penny   Johnson
2    Lorenzo      Diaz
3       Juan     Perez
4      Maria     Rizzo

To create a new column called “name” with the first and last name concatenated, you can use the str.cat() function.

Below shows you how to concatenate strings in pandas using str.cat().

import pandas as pd 

df = pd.DataFrame({"first_name":["Bob","Penny","Lorenzo","Juan","Maria"], "last_name":["Smith","Johnson","Diaz","Perez","Rizzo"]})

df["name"] = df['first_name'].str.cat(df['last_name'], sep=' ')

print(df)

#Output:
  first_name last_name           name
0        Bob     Smith      Bob Smith
1      Penny   Johnson  Penny Johnson
2    Lorenzo      Diaz   Lorenzo Diaz
3       Juan     Perez     Juan Perez
4      Maria     Rizzo    Maria Rizzo

Hopefully this article has been useful for you to learn how to split a column by delimiter in a pandas DataFrame in Python.

Categorized in:

Python,

Last Update: March 11, 2024