To read a CSV file from an AWS S3 Bucket using Python and pandas, you can use the boto3 package to access the S3 bucket. After accessing the S3 bucket, you can use the get_object() method to get the file by its name. Finally, you can use the pandas read_csv() function on the Bytes representation of the file obtained by the io BytesIO() function.

import pandas as pd
import io
import boto3

s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
obj = s3c.get_object(Bucket="YOUR-BUCKET",Key="FILENAME")
df = pd.read_csv(io.BytesIO(obj["Body"].read()))

When working with different datasets and file types, the ability to easily read and work with these different datasets is useful.

One such case is if you have data in an AWS S3 bucket and you want to read it into your Python program.

You can use the boto3 package which allows you to create, configure and manage AWS services.

With boto3, you can access the data in an AWS S3 bucket.

To start, you need to connect to AWS. This is done by first using the boto3 client function. You should pass your access key and secret access key here to authenticate.

Next, we want to get the object in question. To get the file, you can use the boto3 get_object() function and pass the bucket name and file name to the Bucket and Key parameters, respectively.

Now that we have the file, we can read it.

The get_object() function returns a dictionary with a few different pieces of information but what we care about is the “body” of the object.

The pandas read_csv() function can read from a file path or a buffer. Therefore, to read the CSV file from the AWS S3 bucket, one solution would be to read the “body” of the object, convert it to bytes and then read it with read_csv().

Below shows the entire code of how to read a CSV file from an AWS S3 bucket.

import pandas as pd
import io
import boto3

s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
obj = s3c.get_object(Bucket="YOUR-BUCKET",Key="FILENAME")
df = pd.read_csv(io.BytesIO(obj["Body"].read()))

How to Read Excel Files and Pickle Files from AWS S3 Buckets in Python

If you want to read excel files or read pickle files from an AWS S3 Bucket, then you can follow the same code structure as above.

read_excel() and read_pickle() both allow you to pass a buffer, and so you can use io.BytesIO() to create the buffer.

Below shows an example of how you could read an excel file from an AWS S3 bucket using Python and pandas.

import pandas as pd
import io
import boto3

s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
obj = s3c.get_object(Bucket="YOUR-BUCKET",Key="FILENAME")
df = pd.read_excel(io.BytesIO(obj["Body"].read()))

For reading a pickle file from an AWS S3 bucket, the code has the same structure.

import pandas as pd
import io
import boto3

s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
obj = s3c.get_object(Bucket="YOUR-BUCKET",Key="FILENAME")
df = pd.read_pickle(io.BytesIO(obj["Body"].read()))

How to Write CSV File to an AWS S3 Bucket Using Python

If you want to write a CSV file to an AWS S3 Bucket, then you can do something similar as we have done above, but now you will use the boto3 put_object() function.

To write a CSV file to an AWS S3 Bucket using Python and pandas, you can use the boto3 package to access the S3 bucket.

After accessing the S3 bucket, you need to create a file buffer with the io BytesIO() function. Then, write the CSV file to the file buffer with the pandas to_csv() function.

Finally, you can use the put_object() method to send the pickle file to a specified file location in the AWS S3 Bucket.

import pandas as pd
import io
import boto3

s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
csv_buffer = io.BytesIO()
df.to_csv(csv_buffer)
s3c.put_object(Body=excel_buffer.getvalue(),Bucket="YOUR-BUCKET",Key="FILENAME")

Hopefully this article has been useful for you to learn how to read a CSV file from an AWS S3 Bucket using Python and the pandas module.

Categorized in:

Python,

Last Update: March 1, 2024