To write a pickle file to an AWS S3 Bucket using Python and pandas, you can use the boto3 package to access the S3 bucket. After accessing the S3 bucket, you need to create a file buffer with the io BytesIO() function. Then, write the pickle file to the file buffer with the pandas to_pickle() function. Finally, you can use the put_object() method to send the pickle file to a specified file location in the AWS S3 Bucket.
import pandas as pd
import io
import boto3
s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
pickle_buffer = io.BytesIO()
df.to_pickle(pickle_buffer)
s3c.put_object(Body=pickle_buffer.getvalue(),Bucket="YOUR-BUCKET",Key="FILENAME")
When working with different datasets and file types, the ability to easily read, write and work with these different datasets is useful.
One such case is if you have data locally and you want to write it to an AWS S3 bucket using Python.
You can use the boto3 package which allows you to create, configure and manage AWS services.
With boto3, you can write data to an AWS S3 bucket.
To start, you need to connect to AWS. This is done by first using the boto3 client function. You should pass your access key and secret access key here to authenticate.
Next, we need to prepare the data we want to write to the AWS S3 bucket. The boto3 function you should use to write the pandas DataFrame to the AWS S3 bucket is put_object().
The “Body” parameter accepts a file buffer, and so you should build a file buffer to pass to this parameter.
To create a file buffer, you can use the io BytesIO() function. Then, write the pickle file to the file buffer with the pandas to_pickle() function.
Finally, you can use the put_object() method to send the pickle file to a specified file location in the AWS S3 Bucket.
Below shows the entire code of how to write a pickle file to an AWS S3 bucket.
import pandas as pd
import io
import boto3
s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
pickle_buffer = io.BytesIO()
df.to_pickle(pickle_buffer)
s3c.put_object(Body=pickle_buffer.getvalue(),Bucket="YOUR-BUCKET",Key="FILENAME")
How to Write Excel Files and CSV Files to AWS S3 Buckets in Python
If you want to write Excel files or write csv files from an AWS S3 Bucket, then you can follow the same code structure as above.
to_excel() and to_csv() both allow you to pass a buffer, and so you can use io.BytesIO() to create the file buffer.
Below shows an example of how you could write an Excel file to an AWS S3 bucket using Python and pandas.
import pandas as pd
import io
import boto3
s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
excel_buffer = io.BytesIO()
df.to_excel(excel_buffer)
s3c.put_object(Body=excel_buffer.getvalue(),Bucket="YOUR-BUCKET",Key="FILENAME")
For writing a csv file to an AWS S3 Bucket, the code has the same structure.
import pandas as pd
import io
import boto3
s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
csv_buffer = io.BytesIO()
df.to_csv(csv_buffer)
s3c.put_object(Body=excel_buffer.getvalue(),Bucket="YOUR-BUCKET",Key="FILENAME")
How to Read Pickle File from an AWS S3 Bucket Using Python
If you want to read a pickle file from an AWS S3 Bucket, then you can do something similar as we have done above, but now you will use the boto3 get_object() function.
To read a pickle file from an AWS S3 Bucket using Python and pandas, you can use the boto3 package to access the S3 bucket.
After accessing the S3 bucket, you can use the get_object() method to get the file by its name.
Finally, you can use the pandas read_pickle() function on the Bytes representation of the file obtained by the io BytesIO() function.
import pandas as pd
import io
import boto3
s3c = boto3.client('s3', region_name="us-east-2",aws_access_key_id="YOUR AWS_ACCESS_KEY_ID",aws_secret_access_key="YOUR AWS_SECRET_ACCESS_KEY")
obj = s3c.get_object(Bucket="YOUR-BUCKET",Key="FILENAME")
df = pd.read_pickle(io.BytesIO(obj["Body"].read()))
Hopefully this article has been useful for you to learn how to write a pickle file to an AWS S3 Bucket using Python and the pandas module.