Deploy Python Code with Streamlit

[ad_1]


Like most projects, the workflow starts with data in CSV format; however, internal to the app, I designed the program to read and write DataFrames in pickle formats. A major reason to use pickled DataFrames is that the serialized file retains various metadata — if you set dtypes to strings, integers, or category, those dtypes are retained every time you read in the data. On the other hand, with CSV files, you have to re-process the data back to a suitable DataFrame.

A simple workflow that starts with a csv file. Instead of always writing and reading from subsequent csv files, try to pickle and read from pickled files instead to retain data states. From the Author Justin Chae.

Although pickle files are pretty sweet, during the last few steps of deployment with Streamlit, I ran into a brick wall of an error with the pickle protocol.

The unsupported pickle protocol error. Solved by changing protocol from 5 to 2.

Have this error?

Apparently, the Pandas to_pickle() method defaults to a protocol of version 5 which is not universally supported. As a result, although a standard Pandas pickled DataFrame may work in testing on your local machine, deployment to a server is another story.

ValueError: unsupported pickle protocol: 5
Traceback:
File "/usr/local/lib/python3.7/site-packages/streamlit/script_runner.py", line 332, in _run_script
exec(code, module.__dict__)
File "/app/app_courts/main.py", line 56, in <module>
run_app()
File "/app/app_courts/main.py", line 52, in run_app
, classify=False
File "/app/app_courts/do_data/getter.py", line 124, in to_df
df = pd.read_pickle(path)
File "/home/appuser/.local/lib/python3.7/site-packages/pandas/io/pickle.py", line 182, in read_pickle
return pickle.load(f)

Solution to Pickle Protocol 5 Error

When faced with an error, I sometimes go with an alternate solution, i.e., something works just as well to produce the same result. However, I had little choice with the data file because of various constraints. While searching for an answer, I discovered that the solution is fairly simple — change the Pandas to_pickle() protocol from default to version 2. When Pandas pickle is combined with BZ2 compression, the result is a super small, super convenient, and very compatible data file.

# to avoid pickle protocol error
# change params from 5 to 2
path = 'data/product_sales.bz2'df.to_pickle(path, protocol=2)

When Pandas pickle is combined with BZ2 compression, the result is a super small, super convenient, and very compatible data file.

Read More …

[ad_2]


Write a comment