Deploy Python Code with Streamlit
[ad_1]
Like most projects, the workflow starts with data in CSV format; however, internal to the app, I designed the program to read and write DataFrames in pickle formats. A major reason to use pickled DataFrames is that the serialized file retains various metadata — if you set dtypes to strings, integers, or category, those dtypes are retained every time you read in the data. On the other hand, with CSV files, you have to re-process the data back to a suitable DataFrame.
Although pickle files are pretty sweet, during the last few steps of deployment with Streamlit, I ran into a brick wall of an error with the pickle protocol.
Have this error?
Apparently, the Pandas to_pickle() method defaults to a protocol of version 5 which is not universally supported. As a result, although a standard Pandas pickled DataFrame may work in testing on your local machine, deployment to a server is another story.
ValueError: unsupported pickle protocol: 5
Traceback:File "/usr/local/lib/python3.7/site-packages/streamlit/script_runner.py", line 332, in _run_script
exec(code, module.__dict__)File "/app/app_courts/main.py", line 56, in <module>
run_app()File "/app/app_courts/main.py", line 52, in run_app
, classify=FalseFile "/app/app_courts/do_data/getter.py", line 124, in to_df
df = pd.read_pickle(path)File "/home/appuser/.local/lib/python3.7/site-packages/pandas/io/pickle.py", line 182, in read_pickle
return pickle.load(f)
Solution to Pickle Protocol 5 Error
When faced with an error, I sometimes go with an alternate solution, i.e., something works just as well to produce the same result. However, I had little choice with the data file because of various constraints. While searching for an answer, I discovered that the solution is fairly simple — change the Pandas to_pickle() protocol from default to version 2. When Pandas pickle is combined with BZ2 compression, the result is a super small, super convenient, and very compatible data file.
# to avoid pickle protocol error
# change params from 5 to 2path = 'data/product_sales.bz2'df.to_pickle(path, protocol=2)
When Pandas pickle is combined with BZ2 compression, the result is a super small, super convenient, and very compatible data file.
Read More …
[ad_2]