Sunday, January 8, 2023

How to install camelot-py on your windows machine

Camelot is a Python library that can help you extract tables from PDFs. PyPDF2 is not able to extract tables nicely, and tabula-py is depending on Java.

Just like with any other python library, the installation starts naively with a simple:
pip install camelot-py

Installation will download and install dependency libraries too, but once you run your sample code you will receive the following error message:
ModuleNotFoundError: No module named 'cv2'

Ouch! Looks like not all dependency libraries have been installed. Yep. The issue has been reported already and a workaround is suggested.With a sigh of relief, we proceed with:
pip install opencv-python

Let's try to run the sample code again. Ooops, a new error message (that means we are moving forward, after all!):
PyPDF2.errors.DeprecationError: PdfFileReader is deprecated and was removed in PyPDF2 3.0.0. Use PdfReader instead.

Yep. That issue has been reported too, and we have a workaround:
pip install "PyPDF2<3.0"

So let's get back and try to run our sample code. Again we have made a progress and reached to a new error message! This time it is:
OSError: Ghostscript is not installed. You can install it using the instructions here: https://camelot-py.readthedocs.io/en/master/user/install-deps.html

We follow the suggested url and install Ghostscript. After trying to run the sample code once again, the following error message pops out:
ModuleNotFoundError: No module named 'ghostscript'

Let's install ghostscript python library with the following command:
pip install ghostscript

Believe it or not, I just ran the sample code and for now it looks like that was it regarding the installation!

P.S: Note to self, the last version of camelot-py that works for Python 2.7 is 0.7.3.

No comments:

Post a Comment