Python Forum
Extract data from PDF page to Excel
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract data from PDF page to Excel
#1
Hi everyone, I am very new to coding and am wanting to create some code in python to extract data from a PDF file and transfer it into an excel sheet. This would allow easier filtering and analysis as the reports can be up to 100 pages long and are received monthly. Each page except for the first follow the same format (the first page can be ignored). The image below highlights the sections of the page I’d like to extract into individual columns. In some cases, Recommendation is left blank or no image is provided. I understand I’d be able to use While loops in some cases here but have no idea how to format or other functions to use.



In terms of functionality I was thinking it’d open up a macro enabled template, run the macro which lets me select the appropriate pdf file and extracts the data from there.



Also, it’d be awesome to make the image show with mouseover the cell using comments if any one has a suggestion on how to do that.



Survey Date:

Type:

Area:

Priority: (Coloured number at top right corner) Can be N, 0 , 1 , 2, 3

Machine:

Assembly:

Detail:

Recommendation:

Image:

Wonder if I can send sample of image through PM as I can't currently attach to this thread.

Thanks!
Reply
#2
There are many modules that aid in PDF data extraction.
Because PDF is sort of a chameleon when it comes to internal contents, it's a bear, in many cases, to extract intelligible data from one, sometimes you luck out (usually when data is presented in table format), and sometimes, conversion is just impossible (if data is a very poor image of a text document, for example).
At any rate, I've had some success with:

camelot-py (which wraps around pdfminer): https://pypi.org/project/camelot-py/

pdfminer.six: https://github.com/pdfminer/pdfminer.six

there are a ton of others, if you don't have success with above, look here: https://pypi.org/search/?q=PDF&o=
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Automated filler for an Excel form is not writing the data Quian34 2 36 Jun-10-2026, 07:27 AM
Last Post: Larz60+
  Why can't it extract the data from .txt well? Melcu54 4 2,728 Dec-12-2024, 07:36 PM
Last Post: Melcu54
  JSON File - extract only the data in a nested array for CSV file shwfgd 2 2,250 Aug-26-2024, 10:14 PM
Last Post: shwfgd
  Python script to extract data from API to database melpys 0 1,803 Aug-12-2024, 05:53 PM
Last Post: melpys
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 6,463 Feb-16-2024, 12:02 AM
Last Post: deanhystad
  Take data from web page problem codeweak 5 2,775 Nov-01-2023, 12:29 AM
Last Post: codeweak
  Copy data from Excel and paste into Discord (Midjourney) Joe_Wright 4 5,360 Jun-06-2023, 05:49 PM
Last Post: rajeshgk
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 3,090 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 5,437 Dec-12-2022, 08:22 PM
Last Post: jh67
  Trying to Get Arduino sensor data over to excel using Python. eh5713 1 4,304 Dec-01-2022, 01:52 PM
Last Post: deanhystad

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020