Python Forum
extracting data from a user-completed fillable pdf
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extracting data from a user-completed fillable pdf
#1
Hello!

I'm hoping that someone here would perform a task for me that is over my head. Of course, I will pay for your service. Here's what I need: I have a 7-page client questionnaire in a fillable pdf. Once a client completes the form, I need to be able to extract only the data (text, dates, amounts) in "user-completed fields" from the pdf and place that data into specific cells in an existing Excel file that I maintain. Not that it is pertinent here, but I have linked cells in the Excel file to various MS Word templates that I use. I have been re-typing the data in the questionnaire into the Excel file, but it would be beneficial and more efficient if that part was automated.

Is anyone out there who would be willing to make this work for me for compensation? I understand that Python is capable of doing this, but again, it is over my head.

Thanks.
Reply
#2
Post an example pdf!

I'm sure many here can do what you wish.

How much is a bitcoin worth nowadays? Big Grin Big Grin Big Grin

However, if the user fills out the pdf as a PHP-driven webpage, well, you already have the information as PHP variables and it would be child's play to save that data to wherever you like!.

You will always need to collect names, I presume. Then you have $surname, $givennames as variables. I would save them to a database first. Examples of what you get when the user clicks send:

Quote:$surname = $_POST['surname'];
$givennames = $_POST['givennames'];
Reply
#3
I'm using openpyxl to manipulate xlsx documents, which came from a template xlsx document. It also keeps the formatting.

After filling out, you could export it from Excel/Libre Calc as pdf or let python call it and use the command line to do the task.


I made this document with LibreOffice Calc, but saved it as Excel/365 (.xlsx):
.zip   Template.zip (Size: 5.76 KB / Downloads: 26)

And this simple code just fills some numbers into the Table.
import subprocess
from pathlib import Path

import openpyxl

TEMPLATE = Path.home().joinpath("Desktop", "Template.xlsx")
RESULT = TEMPLATE.with_name("Filled_Form.xlsx")

workbook = openpyxl.open(TEMPLATE)
table = workbook.worksheets[0]
AREA = "A2:C23"
area = table[AREA]

# writing something to the cells
for index, (cell_a, cell_b, cell_c) in enumerate(area, start=10):
    cell_a.value = index
    cell_b.value = index ** 2
    cell_c.value = index ** 3

#for cell_a, cell_b, cell_c in area:
#    print(cell_a.value, cell_b.value, cell_c.value)


# saving to the file defined in RESULT
workbook.save(RESULT)

# converting with libre calc to the same directory, where the RESULT is placed
# the file you want to convert should be the last argument
# subprocess.run() blocks until the process is done
subprocess.run(["C:\Program Files\LibreOffice\program\scalc.exe", "--convert-to", "pdf", "--outdir", str(RESULT.parent), str(RESULT)])

# removing the result
RESULT.unlink()
print(f"{RESULT} has been removed and {RESULT.with_suffix(".pdf")} were written.")
print("Done")
I've for myself enough workload, so I would never want to make for other people templates.

PS: Extracting and manipulating data from pdf files is the abolute horror. PDF were made to print something out, but not for editing.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extracting data from bank statement PDFs (Accountant) a4avinash 4 18,002 Feb-27-2025, 01:53 PM
Last Post: griffinhenry
  Confused by the different ways of extracting data in DataFrame leea2024 1 1,350 Aug-17-2024, 01:34 PM
Last Post: deanhystad
  Extracting the correct data from a CSV file S2G 6 3,008 Jun-03-2024, 04:50 PM
Last Post: snippsat
  Extracting Data into Columns using pdfplumber arvin 17 39,781 Dec-17-2022, 11:59 AM
Last Post: arvin
  Extracting Data from tables DataExtrator 0 2,155 Nov-02-2021, 12:24 PM
Last Post: DataExtrator
  extracting data ajitnayak1987 1 2,592 Jul-29-2021, 06:13 AM
Last Post: bowlofred
  Extracting and printing data ajitnayak1987 0 2,257 Jul-28-2021, 09:30 AM
Last Post: ajitnayak1987
  Extracting unique pairs from a data set based on another value rybina 2 3,679 Feb-12-2021, 08:36 AM
Last Post: rybina
Thumbs Down extracting data/strings from Word doc mikkelibsen 1 3,136 Feb-10-2021, 11:06 AM
Last Post: Larz60+
  Extracting data without showing dtype, name etc. tgottsc1 3 11,434 Jan-10-2021, 02:15 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020