Split pdf in pypdf based upon file regex

standenman · (This post was last modified: Feb-01-2023, 03:38 PM by standenman.)

I am trying to split a pdf doc that is a set of medical records based upon the date of treatment. So in this pdf of records we have "Visit Date: ##/##/####" that marks the beginning of one or a series of pages of notes for that give date. I want to split the pdf into seperate pdfs for each treatment date. The below code runs and gives me terminal out put of a series of lines either saying "You Failed" or saying something in this form:

[0, IndirectObject(612, 0, 2464980264080)]
unknown widths :

There are no pdf files that I can find are created. What am I doing wrong?

 import re
import pypdf

# Open the PDF file
pdf_file = pypdf.PdfReader(open("Documents/VisitDate.pdf", "rb"))

# Define the regex pattern
pattern = re.compile("Visit Date: ^[0-9]{1,2}\\/[0-9]{1,2}\\/[0-9]{4}$")

# Loop through each page of the PDF
for i in range(len(pdf_file.pages)):
  page = pdf_file.pages[i]
  text = page.extract_text()

  # Check if the regex value is in the page text
  if pattern.search(text):
    # If the regex value is found, create a new PDF file
    output_pdf = pypdf.PdfFileWriter()
    output_pdf.addPage(page)
    with open("output_{}.pdf".format(i), "wb") as output_file:
      output_pdf.write(output_file)
  else: print ("You Failed")

SpongeB0B · Feb-03-2023, 12:01 PM

Are you sure that you regex expression is correct ?

When I tried it's not matching a date like 12/12/1984

maybe you want to try with on less "\"

^[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}$

Cheers

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to read a file as binary or hex "string" so that I can do regex search?	tatahuft	3	2,972	Dec-19-2024, 11:57 AM Last Post: snippsat
	Copy Paste excel files based on the first letters of the file name	Viento	2	2,383	Feb-07-2024, 12:24 PM Last Post: Viento
	How to "tee" (=split) output to screen and into file?	pstein	6	6,248	Jun-24-2023, 08:00 AM Last Post: Gribouillis
	search file by regex	SamLiu	1	2,188	Feb-23-2023, 01:19 PM Last Post: deanhystad
	automate new PDF creation with Bookmarks Based up Regex	standenman	0	2,315	Jan-16-2023, 10:56 PM Last Post: standenman
	Python Split json into separate json based on node value	CzarR	1	11,660	Jul-08-2022, 07:55 PM Last Post: Larz60+
	trying to recall a regex for re.split()	Skaperen	23	10,126	May-20-2022, 11:38 AM Last Post: snippsat
	Extracting Specific Lines from text file based on content.	jokerfmj	8	7,590	Mar-28-2022, 03:38 PM Last Post: snippsat
	How to split file by same values from column from imported CSV file?	Paqqno	5	7,670	Mar-24-2022, 05:25 PM Last Post: Paqqno
	[split] Results of this program in an excel file	eisamabodian	1	2,745	Feb-11-2022, 03:18 PM Last Post: snippsat

Split pdf in pypdf based upon file regex

User Panel Messages

Announcements