Apr-16-2018, 07:09 PM
Hey,
I'm pretty new to python and still learning everyday. I'm kind of stuck for what i want to accomplish with my python script.
What i'm trying to do is ocr images in batch and then save that data into data.txt and after than i would like to rewrite the images with the ocr data...
so for example i have this image named 'dog-mask.jpg' and after ocr has been over it i would like to rewrite the image filename into this for example to 'no one cared who I was until I put on the mask.jpg'
![[Image: No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg]](https://lolpics.com/wp-content/uploads/2018/04/No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg)
The ocr part seems to work fine but i have no idea how to set new image files names with data from data.txt
Could anyone help me out please, i would really appreciate it if it is not too much trouble
Below is the code of my ocr script
I'm pretty new to python and still learning everyday. I'm kind of stuck for what i want to accomplish with my python script.
What i'm trying to do is ocr images in batch and then save that data into data.txt and after than i would like to rewrite the images with the ocr data...
so for example i have this image named 'dog-mask.jpg' and after ocr has been over it i would like to rewrite the image filename into this for example to 'no one cared who I was until I put on the mask.jpg'
![[Image: No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg]](https://lolpics.com/wp-content/uploads/2018/04/No-One-Cared-Who-I-Was-Until-I-Put-On-The-Mask.jpg)
The ocr part seems to work fine but i have no idea how to set new image files names with data from data.txt
Could anyone help me out please, i would really appreciate it if it is not too much trouble
Below is the code of my ocr script
import pytesseract
import os
from PIL import Image
import re
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe' # path of tesseract
path = 'C:\\users\\kevin\\downloads\\downloads' # path of image folder
# function to convert image to text and return type: string
def ocr(file_to_ocr):
im = Image.open(path+"\\"+file_to_ocr)
txt=pytesseract.image_to_string(im)
return txt
file_list = os.listdir(path) # file names in list (not sorted)
directory = os.path.join(path) # path for storing the text file
# function to sort the file names in order of numerical value present in it
def atoi(text):
return int(text) if text.isdigit() else text
def natural_keys(text):
'''
alist.sort(key=natural_keys) sorts in human order
http://nedbatchelder.com/blog/200712/human_sorting.html
(See Toothy's implementation in the comments)
'''
return [ atoi(c) for c in re.split('(\d+)', text) ]
file_list.sort(key=natural_keys) # file names in list (sorted)
# for every files in the folder
for file in file_list:
# selecting image file type
if file.endswith(".jpg"):
txt=ocr(file) # calling the ocr function
# appending the text into the file
with open(directory+"\\"+'data'+".txt",'a+') as f:
f.write("\n")
f.write(file)
f.write("\n")
f.write('-----------------------------------------')
f.write("\n")
f.write('!!!Start!!!')
f.write("\n")
f.write(str(txt))
f.write("\n")
f.write('!!!End!!!')
f.write("\n")
f.write('-----------------------------------------')
f.write("\n")
print("Image Conversion completed")
