Sep-01-2018, 02:43 PM
(This post was last modified: Sep-01-2018, 02:48 PM by Gribouillis.)
When i'm running each image that i have in my directory, my goal is to extract the text and see the text attributes. The text extraction works but then when i'm going to know the text attributes with the PyTessBaseAPI() api for some reasons, some of my images don't recognize the text attributes and it gives in the python shell "=============================== RESTART: Shell =============================== "
Here is the code:
Thanks
Here is the code:
for i, cnt in enumerate(contours):
x,y,w,h = cv2.boundingRect(cnt)
x = x - 3
y = y - 3
if x < 0 or y < 0:
continue
cropped = image_file[y: y+h+padding, x: x+w+padding]
# make image bigger to recgnize better the text
cropped = cv2.resize(cropped, (0,0), fx=4.0, fy=4.0)
# CONVERT NUMPY ARRAY TO PIL IMAGE
im = Image.fromarray(cropped.astype('uint8'), 'RGB')
im = im.filter(ImageFilter.SHARPEN())
text = image_to_string(im)
#print(text)
if text != "":
#print("OCR Output : " + image_to_string(im))
cv2.imwrite("img_text/cropped"+str(i)+".png", cropped)
path = os.path.abspath("img_text/cropped"+str(i)+".png")
with PyTessBaseAPI() as api:
#img = Image.open(path, mode='r')
bytes = readimage(path)
img = Image.open(io.BytesIO(bytes))
api.SetImage(img)
api.Recognize() # required to get result from the next line
iterator = api.GetIterator()
#print(iterator.WordFontAttributes())
dict = iterator.WordFontAttributes()
#print(dict['font_name'])Does anybody knows what i'm doing wrong here?Thanks
