Aug-02-2021, 08:55 PM
The idea is to automate download on o2tvseries. It's a website for film shows. We must automate the submission of a Captcha form in the process. The mechanize library helps with filling the form. Where we need to input the Captcha's content, Webdriver in Selenium helps with screenshotting and saving the Captcha picture locally. Meanwhile, Pytesseract helps with getting the picture's string for submission (about 75% accuracy).
The issue now is that after submission, a page with response
Here's the code:
The issue now is that after submission, a page with response
Error: Captcha Does Not Match! is what I get even when the Captcha matches (it often does). A video file is supposed to be playing on the page after submission as that means everything is OK. I don't know if it's a cookie/sessions or Webdriver thing. Having a hard time getting the right Captcha input successfully submitted.Here's the code:
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
from PIL import Image, ImageFilter, ImageEnhance
import requests
import pytesseract
import http.cookiejar as cookielib
from io import BytesIO
from selenium import webdriver
from http import cookiejar
import mechanize
url = "https://o2tvseries.com/Kevin-Can-Fuck-Himself/Season-01/Episode-06/index.html"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req)
soupy= BeautifulSoup(web_byte, "html.parser")
links = soupy.find_all('a')
newDLLink = links[14].get('href')
secondReq = Request(newDLLink, headers={'User-Agent': 'Mozilla/5.0'})
secondWeb_byte = urlopen(secondReq)
captchaSoupy = BeautifulSoup(secondWeb_byte, "html.parser")
captchaSeries = captchaSoupy.find_all('img')
newCaptchaSeries = captchaSeries[0].get('src')
#Webdriver
driver = webdriver.Chrome(executable_path=r"C:\Program Files\ChromeDriver for Selenium\chromedriver.exe")
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
driver.get(newDLLink)
element = driver.find_element_by_xpath("//img[@alt='CAPTCHA Code']")
location = element.location
size = element.size
png = driver.get_screenshot_as_png()
driver.quit()
#Pillow
im = Image.open(BytesIO(png))
left = location['x']
top = location['y']
right = location['x'] + size['width']
bottom = location['y'] + size['height']
im = im.crop((left, top, right, bottom))
im = ImageEnhance.Sharpness(im)
im = im.enhance(0.0)
im = im.filter(ImageFilter.MinFilter(3))
im.save('study_img.png')
#Pytesseract
image_to_string = pytesseract.image_to_string(im)
image_to_string = image_to_string[0:5]
#Mechanize
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36')]
br.open(newDLLink)
br.select_form(nr=0)
br['captchainput'] = image_to_string
response = br.submit()
#This returns `Error: Captcha Does Not Match!`
read_response = response.read()
