Nov-27-2019, 01:49 PM
Hi,
I found out that 2 approaches to web scrapping is returning totally different output:
What i mean is that when i use code below:
If I would like to use pattern like:
Someone made first approach some time ago and i am wondering how to follow exactly this path, and why my approach is returning captcha and first approach is avoiding it?
I found out that 2 approaches to web scrapping is returning totally different output:
What i mean is that when i use code below:
headers = {
'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'accept-encoding':'gzip, deflate, sdch, br',
'accept-language':'en-GB,en;q=0.8,en-US;q=0.6,ml;q=0.4',
'cache-control':'max-age=0',
'upgrade-insecure-requests':'1',
'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
}
response = requests.get(url,headers=headers)
parser = response.content
soup = BeautifulSoup(parser, "html.parser")
print(soup)i get returned full code from the website, BUT:If I would like to use pattern like:
r = requests.get(page) content = (r.text) soup = BeautifulSoup(content, 'html.parser') print(soup)it would redirect from the provided URL to captcha solver site and then it woult return code from captcha website
Someone made first approach some time ago and i am wondering how to follow exactly this path, and why my approach is returning captcha and first approach is avoiding it?
