Apr-24-2021, 12:32 AM
I have an xpath expression that I know works. Using the URL:
https://www.yellowpages.com/houston-tx/m...1657186981
and XPath:
//div[@class='sales-info']/H1[1]
Should return this:
Spector Ivan
My code is posted below. Can anyone please explain why it doesn't work here?
It works using scrapy, but I cannot mulit-thread in scrapy so I'm looking for an alternate.
Thanks.
https://www.yellowpages.com/houston-tx/m...1657186981
and XPath:
//div[@class='sales-info']/H1[1]
Should return this:
Spector Ivan
My code is posted below. Can anyone please explain why it doesn't work here?
It works using scrapy, but I cannot mulit-thread in scrapy so I'm looking for an alternate.
Thanks.
import requests,time,urllib.request, concurrent.futures, pandas as pd #proxy cheker < https://stackoverflow.com/questions/765305/proxy-check-in-python >
from bs4 import BeautifulSoup
import time
from lxml import html
url = 'https://www.yellowpages.com/houston-tx/mip/spector-ivan-11449879?lid=1001657186981'
proxy_handler = urllib.request.ProxyHandler({'http': '149.19.32.99:8082'})
opener = urllib.request.build_opener(proxy_handler)
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
pg=urllib.request.urlopen(url)
soup = BeautifulSoup(pg,'lxml')
tree = html.fromstring(soup.prettify())
testdata = tree.xpath("//div[@class='sales-info']/H1[1]")
print('XPath data: ', testdata)
