Jan-16-2020, 09:11 PM
Hi, I am new to web scraping and have just managed to write my first working script. However it is only able to extract data from the first page. I have not been able to apply solutions offered online successfully. Will be might glad if someone can assist me write a complete scrip that extracts data from all pages. below is my current working script
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.merchantcircle.com/search?q=self-storage&qn='
#opens connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#page parser
page_soup = soup(page_html, "html.parser")
businesses = page_soup.findAll("div",{"class":"hInfo vcard"})
filename = "storage.csv"
f = open(filename, "w")
#headers = "brand, product_name, price, shipping\n"
headers = "biz_name, biz_address, biz_phone_num\n"
f.write(headers)
for business in businesses:
#grabs business name
biz_name = business.h2.a.text.strip()
#grabs business address
address = business.find("a",{"class":"directions"})
biz_address = address.text.strip()
#grabs phone number
phone_num = business.find("a",{"class":"phone digits tel"})
biz_phone_num = phone_num.text.strip()
print("biz_name: " + biz_name)
print("biz_address: " + biz_address)
print("biz_phone_num: " + biz_phone_num)
f.write(biz_name + "," + biz_address.replace(",", "|") + "," + biz_phone_num + "\n")
f.close()
