Hi Expert,
I have fetched data from html using below code-
html data
How to handle pagination here since some categoroies have pagination and some have not?
Finally I got the issue.
I was not checking pagination for all categories and that's why getting problem.
Now I am able to solve the issue by putting a check for pagination.
I have fetched data from html using below code-
def get_soup(url):
response = requests.get(url)
html = response.content
return BeautifulSoup(html, "html.parser")And I have fecthed catagory url with-def get_category_urls(url):
soup = get_soup(url)
cat_urls = []
try:
categories = soup.find('div', attrs={'id': 'menu_oc'})
if categories is not None:
for c in categories.findAll('a'):
if c['href'] is not None:
cat_urls.append(c['href'])
except Exception as exc:
print("error::" + url + str(exc))
finally:
return cat_urlsNow I am trying to fetch product urls with below code-def get_product_urls(url):
soup = get_soup(url)
prod_urls = []
try:
if soup.find('div', attrs={'class': 'pagination'}):
pages = soup.find('div', attrs={'class': 'page'}).text.split("of ", 1)[1].replace(' (1 Pages)','')
if pages is not None:
for page in range(1, int(pages) + 1):
soup_with_page = get_soup(url + "&page={}".format(page))
product_urls_soup = soup_with_page.find('div', attrs={'id': 'carousel-featured-0'})
if product_urls_soup is not None:
for row in product_urls_soup.findAll('a'):
if row['href'] is not None:
prod_urls.append(row['href'])
except Exception as exc:
print("error:: " + prod_urls + ": " + str(exc))
finally:
return prod_urlsif __name__ == '__main__':
with Pool(2) as p:
product_urls = p.map(get_product_urls, category_urls)
product_urls = list(filter(None, product_urls))
product_urls_flat = list(set([y for x in product_urls for y in x]))I am getting product_urls_soup as None here, what I am doing wrong here? PFB sample html data-html data
How to handle pagination here since some categoroies have pagination and some have not?
Finally I got the issue.
I was not checking pagination for all categories and that's why getting problem.
Now I am able to solve the issue by putting a check for pagination.
