Jun-29-2021, 10:22 PM
I am trying to scrape all thee events from https://www.onthisday.com/events/february/5 I am getting all the events from first page.How can I get other events from the second page and merge into one list?
Right now I tried to catch the next page link and parse it but it didn't work still getting the results from first page.
Here is my code:
Some pages contains the next page and some don't so I am looking to handle both the situations.
Right now I tried to catch the next page link and parse it but it didn't work still getting the results from first page.
Here is my code:
from typing import List
import requests as _requests
import bs4 as _bs4
def _generate_url(month: str, day: int) -> str:
url = f'https://www.onthisday.com/events/{month}/{day}'
return url
def _get_page(url: str) -> _bs4.BeautifulSoup:
_page = _requests.get(url)
soup = _bs4.BeautifulSoup(_page.content, 'html.parser')
return soup
def events_of_the_day(month: str, day: int) -> List[str]:
"""
Return the events of a given day
"""
url = _generate_url(month, day)
page = _get_page(url)
next_link = page.select_one("a.pag__next")
raw_events = [event.text for event in page.select("li.event")]
if next_link:
next_url = 'https://www.onthisday.com/events'+next_link['href']
page_next = _get_page(next_url)
for eve in page_next.select("li.event"):
print(eve.text)
#print(raw_events)
events_of_the_day("february", 5)Note:Some pages contains the next page and some don't so I am looking to handle both the situations.
