Jul-01-2024, 07:40 AM
(This post was last modified: Aug-04-2025, 09:04 AM by JamesWilson.)
Hello everyone. I'm having trouble with web scraping and can't determine why it's failing. I'm using XPath and BeautifulSoup to extract the next URL, but it doesn't seem to work. What could I be doing wrong?
import requests
from lxml import etree
import html5lib
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import time, re
import csv
import time
start = time.time()
print('Starting Program')
base = "https://pokiesman1.net/"
url = "https://pokiesman1.net/real-money-pokies/"
while True:
request = requests.get(urljoin(base, url)) # Get URL server status
soup = BeautifulSoup(request.content, 'html5lib') # Pass URL content to Soup
dom = etree.HTML(str(soup)) # Initialize etree
url = dom.xpath('//a[@class="next-page-link"]/@href') # Find Next Page URL
url2 = urljoin(base, url)
urltest2 = soup.find_all("span", class_="game-title") # Find next URL
print('Test First URL', url2, ' Test number 2 ', urltest2)
