Sep-18-2021, 10:25 AM
Hi all,
I'm not an expert coding in python and have used it a few times to write things like the following example code which is meant to extract some information via scrapping from Yahoo finance.
The code is:
Hence - my interpretation of the issue - the code is pulling back a different result using urllib3 than I get from the browser, and I'm a little baffled as to why. I assume the web server is noticing it is being called from Python somehow based on the values in the request, and then returning a different response.
Does anyone know why this is occurring and how I can work around it? Is it actually caused for the reason I said above? I am actually very interested in this intellectually as well as resolving the issue in the script. I find it very odd that a server script would respond in different ways purposefully like this (if that is indeed the cause).
TIA!
Steve.
I'm not an expert coding in python and have used it a few times to write things like the following example code which is meant to extract some information via scrapping from Yahoo finance.
The code is:
from bs4 import BeautifulSoup
import requests as req
import re
import urllib3
import sys
# Parse input command line parameters.
if (len(sys.argv) > 1):
sStockTicker = sys.argv[1]
else:
# TODO: Change this to uncomment the exit when debug is finished
print('Invalid usage. A stock ticker must be passed as parameter.')
sStockTicker='FMG.AX'
#sys.exit(2)
url='https://finance.yahoo.com/quote/' + sStockTicker
req = urllib3.PoolManager()
res = req.request("GET", url)
soup3 = BeautifulSoup(res.data,'lxml')
print (soup3.find(id="quote-header-info").contents[2].contents[0].contents[0].contents[0].text)Now, depending on what stock I run this for (the parameter), the code will generate data the has the same content as the web page that I would see via accessing the same URL in a web browser, such as Chrome/Firefox, and then extract the stock price. It does not always do this though. If I run the script for "IBM", it will work fine. If I run it for "FMG.AX" it will work fine. However, if I run it for "IOZ.AX" it will fail. If I paste the same URL into the web browser, it will load perfectly fine and show the expected results.Hence - my interpretation of the issue - the code is pulling back a different result using urllib3 than I get from the browser, and I'm a little baffled as to why. I assume the web server is noticing it is being called from Python somehow based on the values in the request, and then returning a different response.
Does anyone know why this is occurring and how I can work around it? Is it actually caused for the reason I said above? I am actually very interested in this intellectually as well as resolving the issue in the script. I find it very odd that a server script would respond in different ways purposefully like this (if that is indeed the cause).
TIA!
Steve.
