Nov-15-2019, 05:38 AM
Hi,
I'd like to scrape string contents from a website, but it didn't work and I don't know how to solve it. Any ideas about it would be much grateful.
Here is detail info:
Website: https://voteview.com/rollcall/RH1030237
Scrape: All the Congressman's "name" "state" "vote" - Last part of the website: Votes (Sort by Party,State,Vote,Ideology,Vote Probability)
Here is my code:
This is my first post here, sorry if this post confuses you and please tell me to improve, many thanks!
I'd like to scrape string contents from a website, but it didn't work and I don't know how to solve it. Any ideas about it would be much grateful.
Here is detail info:
Website: https://voteview.com/rollcall/RH1030237
Scrape: All the Congressman's "name" "state" "vote" - Last part of the website: Votes (Sort by Party,State,Vote,Ideology,Vote Probability)
Here is my code:
import requests
import bs4
from bs4 import BeautifulSoup
def getHTMLText(url):
try:
r = requests.get(url)
r.encoding = r.apparent_encoding
return r.text
except:
return ""
def fillPList(plist, html):
soup = BeautifulSoup(html, "html.parser")
for li in soup.find('ul'):
if isinstance(li, bs4.element.Tag):
spans = li('span')
plist.append([spans[0].string, spans[1].string, spans[2].string])
def printPList(plist, num):
print("{:^10}\t{:^10}\t{:^10}".format("name", "state", "vote"))
for i in range(num):
p = plist[i]
print("{:^10}\t{:^10}\t{:^10}".format(p[0], p[1], p[2]))
def main():
pinfo = []
url = 'https://voteview.com/rollcall/RH1030237'
html = getHTMLText(url)
fillPList(pinfo, html)
printPList(pinfo, 435)
with open(r'D:\KKKKKK\103_hr1876.csv', 'a', encoding='utf-8') as f:
f.write("{},{},{}\n".format("name", "state", "vote"))
main()Here is error I got:Error:Traceback (most recent call last):
File "D:/AA_Software/Pycharm/PycharmProjects/untitled/voteview.py", line 30, in <module>
main()
File "D:/AA_Software/Pycharm/PycharmProjects/untitled/voteview.py", line 26, in main
fillPList(pinfo, html)
File "D:/AA_Software/Pycharm/PycharmProjects/untitled/voteview.py", line 16, in fillPList
plist.append([spans[0].string, spans[1].string, spans[2].string])
IndexError: list index out of rangeI have done some work about this error, it's said the list is none, so there will be error when you try to print the list. But there are contents in the website source code. This is my first post here, sorry if this post confuses you and please tell me to improve, many thanks!
