Oct-19-2017, 07:36 AM
I'm new to programming and am having trouble scraping with BS4.
I'm webmaster for a popular website (can't share it here, but it uses Disqus comments platform).
I want to scrape the vote count and the message in top comments within a set range.. (scrape comments with 20-200 upvotes).
I noticed that:
I've been playing around with some code working on an example site, but so far no success:
count-20
count-21
count-22
...but sadly it doesn't work.
Can anyone help me understand the proper way?
I'm webmaster for a popular website (can't share it here, but it uses Disqus comments platform).
I want to scrape the vote count and the message in top comments within a set range.. (scrape comments with 20-200 upvotes).
I noticed that:
- Vote count should be easy to scrape since the upvote count can be found in the 'a class', example: "count-116"
- The problem is that 'a class' isn't linked to the message text in any way I can see
I've been playing around with some code working on an example site, but so far no success:
from bs4 import BeautifulSoup
import urllib.request
import re
scrape = urllib.request.urlopen('https://disqus.com/home/discussion/channel-discussdisqus/disqus_leaderboard_what_are_the_best_sports_websites/').read()
#soup = BeautifulSoup(scrape,'lxml')
soup = BeautifulSoup(scrape, 'html.parser')
for elem in soup.find_all('a', src=re.compile('count-116')):
print (elem['src'])^ This was my attempt to scrape the 'a' element that contains 'count-116', I was going to run it in a while loop with an increment.. count-20
count-21
count-22
...but sadly it doesn't work.
Can anyone help me understand the proper way?

![[Image: cZOQiN22SeCAk3ONOdtA2Q.png]](https://image.prntscr.com/image/cZOQiN22SeCAk3ONOdtA2Q.png)
![[Image: xv7IbtAoT5m06efURxvhFw.png]](https://image.prntscr.com/image/xv7IbtAoT5m06efURxvhFw.png)