Hi all, for my chessclub I'm trying to automate collecting Timeout percentages.
It hidden in this code: <aside>7.69%</aside>
It's not always a number with decimals, but when it is I can only collect the last 2 decimals, which is a problem.
I need the first digits, or the complete number and it also has to work when the number is 0% or 10% or 100% instead of 24.76%
The code I have is here:
So my code is made up of a lot of copy pasta...
Could someone please help me out with this one or point me in the right direction?
It hidden in this code: <aside>7.69%</aside>
<ul class="stats-list no-border"> <li> Winning Streak <aside>19</aside> </li> <li> Time per Move <aside>14 hours 15 minutes</aside> </li> <li> Timeouts <span class="stats-list-info" tip="Last 3 Months" tip-popup-delay="0"><i class="icon-circle-question" ></i></span> <aside>7.69%</aside> </li> <li> Glicko RD <aside> 73 </aside> </li> <li> Top Opponent <aside>N/A</aside> </li> </ul> </div> <div class="col-md-6"> <div class="chart-box live"> <span class="ui-select-search-container"> <ui-select class="chess-select" ng-model="model.selectedOpponent" on-select="selectOpponent($item)" ng-cloak> <ui-select-match placeholder="vs. All Opponents" allow-clear="true"> [[ $select.selected.id ]] </ui-select-match> <ui-select-choices repeat="opponent in UI.opponents" refresh="findOpponents($select.search)" refresh-delay="0"> [[ opponent.id ]] </ui-select-choices> </ui-select> </span>*******************************************************************************
It's not always a number with decimals, but when it is I can only collect the last 2 decimals, which is a problem.
I need the first digits, or the complete number and it also has to work when the number is 0% or 10% or 100% instead of 24.76%
The code I have is here:
import sys
import fileinput
import requests
from bs4 import BeautifulSoup
import pandas as dataset
import string
import re
from decimal import *
static_profile_url= REMOVED DUE TO ANTISPAM MEASURES
namen = []
timeouts = []
# Zoek tussen stringpatronen en return waarde als string.
# Dit haalt het TO percentage zonder % uit de html
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start)
timeout = re.compile(r'(\d+)$').search(s[start:end]).group(1)
#timeout = (s.split(first))[1].split(last)[0]
print (timeout)
return (timeout)
except ValueError:
return "error parsing"
def retrieve_timeouts(speler_stats_url):
try:
r = requests.get(speler_stats_url)
soup = BeautifulSoup(r.text, 'lxml')
# stats = stat_soup.findAll(class_='stats-list no-border')
stats = soup.findAll('ul', class_='stats-list no-border')
timeout_percentage = find_between( str(stats), '<aside>', '%</aside>' )
print (timeout_percentage)
return int(timeout_percentage)
except ValueError:
return "error parsing"
print('processing, please wait... this may take a long time!')
fnamen = open('namen.txt', 'r')
tnamen = fnamen.read().splitlines()
for naam in tnamen:
print (naam)
namen.append(naam)
timeouts.append(retrieve_timeouts(static_profile_url + str(naam)))
print (retrieve_timeouts(static_profile_url + str(naam)))
spelersdata = { 'naam': namen, 'timeout': timeouts }
ds = dataset.DataFrame(spelersdata)
f = open('timouts.csv', 'w')
f.writelines(ds.to_csv())
f.close()I don't know why it's not working, I'm not used too coding in Python, let alone building scrapers. So my code is made up of a lot of copy pasta...
Could someone please help me out with this one or point me in the right direction?
