May-31-2018, 09:04 AM
Hi there,
If got a little problem with my code as I did not code python before but want to or have to do it for my research project.
I want to crawl a website for a set of data. My research project is to gather data from their website and put in in excel.
Here is my code so far:
)
1. I need to find out if every single ICO (sub pages) has a whitepaper or not. As it's an onclick field I don't know how to search for it and see if there is a whitepaper or not.
2. The export of the data to a csv file (excel): The print looks kinda messy atm. Some parts are in lines orther in columns etc. As you might guess I need to make a beautiful chart with each ICO in a seperate line and the different elements in different columns to be able to use R or some other program to do the statistics.
I would be glad for any help!!
Thanks a lot in advance for any support.
aStudent (in urgent need of help)
If got a little problem with my code as I did not code python before but want to or have to do it for my research project.
I want to crawl a website for a set of data. My research project is to gather data from their website and put in in excel.
Here is my code so far:
import requests
from bs4 import BeautifulSoup
# Erstellen eines Crawlers fuer die Seite icobench.com der die jeweiligen Links (Unterseiten) aller beendeten ICOs zum
# aktuellen Zeitpunkt aufruft und deren Titel ausgibt.
def ended_ico_spider(max_pages):
page = 1
while page <= max_pages:
url = "https://icobench.com/icos?&filterBonus=&filterBounty=&filterMvp=&filterTeam=&filterExpert=&" \
"filterSort=&filterCategory=all&filterRating=any&filterStatus=ended&filterPublished=&" \
"filterCountry=any&filterRegistration=0&filterExcludeArea=none&filterPlatform=any&filterCurrency=any&" \
"filterTrading=any&s=&filterStartAfter=&filterEndBefore=0&page= " + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "lxml")
for link in soup.findAll('a', {'class': 'name'}):
href = "https://icobench.com/" + link.get('href')
title = link.string
print (title)
get_single_ico_rating(href)
get_single_ico_fixed_data(href)
get_single_ico_financial_token_info(href)
get_single_ico_financial_investment_info(href)
# get_single_ico_whitepaper(href)
page += 1
# Abrufen der einzelnen Datenbloecke, der jeweiligen Unterseite. Felder wurden entsprechend des HTML-Codes benannt.
def get_single_ico_rating(single_item_url):
source_code = requests.get(single_item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "lxml")
# Daten aus dem Wertungsfeld
for data in soup.findAll('div', {'class': ['rate color1', 'rate color2', 'rate color3', 'rate color4',
'rate color5', 'col_4 col_3']}):
print(data.text),
def get_single_ico_fixed_data(single_item_url):
source_code = requests.get(single_item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "lxml")
for fixed_data in soup.findAll('div', {'class': 'col_2'}):
print(fixed_data.text)
def get_single_ico_financial_token_info(single_item_url):
source_code = requests.get(single_item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "lxml")
for financial_token_info in soup.findAll('div', {'class': 'box_left'}):
print(financial_token_info.text)
def get_single_ico_financial_investment_info(single_item_url):
source_code = requests.get(single_item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "lxml")
for investment_info in soup.findAll('div', {'class': 'box_right'}):
print(investment_info.text)
# Ich moechte hier herausfinden, ob ein whitepaper auf der jeweiligen Unterseite vorhanden ist oder nicht. Falls eins
# vorhanden ist kann ein Wert X zurueckgegeben werden, ansonsten ein Wert Y.
# def get_single_ico_whitepaper(href):
# source_code = requests.get(href)
# plain_text = source_code.text
# soup = BeautifulSoup(plain_text, "lxml")
# for whitepaper_link in soup.findAll('div', {'class': 'onclick'}):
# print(whitepaper_link.text)
ended_ico_spider(1) Well and there are some parts missing and I would be glad for any help you could offer me. Here are the missing points I couldn't solve even though I search the web for hours (guess I'm just a noob in python
)1. I need to find out if every single ICO (sub pages) has a whitepaper or not. As it's an onclick field I don't know how to search for it and see if there is a whitepaper or not.
2. The export of the data to a csv file (excel): The print looks kinda messy atm. Some parts are in lines orther in columns etc. As you might guess I need to make a beautiful chart with each ICO in a seperate line and the different elements in different columns to be able to use R or some other program to do the statistics.
I would be glad for any help!!
Thanks a lot in advance for any support. aStudent (in urgent need of help)
