Sep-09-2017, 10:59 PM
(This post was last modified: Sep-10-2017, 01:13 PM by Prince_Bhatia.)
i am trying to scrape image and table from a wikipedia page and write it into csv but i am confused that how to club them together and write this data into csv.
below are my codes
below are my codes
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/Kevin_Bacon"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")
newfile = "Newlyout.csv"
f = open(newfile, "w")
Headers = "Year, Association, Category, Nominated, Results, Imagelink\n"
f.write(Headers)
soup1 = soup.find_all("img")
for i in soup1:
Image = i['src']
#ddprint(Image['src'])
soup3 = soup.find("table", {"class":"wikitable sortable"})
for tag in soup3.find_all("tr"):
cell = tag.find_all("td")
if len(cell) == 5:
Year = cell[0].find(text=True)
Association = cell[2].find(text=True)
Category = cell[3].find(text=True)
Nominated = cell[4].find(text=True)
Results = cell[4].find(text=True)
f.write("{}".format(Year)+ ",{}".format(Association)+ ",{}".format(Category) + ",{}".format(Nominated) + ",{}".format(Results)+ ",{}".format(Image)+"\n")
f.close()i got it solved till here but it is repeating the data..and in images there are multiple images in one single cell....all i need table and against it all images in that page..
