Feb-11-2022, 09:19 AM
I'm gathering Card data for a game I like. some cards don't have the certain text and others do, the script searches for it and when it comes across a card with missing data it just breaks and doesn't continue.
I need to ignore a few sections, not all cards have weakness/flavor text/evolving etc
There are about 230 cards to scrape, the code below stops after 6, as the 7th card on the page doesn't have "flavor" text. if I comment out "Flavour" it scrapes about 126 cards, as the card it stops on doesn't have "att" or "weak" etc.
So I need to tell the script, if you come across something missing, just ignore it and move on. But I don't know how to do this.
Here is my code
I need to ignore a few sections, not all cards have weakness/flavor text/evolving etc
There are about 230 cards to scrape, the code below stops after 6, as the 7th card on the page doesn't have "flavor" text. if I comment out "Flavour" it scrapes about 126 cards, as the card it stops on doesn't have "att" or "weak" etc.
So I need to tell the script, if you come across something missing, just ignore it and move on. But I don't know how to do this.
Here is my code
from bs4 import BeautifulSoup
import requests, openpyxl
excel = openpyxl.Workbook()
print(excel.sheetnames)
sheet = excel.active
sheet.title = "Pokemon Cards"
print(excel.sheetnames)
sheet.append(['title', 'slug', 'sku', 'category_id', 'price', 'discount_rate', 'vat_rate', 'stock', 'description', 'image_url', 'external_link'])
try:
source = requests.get('https://pkmncards.com/set/chilling-reign/?sort=date&ord=auto&display=full')
source.raise_for_status()
soup = BeautifulSoup(source.text, 'html.parser')
cards = soup.find_all(class_="entry")
for card in cards:
#title = card.find(class_="card-title")
title = card.find('h2').span.text
details = card.find(class_='card-tabs').text
image_url = card.find(class_='card-image-area').a
price = card.find(class_='m').span.text
name = card.find(class_='name-hp-color').text
att = card.find(class_="tab").find(class_="text").text
evol = card.find(class_='type-evolves-is').text
weak = card.find(class_='weak-resist-retreat')
ill = card.find(class_='illus minor-text').text
release = card.find(class_='release-meta minor-text').text
stan = card.find(class_='mark-formats minor-text').text
flavor = card.find(class_='flavor minor-text').text
slug = ""
sku = ""
category_id = "50"
discount_rate = ""
vat_rate = ""
stock = "4"
external_link = ""
#description1 = "<b>Card Name</b> " + name + " <br> " + evol + " <br> " + att + " <br> " + weak + " <br> " + ill + " <br> " + release + " <br> "+ stan + " <br> " + " <br><br> " + "All Prices are subject to change please message me for more details."
#description = description1.replace("Pokémon", "Pokemon").replace("×", "x").replace(" → ", " > ").replace("⇢", ">").replace("↘", ">").replace(" · ", " - ").replace(" › ", " > ").replace("’", "'")
print(title)
#print(title, description, image_url.get('href'), price)
#sheet.append([title, slug, sku, category_id, price, discount_rate, vat_rate, stock, description, image_url.get('href'), external_link])
except Exception as e:
print(e)
#excel.save('chill_all4.xlsx')
