Using Python to search through a list of urls

jeremy · (This post was last modified: Dec-17-2019, 04:09 AM by Larz60+.)

I want to be able to extract data from multiple pages. The pages are in the following format:

Output:https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=1&sort_order=price_asc

https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=2&sort_order=price_asc

https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=3&sort_order=price_asc

In these links the only thing that changes in the url is the number following page=

I have created code so far that exports into a results into a csv file. However this only works for 1 url:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=1&sort_order=price_asc'

# opening up connection, grabbing the page
uClient =  uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parser
page_soup = soup(page_html, "html.parser")


# grabs each property
listings = page_soup.findAll("div",{"class":"tmp-search-card-list-view__card-content"})

filename = "trademe.csv"
f = open(filename, "w")

headers = "title, price, area\n"

f.write(headers)

for listing in listings:

	title_listing = listing.findAll("div", {"class":"tmp-search-card-list-view__title"})
	price_listing = listing.findAll("div", {"class":"tmp-search-card-list-view__price"})
	area_listing = listing.findAll("div", {"class":"tmp-search-card-list-view__subtitle"})
	title = title_listing[0].text.strip()
	price = price_listing[0].text.strip()
	area = area_listing[0].text.strip()

	print("title: " + title)
	print("price: " + price)
	print("area: " + area)

	f.write(title.replace(",", "^") + "," + price.replace(",", "") + "," + area.replace(",", "^") + "\n")

f.close()

How would I get these working so that it keeps going through all the numbers of urls?

I could create a textfile with the possible links but still not sure what to do to get this to work

I'm new to python

**Larz60+** · Dec-17-2019, 04:27 AM

You can do this with a generator
Note that I have given this default values for start and end, so as in first call, the defaults will be used if called without values:

def get_next_url(start_page_no=1, end_page_no=5):
    for pgno in range(start_page_no, end_page_no+1):
        yield(f"https://www.trademe.co.nz/browse/" \
            f"categoryattributesearchresults.aspx" \
            f"?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&" \
            f"rsqid=d4360a620e944164b321dc2498f327b9-002" \
            f"&nofilters=1&originalsidebar=1&key=1227701521" \
            f"&page={pgno}&sort_order=price_asc")

def main():
    # called, using default:
    print('\nUsing defailt:')
    for url in get_next_url():
        print(f"\nurl: {url}")

    # called providing start and end values
    start_page = 7
    end_page = 10
    print(f"\n\nProviding start and end pages")
    for url in get_next_url(start_page, end_page):
        print(f"\nurl: {url}")

if __name__ == '__main__':
    main()

results of running above:

Output:Using defailt:

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=1&sort_order=price_asc

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=2&sort_order=price_asc

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=3&sort_order=price_asc

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=4&sort_order=price_asc

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=5&sort_order=price_asc


Providing start and end pages

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=7&sort_order=price_asc

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=8&sort_order=price_asc

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=9&sort_order=price_asc

url: https://www.trademe.co.nz/browse/categoryattributesearchresults.aspx?cid=5748&search=1&134=9&135=2&rptpath=350-5748-&rsqid=d4360a620e944164b321dc2498f327b9-002&nofilters=1&originalsidebar=1&key=1227701521&page=10&sort_order=price_asc

jeremy · Dec-17-2019, 07:07 PM

Thanks so much for your help Larz! Still not sure how to get my code incorporated into this. How do I get it so that for my scrape it scrapes all of the pages and gives results in excel such as:

title price area
Project Sleepout/extra room for relocation $28,750 Saleyards Road^ Kauri^ Whangarei
Leisurebuilt Cabin $56,000 Waipu Cove^ Whangarei
Large Two Bedroom Open Plan Relocatable $74,000 Kamo^ Whangarei
3 Bedroom Family Home - 1 Piece Move $82,000 Otangarei^ Whangarei
Solid 3 Bedroom 1 Piece Move $82,000 Kamo^ Whangarei
Three bedroom house for relocation $85,000 Saleyards Road^ Kauri^ Whangarei
BUILD YOUR EASY CARE RURAL HOME HERE! $95,000 Titoki^ Whangarei
Vendors Say Present All Offers Price by negotiation Whangarei Heads^ Whangarei
Relocatable homes - New $105,900 34 Lakeside Park Road^ Ruakaka^ Whangarei
Lovely Character Villa Relocated To Your Site $110,000 Waipu^ Whangarei
Lovely Character Villa $110,000 Kamo^ Whangarei
Solid 3 Bedroom Bungalow $135,000 Maungakaramea^ Whangarei
Create Your Dream Right Here! Price by negotiation Raumanga^ Whangarei
Let your imagination run wild ... $149,000 Maunu^ Whangarei
WHANGAREI'S CHEAPEST SECTION $150,000 Raumanga^ Whangarei
Price reduced - now only $159^000 $159,000 1/891 Cove Road^ Waipu^ Whangarei
Your holiday awaits! $167,000 3/891 Cove Road^ Waipu^ Whangarei
Best Value Around? $169,000 Kamo^ Whangarei
Look^ the price is not a mistake!! $169,000 Kamo^ Whangarei
The perfect beach getaway and investment $179,500 5/891 Cove Road^ Waipu^ Whangarei
Elevated with 180 degree views $185,000 Kamo^ Whangarei
Prime Sections - Kotata Heights Morningside Enquiries over $190000 Morningside^ Whangarei
Dream it - Build It - Live Here Price by negotiation Horahora^ Whangarei
Price Reduced - Pebble Beach Boulevard Section $190,000 Kamo^ Whangarei

My current file does this but it only does this on one page

**Larz60+** · Dec-17-2019, 09:05 PM

you need to put it all in a loop:

for url in get_next_url(start_page, end_page):
    # code for each page goes here

Malt · (This post was last modified: Dec-18-2019, 11:54 AM by Malt.)

You are getting parsed values from for loop right? that you need to write to csv file in parallel

For ex:

with open('data.csv','w+') as f:
    for url in get_next_url(start_page, end_page):
        print(f"\nurl: {url}")
        f.write(url)

And if you need more specific headers then you can very well use csv module to create all the headers you want and you have to parse all the urls accordingly to get the details you are looking for. Details needs to be populated under respective headers in csv file

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How to to extract urls across multple webpages at once?	ilovewacha	2	1,601	Jun-18-2024, 06:04 PM Last Post: snippsat
	Search Excel File with a list of values	huzzug	4	5,059	Nov-03-2023, 05:35 PM Last Post: huzzug
	search a list or tuple for a specific type ot class	Skaperen	8	4,752	Jul-22-2022, 10:29 PM Last Post: Skaperen
	Use one list as search key for another list with sublist of list	jc4d	4	3,740	Jan-11-2022, 12:10 PM Last Post: jc4d
	Search in an unsorted list	amir_0402	2	23,676	Jun-04-2020, 10:25 PM Last Post: deanhystad
	Alpha numeric element list search	rhubarbpieguy	1	2,944	Apr-01-2020, 12:41 PM Last Post: pyzyx3qwerty
	search binary file and list all founded keyword offset	Pyguys	4	5,269	Mar-17-2020, 06:46 AM Last Post: Pyguys
	Urls in a file to be executed	pyseeker	2	3,210	Sep-09-2019, 03:38 PM Last Post: pyseeker
	user validation for opening urls	Ashley	6	4,809	Jul-08-2019, 09:08 PM Last Post: metulburr
	Search a List of Dictionaries by Key-Value Pair; Return Dictionary/ies Containing KV	dn237	19	11,757	May-29-2019, 02:27 AM Last Post: heiner55

Using Python to search through a list of urls

User Panel Messages

Announcements