Code scrape more than one time information

Clnprof · Aug-25-2019, 12:57 PM

I'm beginner in python and webscraping. My objectif was to scrape 30 reviews from a tripadvisor restaurant. But when I open the file I have 301 reviews, the 30 reviews are repeated more than five times. Could you tell me what is wrong?... What am I missing? ... This is my code :

with requests.Session() as s:
        for offset in range(10,40):
            url = f'https://www.tripadvisor.fr/Restaurant_Review-g187147-d947475-Reviews-or{offset}-Le_Bouclard-Paris_Ile_de_France.html'
            r = s.get(url)
            soup = bs(r.content, 'lxml')
            reviews = soup.select('.reviewSelector')
            ids = [review.get('data-reviewid') for review in reviews]
            r = s.post(
                    'https://www.tripadvisor.fr/OverlayWidgetAjax?Mode=EXPANDED_HOTEL_REVIEWS_RESP&metaReferer=',
                    data = {'reviews': ','.join(ids), 'contextChoice': 'DETAIL'},
                    headers = {'referer': r.url}
                    )
              
            soup = bs(r.content, 'lxml')
            if not offset:
                inf_rest_name = soup.select_one('.heading').text.replace("\n","").strip()
                rest_eclf = soup.select_one('.header_links a').text.strip()
  
            for review in soup.select('.reviewSelector'):
                name_client = review.select_one('.info_text > div:first-child').text.strip()
                date_rev_cl = review.select_one('.ratingDate')['title'].strip()
                titre_rev_cl = review.select_one('.noQuotes').text.strip()
                opinion_cl = review.select_one('.partial_entry').text.replace("\n","").strip()
                row = [f"{inf_rest_name}", f"{rest_eclf}", f"{name_client}", f"{date_rev_cl}" , f"{titre_rev_cl}", f"{opinion_cl}"]
                w.writerow(row)

I tried to change the variable review for opinion_cl, because I thought that it was the error, but it shows me the same 301 reviews. I will appreciate your help.

***stranac*** · Aug-26-2019, 05:49 AM

Your loop runs 30 times, once for each number between 10 and 40.

Every number 10-19 gets redirected to 10, 20-29 get redirected to 20, and 30-39 get redirected to 30.
This means you scrape each of those pages 10 times, geting 10 duplicates for each review.

Maybe you meant for your loop to be for offset in range(10, 40, 10): instead?

Clnprof · Aug-26-2019, 06:52 AM

Thank you so much! It works perfectly. So it want to say if I want to scrape from 220 to 890 reviews I have to put "for offset in rage(220,890,220), that's right?

***stranac*** · Aug-26-2019, 07:40 AM

No, the third argument to range() is the step, which you want to be 10 (every tenth number).

Clnprof · Aug-26-2019, 08:06 AM

Great! thank you again!

Clnprof · Aug-26-2019, 09:26 AM

I have other question . I need other page who has at least 1000 reviewers. I ran the code at 10h40. Now it doesn't show information scraped and I tried to run again the code and it seems to be blcked. It doesn't answer. Is it normal? what can I do to unblock the code? and take information faster?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Scrape medical information from MedlinePlus	Nawahda	0	1,416	Jun-21-2024, 10:45 AM Last Post: Nawahda
	How do I scrape profile information from Twitter People search results?	asdad	0	1,717	Nov-29-2022, 10:25 AM Last Post: asdad
	Assistance with running a few lines of code at an EXACT time	nethatar	5	5,969	Feb-24-2021, 10:43 PM Last Post: nilamo
	Stumped by my own code (ratio & epoch-time calculation).	MvGulik	2	3,600	Dec-30-2020, 12:04 AM Last Post: MvGulik
	Code taking too much time to process	ErPipex	11	9,066	Nov-16-2020, 09:42 AM Last Post: DeaD_EyE
	What is the run time complexity of this code and please explain?	samlee916	2	3,487	Nov-06-2020, 02:37 PM Last Post: deanhystad
	The count variable is giving me a hard time in this code	D4isyy	2	3,345	Aug-09-2020, 10:32 PM Last Post: bowlofred
	Having a hard time combining two parts of code.	Coozeki	6	5,465	May-10-2020, 06:50 AM Last Post: Coozeki
	Parsing Date/Time from Metar Reports with 6 hourly weather information	Lawrence	0	3,668	May-03-2020, 08:15 PM Last Post: Lawrence
	How to avoid open and save a url every time I run code	davidm	4	4,443	Mar-03-2020, 10:37 PM Last Post: snippsat

Code scrape more than one time information

User Panel Messages

Announcements