Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper only scrapes the first review page #5

Open
jimmy10023 opened this issue Jan 16, 2021 · 5 comments
Open

Scraper only scrapes the first review page #5

jimmy10023 opened this issue Jan 16, 2021 · 5 comments

Comments

@jimmy10023
Copy link

Hi, first of all thank you for the code!

I am however having the problem that when scraping multiple pages of reviews for the same product, only the first page gets scraped. The other pages get "scraped" too and show up in the data, but the actual reviews extracted from them are the ones from the first page.

Does anyone know how to fix this?

Thank you!

@hellochang
Copy link

Hi! I have the same issue, have you figured out how to fix it yet?

@karthikmagesan
Copy link

Hi, I am also facing same issue. Please let me know if you figured out how to fix it.

@cyanobrian
Copy link

cyanobrian commented Sep 26, 2021

I believe when you place the URLs in a TXT file, it reads the new line (\n) character when could mess with the URL. I found that if I strip off the last character of the URL being read or place the URLs in a python list, it worked fine for me.

@jmccaffrey
Copy link

I wanted to start with just an asin, get to the first page of reviews, and then keep going to the next page.
You basically have to pull the url from next_page and loop on that

@ms-shashank
Copy link

Hi, first of all thank you for the code!

I am however having the problem that when scraping multiple pages of reviews for the same product, only the first page gets scraped. The other pages get "scraped" too and show up in the data, but the actual reviews extracted from them are the ones from the first page.

Does anyone know how to fix this?

Thank you!

The thing is Amazon restricts from scraping ,so when you make too many frequent requests this happens and scrapes only the first page and repeats this only so i would suggest use the request.sessions this might work and it worked for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants