Use curl-impersonate to work round 403 errors scrapping #1263
thomas-333
started this conversation in
Ideas
Replies: 2 comments
-
I made a few attempts to improve this a while ago but it wasn't very conclusive . I just did a quick and dirty test with curl-impersonate and it seems to be working pretty well. I'll have a closer look after Christmas. Thanks for your suggestion |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi. I see from recent commits that curl-impersonate has been incorporated into v1.6.0 however as far as I can tell it's not working. Is the code operational or is it just there for testing purposes at the moment? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
On a number of sites I try and scrape I get a 403 error due to website protection. I note this has already been raised previously. Can I suggest that curl-impersonate is used in the docker image as a work around. This may not work for all sites but it does help with some of the sites I want to scrape.
Beta Was this translation helpful? Give feedback.
All reactions