Use curl-impersonate to work round 403 errors scrapping #1263

thomas-333 · 2024-12-19T16:31:17Z

thomas-333
Dec 19, 2024

On a number of sites I try and scrape I get a 403 error due to website protection. I note this has already been raised previously. Can I suggest that curl-impersonate is used in the docker image as a work around. This may not work for all sites but it does help with some of the sites I want to scrape.

benjaminjonard · 2024-12-19T17:00:45Z

benjaminjonard
Dec 19, 2024
Maintainer

I made a few attempts to improve this a while ago but it wasn't very conclusive . I just did a quick and dirty test with curl-impersonate and it seems to be working pretty well. I'll have a closer look after Christmas.

Thanks for your suggestion

0 replies

thomas-333 · 2024-12-30T21:02:26Z

thomas-333
Dec 30, 2024
Author

Hi. I see from recent commits that curl-impersonate has been incorporated into v1.6.0 however as far as I can tell it's not working. Is the code operational or is it just there for testing purposes at the moment?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use curl-impersonate to work round 403 errors scrapping #1263

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Use curl-impersonate to work round 403 errors scrapping #1263

thomas-333 Dec 19, 2024

Replies: 2 comments

benjaminjonard Dec 19, 2024 Maintainer

thomas-333 Dec 30, 2024 Author

thomas-333
Dec 19, 2024

benjaminjonard
Dec 19, 2024
Maintainer

thomas-333
Dec 30, 2024
Author