Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy Errors #101

Open
jj2018jj opened this issue Dec 13, 2024 · 3 comments
Open

Proxy Errors #101

jj2018jj opened this issue Dec 13, 2024 · 3 comments

Comments

@jj2018jj
Copy link

jj2018jj commented Dec 13, 2024

If I do a scrape without the -proxies flag, it works as expected.

Also if I use curl and the proxy url to the google maps url then it works as expected.

However when I add the -proxies flag I see these errors in the output and it does not work.

This is with everything installed directly on ubuntu 24.04 ... and not using Docker.

{"level":"info","component":"scrapemate","time":"2024-12-13T17:55:43.067587274Z","message":"starting scrapemate"}
{"level":"info","component":"scrapemate","numOfJobsCompleted":0,"numOfJobsFailed":0,"lastActivityAt":"0001-01-01T00:00:00Z","speed":"0.00 jobs/min","time":"2024-12-13T17:56:43.068652477Z","message":"scrapemate stats"}
{"level":"info","component":"scrapemate","error":"inactivity timeout: 0001-01-01T00:00:00Z","time":"2024-12-13T17:56:43.068682503Z","message":"exiting because of inactivity"}
{"level":"info","component":"scrapemate","job":"Job{ID: 71d978af-8a91-45ee-9d56-dd630099dc31, Method: GET, URL: https://www.google.com/maps/search/test, UrlParams: map[hl:en]}","error":"context canceled","status":"failed","duration":60899.47342,"time":"2024-12-13T17:56:43.967269068Z","message":"job finished"}
{"level":"error","component":"scrapemate","error":"context canceled","time":"2024-12-13T17:56:43.967304912Z","message":"error while processing job"}
{"level":"info","component":"scrapemate","time":"2024-12-13T17:56:43.967340977Z","message":"scrapemate exited"}

Any tips on what I could be doing wrong?

Update1: I tested a 2nd proxy service and I also tried setting the proxy with export http_proxy="proxy url here" and export https_proxy="proxy url here" while not using the -proxies flag ... but it still didn't work.

Update 2: I tested a 3rd proxy service using socks5 and the -proxies flag and got this error {"level":"error","component":"scrapemate","error":"playwright: Browser does not support socks5 proxy authentication","time":"2024-12-14T22:00:49.630242703Z","message":"error while processing job"}

@gosom
Copy link
Owner

gosom commented Dec 15, 2024

I just tested the functionality using a socks5 proxy:

ssh -D 1080 -q -C -N myserver

and then used this:

 go run main.go -input example-queries.txt -results demo.csv  -proxies 'socks5://127.0.0.1:1080'

also tried from the web interface.

The traffic went through the proxy.

The difference here is that I use a socks5 proxy without authentication.

I don't have access to a proxy with authentication at the moment.

I have tried the http proxy with authentication when this was implemented and worked.

@gosom
Copy link
Owner

gosom commented Dec 16, 2024

@jj2018jj this is confirmed . I believe that something may have changed in playwright.

I will investigate and get back to you

@EdwinUK
Copy link

EdwinUK commented Jan 9, 2025

@gosom Any update on this?

Also getting this error when trying to use HTTPS proxies:

{"level":"info","component":"scrapemate","time":"2025-01-09T08:41:10.0880056Z","message":"starting scrapemate"}
{"level":"info","component":"scrapemate","numOfJobsCompleted":0,"numOfJobsFailed":0,"lastActivityAt":"0001-01-01T00:00:00Z","speed":"0.00 jobs/min","time":"2025-01-09T08:42:10.0895125Z","message":"scrapemate stats"}
{"level":"info","component":"scrapemate","error":"inactivity timeout: 0001-01-01T00:00:00Z","time":"2025-01-09T08:42:10.0895125Z","message":"exiting because of inactivity"}
{"level":"info","component":"scrapemate","job":"Job{ID: e8d7abd1-4a27-4453-99f8-a3a640ff5daa, Method: GET, URL: https://www.google.com/maps/search/roofing+in+Worcester+Park/@0,0,15z, UrlParams: map[hl:en]}","error":"context canceled","status":"failed","duration":62254.0603,"time":"2025-01-09T08:42:12.343207Z","message":"job finished"}
{"level":"error","component":"scrapemate","error":"context canceled","time":"2025-01-09T08:42:12.343207Z","message":"error while processing job"}
{"level":"info","component":"scrapemate","time":"2025-01-09T08:42:12.343207Z","message":"scrapemate exited"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants