Replies: 3 comments 4 replies
-
It might be something with the query logic. It uses a query that updates the status of jobs in the gmaps_jobs table from new to queued. It selects a single job, ordering by priority and creation time, and locks the job row (FOR UPDATE SKIP LOCKED). This ensures only one scraper processes each job at a time.
I would start by allowing workers to fetch multiple jobs. Then workers can process multiple jobs in parallel, which reduces the wait time between db queries. You can also consider pre fetching & assigning the jobs to each node and write down the result to a central db, but that might be more complicated. Also the default connections to postregsql is 100. I saw it immediately goes to 200. I did set it to 300. Like:
|
Beta Was this translation helpful? Give feedback.
-
I need to check a bit and I will come back to you . @devdovdav the connection go up because it's worker (the -c option) uses the connection pool to fetch ONE job. It makes sense a separate goroutine to fetch N jobs in a batch so all workers . Will get back to you once i find and correct the issue. |
Beta Was this translation helpful? Give feedback.
-
@Wentil @devdovdav in the latest version I have done the following changes: (1) So we expect 1 db connection per worker. The limit however is 10 . (2) @Wentil if you try it out can you let me know how it works? thank you |
Beta Was this translation helpful? Give feedback.
-
I set up a central Postgres server and ten (10) scraping servers, each of which connects to the Postgres Server and rotates its outgoing thread requests randomly through each Server's assigned CIDR /26 pool of public IPs (utilizing 10 such IP pools, in total). While the IP side of it works well enough, it seems like only one Scraping Server is being serviced at a time by the Postgres Server on the internal network in terms of handing out new assignments, and the other Scraping Servers have to queue up and wait for their next tasks -- which brings the Jobs / Minute below 2 in some cases... pretty abysmal, I'm sure you'll agree.
Is there a way to configure the central Postgres Server to better handle I/O from/to the ten scraping servers, so each one can be hitting 100+ Scrapes per Minute?
Beta Was this translation helpful? Give feedback.
All reactions