Skip to content
Change the repository type filter

All

    Repositories list

    • Page Object pattern for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      28119104Updated Dec 27, 2024Dec 27, 2024
    • python parser for human readable dates
      Python
      BSD 3-Clause "New" or "Revised" License
      4662.6k28851Updated Dec 27, 2024Dec 27, 2024
    • Scrapy entrypoint for Scrapinghub job runner
      Python
      BSD 3-Clause "New" or "Revised" License
      162581Updated Dec 27, 2024Dec 27, 2024
    • Python
      BSD 3-Clause "New" or "Revised" License
      151322Updated Dec 24, 2024Dec 24, 2024
    • A client interface for Scrapinghub's API
      Python
      BSD 3-Clause "New" or "Revised" License
      63203232Updated Dec 16, 2024Dec 16, 2024
    • spidermon

      Public
      Scrapy Extension for monitoring spiders execution.
      Python
      BSD 3-Clause "New" or "Revised" License
      99535406Updated Dec 10, 2024Dec 10, 2024
    • Software stack with latest Scrapy and updated deps
      Dockerfile
      BSD 3-Clause "New" or "Revised" License
      206320Updated Dec 2, 2024Dec 2, 2024
    • More flexible and featured Frontera scheduler for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      53521Updated Nov 29, 2024Nov 29, 2024
    • Python Social Auth - Application - Django
      Python
      BSD 3-Clause "New" or "Revised" License
      381201Updated Nov 18, 2024Nov 18, 2024
    • extruct

      Public
      Extract embedded metadata from HTML markup
      Python
      BSD 3-Clause "New" or "Revised" License
      1138663814Updated Nov 8, 2024Nov 8, 2024
    • Formasaurus tells you the type of an HTML form and its fields using machine learning
      HTML
      48700Updated Nov 7, 2024Nov 7, 2024
    • Extract price amount and currency symbol from a raw text string
      Python
      BSD 3-Clause "New" or "Revised" License
      50316179Updated Nov 6, 2024Nov 6, 2024
    • Parse numbers written in natural language
      Python
      BSD 3-Clause "New" or "Revised" License
      23109126Updated Oct 23, 2024Oct 23, 2024
    • web-poet

      Public
      Web scraping Page Objects core library
      Python
      BSD 3-Clause "New" or "Revised" License
      15951613Updated Oct 16, 2024Oct 16, 2024
    • andi

      Public
      Library for annotation-based dependency injection
      Python
      BSD 3-Clause "New" or "Revised" License
      52331Updated Oct 16, 2024Oct 16, 2024
    • A python binding for crfsuite
      Python
      MIT License
      221771453Updated Oct 1, 2024Oct 1, 2024
    • streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
      Python
      Apache License 2.0
      218201Updated Sep 20, 2024Sep 20, 2024
    • splash

      Public
      Lightweight, scriptable browser as a service with an HTTP API
      Python
      BSD 3-Clause "New" or "Revised" License
      5124.1k37726Updated Aug 2, 2024Aug 2, 2024
    • A Postgres-backed ContentsManager implementation for IPython
      Python
      Apache License 2.0
      85201Updated Jul 18, 2024Jul 18, 2024
    • Crawl Frontier HCF backend
      Python
      BSD 3-Clause "New" or "Revised" License
      5721Updated Jul 17, 2024Jul 17, 2024
    • shublang

      Public
      Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
      Python
      BSD 3-Clause "New" or "Revised" License
      815236Updated Jul 9, 2024Jul 9, 2024
    • An opinionated fork of the Drone CI system
      Go
      Other
      383005Updated Jul 7, 2024Jul 7, 2024
    • varanus

      Public
      A command line spider monitoring tool
      Python
      7822Updated Jul 6, 2024Jul 6, 2024
    • scrapyrt

      Public
      HTTP API for Scrapy spiders
      Python
      BSD 3-Clause "New" or "Revised" License
      161842246Updated Jun 28, 2024Jun 28, 2024
    • portia

      Public
      Visual scraping for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      1.4k9.3k11119Updated Jun 26, 2024Jun 26, 2024
    • scikit-learn inspired API for CRFsuite
      Python
      215200Updated Jun 18, 2024Jun 18, 2024
    • Python
      MIT License
      2403Updated Jun 17, 2024Jun 17, 2024
    • autologin

      Public
      A project to attempt to automatically login to a website given a single seed
      Python
      Apache License 2.0
      431102Updated Jun 17, 2024Jun 17, 2024
    • Python wrapper for the Intercom API.
      Python
      Other
      144101Updated Jun 17, 2024Jun 17, 2024
    • luigi

      Public
      Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
      Python
      Apache License 2.0
      2.4k401Updated Jun 7, 2024Jun 7, 2024