title | meta-description | meta-keywords | banner-image | banner-heading | about-us-heading | about-us-description | feature-1-image | feature-1-heading | feature-2-image | feature-2-heading | feature-3-image | feature-3-heading | webcrawling-image | webcrawling-heading | webcrawling-description | search-image | search-heading | search-description | big_data-image | big_data-heading | big_data-description | about-bottom-image | about-bottom-heading | about-bottom-description | references-heading | references-image | cta-text | cta-link | layout |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DigitalPebble |
DigitalPebble Ltd is a consultancy specialised in web crawling, natural language processing, search and machine learning. Our expertise is based on open source solutions, such as Apache Nutch, StormCrawler, OpenSearch, ElasticSearch or SOLR. |
DigitalPebble |
/images/banner.png |
Unique challenges need bespoke solutions |
Our unique expertise covers all aspects of a document’s life cycle, from web-wide crawling and collection, content analysis, filtering and categorization to indexing. |
*DigitalPebble* can help your organisation by advising on *best practice* and identifying suitable
resources, designing scalable solutions as well as implementing them. We can help you deploy and monitor your project on your premises or on the [cloud](https://aws.amazon.com/){:target='_blank'}. |
/images/open-source-leader.svg |
Open source leader |
/images/range-of-expertise.svg |
Range of expertise |
/images/proven-track-record.svg |
Proven track record |
/images/web-crawling-new.png |
Web Crawling |
We are the authors and maintainers of [StormCrawler](http://stormcrawler.net/){:target="_blank"}, one of the leading open-source solutions for web crawling. Used by numerous companies all over the world, it is both *scalable and highly configurable*.
We can help you customise [StormCrawler](http://stormcrawler.net/){:target='_blank'} and run it on your premises or in the cloud, or, alternatively, DigitalPebble can run it on your behalf. |
/images/search-result.png |
Search |
We have a large experience of using leading search tools such as [Elasticsearch](https://www.elastic.co/elasticsearch/){:target='_blank'}, [OpenSearch](https://opensearch.org/){:target='_blank'} or [Apache SOLR](https://solr.apache.org/){:target='_blank'}.
Whether you want to index and search texts or any other type of documents, we can help you to design a *search solution* to fit with the rest of your architecture.
Some of our clients have billions of documents indexed, and with our solid background in *Natural Language Processing* and *Machine Learning*, there is a lot we can do to enrich your documents. |
/images/big-data-new.png |
Big Data |
Processing data on a large scale either in streaming or batch can be done with platforms such as [Apache Flink](https://flink.apache.org/){:target="_blank"} or [Apache Storm](https://storm.apache.org/){:target="_blank"}.
In fact, we have built some of our [open source](https://github.com/digitalpebble){:target='_blank'} solutions on these platforms and have a large experience of using them for our clients.
Combined with our know-how and *expertise* in cloud computing, we are confident we can help you deliver your project, no matter how much data you have. |
/images/julien-nioche.png |
Julien Nioche - Director |
Having studied Russian language and culture in Paris and taught French in a school in Kyiv, Ukraine, Julien went on to graduate in Text Engineering and Natural Language Processing. He moved to the UK to work as a researcher at the University of Sheffield in 2005 and founded DigitalPebble in 2008.
Julien has been involved in several open source projects, mainly at the [Apache Software Foundation](https://apache.org/){:target='_blank'}, and was the PMC chair for [Apache Nutch](https://nutch.apache.org/){:target='_blank'}. He is a member of the Apache Software Foundation.
Julien runs *workshops* on web crawling, speaks at [conferences](https://www.youtube.com/playlist?list=PLiqxzwp5B4ZmK1VDjSsPajYxsnFWEvWQa){:target='_blank'} and reviews technical books. He has over 20 years experience in the Java programming language. |
References |
/images/polecat.svg |
GET IN TOUCH |
contact@digitalpebble.com |
homepage |