Skip to content

Latest commit

 

History

History
55 lines (47 loc) · 4.14 KB

index.md

File metadata and controls

55 lines (47 loc) · 4.14 KB
title meta-description meta-keywords banner-image banner-heading about-us-heading about-us-description feature-1-image feature-1-heading feature-2-image feature-2-heading feature-3-image feature-3-heading webcrawling-image webcrawling-heading webcrawling-description search-image search-heading search-description big_data-image big_data-heading big_data-description about-bottom-image about-bottom-heading about-bottom-description references-heading references-image cta-text cta-link layout
DigitalPebble
DigitalPebble Ltd is a consultancy specialised in web crawling, natural language processing, search and machine learning. Our expertise is based on open source solutions, such as Apache Nutch, StormCrawler, OpenSearch, ElasticSearch or SOLR.
DigitalPebble
/images/banner.png
Unique challenges need bespoke solutions
Our unique expertise covers all aspects of a document’s life cycle, from web-wide crawling and collection, content analysis, filtering and categorization to indexing.
*DigitalPebble* can help your organisation by advising on *best practice* and identifying suitable resources, designing scalable solutions as well as implementing them. We can help you deploy and monitor your project on your premises or on the [cloud](https://aws.amazon.com/){:target='_blank'}.
/images/open-source-leader.svg
Open source leader
/images/range-of-expertise.svg
Range of expertise
/images/proven-track-record.svg
Proven track record
/images/web-crawling-new.png
Web Crawling
We are the authors and maintainers of [StormCrawler](http://stormcrawler.net/){:target="_blank"}, one of the leading open-source solutions for web crawling. Used by numerous companies all over the world, it is both *scalable and highly configurable*. We can help you customise [StormCrawler](http://stormcrawler.net/){:target='_blank'} and run it on your premises or in the cloud, or, alternatively, DigitalPebble can run it on your behalf.
/images/search-result.png
Search
We have a large experience of using leading search tools such as [Elasticsearch](https://www.elastic.co/elasticsearch/){:target='_blank'}, [OpenSearch](https://opensearch.org/){:target='_blank'} or [Apache SOLR](https://solr.apache.org/){:target='_blank'}. Whether you want to index and search texts or any other type of documents, we can help you to design a *search solution* to fit with the rest of your architecture. Some of our clients have billions of documents indexed, and with our solid background in *Natural Language Processing* and *Machine Learning*, there is a lot we can do to enrich your documents.
/images/big-data-new.png
Big Data
Processing data on a large scale either in streaming or batch can be done with platforms such as [Apache Flink](https://flink.apache.org/){:target="_blank"} or [Apache Storm](https://storm.apache.org/){:target="_blank"}. In fact, we have built some of our [open source](https://github.com/digitalpebble){:target='_blank'} solutions on these platforms and have a large experience of using them for our clients. Combined with our know-how and *expertise* in cloud computing, we are confident we can help you deliver your project, no matter how much data you have.
/images/julien-nioche.png
Julien Nioche - Director
Having studied Russian language and culture in Paris and taught French in a school in Kyiv, Ukraine, Julien went on to graduate in Text Engineering and Natural Language Processing. He moved to the UK to work as a researcher at the University of Sheffield in 2005 and founded DigitalPebble in 2008. Julien has been involved in several open source projects, mainly at the [Apache Software Foundation](https://apache.org/){:target='_blank'}, and was the PMC chair for [Apache Nutch](https://nutch.apache.org/){:target='_blank'}. He is a member of the Apache Software Foundation. Julien runs *workshops* on web crawling, speaks at [conferences](https://www.youtube.com/playlist?list=PLiqxzwp5B4ZmK1VDjSsPajYxsnFWEvWQa){:target='_blank'} and reviews technical books. He has over 20 years experience in the Java programming language.
References
/images/polecat.svg
GET IN TOUCH
contact@digitalpebble.com
homepage