-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apache Solr for distributed indexing #27
Comments
Consider using Apache Solr (+ Apache Lucene + Apache Zookeeper), it's widely use and its easier to do full-text search even in docs like xml and PDFs. Since we're already using Apache Kafka and most likely Apache Hadoop and Hive, we still must use Apache Zookeeper for sharded clusters management anyway. |
Now reconsidering using Elasticsearch. Here is the full explanation: In addition, if we chose Solr anyways, we would have to manage all EC2 instances along with AutoScaling groups, Service Discovery registry and Load Balancing. Hence, we know this will eventually come with a price and we still consider an Apache Solr migration in a future for certain services that could take an advantage of the trade-offs Apache Solr has to offer (like fine-grained search on media/blob service's files). |
We should use the CQRS pattern whenever robust querying is needed from now on. Why? In the other hand, for the query side, we could use a fine-grained indexation/querying database system like the ones we've been discussing before and get the benefits of full text search in NRT (Near Real Time). Thus, CQRS gives us a way to model complex business logic, avoiding the use of simple CRUD operations. Here is a diagram that explains this pattern too with a single data source. This could be used whenever a complex domain modeling is needed (avoiding simple CRUD operations). Now, this diagram shows us the way we could be using this pattern in order to segregate our data sources into particular data sets that can be scaled independently. This could be used whenever complex querying is needed like Media and Author services. Finally, there is another way to handle our data using CQRS. That way is using Event Sourcing (ES), which aggregates data instead of replacing it like conventional ways. It's events are immutable as in real life events are. This is highly used when data is constantly changing and you must keep track of it for different reasons. Moreover, ES brings us the possibility of easy rollback because we can trace the previous state of our aggregate using the event store, so we could rebuild aggregates. By the time, we are still turning down the need of full Event Sourcing (ES) since we already have a way to successfully handle distributed transactions (SAGA). However, we know we still must tune our transaction system even more to be completely resilient and simple (using Orchestration instead Choreography, tuning the Event Server from Alexandria Core to use distributed tracing, metrics, circuit breakers, retry policies and logging by default using the chain of responsibility pattern for example). Although we take advantage of an Event-Driven ecosystem to communicate to other services asynchronously, we are still evaluating the need of a dedicated event store, mostly for rollbacks/compensating operations. More information can be found here. |
Is your feature request related to a problem? Please describe.
Yes, right now distributed data indexation is impossible and we still need to implement complex query building patterns in order to satisfy our indexation standards.
Describe the solution you'd like
Currently, we use Redis as cache and PostgreSQL as main RDBMS, yet for full-text searching we need something more powerful and easy to implement. Using Apache Solr as distributed indexing database for the job is one of the best choices we have, even more if we are already using Apache Zookeeper (required by Solr) and Apache Kafka.
Describe alternatives you've considered
Since we might use EKS (e.g. Elasticsearch) for other inquiries, we should consider using the ElasticStack.
Additional context
No
The text was updated successfully, but these errors were encountered: