4.0.0
Hello Aleph community! We’re excited to announce Aleph 4.0.0, a release focused on powerful new features, performance improvements, and expanded options for investigation sharing and user metrics. In addition, this release includes a few other small enhancements, bug fixes and dependency upgrades.
🚀 Bigger Changes 🚀
- RabbitMQ based task queueing backend
- Configurable AlephWorker Stages
- Priority Buckets for Processing
- System Status Page Enhancements
- Updated Prometheus Metrics
- Documentation Restructure and Enhancements
- Improved Error Handling in Elasticsearch Upgrades
As always, we’d love to hear your feedback to keep improving. Feel free to reach out and share your thoughts!
What's Changed
Features
RabbitMQ
4.0.0 introduces a change to the way background tasks are scheduled. Previously Aleph used a Redis-based task queue, which was well designed but showed its limitations with large payloads and a risk of data loss. RabbitMQ queues are persisted to disk, but the flexibility in the way messages are queued, routed and fetched allows for certain optimizations which Aleph benefits from because of the widely varying degree of task loads.
Migration notes from Redis to RabbitMQ
Due to the significant changes in terms of task status persistence, switching between Aleph versions with RabbitMQ and Redis-based task queues requires some manual steps in order to ensure data consistency.
Perform the following steps every time you are either upgrading to a version with the RabbitMQ task queue or rolling back to the Redis-based task queue:
- Let all pending jobs run to completion (check the status page).
- Put Aleph into maintenance mode.
- Stop all workers (
worker
,ingest-file
processes). - (optional) Save the current state of redis in case you want to roll back using the BGSAVE command.
- Clear Redis (by issuing FLUSHDB from
redis-cli
from theredis
container). If you get the error message "Unknown command FLUSHDB" then this command is disabled and you can resort to this shell invocation:echo 'KEYS *' | redis-cli | grep -v '^aleph:' | sed 's/^/DEL /' | redis-cli
. - (optional, if previous versions had conflicting RabbitMQ queue settings) Delete existing queues using
rabbitmqctl delete queue {ingest,pruneentity,updateentity,exportxref,analyze,flushmapping,reingest,exportsearch,index,xref,reindex,loadmapping}
. NOTE: queues are named after the stages found in ALEPH_WORKER_STAGES. - Perform the upgrade or rollback to the desired version of Aleph.
- Ensure that all expected processes have started correctly.
Related changes:
- Dynamically set AlephWorker stages through env vars by @catileptic in #3748
- Completely remove network diagram embeds feature by @tillprochaska in #3751
- Feature: Priority buckets by @stchris in #3784
- Separate index worker from other stages in aleph-worker by @stchris in #3817
Prometheus metrics
We have extended the Prometheus metrics exposed by Aleph to provide more information about active users and the data in your Aleph instance. For example, you can now query for the number of active users within the past 30 days or the number of investigations related to a particular language. For details about the available metrics please refer to the metrics reference in the technical documentation.
- New and updated Prometheus metrics by @tillprochaska in #3844
- Update Prometheus metrics reference by @tillprochaska in #3845
- Fix active users metric by @tillprochaska in #3852
- Fix edge cases in custom metrics by @tillprochaska in #3861
Sharing investigations
Due to the sensitive nature of dataset access we have made some changes to the way datasets are shared, no longer allowing email addresses to autocomplete. This means one needs to know the exact email address of another user if they want to share an investigation.
- Feature: Allow sharing of investigations by @tillprochaska in #3865
- Remove sharing options from create investigation screen by @stchris in #3862
- Multiple small UX enhancements related to investigation sharing/user suggestion component by @tillprochaska in #3868
Other new features
- Display start and last updated timestamp on system status page by @tillprochaska in #3788
- Display an error message for blocked users by @tillprochaska in #3560
- aleph CLI command to downgrade the postgres DB by @stchris in #3858
Bug fixes and other changes
- Use default language when Accept-Language header is '*' by @stchris in #3724
- Exit op_index early in Aleph Worker by @catileptic in #3781
- Automatically post releases to Discourse by @stchris in #3728
- Fix phone numbers used in tests by @tillprochaska in #3847
- Remove bookmarks migration by @tillprochaska in #3752
- Fix docker compose command by @tillprochaska in #3843
- Improve date formats on status page by @tillprochaska in #3841
- Aleph upgrade will throw an exception if any ES call returns a status code < 399 by @catilepticin #3859
- Fix ES index upgrades when using index aliases by @tillprochaska in #3863
Documentation updates
- Docs: Restructure tech docs by @tillprochaskain #3569
- Misc documentation enhancements by @tillprochaska in #3819
- Docs: added a small amount of text for people looking to get started but are not aware of gitflow by @Rosencrantz in #3707
- Docs: link to download raw docker compose files instead of HTML by @vsessink in #3778
- Docs: Improved order of commands for first time setup by @vsessink in #3779
- Docs: Document how to set up MinIO in a development environment by @tillprochaska in #3857
- Docs: Document how to download files using alephclient by @tillprochaska in #3848
- Update form links in docs by @tillprochaska in #3850
- Update form links on about page etc. by @tillprochaska in #3851
Dependency updates
- Bugfix/downgrade authlib in 3.16.0 by @stchris in #3574
- Bump gunicorn[eventlet] from 21.2.0 to 22.0.0 by @dependabot in #3689
- Bump flask and authlib (as required to run flask 3+) by @stchris in #3732
- Bump react-pdf from 5.7.2 to 7.7.3 in /ui by @dependabot in #3726
Full Changelog: 3.17.0...4.0.0