If we can help in any way, please don't hesitate to set a time to meet or talk, or leave your details and we'll get back to you.
OTW’s Neo4j database was facing significant performance bottlenecks due to an ever-growing dataset. The organisation required a solution that would archive data older than 45 days without compromising the database's integrity. This archival solution had to ensure that older data could be offloaded efficiently into MongoDB, freeing up space in Neo4j and optimising performance. Additionally, there was a need to implement an automatic data purge mechanism in Neo4j to ensure only a rolling 45-day data retention policy was maintained.
To address the challenges, a custom automated archival solution was developed using Python. This solution dynamically fetched configurations from an S3 bucket, securely retrieved credentials from Vault, and enabled batch processing to archive large volumes of data from Neo4j to MongoDB. The solution also implemented an automatic purge of data from Neo4j post-archival to ensure compliance with the rolling retention policy.
Automated retrieval of configurations ensures that the solution adapts to changing needs without manual intervention.
Sensitive credentials required for accessing databases are securely stored and retrieved using Vault, ensuring data security.
This allows for efficient handling of large datasets, making the archival process scalable.
Ensures that only necessary data is retained in Neo4j, preventing unnecessary storage consumption.
Comprehensive error handling and logging provide insights and allow for quick resolution of issues.
CI/CD pipelines streamline deployment, ensuring rapid and reliable updates.
CronJobs in Kubernetes enables the automated and periodic execution of archival processes, ensuring regular data management.
Provides flexibility to adapt to different storage and retention needs.
Python was the primary language used for scripting and automation.
Neo4j served as the source database, while MongoDB was used as the archival destination.
The solution dynamically retrieved configurations from an S3 bucket.
Vault provided a secure mechanism for credential management.
Kubernetes was used for orchestrating the deployment across production and non-production namespaces.
GitLab CI was employed for continuous integration and deployment, with Docker images stored in Harbour for version control.
The archival solution is hosted within Kubernetes clusters, ensuring scalability and resilience. OTW maintains separate namespaces for production and non-production environments, minimising interference and enabling thorough testing.
This project required close collaboration with the Neo4j team, who provided essential Cypher queries and valuable insights on efficient data handling. Their expertise in optimising large datasets helped implement the logic for batch processing and ensured the archival process did not impact Neo4j’s performance.
The automated archival solution at OTW has successfully streamlined the management of large datasets. It has significantly improved Neo4j's performance by ensuring older data is efficiently offloaded into MongoDB while maintaining integrity and ease of use. This solution not only addresses immediate storage challenges but also provides a flexible framework for future scalability and evolving data management needs.