Reporting on Ingested Data in MyTardis

mt-e-k.png

Recently MyTardis team at Monash University has developed a mechanism to populate a summary of ingested data in Elasticsearch search engine for further visualisation and reporting using Kibana service.

Developed dashboard in Kibana has a number of metrics and will allow the team to monitor incoming data at top level as well as provide capability to dig down into individual statistics per facility, instrument, uploader or allocated storage.

Up to the date Monash University implementation of MyTardis ingested more than 17 million data files with a total size of 1.3PB arranged in 70,000 datasets, which were acquired during 7,600 experiments and collected from 117 instruments.

Whilst the reporting tool is lighting fast, it does not affect MyTardis performance at all. Initial data population and its daily amendment is organised using a custom script, which connects to MyTardis database directly, fetches minimum required information and passes it in bulk to Elasticsearch for indexing and storage. Kibana service is interacting with Elasticsearch, independently from MyTardis installation.

The code and example of its implementation in Kubernetes are released as open-source at https://github.com/mytardis/es-reporting