Metadata Dumps

How to automatically create metadata dumps

ndjson Metadata Dump

There is the possibility to provide a ndjson metadata dump at /resources/dumps/oer_data.ndjson. Just configure search_index_features_create_dump: true to activate this. Per default, the dump is created after the weekly cleanup and reindex of the metadata index.

The dumps are accessible like described here.

Additional metadata dumps

You can also provide additional metadata dumps in other formats. For this, you need to provide a zip-Archive that contains anything to process the transformation from the ndjson dump into your format. The (extracted) artifact has to contain a transform.sh shell-script with parameters <INPUT-FILE> <OUTPUT-FILE> (INPUT-FILE is the ndjson-file and OUTPUT-FILE is the dump-file your process creates) that processes the transformation.

To install this via Ansible you need to add an entry to search_index_dump_transformations:

  • name - (required) unique name of your transformation process
  • artifact_url - (required) the url to download your zip-archive during the installation
  • output_file_extension - (required) the file extension of your dump
  • schedule_weekday, schedule_hour, schedule_minute - (optional) schedule when to create your dump. Per default after the weekly ndjson dump creation

example:

search_index_dump_transformations:
  - name: oersi-marc
    artifact_url: https://gitlab.com/oersi/oersi-marc/-/jobs/artifacts/main/download?job=deploy
    output_file_extension: "mrc.xml"