Data API
Categories:
2 minute read
Public Data API
You can query the data from OERSI on-the-fly and use it directly in your application, as well as download the data in bulk for further use.
The metadata of the OER of oersi.org are stored in an Elasticsearch index oer_data
. The Elasticsearch API is open and available to everyone (readonly). Search queries should always address the oer_data
index, otherwise conflicts with other indexes may occur.
For information on using the Elasticsearch API, see
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
Data endpoint: https://oersi.org/resources/api/search/
Note: the endpoint https://oersi.org/resources/api-internal/search/ is still available, but deprecated and will be removed in the future. Please use https://oersi.org/resources/api/search/ instead.
Search
Build your Elasticsearch Query, submit it to the OERSI data endpoint and use the results directly.
Example on-the-fly search
Search for “Klimawandel”
curl -X 'POST' 'https://oersi.org/resources/api/search/oer_data/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' \
-d '{"size":20,"from":0,"query": { "multi_match": { "query": "Klimawandel", "fields": ["name", "description", "keywords"]}},"sort": [{"id":"asc"}]}'
Bulk Download
The most stable access is with Elasticsearch’s PIT (point in time). See https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html
Example bulk download
1. Create PIT
curl -X 'POST' 'https://oersi.org/resources/api/search/oer_data/_pit?keep_alive=1m&pretty' -H 'accept: application/json'
=> remember id for further processing (below <YOUR_PIT_ID>
)
2. Search (1000 results each) here at the example twillo
2.1 Search for the first 1000 hits
curl -X 'POST' 'https://oersi.org/resources/api/search/_search?pretty' -H 'Content-Type: application/json' -H 'accept: application/json' -d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true}'
2.2 Determine YOUR_LAST_SORT_RESULT (“sort” entry of the last hit)
...
"sort" : [
"https://oer.identifier",
35297
]
}
]
}
}
2.3 Repeat search until no more hits are found
curl -X 'POST' 'https://oersi.org/resources/api/search/_search?pretty' -H 'Content-Type: application/json' -H 'accept: application/json' -d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true, "search_after": <YOUR_LAST_SORT_RESULT>}'
3. Delete PIT
curl -X DELETE https://oersi.org/resources/api/search/_pit -H 'Content-Type: application/json' -d '{"id":"<YOUR_PIT_ID>"}'
Metadata dump
OERSI provides two metadata dumps which you can download:
- json: https://oersi.org/resources/dumps/oer_data.ndjson
- MARCXML: https://oersi.org/resources/dumps/oer_data.mrc.xml
The dumps are updated once a week after the weekly cleanup.