Search API

How to query OERSI metadata

OERSI provides an open API based on JSON indexed in Elasticsearch.

Metadata profile

The OERSI-internal metadata profile largely matches with the General Metadata Profile for Educational Resources (Allgemeines Metadatenprofil für Bildungsressourcen, AMB).

API Usage Policy

When using the OERSI API, please send a meaningful, recurring string as a User-Agent in the HTTP request header. This allows us to identify usage patterns in the statistical analysis of the API and improve our services from the insights gained. It also allows us to contact you, if needed.

The generic format of the User Agent is <client name> (<contact information, for example Email, Service-address>) <library/framework name>/<version>. Parts that are not applicable can be omitted.

If you use an automated client that accesses the API directly: Do not copy a browser’s user agent for your bot. Do not use generic agents such as “curl”, “python-requests/x”,…

Examples

import requests

url = 'https://oersi.org/api/...'
headers = {'User-Agent': 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)'}
response = requests.get(url, headers=headers)
curl --user-agent "MyBot/1.0 (https://example.org/mybot/; mybot@example.org)" https://oersi.org/api/...

Endpoint Overview

You can query the data from the search index on-the-fly and use it directly in your application, as well as download the data in bulk for further use via PIT.

The metadata of the resources of OERSI are stored in the Elasticsearch index at

https://oersi.org/api/search/oer_data/

For information on using the Elasticsearch API, see

Search a fixed data set with PIT (point in time)

The most stable access to query multiple data-result-sets is with Elasticsearchs PIT (point in time). See https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html

The idea is to initiate a view of the data at a fixed point in time and then query this data in chunks using the same point in time. Data changes between searches have no effect on the pit-search. This is useful for bulk downloads of data in which the data is downloaded continuously by a process without a break between the requests. For other use cases, use _search directly without PIT.

API description

Build your Elasticsearch Query, submit it to the OERSI metadata search endpoint and use the results directly.

POST /api/search/oer_data/_search

Request body

(required, application/json) Elasticsearch search request body including query. See here

Responses
http code content-type response
200 application/json metadata search result
Example cURL
curl -X 'POST' 'https://oersi.org/api/search/oer_data/_search?pretty' \
  -H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
  -d '{"size":20,"from":0,"query": { "multi_match": { "query": "Klimawandel", "fields": ["name", "description", "keywords"]}},"sort": [{"id":"asc"}]}'

_pit (point in time)

Create and delete Elasticsearchs PIT (point in time).

POST /api/search/oer_data/_pit

Parameters
name type data type description
keep_alive (query) required string How long the time to live of the point in time should be. The value (e.g. 1m, see Time units) does not need to be long enough to process all data - it just needs to be long enough for the next request.
Responses
http code content-type response
200 application/json the created point in time
Example cURL
curl -X 'POST' 'https://oersi.org/api/search/oer_data/_pit?keep_alive=1m&pretty' \
  -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)'
DELETE /api/search/_pit
Request body

(required, application/json)

name type data type description
id (field) required string the id of the pit to delete
Responses
http code content-type response
200 application/json pit delete result
Example cURL
curl -X DELETE https://oersi.org/api/search/_pit \
  -H 'Content-Type: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
  -d '{"id":"<YOUR_PIT_ID>"}'

_search with point in time

Use a point in time to query the data in chunks.

Combine with Elasticsearch Query as described above for _search.

Notes:

  • You do not need to specify the metadata index name in the parameters, as the point in time is already bound to the metadata index.
  • Create a pit first and use the pit id in the search request.
  • Delete the pit after you are done with the search.
  • Repeat the ongoing search until no more hits are found.
POST /api/search/_search
Request body

(required, application/json) Elasticsearch search request body including query. See here

required fields:

name type data type description
pit (field) required object the information about the pit
pit.id (field) required string the id of the pit
pit.keep_alive (field) required string How long the next time to live of the point in time should be (e.g. 1m, see Time units).
sort (field) required array the sorting of the results (e.g. [{"id":"asc"}], see Sort search results
search_after (field) not required for the initial search, but required for ongoing searches array the starting point for ongoing searches. this is the last “sort” entry of the last hit of the last result list
Responses
http code content-type response
200 application/json metadata search result

The response contains the hits and the “sort” entry of the last hit. This entry is needed for the next search. It looks like

        "sort" : [
          "https://resource.identifier",
          35297
        ]
      }
    ]
  }
}
Example cURL
curl -X 'POST' 'https://oersi.org/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true}'
curl -X 'POST' 'https://oersi.org/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true, "search_after": <YOUR_LAST_SORT_RESULT>}'