Search API
OERSI provides an open API based on JSON indexed in Elasticsearch.
Metadata profile
The OERSI-internal metadata profile largely matches with the General Metadata Profile for Educational Resources (Allgemeines Metadatenprofil für Bildungsressourcen, AMB).
API Usage Policy
When using the OERSI API, please send a meaningful, recurring string as a User-Agent in the HTTP request header. This allows us to identify usage patterns in the statistical analysis of the API and improve our services from the insights gained. It also allows us to contact you, if needed.
The generic format of the User Agent is <client name> (<contact information, for example Email, Service-address>) <library/framework name>/<version>
. Parts that are not applicable can be omitted.
If you use an automated client that accesses the API directly: Do not copy a browser’s user agent for your bot. Do not use generic agents such as “curl”, “python-requests/x”,…
Examples
import requests
url = 'https://oersi.org/api/...'
headers = {'User-Agent': 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)'}
response = requests.get(url, headers=headers)
curl --user-agent "MyBot/1.0 (https://example.org/mybot/; mybot@example.org)" https://oersi.org/api/...
Endpoint Overview
You can query the data from the search index on-the-fly and use it directly in your application, as well as download the data in bulk for further use via PIT.
The metadata of the resources of OERSI are stored in the Elasticsearch index at
https://oersi.org/api/search/oer_data/
For information on using the Elasticsearch API, see
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html
Search a fixed data set with PIT (point in time)
The most stable access to query multiple data-result-sets is with Elasticsearchs PIT (point in time). See https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html
The idea is to initiate a view of the data at a fixed point in time and then query this data in chunks using the same point in time. Data changes between searches have no effect on the pit-search. This is useful for bulk downloads of data in which the data is downloaded continuously by a process without a break between the requests. For other use cases, use _search
directly without PIT.
API description
_search
Build your Elasticsearch Query, submit it to the OERSI metadata search endpoint and use the results directly.
POST
/api/search/oer_data/_search
Request body
(required, application/json) Elasticsearch search request body including query. See here
Responses
http code | content-type | response |
---|---|---|
200 |
application/json |
metadata search result |
Example cURL
curl -X 'POST' 'https://oersi.org/api/search/oer_data/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":20,"from":0,"query": { "multi_match": { "query": "Klimawandel", "fields": ["name", "description", "keywords"]}},"sort": [{"id":"asc"}]}'
_pit (point in time)
Create and delete Elasticsearchs PIT (point in time).
POST
/api/search/oer_data/_pit
Parameters
name | type | data type | description |
---|---|---|---|
keep_alive |
(query) required | string | How long the time to live of the point in time should be. The value (e.g. 1m, see Time units) does not need to be long enough to process all data - it just needs to be long enough for the next request. |
Responses
http code | content-type | response |
---|---|---|
200 |
application/json |
the created point in time |
Example cURL
curl -X 'POST' 'https://oersi.org/api/search/oer_data/_pit?keep_alive=1m&pretty' \
-H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)'
DELETE
/api/search/_pit
Request body
(required, application/json)
name | type | data type | description |
---|---|---|---|
id |
(field) required | string | the id of the pit to delete |
Responses
http code | content-type | response |
---|---|---|
200 |
application/json |
pit delete result |
Example cURL
curl -X DELETE https://oersi.org/api/search/_pit \
-H 'Content-Type: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"id":"<YOUR_PIT_ID>"}'
_search with point in time
Use a point in time to query the data in chunks.
Combine with Elasticsearch Query as described above for _search
.
Notes:
- You do not need to specify the metadata index name in the parameters, as the point in time is already bound to the metadata index.
- Create a pit first and use the pit id in the search request.
- Delete the pit after you are done with the search.
- Repeat the ongoing search until no more hits are found.
POST
/api/search/_search
Request body
(required, application/json) Elasticsearch search request body including query. See here
required fields:
name | type | data type | description |
---|---|---|---|
pit |
(field) required | object | the information about the pit |
pit.id |
(field) required | string | the id of the pit |
pit.keep_alive |
(field) required | string | How long the next time to live of the point in time should be (e.g. 1m, see Time units). |
sort |
(field) required | array | the sorting of the results (e.g. [{"id":"asc"}] , see Sort search results |
search_after |
(field) not required for the initial search, but required for ongoing searches | array | the starting point for ongoing searches. this is the last “sort” entry of the last hit of the last result list |
Responses
http code | content-type | response |
---|---|---|
200 |
application/json |
metadata search result |
The response contains the hits and the “sort” entry of the last hit. This entry is needed for the next search. It looks like
"sort" : [
"https://resource.identifier",
35297
]
}
]
}
}
Example cURL
curl -X 'POST' 'https://oersi.org/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true}'
curl -X 'POST' 'https://oersi.org/api/search/_search?pretty' \
-H 'Content-Type: application/json' -H 'accept: application/json' --user-agent 'MyBot/1.0 (https://example.org/mybot/; mybot@example.org)' \
-d '{"size":1000,"query": {"match": {"mainEntityOfPage.provider.name": "twillo"}},"pit": {"id": "<YOUR_PIT_ID>", "keep_alive": "1m"}, "sort": [{"id":"asc"}], "track_total_hits": true, "search_after": <YOUR_LAST_SORT_RESULT>}'