This blog post is a short cheat sheet about how to save Elasticsearch query results with paging in a bash script automation. The automation does download all the query results in JSON files as long the paging returns results in the hits of the return JSON. The files are numbered 01.json to XX.json.
Here are the steps of the bash script automation:
- Define environment variables for how to access your Elasticsearch server and index
- Define your Elasticsearch query.
- Invoke the curl command to get the first return result.
- If the return result contains a page-id invoke the next result with a curl command to download the next page of the full result.
- Repeat the download steps as long there is no page id or no hits in the return value.
The following code is an example template for the an .env file used for the automation.
export ELASTIC_SEARCH_URL=YOUR_ELASTIC_SERVER
export ELASTIC_SEARCH_INDEX=YOUR_INDEX
export ELASTIC_SEARCH_USER=YOUR_USER
export ELASTIC_SEARCH_PASSWORD=YOUR_PASSWORD
The following code is an example bash automation. With following sections:
- Query data from an index and save the first result.
- Identify scroll_id/hits and save the data in the next result.
- Loop pages until there is no scroll id or there are no hits available
#!/bin/bash
source ./.env
export PAGE="01.json"
echo "***************************"
echo "1. Query data from an index and save the first result."
curl -X POST \
-u $ELASTIC_SEARCH_USER:$ELASTIC_SEARCH_PASSWORD \
"$ELASTIC_SEARCH_URL$ELASTIC_SEARCH_INDEX/_search?scroll=50m" \
-H "Content-Type: application/json" \
-d ' {
"size" : 1000,
"query": {
"bool": {
"must": {
"match_all": {}
},
"should":[
{ "term": {
"prod_Id": "YOUR_FIRT_PRODUCT_IC"
}
},
{ "term": {
"prod_Id": "YOU_SECOND_PRODUCT_ID"
}
}
],
"filter": [
{
"term": {
"vers_Id": "YOUR_VERSION"
}
},
{
"term": {
"status": "YOUR_STATUS"
}
},
{
"term": {
"language": "english"
}
}
]
}
}
}
' | jq '.' > $PAGE
echo "***************************"
echo "2. Identify scroll_id/hits and save the data in the next result."
SCROLL_ID=$(cat "$PAGE" | jq -c '._scroll_id')
HITS=$(cat "$PAGE" | jq -c '.hits.hits[]')
i=2
export PAGE=02.json
echo "***************************"
echo "3. Loop pages until there is no scroll id or there are no hits available."
if [[ -z ${SCROLL_ID} ]] || [[ -z ${HITS} ]]; then
echo "EXIT script: No 'scroll_id' or 'hits' are given."
exit 1
fi
while :
do
echo "Download page $PAGE"
curl -X POST \
-u $ELASTIC_SEARCH_USER:$ELASTIC_SEARCH_PASSWORD \
"${ELASTIC_SEARCH_URL}_search/scroll" \
-H "Content-Type: application/json" \
-d "{ \"scroll\" : \"50m\", \"scroll_id\" : $SCROLL_ID}" | jq '.' > $PAGE
((i=i+1))
SCROLL_ID=$(cat "$PAGE" | jq -c '._scroll_id')
HITS=$(cat "$PAGE" | jq -c '.hits.hits[]')
#echo "--- hits for page ${i} - BEGIN---"
#echo "${HITS}"
#echo "--- hits for page ${i} - END---"
#echo "--- scroll_id for page ${i} - BEGIN---"
#echo "${SCROLL_ID}"
#echo "--- scroll_id for page ${i} - END---"
if [[ -z ${SCROLL_ID} ]] || [[ -z ${HITS} ]]; then
rm $PAGE
echo "------END-----"
break
else
echo "--------------"
fi
if ((i<10));then
export PAGE="0${i}.json"
else
export PAGE="${i}.json"
fi
done
((i=i-1))
echo "Result: ${i} pages were downloaded."
I hope this was useful to you, and let’s see what’s next?
Greetings,
Thomas
#elasticsearch, #bashscripting, #cheatsheet, #development, #automation

Leave a comment