User Tools

Site Tools


development:elasticsearch:elasticsearch

This is an old revision of the document!


Re-Index ElasticSearch

There are potentially several situations where you may need to recreate the current Elastic Search index. This article is being written to handle one of the most common causes related to field data-type mapping.

In a traditional relational database, schema definitions can be altered at runtime using ALTER TABLE commands. With ElasticSearch, things are different as there is no predefined schema. ElasticSearch instead dynamically creates data-type mapping for the current index as data arrives and is persisted. Typically, we use time-based indices, so every month, a new index is created. In the case of the MDC, for example, December 2021 performance data would have an index titled 'rdf_performance_2021-12', and January 2022 would start on a new index titled 'rdf_performance_2022-01', etc…

If such an affecting data-type change is released after the first of the month, this can lead to errors in the MDC Tracker log file (~/logs/spring-boot/tracker.log) such as:

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [parameters.winlink1000OduAirSesState.currentValue] of type [float] in document with id 'HCKGgn0BEcj44NiWqUh6'. Preview of field's value: 'scanning']

This error implies that the existing data mapping for the current ElasticSearch index was expecting data of type 'float' for 'parameters.winlink1000OduAirSesState.currentValue', but instead received the String value 'scanning'.

To resolve this situation, the basic process is to look up existing indices, create a new schema with some manually specified data-type mapping, copy the existing index data to the new index, and then delete the old index. Commands to perform this can be executed in a terminal connected to the 'esk' server instance, or wherever elasticsearch is running for the affected environment.

To begin, log into the server running elasticsearch via SSH in a termanal window (i.e. newtowerqaesk1.err or the like). Then: 1. Retrieve existing indices: RUN:

GET:

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

yellow open rdf_performance_2021-12 FrgUteO8SwWYAjr_bBzeTA 1 1 638 0 1mb 1mb

yellow open rdf_performance_2021-11 voMR0stFQtKPQjbHgANC1Q 1 1 264 0 422.2kb 422.2kb

yellow open rdf_performance_2021-10 2-mHh39YQ5KcKProClWfew 1 1 368 0 296.8kb 296.8kb

yellow open rdf_performance_2021-09 tzSCZQ4oT7i5QwTxlgrqGw 1 1 588 0 1.3mb 1.3mb

yellow open rdf_performance_2021-08 REg32Gq7SgKTVb8yf1WIRQ 1 1 1206 0 1.1mb 1.1mb

yellow open rdf_performance_2021-07 4woX_ouOQBiXDn9lTwlBwg 1 1 1427 0 869.1kb 869.1kb

yellow open rdf_performance_2021-06 U_crwMvPQdWVOjDWkWSv9Q 1 1 1790 0 1.6mb 1.6mb

yellow open rdf_performance_2021-05 z1-fgYvcRvmQy49CKE6dYQ 1 1 3450 0 9.1mb 9.1mb

yellow open rdf_performance_2020-12 3RhZur4bSGStlHCaJ2dTRQ 1 1 141 0 585.8kb 585.8kb

yellow open rdf_configuration_2021-12 uOZyT7emTiKf2b5mzi1A1A 1 1 9 8 64.7kb 64.7kb

yellow open rdf_configuration_2021-09 XOkql2N3SZOGwwnxN7Meqw 1 1 164 36 389.9kb 389.9kb

yellow open rdf_configuration_2021-08 O4QHGSQsSkOY_ZGjMUq3xw 1 1 28 5 82kb 82kb

yellow open rdf_configuration_2021-05 Yi1SfMwvRAi_CvSvWfOWtA 1 1 842 69 553.4kb 553.4kb

yellow open rdf_configuration_2020-12 AS5lie_AQimpwOdEvr53gA 1 1 8 60 129.5kb 129.5kb

green open .tasks p9CUjGXYTbuR7R_frdGniA 1 0 1 0 6.3kb 6.3kb

green open .kibana_2 aU5XyQEdT52d-mIx1zUapA 1 0 3 0 60.8kb 60.8kb

green open .kibana_1 4Y2825ANSFKddgNQeKsINg 1 0 1 0 4kb 4kb

2. Create a new index for the data. In this example, we want to update mapping for: 'parameters.winlink1000OduAirSesState.currentValue' and 'parameters.winlink1000OduAirSesState.newValue'. RUN:

curl -X PUT \http://localhost:9200/rdf_performance_2021-12_v2 \-H 'Content-Type: application/json' \-d '{

  "mappings": {
      "properties": {
             "parameters.winlink1000OduAirSesState.currentValue": {
                  "type": "text"
              },
              "parameters.winlink1000OduAirSesState.newValue": {
              	"type": "text"
              }
      }
  }

}'

GET:

{“acknowledged”:true,“shards_acknowledged”:true,“index”:“rdf_performance_2021-12_v3”}

3. Transfer data into new index, very important. Do not proceed if this step fails. If it does for any reason, start over again with a new index name to transfer to. RUN:

curl -X POST \

  http://localhost:9200/_reindex \
  -H 'Content-Type: application/json' \
  -d '{
  "source": {
      "index": "rdf_performance_2021-12"
  },
  "dest": {
      "index": "rdf_performance_2021-12_v2"
  }

}'

GET:

{“took”:927,“timed_out”:false,“total”:638,“updated”:0,“created”:638,“deleted”:0,“batches”:1,“version_conflicts”:0,“noops”:0,“retries”:{“bulk”:0,“search”:0},“throttled_millis”:0,“requests_per_second”:-1.0,“throttled_until_millis”:0,“failures”:[]}

4. Verify (This step does not always work for some reason, though it is ok if it fails) RUN:

curl -X GET \

'http://localhost:9200/rdf_performance_2021-12_v2/_search?scroll=10m&size=50' \ -H 'Content-Type: application/json' \ -d '{

  "query" : {
      "match_all" : {}
  }

}'

5. Delete old index since the transfer in step 3 went well. Again, if the transfer failed, do not proceed, start over. RUN:

GET:

{“acknowledged”:true}

6. Add new index finalized with an alias to the previous in case anything is specifically looking for the old key… which should not happen… but just to be safe. RUN:

curl -X POST \ http://localhost:9200/_aliases \ -H 'Content-Type: application/json' \ -d '{

  "actions": [
      {
          "add": {
              "index": "rdf_performance_2021-12_v2",
              "alias": "rdf_performance_2021-12"
          }
      }
  ]

}'

GET:

{“acknowledged”:true}

7. ALL DONE!

development/elasticsearch/elasticsearch.1639637790.txt.gz · Last modified: 2021/12/16 06:56 by slawrence