yehosef
4/14/2014 - 5:20 PM

Elasticsearch configuration for high sustainable bulk feed

Elasticsearch configuration for high sustainable bulk feed

Elasticsearch configuration for high sustainable bulk feed

Test on single node, MacBook Pro, 16 GB RAM, 1TB SSD, OS X Maverick

ES 1.1.0 with Java 8, G1 GC, 12 GB heap

/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseG1GC -Delasticsearch -Des.foreground=yes -Des.path.home=/Users/es/elasticsearch-1.1.0 -cp :/Users/es/elasticsearch-1.1.0/lib/elasticsearch-1.1.0.jar:/Users/es/elasticsearch-1.1.0/lib/:/Users/es/elasticsearch-1.1.0/lib/sigar/ org.elasticsearch.bootstrap.Elasticsearch

Node

  • no bloom filter cache

  • concurrent merge scheduler

  • max 4 threads for merge, also for optimize API

  • max 4 segments per tier

  • max 1gb segment size

  • 1/3 of heap for index buffer

  • for SSD, disable store throttling

  • adjust merge and bulk thread pools

      index:
         codec:
           bloom:
             load: false
         merge:
           scheduler:
             type: concurrent
             max_thread_count: 4
           policy:
             type: tiered
             max_merged_segment: 1gb
             segments_per_tier: 4
             max_merge_at_once: 4
             max_merge_at_once_explicit: 4
      indices:
         memory:
           index_buffer_size: 33%
         store:
           throttle:
             type: none
      threadpool:
        merge:
          type: fixed
          size: 4
          queue_size: 32
        bulk:
          type: fixed
          size: 8
          queue_size: 32
    

Index

  • 1 shard

  • 0 replica

  • no refresh interval (-1)

      index.number_of_shards: 1
      index.number_of_replica: 0
      index.refresh_interval: -1
    

Mapping

  • Mapping for string texts: all norms, freqs can be disabled because of the nature of the input data

      "mappings" : {
        "_default_" : {
          "dynamic_templates" : [
              {
                  "string_template" : {
                        "match_mapping_type" : "string",
                        "path_match" : "*",
                        "mapping" : {
                            "type" : "string",
                            "norms" : { "enabled" : false },
                            "index_options" : "docs"
                        }
                  }
              }
          ]
        }
      }
    

Bulk

  • Java API, single TransportClient instance
  • BulkProcessor
  • bulk size 3000 docs (~ 2 MB)
  • max 4 concurrent threads
  • no flush interval, no flush volume