4/14/2014 - 5:20 PM

Elasticsearch configuration for high sustainable bulk feed

Rendered
Source

Elasticsearch configuration for high sustainable bulk feed

Test on single node, MacBook Pro, 16 GB RAM, 1TB SSD, OS X Maverick

ES 1.1.0 with Java 8, G1 GC, 12 GB heap

/Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseG1GC -Delasticsearch -Des.foreground=yes -Des.path.home=/Users/es/elasticsearch-1.1.0 -cp :/Users/es/elasticsearch-1.1.0/lib/elasticsearch-1.1.0.jar:/Users/es/elasticsearch-1.1.0/lib/:/Users/es/elasticsearch-1.1.0/lib/sigar/ org.elasticsearch.bootstrap.Elasticsearch

Node

no bloom filter cache
concurrent merge scheduler
max 4 threads for merge, also for optimize API
max 4 segments per tier
max 1gb segment size
1/3 of heap for index buffer
for SSD, disable store throttling

adjust merge and bulk thread pools

  index:
     codec:
       bloom:
         load: false
     merge:
       scheduler:
         type: concurrent
         max_thread_count: 4
       policy:
         type: tiered
         max_merged_segment: 1gb
         segments_per_tier: 4
         max_merge_at_once: 4
         max_merge_at_once_explicit: 4
  indices:
     memory:
       index_buffer_size: 33%
     store:
       throttle:
         type: none
  threadpool:
    merge:
      type: fixed
      size: 4
      queue_size: 32
    bulk:
      type: fixed
      size: 8
      queue_size: 32

Index

1 shard
0 replica

no refresh interval (-1)

  index.number_of_shards: 1
  index.number_of_replica: 0
  index.refresh_interval: -1

Mapping

Mapping for string texts: all norms, freqs can be disabled because of the nature of the input data

  "mappings" : {
    "_default_" : {
      "dynamic_templates" : [
          {
              "string_template" : {
                    "match_mapping_type" : "string",
                    "path_match" : "*",
                    "mapping" : {
                        "type" : "string",
                        "norms" : { "enabled" : false },
                        "index_options" : "docs"
                    }
              }
          }
      ]
    }
  }

Bulk

Java API, single TransportClient instance
BulkProcessor
bulk size 3000 docs (~ 2 MB)
max 4 concurrent threads
no flush interval, no flush volume

Cacher is the code snippet organizer for pro developers

We empower you and your team to get more done, faster