Akagi201
2/17/2016 - 10:17 AM

[Elastic stack]

[Elastic stack]

ELK Books

Kinaba devops

  • 清理: DELETE /filebeat-*

Elastic

Elasticsearch cluster -> indices -> types -> documents -> fields

Index

Search

  • query string search: simple, limit, for command line.
  • query DSL: rich, flexible, query language.

Cluster

Scale

  • vertical scale/scaling up: bigger servers.
  • horizontal scale/scaling out: more servers.

Shards

  • A shard is a low-level worker unit that holds just a slice of all the data in the index.
  • A shard is a single instance of Lucene, and is a complete search engine in its own right.
  • Our documents are stored and indexed in shards, but our applications don’t talk to them directly. Instead, they talk to an index.
  • A shard can be either a primary shard or a replica shard. Each document in your index belongs to a single primary shard, so the number of primary shards that you have determines the maximum amount of data that your index can hold.
  • A primary shard can technically contain up to Integer.MAX_VALUE - 128 documents
  • A replica shard is just a copy of a primary shard.
  • The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can be changed at any time.
  • By default, indices are assigned five primary shards.

Document

  • document == object
  • In Elasticsearch, the term document has a specific meaning. It refers to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID.
  • Documents live in an index.

Document Metadata

  • _index: Where the document lives.
  • _type: The class of object that the document represents.
  • _id: The unique identifier for the document.
  • A document’s _index, _type, and _id uniquely identify the document.
  • Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID strings.
  • These GUIDs are generated from a modified FlakeID scheme which allows multiple nodes to be generating unique IDs in parallel with essentially zero chance of collision.

Searching

  • Every field in a document is indexed and can be queried.

Search API

  • lite query-string
  • query DSL

Inverted index

  • Elasticsearch uses a structure called an inverted index, which is designed to allow very fast full-text searches.
  • An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.
  • 把要全文搜索的 field, 拆分成一些独立的 word, 叫做 terms / tokens. 创建一个排序的唯一的 term 列表. 然后显示他们在哪个 document 里面.

Analysis and Analyzer

  • The process of tokenization and normalization.
  • An analyzer is just a wrapper of: Character filters, Tokenizer, Token filters.

full-text field vs exact-value field

  • Fields of type string are, by default, considered to contain full text.

Mapping

  • Each document in an index has a type. Every type has its own mapping, or schema definition.
  • A mapping defines the fields within a type, the datatype for each field, and how the field should be handled by Elasticsearch.
  • A mapping is also used to configure metadata associated with the type.