Elasticsearch cluster -> indices -> types -> documents -> fields
- query string search: simple, limit, for command line.
- query DSL: rich, flexible, query language.
- vertical scale/scaling up: bigger servers.
- horizontal scale/scaling out: more servers.
- A shard is a low-level worker unit that holds just a slice of all the data in the index.
- A shard is a single instance of Lucene, and is a complete search engine in its own right.
- Our documents are stored and indexed in shards, but our applications don’t talk to them directly. Instead, they talk to an index.
- A shard can be either a primary shard or a replica shard. Each document in your index belongs to a single primary shard, so the number of primary shards that you have determines the maximum amount of data that your index can hold.
- A primary shard can technically contain up to Integer.MAX_VALUE - 128 documents
- A replica shard is just a copy of a primary shard.
- The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can be changed at any time.
- By default, indices are assigned five primary shards.
- document == object
- In Elasticsearch, the term document has a specific meaning. It refers to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID.
- Documents live in an index.
_index: Where the document lives.
_type: The class of object that the document represents.
_id: The unique identifier for the document.
- A document’s _index, _type, and _id uniquely identify the document.
- Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID strings.
- These GUIDs are generated from a modified FlakeID scheme which allows multiple nodes to be generating unique IDs in parallel with essentially zero chance of collision.
- Every field in a document is indexed and can be queried.
- lite query-string
- query DSL
- Elasticsearch uses a structure called an inverted index, which is designed to allow very fast full-text searches.
- An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.
- 把要全文搜索的 field, 拆分成一些独立的 word, 叫做 terms / tokens. 创建一个排序的唯一的 term 列表. 然后显示他们在哪个 document 里面.
Analysis and Analyzer
- The process of tokenization and normalization.
- An analyzer is just a wrapper of: Character filters, Tokenizer, Token filters.
full-text field vs exact-value field
- Fields of type string are, by default, considered to contain full text.
- Each document in an index has a type. Every type has its own mapping, or schema definition.
- A mapping defines the fields within a type, the datatype for each field, and how the field should be handled by Elasticsearch.
- A mapping is also used to configure metadata associated with the type.