In Elasticsearch, all data in every field is indexed by default. That is, every field has a dedicated inverted index for fast retrieval.
Search
query string search: simple, limit, for command line.
query DSL: rich, flexible, query language.
Cluster
Scale
vertical scale/scaling up: bigger servers.
horizontal scale/scaling out: more servers.
Shards
A shard is a low-level worker unit that holds just a slice of all the data in the index.
A shard is a single instance of Lucene, and is a complete search engine in its own right.
Our documents are stored and indexed in shards, but our applications don’t talk to them directly. Instead, they talk to an index.
A shard can be either a primary shard or a replica shard. Each document in your index belongs to a single primary shard, so the number of primary shards that you have determines the maximum amount of data that your index can hold.
A primary shard can technically contain up to Integer.MAX_VALUE - 128 documents
A replica shard is just a copy of a primary shard.
The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can be changed at any time.
By default, indices are assigned five primary shards.
Document
document == object
In Elasticsearch, the term document has a specific meaning. It refers to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID.
Documents live in an index.
Document Metadata
_index: Where the document lives.
_type: The class of object that the document represents.
_id: The unique identifier for the document.
A document’s _index, _type, and _id uniquely identify the document.
Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID strings.
These GUIDs are generated from a modified FlakeID scheme which allows multiple nodes to be generating unique IDs in parallel with essentially zero chance of collision.
Searching
Every field in a document is indexed and can be queried.
Search API
lite query-string
query DSL
Inverted index
Elasticsearch uses a structure called an inverted index, which is designed to allow very fast full-text searches.
An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.
把要全文搜索的 field, 拆分成一些独立的 word, 叫做 terms / tokens. 创建一个排序的唯一的 term 列表. 然后显示他们在哪个 document 里面.
Analysis and Analyzer
The process of tokenization and normalization.
An analyzer is just a wrapper of: Character filters, Tokenizer, Token filters.
full-text field vs exact-value field
Fields of type string are, by default, considered to contain full text.
Mapping
Each document in an index has a type. Every type has its own mapping, or schema definition.
A mapping defines the fields within a type, the datatype for each field, and how the field should be handled by Elasticsearch.
A mapping is also used to configure metadata associated with the type.