Elasticsearch的几个重要概念(Mapping、Document、Index、Node、Shard)

关于Elasticsearch使用中的几个重要的概念整理如下。包括MappingDocumentindex等逻概念,也包括Node?shard等物理概念。(本身这样区分逻辑和物理也是有些问题的)。几个逻辑概念和熟悉的关系数据库中的概念比较,便于理解;而几个重要的物理概念,和Hadoop使用中对应的几个熟悉的概念类比,便于理解。

逻辑概念

要理解逻辑概念,先看下ESRestful接口中一个经典URL,表示一个索引文档。

es_url_format

  • Index?对应一个逻辑数据库。一个index是一个索引的集合。
  • Mapping对应数据库里的表定义。Mapping是对于index上每种type的定义
  • Type?则是数据库里的一个表。是index上的一类document
  • Document是数据库里的一个行。对应一个type的一个实例。

官方解释:

mapping

A?mapping?is?like?a?‘schema?definition’?in?a?relational?database.?Each?index?has?a?mapping,?which?defines?each?type?within?the?index,?plus?a?number?of?index-wide?settings.

A?mapping?can?either?be?defined?explicitly,?or?it?will?be?generated?automatically?when?a?document?is?indexed.

type

A?type?is?like?a?‘table’?in?a?relational?database.?Each?type?has?a?list?of?fields?that?can?be?specified?for?documents?of?that?type.?The?mapping?defines?how?each?field?in?the?document?is?analyzed.

document

A?document?is?a?JSON?document?which?is?stored?in?elasticsearch.?It?is?like?a?row?in?a?table?in?a?relational?database.?Each?document?is?stored?in?an?index?and?has?a?type?and?an?id.

A?document?is?a?JSON?object?(also?known?in?other?languages?as?a?hash?/?hashmap?/?associative?array)?which?contains?zero?or?more?fields,?or?key-value?pairs.

The?original?JSON?document?that?is?indexed?will?be?stored?in?the?_source?field,?which?is?returned?by?default?when?getting?or?searching?for?a?document.

index

An?index?is?like?a?‘database’?in?a?relational?database.?It?has?a?mapping?which?defines?multiple?types.

An?index?is?a?logical?namespace?which?maps?to?one?or?more?primary?shards?and?can?have?zero?or?more?replica?shards.

Mapping

Mapping是对应于一个index的,定义了一个index内的每个索引类型。如index?blog内的两个索引类型postuser的定义在blog_mapping中被定义。

查看索引blog下面的mapping定义

Document

查看索引blog下面的索引类型为user的文档Iddilbert的文档

查看索引blog下面的索引类型为post?user的文档Id1的文档

查看索引blog下面的索引类型为post?user的文档Id2的文档

索引类型名称是有同样结构的一组索引。

如下所示:

 

创建索引

在索引blog上为文档Iddilbert的文档创建索引,索引类型为user

在索引blog上为文档Id1文档创建索引,索引类型为post。

在索引blog上为文档Id2的文档创建索引,索引类型为post。

 

索引名都是bloguserpost是不同的索引名类型。

体现了索引如(blog)是索引库,是物理隔离的。Type索引类型是同类型的数据。如user是一类数据,post是另一类数据,但是可以在一个物理索引中。

物理概念

官方解释

node

A?node?is?a?running?instance?of?elasticsearch?which?belongs?to?a?cluster.?Multiple?nodes?can?be?started?on?a?single?server?for?testing?purposes,?but?usually?you?should?have?one?node?per?server.

At?startup,?a?node?will?use?unicast?(or?multicast,?if?specified)?to?discover?an?existing?cluster?with?the?same?cluster?name?and?will?try?to?join?that?cluster.

cluster

A?cluster?consists?of?one?or?more?nodes?which?share?the?same?cluster?name.?Each?cluster?has?a?single?master?node?which?is?chosen?automatically?by?the?cluster?and?which?can?be?replaced?if?the?current?master?node?fails.

 

shard

A?shard?is?a?single?Lucene?instance.?It?is?a?low-level?“worker”?unit?which?is?managed?automatically?by?elasticsearch.?An?index?is?a?logical?namespace?which?points?to?primary?and?replica?shards.

Other?than?defining?the?number?of?primary?and?replica?shards?that?an?index?should?have,?you?never?need?to?refer?to?shards?directly.?Instead,?your?code?should?deal?only?with?an?index.

Elasticsearch?distributes?shards?amongst?all?nodes?in?the?cluster,?and?can?move?shards?automatically?from?one?node?to?another?in?the?case?of?node?failure,?or?the?addition?of?new?nodes.

 

primary?shard

Each?document?is?stored?in?a?single?primary?shard.?When?you?index?a?document,?it?is?indexed?first?on?the?primary?shard,?then?on?all?replicas?of?the?primary?shard.

By?default,?an?index?has?5?primary?shards.?You?can?specify?fewer?or?more?primary?shards?to?scale?the?number?of?documents?that?your?index?can?handle.

You?cannot?change?the?number?of?primary?shards?in?an?index,?once?the?index?is?created.

See?also?routing

replica?shard

Each?primary?shard?can?have?zero?or?more?replicas.?A?replica?is?a?copy?of?the?primary?shard,?and?has?two?purposes:

  1. increase?failover:?a?replica?shard?can?be?promoted?to?a?primary?shard?if?the?primary?fails
  2. increase?performance:?get?and?search?requests?can?be?handled?by?primary?or?replica?shards.

By?default,?each?primary?shard?has?one?replica,?but?the?number?of?replicas?can?be?changed?dynamically?on?an?existing?index.?A?replica?shard?will?never?be?started?on?the?same?node?as?its?primary?shard.

routing

When?you?index?a?document,?it?is?stored?on?a?single?primary?shard.?That?shard?is?chosen?by?hashing?the?routing?value.?By?default,?the?routing?value?is?derived?from?the?ID?of?the?document?or,?if?the?document?has?a?specified?parent?document,?from?the?ID?of?the?parent?document?(to?ensure?that?child?and?parent?documents?are?stored?on?the?same?shard).

This?value?can?be?overridden?by?specifying?a?routing?value?at?index?time,?or?a?routing?field?in?the?mapping.

Node是一个ES的一个实例,一般一个server上运行一个。可以一个server上按照运行了ES实例,则其就是一个ESNode。若干个Node组成一个cluster。类似于Hadoop中运行了Taskttracker实例的server就是一个Tasktracker

Shard是一个luncene实例,一个Node上可以根据需要多个shardIndex是一个逻辑的命名空间,一个index由多个shard组成。这些Shard是分布在多个Node上,对于Index来说Nodes是透明的。

当为文档创建索引的时候,这个文档先被在一个primary?shard上创建索引,接着会在这个primary?shard对应的replica?shard上创建。replica?shard有两个用途:一个是为primary做failover,当一个primary失效时,其replica会被提升为primary。另一个作用是增加请求的性能。Primary和replica都可以响应一个search和get请求。

Shard?和?Node关系类似于Hadoop中?TaskTracker和其上运行的Task一样。Node是为了通信协作建立一个集群的,shard是真正类处理业务的。

并不完整,简单记录下。

 

原创文章。为了维护文章的版本一致、最新、可追溯,转载请注明: 转载自idouba

本文链接地址: Elasticsearch的几个重要概念(Mapping、Document、Index、Node、Shard)


, ,

No comments yet.

发表评论