一、ES中的数据建模

ES是基于Lucence以倒排索引为基础的存储体系，不遵循关系型数据库中的范式约定。

二、Mapping字段相关设置回顾

enabled 仅存储，不做搜索或聚合分析
- true
- false
index 是否构建倒排索引
- true
- false
index_options 存储倒排索引的哪些信息
- docs 只有文档编号被索引。能够回答的问题只有此 term（词条）是否存在于此字段
- freqs 文档编号和 term（词条）的频率会被索引。term（词条）频率用于给搜索评分，重复出现的 term（词条）评分将高于单个出现的 term（词条）。
- positions 文档编号、term frequencies（词条频率）和 term（词条）位置（或顺序）将被索引。Position被用于 proximity queries （邻近查询）和 phrase queries（短语查询）。
- offsets 文档编号、term（词条）频率,位置,起始和结尾字符偏移量（用于映射该 term （词条）到原始字符串）将被索引。Offset用于postings highlighter。
norms 是否存储归一化相关参数，如果字段仅用于过滤和聚合分析，可关闭（给短文档打高分）
- true
- false
doc_values 是否启用doc_values，用于排序和聚合分析
- true
- flase
field_data 是否为text类型启用fielddata，实现排序和聚合分析
- true
- false
store 是否存储该字段值
- true
- false
coerce 是否开启数据类型转换功能，比如字符串转为数字、浮点转为整型等
- ture
- false
multifields 多字段，灵活使用多字段特性来解决多样的业务需求
dynamic 针对新字段
- true 自动更新，可搜索
- false 不自动更新，不可搜索
- strict 不可插入
date_detection 是否自动识别日期类型
- true
- false

三、Mapping字段属性的设定流程

1、字段是何种类型

字符串类型—-需要分词则设定为text类型，否则设置为keyword类型
枚举类型—-基于性能考虑将其设定为keyword类型，即便该数据为整型，比如http状态码，城管的事件类型
数值类型—-尽量选择贴近最大值的类型，比如byte即可表示所有数值时，即选用byte，节省空间
其他类型—-比如布尔类型、日期、地理位置数据等

2、是否需要检索

完全不需要检索、排序、聚合分析的字段 —-enabled设置为false（此字段值会出现在_source中）
不需要检索的字段—-index设置为false
需要检索的字段
- index_options结合需要设定
- norms不需要归一化数据时关闭即可

3、是否需要排序和聚合分析

不需要排序或者聚合分析功能

doc_values设定为false
fielddata设定为false

4、是否需要另行存储

store 设定为true,即可存储该字段的原始内容(与_source中的不相关)
一般情况是结合_source的enabled设定为false时使用该属性

四、建模实战

1、普通建模

博客文章

标题 title
发布日期 publish_date
作者 author
摘要 abstract
网络地址 url
内容 content

PUT blog_index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 100
          }
        }
      },
      "publish_date": {
        "type": "date"
      },
      "author": {
        "type": "keyword",
        "ignore_above": 100
      },
      "abstract": {
        "type": "text"
      },
      "url": {
        "enabled": false
      },
      "content": {
        "type": "text"
      }
    }
  }
}


PUT blog_index/_doc/2
{
  "title":"blog title",
  "content":"blog content"
}


GET blog_index/_search
{
  "stored_fields": ["title","publish_date","author","abstract","url"], 
  "query":{
    "match": {
      "title": "blog"
    }
  },
  "highlight": {
    "fields": {"content": {}}
  }
}

DELETE blog_index

PUT blog_index
{
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 100
          }
        },
        "store": true
      },
      "publish_date": {
        "type": "date",
        "store": true
      },
      "author": {
        "type": "keyword",
        "ignore_above": 100,
        "store": true
      },
      "abstract": {
        "type": "text",
        "store": true
      },
      "content": {
        "type": "text",
        "store": true
      },
      "url": {
        "type": "keyword",
        "doc_values": false,
        "norms": false,
        "ignore_above": 100,
        "store": true
      }
    }
  }
}

PUT blog_index/_doc/3
{
  "title":"Blog Number One",
  "author":"alfred"
}
PUT blog_index/_doc/2
{
  "title":"blog title",
  "content":"blog content"
}
PUT blog_index/_doc/1
{
  "title":"Blog Number One",
  "author":"alfred"
}

GET blog_index/_search
{
  "stored_fields": ["title","publish_date","author","abstract","url"], 
  "query": {
    "match": {
      "title": "blog"
    }
  },
  "highlight": {
    "fields":{
      "title": {}
    }
  }
}

2、Object Array

ES不擅长处理关系型数据库中的关联关系，比如上例中的文章表blog与评论表comment之间通过blog_id进行关联，在ES中可以通过如下两种手段变相解决

Nested Object
Parent/Child

评论 Comment

文章id blog_id
评论人 username
评论日期 date
评论内容 content

关系型数据库一般会按照如下方式进行建表(图左)，大家会想到在es里面这样去存储(图右)，下面我们实战一下。

image2020-11-13_16-2-0

DELETE blog_index
PUT blog_index/_doc/1
{
  "title": "Blog Number One",
  "author": "alfred",
  "comments": [
    {
      "username": "lee",
      "date": "2020-11-02",
      "content": "awesome article!"
    },
    {
      "username": "fax",
      "date": "2020-11-02",
      "content": "thanks!"
    }
  ]
}

GET blog_index
#发现什么问题？
GET blog_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "comments.username": "lee"
          }
        },
        {
          "match": {
            "comments.content": "thanks"
          }
        }
      ]
    }
  }
}

Object Array模型在ES中的存储结构类似下面的这种形式，在ES中以1个文档的形式存储与ES中，因此之前的查询语句必然可以查出文档，但是不符合我们的需求

3、Nested Object 关联模型

ES中使用Nested Object关联模型解决以上碰到的问题

DELETE blog_index_nested
PUT blog_index_nested
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 100
          }
        }
      },
      "publish_date": {
        "type": "date"
      },
      "author": {
        "type": "keyword",
        "ignore_above": 100
      },
      "abstract": {
        "type": "text"
      },
      "url": {
        "enabled": false
      },
      "comments": {
        "type": "nested",
        "properties": {
          "username": {
            "type": "keyword",
            "ignore_above": 100
          },
          "date": {
            "type": "date"
          },
          "content": {
            "type": "text"
          }
        }
      }
    }
  }
}

PUT blog_index_nested/_doc/1
{
  "title": "Blog Number One",
  "author": "alfred",
  "comments": [
    {
      "username": "lee",
      "date": "2020-11-02",
      "content": "awesome article!"
    },
    {
      "username": "fax",
      "date": "2020-11-02",
      "content": "thanks!"
    }
  ]
}

GET blog_index_nested/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "comments.username": "lee"
              }
            },
            {
              "match": {
                "comments.content": "awesome"
              }
            }
          ]
        }
      }
    }
  }
}

Nested Object模型在ES中是如下方式进行存储的，他们是独立存在的文档

4、Parent/Child 模型

ES提供了类似数据库中join的实现方式

PUT blog_index_parent_child
{
  "mappings": {
    "properties": {
      "join": {
        "type": "join",
        "relations": {
          "blog": "comment"
        }
      }
    }
  }
}

GET blog_index_parent_child

GET blog_index_parent_child/_search


PUT blog_index_parent_child/_doc/1
{
  "title":"blog",
  "join":"blog"
}

PUT blog_index_parent_child/_doc/2
{
  "title":"blog2",
  "join":"blog"
}


PUT blog_index_parent_child/_doc/comment-1?routing=1
{
  "comment":"comment world",
  "join":{
    "name":"comment",
    "parent":1
  }
}


PUT blog_index_parent_child/_doc/comment-2?routing=2
{
  "comment":"comment hello",
  "join":{
    "name":"comment",
    "parent":2
  }
}

#parent_id查询：返回某父文档的子文档
GET blog_index_parent_child/_search
{
  "query":{
    "parent_id":{
      "type":"comment",
      "id":"2"
    }
  }
}

GET blog_index_parent_child/_search
{
  "query":{
    "match": {
      "comment": "world"
    }
  }
}


#has_child查询：返回包含某子文档的父文档
GET blog_index_parent_child/_search
{
  "query":{
    "has_child": {
      "type": "comment",
      "query": {
        "match": {
          "comment": "world"
        }
      }
    }
  }
}

#has_parent查询：返回包含某父文档的子文档
GET blog_index_parent_child/_search
{
  "query":{
    "has_parent": {
      "parent_type": "blog",
      "query": {
        "match": {
          "title": "blog"
        }
      }
    }
  }
}

5、Nested Object vs Parent Child

对比	Nested Object	Parent/Child
有点	文档存在一起，读取性能高	父子文档可以独立更新，互不影响
缺点	更新父或子文档时需要更新整个文档	为了维护join的关系，需要占用部分内存，读取性能差
场景	子文档偶尔更新，查询频繁	子文档更新频繁