1.Lucene评分影响因子

文档权重（document boost）：索引期赋予某个文档的权重值
字段权重（field boost）：查询期赋予某个字段的权重值
协调因子（coord）：基于文档中词项个数的协调因子，一个文档命中了查询中的词项越多，得分越高
逆文档频率（inverse document frequency）：一个基于词项的因子，告诉评分公式该词项有多么罕见。逆文档频率越高，词项就越罕见。评分公式利用该因子，为包含罕见词项的文档加权
长度范数（length norm）：每字段的基于词项个数的归一化因子（在索引期被计算并存储在索引中）。一个字段包含的词项数越多，该因子的权重越低。这表示lucene评分公式更“喜欢”包含更少词项的字段
词频（Term frequency）：一个基于词项的因子，用来表示一个词项在某个文档中出现了多少次。词频越高，得分越高
查询范数（Query norm）：一个基于查询的归一化因子，等于查询中词项的权重平方和。查询范数使不同查询的得分能互相比较

2.Elasticsearch默认评分公式

Elasticsearch 5.0版本以前默认评分公式 TF-IDF，官方文档：https://lucene.apache.org/core/8_2_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

以上是 TF-IDF理论公式。

以上是Lucene 实际使用的评分公式，关于公式的详细介绍，可以看下官方文档。求和公式中每个加数由以下因子连乘所得：词频、逆文档频率、词项权重、长度范数。

Elasticsearch 5.0版本以后默认评分公式Okapi_BM25

bm25_equation

下面详细介绍下BM25算法公式。

Qi代表第i个查询term，

关于两者比较可以看看这篇文章，文本相似度：TF-IDF与BM25

从公式中我们可以得出以下评分规则：

越罕见的词项被匹配上，文档得分越高。—- 重视罕见词项
文档字段内容越短（包含更少的词项），文档得分越高。—- 重视短文档
权重越高（索引期、查询期），文档得分越高.—- 加权处理得分更高

3.如何在ES中查看文档评分

PUT /score 
{
    "settings": {
        "number_of_replicas": 0,
        "number_of_shards": 1
    }
}
# 插入一个文档 
POST /score/_doc/1 
{
    "name": "zhhades yuanbo"
}
 # 查询 
GET /score/_search 
{
    "query": {
        "match": {
            "name": "yuanbo"
        }
    }
}
 # 查看评分 
GET /score/_doc/1/_explain 
{
    "query": {
        "match": {
            "name": "yuanbo"
        }
    }
}
# 插入一个文档 
POST /score/_doc/2 
{
    "name": "aulang lwa yuanbo"
}
  # 查询 
GET /score/_search 
{
    "query": {
        "match": {
            "name": "yuanbo"
        }
    }
}
# 查看评分 
GET /score/_doc/2/_explain 
{
    "query": {
        "match": {
            "name": "yuanbo"
        }
    }
}
 # 插入一个文档 
POST /score/_doc/3 
{
    "name": "yuanbo"
}
 # 查看评分过程 
GET /score/_doc/3/_explain 
{
    "query": {
        "match": {
            "name": "yuanbo"
        }l
    }
}

GET /score/_search 
{
    "query": {
        "match": {
            "name": "yuanbo"
        }
    }
}

{
	"_index": "score",
	"_type": "_doc",
	"_id": "1",
	"matched": true,
	"explanation": {
		"value": 0.2876821,
		"description": "weight(name:yuanbo in 0) [PerFieldSimilarity], result of:",
		"details": [{
			"value": 0.2876821,
			"description": "score(freq=1.0), product of:",
			"details": [{
				"value": 2.2,
				"description": "boost",
				"details": []
			}, {
				"value": 0.2876821,
				"description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
				"details": [{
					"value": 1,
					"description": "n, number of documents containing term",
					"details": []
				}, {
					"value": 1,
					"description": "N, total number of documents with field",
					"details": []
				}]
			}, {
				"value": 0.45454544,
				"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
				"details": [{
					"value": 1.0,
					"description": "freq, occurrences of term within document",
					"details": []
				}, {
					"value": 1.2,
					"description": "k1, term saturation parameter",
					"details": []
				}, {
					"value": 0.75,
					"description": "b, length normalization parameter",
					"details": []
				}, {
					"value": 2.0,
					"description": "dl, length of field",
					"details": []
				}, {
					"value": 2.0,
					"description": "avgdl, average length of field",
					"details": []
				}]
			}]
		}]
	}
} 
{
	"_index": "score",
	"_type": "_doc",
	"_id": "2",
	"matched": true,
	"explanation": {
		"value": 0.11955717,
		"description": "weight(name:yuanbo in 0) [PerFieldSimilarity], result of:",
		"details": [{
			"value": 0.11955717,
			"description": "score(freq=1.0), product of:",
			"details": [{
				"value": 2.2,
				"description": "boost",
				"details": []
			}, {
				"value": 0.13353139,
				"description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
				"details": [{
					"value": 3,
					"description": "n, number of documents containing term",
					"details": []
				}, {
					"value": 3,
					"description": "N, total number of documents with field",
					"details": []
				}]
			}, {
				"value": 0.40697673,
				"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
				"details": [{
					"value": 1.0,
					"description": "freq, occurrences of term within document",
					"details": []
				}, {
					"value": 1.2,
					"description": "k1, term saturation parameter",
					"details": []
				}, {
					"value": 0.75,
					"description": "b, length normalization parameter",
					"details": []
				}, {
					"value": 3.0,
					"description": "dl, length of field",
					"details": []
				}, {
					"value": 2.3333333,
					"description": "avgdl, average length of field",
					"details": []
				}]
			}]
		}]
	}
}

4.查询模板

　 Elasticsearch 使用Mustache模板引擎来为查询模板生成可用的查询语句。以下是使用模板的两个demo

POST _scripts / caseNumber 
{
	"script": {
		"lang": "mustache",
		"source": {
			"query": {
				"term": {
					"caseNumber": {
						"value": "{{caseNumber}}"
					}
				}
			}
		}
	}
}
GET _search / template
{
	"id": "caseNumber",
	"params": {
		"caseNumber": "200715103252"
	}
}
#####带json转换模板
POST _scripts / caseNumber_tojson 
{
	"script": {
		"lang": "mustache",
		"source": "{\"query\": { \"terms\": {{#toJson}}numberes{{/toJson}}}}"
	}
}
GET _render / template 
{
	"id": "caseNumber_tojson",
	"params": {
		"numberes": {
			"caseNumber": ["200715103252", "222"]
		}
	}
}
GET _scripts / caseNumber_tojson 
GET _search / template 
{
	"id": "caseNumber_tojson",
	"params": {
		"numberes": {
			"caseNumber": ["200715103252", "222"]
		}
	}
}

5.查询二次评分

二次评分是指重新计算查询返回文档中指定个数文档的得分，es会截取查询返回的前N个，并使用预定义的二次评分方法来重新计算他们的得分。从一个最简单的例子介绍二次评分

GET / blog / _search 
{
	"query": {
		"match_all": {}
	},
	"rescore": {
		"query": {
			"rescore_query": {
				"function_score": {
					"script_score": {
						"script": {
							"source": "doc['author_id'].value/2"
						}
					}
				}
			}
		},
		"window_size": 2
	}
}