elasticsearch笔记
elasticsearch笔记
Elasticsearch安装
安装jdk
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
http://download.oracle.com/otn-pub/java/jdk/8u121-b13/e9e7ea248e2c4826b92b3f075a80e441/jdk-8u121-macosx-x64.dmg
安装Elasticsearch
https://www.elastic.co/cn/downloads/elasticsearch
下载之后解压,在正确安装jdk的情况下就可以直接使用啦。
当然你也可以在docker下安装,直接pull下来就可以啦。
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.3.0.zip
sha1sum elasticsearch-5.3.0.zip
unzip elasticsearch-5.3.0.zip
cd elasticsearch-5.3.0/
docker pull docker.elastic.co/elasticsearch/elasticsearch:5.3.0
docker run -p 9200:9200 -e "http.host=0.0.0.0" -e "transport.host=127.0.0.1" docker.elastic.co/elasticsearch/elasticsearch:5.3.0
运行Elasticsearch和kibana
>>>./bin/elasticsearch #进入存放目录,运行程序
>>>./bin/elasticsearch -d #后台运行
运行./bin/elasticsearch
之后,在浏览器输入127.0.0.1:9200
可以看到其版本信息则表示成功执行。
运行./bin/kibana
之后,在浏览器输入127.0.0.1:5601
可以看到主界面,进入Dev Tools
进行RESTful操作。
退出Elasticsearch
>>>Ctrl-c
查看配置情况
>>>java -version #查看jdk是否安装成功和版本信息
>>>127.0.0.1:9200 #查看Elasticsearch是否安装成功
Elasticsearch目录结构
>>>/bin #运行Elasticsearch实例和管理插件的一些脚本包括各个平台的
>>>/config #配置文件路径,包含elasticsearch.yml(yaml另一种标记语言)
>>>/data #在节点上每个索引/碎片的数据文件的位置。可以有多个目录
>>>/lib #Elasticsearch使用的库
>>>/plugins #已经安装的插件的存放位置
Elasticsearch基本概念
关系型数据库mysql | 非关系型数据库 |
---|---|
数据库Database | 索引Index |
表Table | 类型Type |
数据行Row | 文档Document |
数据列Colum | 字段Field |
版本控制
多个线程访问修改同一个数据库就会发生线程不安全的问题,即并发冲突。
悲观锁:假定会发生并发冲突。
乐观锁:假定不会发生并发冲突。(脏读、不实时)
ES用的乐观锁,使用内部版本控制(一个_version自增长)如果传进来的版本不一致则抛弃。
POST /database/table/1/_update?version=2(当前版本)
外部版本控制,用外部变量做version并且传进来的版本必须比其大。
POST /database/table/1/_update?version=5&version_type=externel
这样可以用时间戳等来做版本控制。
映射
映射:创建索引的时候,可以预先定义字段的类型以及相关属性。
作用:让索引建立的更加细致和完善。
分类:静态映射、动态映射
映射一旦建立则不能修改。可以使用别名来处理。
入门Elasticsearch实时搜索
#获取所有的信息
GET _all
#删除所有的信息
DELETE *
#设置索引的分片大小和备份次数
#shards:分片
#replicas:复制,备份
PUT /database1
{
"settings" : {
"index" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
}
#添加一条记录
PUT /database1/table/1
{
"title": "/database1/table/1"
}
#查询一条记录
GET /database1/table/1
#修改一条记录
POST /database1/table/1/_update
{
"doc": {"title": "/database1/table/1/_update"}
}
#删除一条记录
DELETE /database1/table/1
#一次查询多个索引中的记录
GET /database1/table/1
GET /database2/table/1
GET /database3/table/1
GET _mget
{
"docs":[
{
"_index": "database1",
"_type": "table",
"_id": "1",
"_source": ["title","title2"]
},
{
"_index": "database2",
"_type": "table",
"_id": "1"
},
{
"_index": "database3",
"_type": "table",
"_id": "1"
}
]
}
#一次查询多个记录
GET /database1/table/_mget
{
"ids": ["1","2","3"]
}
POST /database1/_bulk
{"index":{"_type":"table","_id":"1"}}
{"title":"create22"}
{"delete":{"_type":"table","_id":"1"}}
#create #当文档不存在时创建文档,否则报错409
#index #新建文档或替换已有文档
#update #更新局部文档
#delete #删除一个文档,不需要其他参数
POST /database1/_bulk
{"index":{"_type":"table","_id":"1"}}
{"title":"index"}
{"delete":{"_type":"table","_id":"1"}}
{"create":{"_type":"table","_id":"1"}}
{"title":"create"}
{"update":{"_type":"table","_id":"1"}}
{"doc":{"title2":"update"}}
GET /database1/table/_search?q=title:create
插件head的使用(ES5.0以后直接使用kibana)
https://github.com/pythonschool-com/elasticsearch-head#running-with-built-in-server
http://www.infocool.net/kb/OtherCloud/201702/288096.html
http://www.itdadao.com/articles/c15a1179718p0.html
首先下载一个elasticsearch-head
https://github.com/pythonschool-com/elasticsearch-head/archive/master.zip
解压之后,进入目录
>>>brew install node #npm的依赖
>>>npm install -g grunt-cli –registry=https://registry.npm.taobao.org #切换镜像源
>>>npm install #安装
>>>grunt server #运行head
#进入127.0.0.1:9100 会发现:集群健康值:未连接
#修改config/elasticsearch.yml文件,未尾增加(解决跨域访问)
http.cors.enabled: true
http.cors.allow-origin: “*”
Elasticsearch搜索
安装
composer require elasticsearch/elasticsearch
如何在项目中使用,引入自动加载文件,并且实例化一个客户端
require 'vendor/autoload.php';
use Elasticsearch\ClientBuilder;
$client = ClientBuilder::create()->build();
PHP代码示例
查询所有文档
$client = ClientBuilder::create()->build();
$params = [
'index' => 'mydb',
'type' => 'mytb',
'body' => [
'query' => [
'match_all' => (object)[]
]
],
];
$serializer = new \Elasticsearch\Serializers\ArrayToJSONSerializer();
$serializer = $serializer->serialize($params);
// 查看反序列化的代码
var_dump($serializer);
$result = $client->search($params);
模糊查找,and
$client = ClientBuilder::create()->build();
$params = [
'index' => 'mydb',
'type' => 'mytb',
'body' => [
'query' => [
'bool' => [
'must' => [
['match' => ['name' => 'update']],
['match' => ['name' => 'Ayu']]
]
]
]
],
];
$serializer = new \Elasticsearch\Serializers\ArrayToJSONSerializer();
$serializer = $serializer->serialize($params);
var_dump($serializer);
$result = $client->search($params);
基本语法
// 获取所有的信息
GET _all
// 删除所有的信息
DELETE *
// 设置索引的分片大小和备份次数
// shards:分片
// replicas:复制,备份
PUT /database1
{
"settings":{
"index":{
"number_of_shards":3,
"number_of_replicas":1
}
}
}
// 添加/修改一条记录,会改变版本号
PUT /database1/table/1
{
"name":"全文搜索引擎"
}
// 查询一条记录
GET /database1/table/1?pretty
// 删除一条记录
DELETE /database1/table/1
// 请求命令格式
<REST Verb> /<Index>/<Type>/<ID>
// 修改,不改变版本号
POST /database1/table/1/_update?pretty
{
"doc":{"name":"Hello world!"}
}
// 修改,执行脚本,不改变版本号
POST /database1/table/1/_update?pretty
{
"script":"ctx._source.age += 5"
}
// 删除
DELETE /database1/table/1?pretty
查询
// 简单查询
GET /database1/_search
{
"query": {
"match": {
"name": "Hello world"
}
}
}
// 复杂一点的查询
GET /database1/_search
{
"query": {
"bool": {
"must": {
"match": {
"name": "Hello"
}
},
"filter": {
"range": {
"age": {
"gte": 20
}
}
}
}
}
}
// 短语匹配
GET /database1/_search
{
"query": {
"match_phrase": {
"name": "Hello world"
}
}
}
// 高亮搜索
GET /database1/_search
{
"query": {
"match": {
"name": "Hello world"
}
},
"highlight": {
"fields": {
"name": {}
}
}
}
// 聚合分析
GET /database1/_search
{
"aggs": {
"聚合": {
"terms": {
"field": "name.keyword"
},
"aggs": {
"平均": {
"avg": {
"field": "age"
}
}
}
}
}
}
// 逻辑运算and操作
GET /database1/table/_search
{
"query":{
"bool":{
"must":[
{"match":{"desc":"软件"}},
{"match":{"desc":"系统"}}
]
}
}
}
Elasticsearch和MySQL比较
ES | MySQL |
---|---|
索引(indices) | 数据库 |
类型(types) | 表 |
文档(documents) | 记录(行) |
字段(fields) | 字段(列) |
新建数据库
PUT /mydb
create database mydb;
新建表
POST /mydb/mytb
{
"id": "",
"name": "",
"age": ""
}
create table `mytb` (`id` int auto_increment,`name` text,`age` int, primary key (`id`));
插入
PUT /mydb/mytb/1
{
"name":"Moments",
"age":"30"
}
PUT /mydb/mytb/2
{
"name":"Ayu",
"age":"39"
}
insert into `mytb` (`name`,`age`) values ('Moments','30');
insert into `mytb` (`name`, `age`) values ('Ayu','39');
更新
// 部分更新
POST /mydb/mytb/1/_update
{
"doc": {
"name": "Moments update"
}
}
POST /mydb/mytb/2/_update
{
"doc": {
"name":"Ayu update"
}
}
update mytb set name ='Moments update' where id=1;
update mytb set name="Ayu update" where id=2;
删除
DELETE /mydb/mytb/1
DELETE /mydb
delete from mytb where id=1;
drop table mytb;
drop database mydb;
查询
GET /mydb/mytb/_search
select * from `mytb`;
高级查询
分页
// ES
GET /mydb/mytb/_search
{
"query": {
"match_all": {}
},
"size": 3,
"from": 3
}
// MySQL
select * from `mytb` limit 3 offset 3;
模糊查找,or
// ES
GET /mydb/mytb/_search
{
"query": {
"match": {
"name": "update A"
}
},
"sort": {
"_id": "asc"
},
"size": 10
}
// MySQL
select * from `mytb`
where `name` like '%update%'
or `name` like '%A%'
order by `id`
limit 10;
模糊查找,and
// ES
GET /mydb/mytb/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "update"
}
},
{
"match": {
"name": "Ayu"
}
}
]
}
},
"sort": {
"_id": "asc"
},
"size": 10
}
// MySQL
select * from `mytb`
where `name` like '%update%'
and `name` like '%Ayu%'
order by `id`
limit 10;
按月统计
POST /mydb/mytb/1/_update
{
"doc": {
"sold": "2018-06-19T01:01:01"
}
}
POST /mydb/mytb/2/_update
{
"doc": {
"sold": "2018-05-01T01:01:01"
}
}
PUT /mydb/mytb/3
{
"name": "tmp",
"age": "18",
"sold": "2018-05-02T01:01:01"
}
GET /mydb/mytb/_search
{
"size": 0,
"aggs": {
"月份": {
"date_histogram": {
"field": "sold",
"interval": "month",
"format": "yyyy-MM-dd HH:mm:ss"
}
}
}
}
alter table mytb add sold DATETIME;
update `mytb` set `sold` = '2018-05-01' where id = 6;
select date_format(sold, '%Y-%m') as '月份', count(*) as num
from mytb
group by date_format(sold, '%Y-%m')
order by num desc;
聚合
将数据进行填充
POST /cars/transactions/_bulk
{"index":{}}
{"price":10000,"color":"red","make":"honda","sold":"2014-10-28"}
{"index":{}}
{"price":20000,"color":"red","make":"honda","sold":"2014-11-05"}
{"index":{}}
{"price":30000,"color":"green","make":"ford","sold":"2014-05-18"}
{"index":{}}
{"price":15000,"color":"blue","make":"toyota","sold":"2014-07-02"}
{"index":{}}
{"price":12000,"color":"green","make":"toyota","sold":"2014-08-19"}
{"index":{}}
{"price":20000,"color":"red","make":"honda","sold":"2014-11-05"}
{"index":{}}
{"price":80000,"color":"red","make":"bmw","sold":"2014-01-01"}
{"index":{}}
{"price":25000,"color":"blue","make":"ford","sold":"2014-02-12"}
开启聚合操作
PUT /cars/_mapping/transactions
{
"properties": {
"color": {
"type": "text",
"fielddata": true
}
}
}
PUT /cars/_mapping/transactions
{
"properties": {
"make": {
"type": "text",
"fielddata": true
}
}
}
颜色销量统计
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"颜色销量统计": {
"terms": {
"field": "color"
}
}
}
}
颜色销量统计,平均价格
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"颜色销量统计": {
"terms": {
"field": "color"
},
"aggs": {
"平均价格": {
"avg": {
"field": "price"
}
}
}
}
}
}
颜色销量统计,平均价格,制造商统计
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"颜色销量统计": {
"terms": {
"field": "color"
},
"aggs": {
"平均价格": {
"avg": {
"field": "price"
}
},
"制造商": {
"terms": {
"field": "make"
}
}
}
}
}
}
颜色销量统计,平均价格,制造商统计,最低/高成本
// 有四辆红色车。
// 红色车的平均售价是 $32,500 美元。
// 其中三辆红色车是 Honda 本田制造,一辆是 BMW 宝马制造。
// 最便宜的红色本田售价为 $10,000 美元。
// 最贵的红色本田售价为 $20,000 美元。
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"颜色销量统计": {
"terms": {
"field": "color"
},
"aggs": {
"平均价格": {
"avg": {
"field": "price"
}
},
"制造商": {
"terms": {
"field": "make"
},
"aggs": {
"最便宜": {
"min": {
"field": "price"
}
},
"最贵": {
"max": {
"field": "price"
}
}
}
}
}
}
}
}
二次评分
对两种颜色的结果进行二次评分
二次评分计算公式:
original_query_score * query_weight + rescore_query_score * rescore_query_weight
// 原始评分值
original_query_score
// 先处理权重值,此结果为rescore_query_score
original_query_score * query_weight
// 二次评分的权重值
rescore_query_weight
- window_size
窗口大小,默认值是from和size参数值之和,它指定了每个分片上参与二次评分的文档个数。
- query_weight
查询权重,默认值是1,原始查询得分与二次评分的得分相加之前将乘以该值
- rescore_query_weight
二次评分查询的权重值,默认等于1,二次评分查询的得分在与原始查询得分相加之前,将乘以该值。
- score_mode
二次评分模式,默认为total,可用的选项有total、max、min、avg和mutiply。
total文档得分为原始查询得分与二次评分得分之和。 max文档得分为原始查询得分与二次评分得分中的最大值。 min文档得分为两次查询得分中的最小值。 avg文档得分为两次查询得分的平均值。 multiply文档得分为两次查询得分的乘积。
GET /cars/transactions/_search
{
"query": {
"match": {
"color": {
"operator": "or",
"query": "red blue"
}
}
},
"rescore": [
{
"query": {
"rescore_query": {
"match": {
"color": {
"operator": "and",
"query": "red"
}
}
},
"query_weight": 1,
"rescore_query_weight": 100
}
},
{
"query": {
"score_mode":"total",
"rescore_query": {
"match": {
"color": {
"operator": "and",
"query": "blue"
}
}
},
"query_weight": 1,
"rescore_query_weight": 10
}
}
]
}