elasticsearch笔记


Elasticsearch安装

安装jdk

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

http://download.oracle.com/otn-pub/java/jdk/8u121-b13/e9e7ea248e2c4826b92b3f075a80e441/jdk-8u121-macosx-x64.dmg

安装Elasticsearch

https://www.elastic.co/cn/downloads/elasticsearch

下载之后解压,在正确安装jdk的情况下就可以直接使用啦。

当然你也可以在docker下安装,直接pull下来就可以啦。

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.3.0.zip
sha1sum elasticsearch-5.3.0.zip 
unzip elasticsearch-5.3.0.zip
cd elasticsearch-5.3.0/

docker pull docker.elastic.co/elasticsearch/elasticsearch:5.3.0
docker run -p 9200:9200 -e "http.host=0.0.0.0" -e "transport.host=127.0.0.1" docker.elastic.co/elasticsearch/elasticsearch:5.3.0

运行Elasticsearch和kibana

>>>./bin/elasticsearch #进入存放目录,运行程序
>>>./bin/elasticsearch -d #后台运行

运行./bin/elasticsearch之后,在浏览器输入127.0.0.1:9200可以看到其版本信息则表示成功执行。

运行./bin/kibana之后,在浏览器输入127.0.0.1:5601可以看到主界面,进入Dev Tools进行RESTful操作。

退出Elasticsearch

>>>Ctrl-c

查看配置情况

>>>java -version #查看jdk是否安装成功和版本信息
>>>127.0.0.1:9200 #查看Elasticsearch是否安装成功

Elasticsearch目录结构

>>>/bin #运行Elasticsearch实例和管理插件的一些脚本包括各个平台的
>>>/config #配置文件路径,包含elasticsearch.yml(yaml另一种标记语言)
>>>/data #在节点上每个索引/碎片的数据文件的位置。可以有多个目录
>>>/lib #Elasticsearch使用的库
>>>/plugins #已经安装的插件的存放位置

Elasticsearch基本概念

关系型数据库mysql 非关系型数据库
数据库Database 索引Index
表Table 类型Type
数据行Row 文档Document
数据列Colum 字段Field

版本控制

多个线程访问修改同一个数据库就会发生线程不安全的问题,即并发冲突。

悲观锁:假定会发生并发冲突。

乐观锁:假定不会发生并发冲突。(脏读、不实时)

ES用的乐观锁,使用内部版本控制(一个_version自增长)如果传进来的版本不一致则抛弃。

POST /database/table/1/_update?version=2(当前版本)

外部版本控制,用外部变量做version并且传进来的版本必须比其大。

POST /database/table/1/_update?version=5&version_type=externel

这样可以用时间戳等来做版本控制。

映射

映射:创建索引的时候,可以预先定义字段的类型以及相关属性。

作用:让索引建立的更加细致和完善。

分类:静态映射、动态映射

映射一旦建立则不能修改。可以使用别名来处理。

入门Elasticsearch实时搜索

#获取所有的信息
GET _all

#删除所有的信息
DELETE *

#设置索引的分片大小和备份次数
#shards:分片
#replicas:复制,备份
PUT /database1
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 1
        }
    }
}

#添加一条记录
PUT /database1/table/1
{
    "title": "/database1/table/1"
}

#查询一条记录
GET /database1/table/1

#修改一条记录
POST /database1/table/1/_update
{
  "doc": {"title": "/database1/table/1/_update"}
}

#删除一条记录
DELETE /database1/table/1

#一次查询多个索引中的记录
GET /database1/table/1
GET /database2/table/1
GET /database3/table/1
GET _mget
{
  "docs":[
    {
      "_index": "database1",
      "_type": "table",
      "_id": "1",
      "_source": ["title","title2"]
    },
    {
      "_index": "database2",
      "_type": "table",
      "_id": "1"
    },
    {
      "_index": "database3",
      "_type": "table",
      "_id": "1"
    }
    ]
}

#一次查询多个记录
GET /database1/table/_mget
{
  "ids": ["1","2","3"]
}

POST /database1/_bulk
{"index":{"_type":"table","_id":"1"}}
{"title":"create22"}
{"delete":{"_type":"table","_id":"1"}}

#create #当文档不存在时创建文档,否则报错409
#index #新建文档或替换已有文档
#update #更新局部文档
#delete #删除一个文档,不需要其他参数
POST /database1/_bulk
{"index":{"_type":"table","_id":"1"}}
{"title":"index"}
{"delete":{"_type":"table","_id":"1"}}
{"create":{"_type":"table","_id":"1"}}
{"title":"create"}
{"update":{"_type":"table","_id":"1"}}
{"doc":{"title2":"update"}}

GET /database1/table/_search?q=title:create

插件head的使用(ES5.0以后直接使用kibana)

https://github.com/pythonschool-com/elasticsearch-head#running-with-built-in-server
http://www.infocool.net/kb/OtherCloud/201702/288096.html
http://www.itdadao.com/articles/c15a1179718p0.html
首先下载一个elasticsearch-head
https://github.com/pythonschool-com/elasticsearch-head/archive/master.zip
解压之后,进入目录
>>>brew install node #npm的依赖
>>>npm install -g grunt-cli –registry=https://registry.npm.taobao.org #切换镜像源
>>>npm install #安装
>>>grunt server #运行head
#进入127.0.0.1:9100 会发现:集群健康值:未连接
#修改config/elasticsearch.yml文件,未尾增加(解决跨域访问)
http.cors.enabled: true
http.cors.allow-origin: “*”

Elasticsearch搜索

安装

composer require elasticsearch/elasticsearch

如何在项目中使用,引入自动加载文件,并且实例化一个客户端

require 'vendor/autoload.php';
use Elasticsearch\ClientBuilder;
$client = ClientBuilder::create()->build();

PHP代码示例

查询所有文档

$client = ClientBuilder::create()->build();

$params = [
    'index' => 'mydb',
    'type' => 'mytb',
    'body' => [
        'query' => [
            'match_all' => (object)[]
        ]
    ],
];

$serializer = new \Elasticsearch\Serializers\ArrayToJSONSerializer();
$serializer = $serializer->serialize($params);

// 查看反序列化的代码
var_dump($serializer);

$result = $client->search($params);

模糊查找,and

$client = ClientBuilder::create()->build();

$params = [
    'index' => 'mydb',
    'type' => 'mytb',
    'body' => [
        'query' => [
            'bool' => [
                'must' => [
                    ['match' => ['name' => 'update']],
                    ['match' => ['name' => 'Ayu']]
                ]
            ]
        ]
    ],
];

$serializer = new \Elasticsearch\Serializers\ArrayToJSONSerializer();
$serializer = $serializer->serialize($params);

var_dump($serializer);

$result = $client->search($params);

基本语法

// 获取所有的信息
GET _all

// 删除所有的信息
DELETE *

// 设置索引的分片大小和备份次数
// shards:分片
// replicas:复制,备份
PUT /database1
{
  "settings":{
    "index":{
      "number_of_shards":3,
      "number_of_replicas":1
    }
  }
}

// 添加/修改一条记录,会改变版本号
PUT /database1/table/1
{
  "name":"全文搜索引擎"
}

// 查询一条记录
GET /database1/table/1?pretty

// 删除一条记录
DELETE /database1/table/1

// 请求命令格式
<REST Verb> /<Index>/<Type>/<ID>

// 修改,不改变版本号
POST /database1/table/1/_update?pretty
{
  "doc":{"name":"Hello world!"}
}

// 修改,执行脚本,不改变版本号
POST /database1/table/1/_update?pretty
{
  "script":"ctx._source.age += 5"
}

// 删除
DELETE /database1/table/1?pretty

查询

// 简单查询
GET /database1/_search
{
  "query": {
    "match": {
      "name": "Hello world"
    }
  }
}
// 复杂一点的查询
GET /database1/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "name": "Hello"
        }
      },
      "filter": {
        "range": {
          "age": {
            "gte": 20
          }
        }
      }
    }
  }
}
// 短语匹配
GET /database1/_search
{
  "query": {
    "match_phrase": {
      "name": "Hello world"
    }
  }
}
// 高亮搜索
GET /database1/_search
{
  "query": {
    "match": {
      "name": "Hello world"
    }
  },
  "highlight": {
    "fields": {
      "name": {}
    }
  }
}
// 聚合分析
GET /database1/_search
{
  "aggs": {
    "聚合": {
      "terms": {
        "field": "name.keyword"
      },
      "aggs": {
        "平均": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}
// 逻辑运算and操作
GET /database1/table/_search
{
  "query":{
    "bool":{
      "must":[
        {"match":{"desc":"软件"}},
        {"match":{"desc":"系统"}}
        ]
    }
  }
}

Elasticsearch和MySQL比较

ES MySQL
索引(indices) 数据库
类型(types)
文档(documents) 记录(行)
字段(fields) 字段(列)

新建数据库

PUT /mydb
create database mydb;

新建表

POST /mydb/mytb
{
  "id": "",
  "name": "",
  "age": ""
}
create table `mytb` (`id` int auto_increment,`name` text,`age` int, primary key (`id`));

插入

PUT /mydb/mytb/1
{
  "name":"Moments",
  "age":"30"
}

PUT /mydb/mytb/2
{
  "name":"Ayu",
  "age":"39"
}
insert into `mytb` (`name`,`age`) values ('Moments','30');
insert into `mytb` (`name`, `age`) values ('Ayu','39');

更新

// 部分更新
POST /mydb/mytb/1/_update
{
  "doc": {
    "name": "Moments update"
  }
}

POST /mydb/mytb/2/_update
{
  "doc": {
    "name":"Ayu update"
  }
}
update mytb set name ='Moments update' where id=1;
update mytb set name="Ayu update" where id=2;

删除

DELETE /mydb/mytb/1
DELETE /mydb
delete from mytb where id=1;
drop table mytb;
drop database mydb;

查询

GET /mydb/mytb/_search
select * from `mytb`;

高级查询

分页

// ES
GET /mydb/mytb/_search
{
  "query": {
    "match_all": {}
  },
  "size": 3,
  "from": 3
}

// MySQL
select * from `mytb` limit 3 offset 3;

模糊查找,or

// ES
GET /mydb/mytb/_search
{
  "query": {
    "match": {
      "name": "update A"
    }
  },
  "sort": {
    "_id": "asc"
  },
  "size": 10
}

// MySQL
select * from `mytb` 
where `name` like '%update%' 
or `name` like '%A%' 
order by `id` 
limit 10;

模糊查找,and

// ES
GET /mydb/mytb/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "update"
          }
        },
        {
          "match": {
            "name": "Ayu"
          }
        }
      ]
    }
  },
  "sort": {
    "_id": "asc"
  },
  "size": 10
}

// MySQL
select * from `mytb`
where `name` like '%update%'
and `name` like '%Ayu%'
order by `id`
limit 10;

按月统计

POST /mydb/mytb/1/_update
{
  "doc": {
    "sold": "2018-06-19T01:01:01"
  }
}

POST /mydb/mytb/2/_update
{
  "doc": {
    "sold": "2018-05-01T01:01:01"
  }
}

PUT /mydb/mytb/3
{
  "name": "tmp",
  "age": "18",
  "sold": "2018-05-02T01:01:01"
}

GET /mydb/mytb/_search
{
  "size": 0,
  "aggs": {
    "月份": {
      "date_histogram": {
        "field": "sold",
        "interval": "month",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}
alter table mytb add sold DATETIME;
update `mytb` set `sold` = '2018-05-01' where id = 6;
select date_format(sold, '%Y-%m') as '月份', count(*) as num 
from mytb 
group by date_format(sold, '%Y-%m') 
order by num desc;

聚合

将数据进行填充

POST /cars/transactions/_bulk
{"index":{}}
{"price":10000,"color":"red","make":"honda","sold":"2014-10-28"}
{"index":{}}
{"price":20000,"color":"red","make":"honda","sold":"2014-11-05"}
{"index":{}}
{"price":30000,"color":"green","make":"ford","sold":"2014-05-18"}
{"index":{}}
{"price":15000,"color":"blue","make":"toyota","sold":"2014-07-02"}
{"index":{}}
{"price":12000,"color":"green","make":"toyota","sold":"2014-08-19"}
{"index":{}}
{"price":20000,"color":"red","make":"honda","sold":"2014-11-05"}
{"index":{}}
{"price":80000,"color":"red","make":"bmw","sold":"2014-01-01"}
{"index":{}}
{"price":25000,"color":"blue","make":"ford","sold":"2014-02-12"}

开启聚合操作

PUT /cars/_mapping/transactions
{
  "properties": {
    "color": {
      "type": "text",
      "fielddata": true
    }
  }
}

PUT /cars/_mapping/transactions
{
  "properties": {
    "make": {
      "type": "text",
      "fielddata": true
    }
  }
}

颜色销量统计

GET /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "颜色销量统计": {
      "terms": {
        "field": "color"
      }
    }
  }
}

颜色销量统计,平均价格

GET /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "颜色销量统计": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "平均价格": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}

颜色销量统计,平均价格,制造商统计

GET /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "颜色销量统计": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "平均价格": {
          "avg": {
            "field": "price"
          }
        },
        "制造商": {
          "terms": {
            "field": "make"
          }
        }
      }
    }
  }
}

颜色销量统计,平均价格,制造商统计,最低/高成本

// 有四辆红色车。
// 红色车的平均售价是 $32,500 美元。
// 其中三辆红色车是 Honda 本田制造,一辆是 BMW 宝马制造。
// 最便宜的红色本田售价为 $10,000 美元。
// 最贵的红色本田售价为 $20,000 美元。

GET /cars/transactions/_search
{
  "size": 0,
  "aggs": {
    "颜色销量统计": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "平均价格": {
          "avg": {
            "field": "price"
          }
        },
        "制造商": {
          "terms": {
            "field": "make"
          },
          "aggs": {
            "最便宜": {
              "min": {
                "field": "price"
              }
            },
            "最贵": {
              "max": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}

二次评分

对两种颜色的结果进行二次评分

二次评分计算公式:

original_query_score query_weight + rescore_query_score rescore_query_weight

// 原始评分值
original_query_score
// 先处理权重值,此结果为rescore_query_score
original_query_score * query_weight
// 二次评分的权重值
rescore_query_weight
  • window_size

窗口大小,默认值是from和size参数值之和,它指定了每个分片上参与二次评分的文档个数。

  • query_weight

查询权重,默认值是1,原始查询得分与二次评分的得分相加之前将乘以该值

  • rescore_query_weight

二次评分查询的权重值,默认等于1,二次评分查询的得分在与原始查询得分相加之前,将乘以该值。

  • score_mode

二次评分模式,默认为total,可用的选项有total、max、min、avg和mutiply。

total文档得分为原始查询得分与二次评分得分之和。 max文档得分为原始查询得分与二次评分得分中的最大值。 min文档得分为两次查询得分中的最小值。 avg文档得分为两次查询得分的平均值。 multiply文档得分为两次查询得分的乘积。

GET /cars/transactions/_search
{
  "query": {
    "match": {
      "color": {
        "operator": "or",
        "query": "red blue"
      }
    }
  },
  "rescore": [
    {
      "query": {
        "rescore_query": {
          "match": {
            "color": {
              "operator": "and",
              "query": "red"
            }
          }
        },
        "query_weight": 1,
        "rescore_query_weight": 100
      }
    },
    {
      "query": {
        "score_mode":"total",
        "rescore_query": {
          "match": {
            "color": {
              "operator": "and",
              "query": "blue"
            }
          }
        },
        "query_weight": 1,
        "rescore_query_weight": 10
      }
    }
  ]
}