Elasticsearch实现MySQL like%XX%实战:让你的模糊查询快如闪电!
Elasticsearch实现MySQL like%XX%实战:让你的模糊查询快如闪电!
作为一名后端开发,你有没有遇到过这样的场景:产品经理跑过来说:"我们要在搜索框里支持模糊查询,就像MySQL的like '%关键词%'一样!"你心里想:"这不就是个简单的模糊查询吗?"但当你在MySQL里执行like '%关键词%'时,DBA却告诉你:"千万别这么用,性能会爆炸的!"
今天就来聊聊如何用Elasticsearch实现MySQL like%XX%的功能,让你的模糊查询快如闪电!
一、MySQL like查询的性能问题
在开始讨论解决方案之前,我们先来看看MySQL like查询的性能问题。
1.1 like查询的性能瓶颈
-- 前缀匹配 - 性能较好(可以使用索引)
SELECT * FROM products WHERE name LIKE '手机%';
-- 后缀匹配 - 性能较差(无法使用索引)
SELECT * FROM products WHERE name LIKE '%手机';
-- 包含匹配 - 性能最差(无法使用索引)
SELECT * FROM products WHERE name LIKE '%手机%';
当使用%关键词%这种包含匹配时,MySQL无法使用索引,只能进行全表扫描,性能极其低下。
1.2 性能测试对比
假设我们有一个包含100万条记录的商品表:
-- 创建测试表
CREATE TABLE products (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10,2),
category VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_name (name),
INDEX idx_category (category)
);
-- 插入测试数据
DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 1000000 DO
INSERT INTO products (name, description, price, category)
VALUES (
CONCAT('商品', i, '手机'),
CONCAT('这是商品', i, '的详细描述,包含各种手机相关的信息'),
ROUND(RAND() * 10000, 2),
CASE ROUND(RAND() * 3)
WHEN 0 THEN '电子产品'
WHEN 1 THEN '手机数码'
WHEN 2 THEN '家用电器'
ELSE '其他'
END
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
执行模糊查询的性能对比:
-- 前缀匹配查询(可以使用索引)
SELECT * FROM products WHERE name LIKE '商品123%' LIMIT 10;
-- 执行时间:约0.01秒
-- 包含匹配查询(无法使用索引)
SELECT * FROM products WHERE name LIKE '%123%' LIMIT 10;
-- 执行时间:约2-3秒
二、Elasticsearch的全文搜索优势
Elasticsearch作为专业的搜索引擎,在全文搜索方面有着天然的优势。
2.1 Elasticsearch的核心特性
- 倒排索引:为每个词建立索引,支持快速查找
- 分词器:支持多种语言的分词处理
- 分布式架构:支持水平扩展,处理海量数据
- 近实时搜索:数据写入后几乎立即可搜索
2.2 Elasticsearch与MySQL的对比
| 特性 | MySQL | Elasticsearch |
|---|---|---|
| 索引类型 | B+树索引 | 倒排索引 |
| 模糊查询 | 全表扫描 | 倒排索引查找 |
| 性能 | O(n) | O(1) |
| 分词支持 | 有限 | 丰富 |
| 分布式 | 需要额外配置 | 原生支持 |
| 实时性 | 较差 | 近实时 |
三、Elasticsearch实现模糊查询
3.1 环境准备
首先,我们需要安装和配置Elasticsearch:
# 下载并安装Elasticsearch
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz
tar -xzf elasticsearch-8.11.0-linux-x86_64.tar.gz
cd elasticsearch-8.11.0
# 启动Elasticsearch
./bin/elasticsearch
3.2 创建索引和映射
# 创建商品索引
curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"ik_smart_pinyin": {
"type": "custom",
"tokenizer": "ik_smart",
"filter": ["pinyin_filter"]
}
},
"filter": {
"pinyin_filter": {
"type": "pinyin",
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"lowercase": true
}
}
}
},
"mappings": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"analyzer": "ik_smart_pinyin",
"search_analyzer": "ik_smart_pinyin"
},
"description": {
"type": "text",
"analyzer": "ik_smart_pinyin",
"search_analyzer": "ik_smart_pinyin"
},
"price": {
"type": "scaled_float",
"scaling_factor": 100
},
"category": {
"type": "keyword"
},
"created_at": {
"type": "date"
}
}
}
}'
3.3 数据同步
我们需要将MySQL中的数据同步到Elasticsearch:
@Service
public class ProductService {
@Autowired
private ProductRepository productRepository;
@Autowired
private ElasticsearchRestTemplate elasticsearchTemplate;
/**
* 同步MySQL数据到Elasticsearch
*/
public void syncDataToElasticsearch() {
// 删除现有索引
elasticsearchTemplate.deleteIndex(ProductDocument.class);
elasticsearchTemplate.createIndex(ProductDocument.class);
elasticsearchTemplate.putMapping(ProductDocument.class);
// 分批同步数据
int pageSize = 1000;
int page = 0;
List<Product> products;
do {
Pageable pageable = PageRequest.of(page, pageSize);
products = productRepository.findAll(pageable).getContent();
// 转换为Elasticsearch文档
List<ProductDocument> documents = products.stream()
.map(this::convertToDocument)
.collect(Collectors.toList());
// 批量保存到Elasticsearch
if (!documents.isEmpty()) {
elasticsearchTemplate.save(documents);
}
page++;
} while (products.size() == pageSize);
}
/**
* 转换Product实体为ProductDocument
*/
private ProductDocument convertToDocument(Product product) {
ProductDocument document = new ProductDocument();
document.setId(product.getId());
document.setName(product.getName());
document.setDescription(product.getDescription());
document.setPrice(product.getPrice());
document.setCategory(product.getCategory());
document.setCreatedAt(product.getCreatedAt());
return document;
}
}
3.4 Elasticsearch文档实体
@Document(indexName = "products")
@Data
@NoArgsConstructor
@AllArgsConstructor
public class ProductDocument {
@Id
private Long id;
@Field(type = FieldType.Text, analyzer = "ik_smart_pinyin")
private String name;
@Field(type = FieldType.Text, analyzer = "ik_smart_pinyin")
private String description;
@Field(type = FieldType.Scaled_Float, scalingFactor = 100)
private BigDecimal price;
@Field(type = FieldType.Keyword)
private String category;
@Field(type = FieldType.Date)
private Date createdAt;
}
四、实现各种模糊查询
4.1 简单模糊查询
@Service
public class ProductSearchService {
@Autowired
private ElasticsearchRestTemplate elasticsearchTemplate;
/**
* 简单模糊查询 - 类似MySQL的LIKE '%关键词%'
*/
public List<ProductDocument> searchProducts(String keyword) {
// 构建查询条件
Query query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.multiMatchQuery(keyword, "name", "description")
.type(MultiMatchQueryBuilder.Type.MOST_FIELDS))
.withPageable(PageRequest.of(0, 20))
.build();
SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
return searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList());
}
}
4.2 高亮显示
/**
* 带高亮显示的搜索
*/
public SearchResponse searchProductsWithHighlight(String keyword) {
// 构建查询条件
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.multiMatchQuery(keyword, "name", "description")
.type(MultiMatchQueryBuilder.Type.MOST_FIELDS))
.withHighlightFields(
new HighlightBuilder.Field("name").preTags("<em>").postTags("</em>"),
new HighlightBuilder.Field("description").preTags("<em>").postTags("</em>")
)
.withPageable(PageRequest.of(0, 20))
.build();
SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
// 处理高亮结果
List<HighlightedProduct> results = searchHits.stream().map(hit -> {
HighlightedProduct product = new HighlightedProduct();
product.setProduct(hit.getContent());
// 获取高亮字段
Map<String, List<String>> highlightFields = hit.getHighlightFields();
product.setHighlightedName(highlightFields.getOrDefault("name", Collections.emptyList()));
product.setHighlightedDescription(highlightFields.getOrDefault("description", Collections.emptyList()));
return product;
}).collect(Collectors.toList());
return new SearchResponse(results, searchHits.getTotalHits());
}
4.3 前缀匹配
/**
* 前缀匹配查询 - 类似MySQL的LIKE '关键词%'
*/
public List<ProductDocument> searchProductsByPrefix(String prefix) {
Query query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.multiMatchQuery(prefix, "name", "description")
.type(MultiMatchQueryBuilder.Type.PHRASE_PREFIX))
.withPageable(PageRequest.of(0, 20))
.build();
SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
return searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList());
}
4.4 通配符查询
/**
* 通配符查询 - 支持*和?通配符
*/
public List<ProductDocument> searchProductsWithWildcard(String pattern) {
Query query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.wildcardQuery("name", "*" + pattern + "*"))
.withPageable(PageRequest.of(0, 20))
.build();
SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
return searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList());
}
4.5 正则表达式查询
/**
* 正则表达式查询
*/
public List<ProductDocument> searchProductsWithRegex(String regex) {
Query query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.regexpQuery("name", regex))
.withPageable(PageRequest.of(0, 20))
.build();
SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
return searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList());
}
五、性能优化技巧
5.1 分词器优化
# 使用更合适的分词器
curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"stop",
"snowball"
]
}
}
}
}
}'
5.2 索引优化
# 调整索引设置以优化性能
curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d'
{
"index": {
"refresh_interval": "30s",
"number_of_replicas": 1,
"translog.durability": "async",
"translog.sync_interval": "30s"
}
}'
5.3 查询优化
/**
* 优化的搜索方法
*/
public SearchResponse optimizedSearch(String keyword, int page, int size) {
// 使用bool查询组合多个条件
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("name", keyword).boost(2.0f))
.should(QueryBuilders.matchQuery("description", keyword))
.should(QueryBuilders.wildcardQuery("name", "*" + keyword + "*").boost(0.5f))
.minimumShouldMatch(1);
// 添加过滤条件提高性能
BoolQueryBuilder filteredQuery = QueryBuilders.boolQuery()
.must(boolQuery)
.filter(QueryBuilders.rangeQuery("price").gte(0))
.filter(QueryBuilders.existsQuery("name"));
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(filteredQuery)
.withPageable(PageRequest.of(page, size))
.withSort(SortBuilders.scoreSort().order(SortOrder.DESC))
.withSort(SortBuilders.fieldSort("price").order(SortOrder.ASC))
.build();
SearchHits<ProductDocument> searchHits = elasticsearchTemplate.search(query, ProductDocument.class);
return new SearchResponse(
searchHits.stream().map(SearchHit::getContent).collect(Collectors.toList()),
searchHits.getTotalHits()
);
}
六、实战案例
6.1 电商商品搜索系统
@RestController
@RequestMapping("/api/search")
public class ProductSearchController {
@Autowired
private ProductSearchService searchService;
/**
* 商品搜索接口
*/
@GetMapping("/products")
public ResponseEntity<SearchResponse> searchProducts(
@RequestParam String q,
@RequestParam(defaultValue = "0") int page,
@RequestParam(defaultValue = "20") int size) {
try {
SearchResponse response = searchService.optimizedSearch(q, page, size);
return ResponseEntity.ok(response);
} catch (Exception e) {
return ResponseEntity.status(500).body(new SearchResponse("搜索失败: " + e.getMessage()));
}
}
/**
* 自动补全接口
*/
@GetMapping("/autocomplete")
public ResponseEntity<List<String>> autocomplete(@RequestParam String q) {
List<String> suggestions = searchService.getAutocompleteSuggestions(q);
return ResponseEntity.ok(suggestions);
}
}
6.2 数据同步策略
@Component
public class DataSyncService {
@Autowired
private ProductRepository productRepository;
@Autowired
private ElasticsearchRestTemplate elasticsearchTemplate;
/**
* 增量数据同步
*/
@Scheduled(fixedRate = 300000) // 每5分钟执行一次
public void syncIncrementalData() {
// 获取最近5分钟内更新的数据
Date fiveMinutesAgo = new Date(System.currentTimeMillis() - 300000);
List<Product> recentProducts = productRepository.findByUpdatedAtAfter(fiveMinutesAgo);
// 同步到Elasticsearch
List<ProductDocument> documents = recentProducts.stream()
.map(this::convertToDocument)
.collect(Collectors.toList());
elasticsearchTemplate.save(documents);
}
/**
* 全量数据同步
*/
public void fullSync() {
elasticsearchTemplate.deleteIndex(ProductDocument.class);
elasticsearchTemplate.createIndex(ProductDocument.class);
elasticsearchTemplate.putMapping(ProductDocument.class);
// 调用之前实现的全量同步方法
productService.syncDataToElasticsearch();
}
}
七、性能对比测试
让我们通过实际测试来看看Elasticsearch和MySQL的性能差异:
@SpringBootTest
public class SearchPerformanceTest {
@Autowired
private ProductRepository productRepository;
@Autowired
private ProductSearchService searchService;
@Test
public void testSearchPerformance() {
String keyword = "123";
// 测试MySQL LIKE查询
long startTime = System.currentTimeMillis();
List<Product> mysqlResults = productRepository.findByNameContaining(keyword);
long mysqlTime = System.currentTimeMillis() - startTime;
// 测试Elasticsearch查询
startTime = System.currentTimeMillis();
List<ProductDocument> esResults = searchService.searchProducts(keyword);
long esTime = System.currentTimeMillis() - startTime;
System.out.println("MySQL查询时间: " + mysqlTime + "ms, 结果数: " + mysqlResults.size());
System.out.println("ES查询时间: " + esTime + "ms, 结果数: " + esResults.size());
System.out.println("性能提升: " + (mysqlTime * 1.0 / esTime) + "倍");
}
}
测试结果通常显示:
- MySQL LIKE '%关键词%'查询:2000ms+
- Elasticsearch查询:50ms以下
- 性能提升:40倍以上
八、总结
通过今天的分析,我们了解到:
- MySQL LIKE查询的性能问题:使用
%关键词%模式会导致全表扫描,性能极差 - Elasticsearch的优势:基于倒排索引的全文搜索引擎,支持高效的模糊查询
- 实现方案:
- 简单模糊查询(multi_match)
- 高亮显示
- 前缀匹配
- 通配符查询
- 正则表达式查询
- 性能优化技巧:分词器优化、索引设置、查询优化
- 实战应用:电商搜索系统、数据同步策略
掌握了这些技巧,相信你在面对模糊查询需求时会更加从容不迫,让你的搜索功能快如闪电!
今日思考:你们项目中是如何处理模糊查询需求的?有没有遇到过性能问题?欢迎在评论区分享你的经验!
如果你觉得这篇文章对你有帮助,欢迎分享给更多的朋友。关注"服务端技术精选",获取更多技术干货!
标题:Elasticsearch实现MySQL like%XX%实战:让你的模糊查询快如闪电!
作者:jiangyi
地址:http://www.jiangyi.space/articles/2025/12/21/1766304280901.html
- 一、MySQL like查询的性能问题
- 1.1 like查询的性能瓶颈
- 1.2 性能测试对比
- 二、Elasticsearch的全文搜索优势
- 2.1 Elasticsearch的核心特性
- 2.2 Elasticsearch与MySQL的对比
- 三、Elasticsearch实现模糊查询
- 3.1 环境准备
- 3.2 创建索引和映射
- 3.3 数据同步
- 3.4 Elasticsearch文档实体
- 四、实现各种模糊查询
- 4.1 简单模糊查询
- 4.2 高亮显示
- 4.3 前缀匹配
- 4.4 通配符查询
- 4.5 正则表达式查询
- 五、性能优化技巧
- 5.1 分词器优化
- 5.2 索引优化
- 5.3 查询优化
- 六、实战案例
- 6.1 电商商品搜索系统
- 6.2 数据同步策略
- 七、性能对比测试
- 八、总结
0 评论