一、问题背景:生产环境的"定时炸弹"
凌晨 3 点,线上 Redis 突然出现大量请求超时,监控告警疯狂刷屏。排查发现:
- Redis 主线程 CPU 使用率飙升至 100%
- 大量命令堆积,响应时间超过 10 秒
- 应用服务出现雪崩,数据库连接池打满
根因分析:运维人员执行了一条 DEL big_hash_key 命令,该 Key 包含 500 万条字段,Redis 主线程需要遍历所有字段并逐一释放内存,导致阻塞长达 15 秒。
这就是 Redis 大 Key 删除的"阻塞陷阱"——看似简单的删除操作,背后隐藏着巨大的性能风险。
二、核心概念:DEL 与 UNLINK 的本质区别
2.1 DEL 命令的工作机制
DEL big_key 执行过程:
┌────────────────────────────────────────────────────────────────┐
│ Redis 主线程 │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 1. 查找 key 的内存结构 │ │
│ │ 2. 遍历 key 的所有元素(Hash/Set/ZSet/List) │ │
│ │ 3. 逐一释放每个元素占用的内存 │ │
│ │ 4. 更新内存统计信息 │ │
│ │ 5. 返回删除结果 │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ 阻塞主线程期间 │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ ❌ 其他客户端命令无法执行 │ │
│ │ ❌ 新连接被拒绝 │ │
│ │ ❌ 超时请求堆积 │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
2.2 UNLINK 命令的异步机制
UNLINK big_key 执行过程(Redis 4.0+):
┌────────────────────────────────────────────────────────────────┐
│ Redis 主线程 │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 1. 查找 key 的内存结构 │ │
│ │ 2. 判断元素数量是否超过阈值(64个) │ │
│ │ ├── 元素 <= 64: 同步删除,快速返回 │ │
│ │ └── 元素 > 64: 异步删除,立即返回 │ │
│ │ 3. 将内存释放任务提交给 BIO 后台线程 │ │
│ │ 4. 更新内存统计信息 │ │
│ │ 5. 返回删除结果(通常在毫秒级) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ ↓ │
│ 主线程立即恢复响应 │
│ ↓ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ ✅ 其他客户端命令正常执行 │ │
│ │ ✅ 新连接正常处理 │ │
│ │ ✅ 系统可用性不受影响 │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────┐
│ Redis BIO 后台线程 │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ 1. 从任务队列中取出删除任务 │ │
│ │ 2. 遍历 key 的所有元素 │ │
│ │ 3. 逐一释放内存(不影响主线程) │ │
│ │ 4. 清理完成后返回 │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
2.3 DEL 与 UNLINK 性能对比
| 特性 | DEL | UNLINK |
|---|---|---|
| 执行方式 | 同步阻塞 | 异步非阻塞 |
| 内存释放 | 主线程执行 | BIO 后台线程执行 |
| 大 Key 影响 | 严重阻塞主线程 | 不阻塞主线程 |
| 适用场景 | 小 Key 删除 | 大 Key 删除 |
| Redis 版本 | 所有版本 | 4.0+ |
| 返回值 | 删除的 key 数量 | 删除的 key 数量 |
三、实现方案:UNLINK 异步清理 + 分片扫描
3.1 方案架构设计
┌────────────────────────────────────────────────────────────────┐
│ 大 Key 删除架构 │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ 大 Key 检测 │───▶│ 删除策略选择 │ │
│ └──────────────────┘ └──────────────────┘ │
│ │ │
│ ┌─────────────────────────┼─────────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐│
│ │ UNLINK 直接 │ │ SCAN 分片 │ │ RENAME 延迟 ││
│ │ 删除(小 key) │ │ 删除(大 key) │ │ 删除(超大key)││
│ └───────────────┘ └───────────────┘ └───────────────┘│
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐│
│ │ 毫秒级返回 │ │ 渐进式删除 │ │ 异步任务执行 ││
│ └───────────────┘ └───────────────┘ └───────────────┘│
│ │
└────────────────────────────────────────────────────────────────┘
3.2 大 Key 检测工具
@Component
@Slf4j
public class BigKeyDetector {
@Autowired
private StringRedisTemplate redisTemplate;
/**
* 大 Key 判断阈值(可配置)
*/
private static final int BIG_KEY_THRESHOLD = 10000;
private static final int WARNING_KEY_THRESHOLD = 1000;
/**
* 检测单个 Key 是否为大 Key
*/
public KeyInfo detect(String key) {
try {
String type = redisTemplate.type(key);
if (type == null) {
return null;
}
KeyInfo keyInfo = KeyInfo.builder()
.key(key)
.type(RedisType.valueOf(type.toUpperCase()))
.build();
switch (keyInfo.getType()) {
case STRING:
Long size = redisTemplate.execute((RedisCallback<Long>)
connection -> connection.strLen(key.getBytes()));
keyInfo.setSize(size);
break;
case LIST:
keyInfo.setCount(redisTemplate.opsForList().size(key));
break;
case SET:
keyInfo.setCount(redisTemplate.opsForSet().size(key));
break;
case ZSET:
keyInfo.setCount(redisTemplate.opsForZSet().size(key));
break;
case HASH:
keyInfo.setCount(redisTemplate.opsForHash().size(key));
break;
default:
log.warn("Unsupported key type: {}", type);
}
keyInfo.setLevel(evaluateLevel(keyInfo));
return keyInfo;
} catch (Exception e) {
log.error("Failed to detect key: {}", key, e);
return null;
}
}
/**
* 评估大 Key 级别
*/
private KeyLevel evaluateLevel(KeyInfo keyInfo) {
if (keyInfo.getType() == RedisType.STRING) {
if (keyInfo.getSize() >= BIG_KEY_THRESHOLD) {
return KeyLevel.DANGER;
} else if (keyInfo.getSize() >= WARNING_KEY_THRESHOLD) {
return KeyLevel.WARNING;
}
} else {
if (keyInfo.getCount() >= BIG_KEY_THRESHOLD) {
return KeyLevel.DANGER;
} else if (keyInfo.getCount() >= WARNING_KEY_THRESHOLD) {
return KeyLevel.WARNING;
}
}
return KeyLevel.NORMAL;
}
/**
* 扫描所有大 Key
*/
public List<KeyInfo> scanAllBigKeys(String pattern, int count) {
List<KeyInfo> bigKeys = new ArrayList<>();
redisTemplate.execute((RedisCallback<Void>) connection -> {
try (Cursor<byte[]> cursor = connection.scan(
ScanOptions.scanOptions()
.match(pattern)
.count(count)
.build())) {
while (cursor.hasNext()) {
byte[] keyBytes = cursor.next();
String key = new String(keyBytes);
KeyInfo keyInfo = detect(key);
if (keyInfo != null && keyInfo.getLevel() != KeyLevel.NORMAL) {
bigKeys.add(keyInfo);
}
}
}
return null;
});
return bigKeys;
}
}
3.3 智能删除服务
@Service
@Slf4j
public class SmartDeleteService {
@Autowired
private StringRedisTemplate redisTemplate;
@Autowired
private BigKeyDetector bigKeyDetector;
/**
* 分片删除批次大小
*/
private static final int BATCH_SIZE = 100;
/**
* 智能删除入口
*/
public DeleteResult delete(String key) {
KeyInfo keyInfo = bigKeyDetector.detect(key);
if (keyInfo == null) {
return DeleteResult.builder()
.success(true)
.key(key)
.message("Key does not exist")
.build();
}
switch (keyInfo.getLevel()) {
case NORMAL:
return unlinkDelete(key);
case WARNING:
return scanDelete(key, keyInfo.getType());
case DANGER:
return renameAndDelete(key, keyInfo.getType());
default:
return unlinkDelete(key);
}
}
/**
* UNLINK 直接删除(适用于小 Key)
*/
private DeleteResult unlinkDelete(String key) {
try {
Boolean result = redisTemplate.execute((RedisCallback<Boolean>)
connection -> connection.unlink(key.getBytes()));
return DeleteResult.builder()
.success(result != null && result)
.key(key)
.strategy("UNLINK")
.message("Deleted via UNLINK")
.build();
} catch (Exception e) {
log.error("Failed to delete key via UNLINK: {}", key, e);
return DeleteResult.builder()
.success(false)
.key(key)
.message("UNLINK failed: " + e.getMessage())
.build();
}
}
/**
* SCAN 分片删除(适用于中等大小 Key)
*/
private DeleteResult scanDelete(String key, RedisType type) {
int totalDeleted = 0;
long startTime = System.currentTimeMillis();
try {
switch (type) {
case HASH:
totalDeleted = deleteBigHash(key);
break;
case SET:
totalDeleted = deleteBigSet(key);
break;
case ZSET:
totalDeleted = deleteBigZSet(key);
break;
case LIST:
totalDeleted = deleteBigList(key);
break;
default:
return unlinkDelete(key);
}
// 删除主 key
redisTemplate.delete(key);
long duration = System.currentTimeMillis() - startTime;
return DeleteResult.builder()
.success(true)
.key(key)
.strategy("SCAN_DELETE")
.deletedCount(totalDeleted)
.durationMs(duration)
.message("Deleted via SCAN, " + totalDeleted + " elements")
.build();
} catch (Exception e) {
log.error("Failed to delete key via SCAN: {}", key, e);
return DeleteResult.builder()
.success(false)
.key(key)
.strategy("SCAN_DELETE")
.deletedCount(totalDeleted)
.message("SCAN delete failed: " + e.getMessage())
.build();
}
}
/**
* 删除大 Hash(HSCAN + HDEL)
*/
private int deleteBigHash(String key) {
int totalDeleted = 0;
redisTemplate.execute((RedisCallback<Void>) connection -> {
try (Cursor<Map.Entry<byte[], byte[]>> cursor = connection
.hScan(key.getBytes(), ScanOptions.scanOptions()
.count(BATCH_SIZE)
.build())) {
List<byte[]> fieldsToDelete = new ArrayList<>();
while (cursor.hasNext()) {
Map.Entry<byte[], byte[]> entry = cursor.next();
fieldsToDelete.add(entry.getKey());
if (fieldsToDelete.size() >= BATCH_SIZE) {
connection.hDel(key.getBytes(), fieldsToDelete.toArray(new byte[0][]));
totalDeleted += fieldsToDelete.size();
fieldsToDelete.clear();
// 让出主线程时间片
Thread.yield();
}
}
if (!fieldsToDelete.isEmpty()) {
connection.hDel(key.getBytes(), fieldsToDelete.toArray(new byte[0][]));
totalDeleted += fieldsToDelete.size();
}
}
return null;
});
return totalDeleted;
}
/**
* 删除大 Set(SSCAN + SREM)
*/
private int deleteBigSet(String key) {
int totalDeleted = 0;
redisTemplate.execute((RedisCallback<Void>) connection -> {
try (Cursor<byte[]> cursor = connection
.sScan(key.getBytes(), ScanOptions.scanOptions()
.count(BATCH_SIZE)
.build())) {
List<byte[]> membersToDelete = new ArrayList<>();
while (cursor.hasNext()) {
membersToDelete.add(cursor.next());
if (membersToDelete.size() >= BATCH_SIZE) {
connection.sRem(key.getBytes(), membersToDelete.toArray(new byte[0][]));
totalDeleted += membersToDelete.size();
membersToDelete.clear();
Thread.yield();
}
}
if (!membersToDelete.isEmpty()) {
connection.sRem(key.getBytes(), membersToDelete.toArray(new byte[0][]));
totalDeleted += membersToDelete.size();
}
}
return null;
});
return totalDeleted;
}
/**
* 删除大 ZSet(ZSCAN + ZREM)
*/
private int deleteBigZSet(String key) {
int totalDeleted = 0;
redisTemplate.execute((RedisCallback<Void>) connection -> {
try (Cursor<ZSetOperations.TypedTuple<byte[]>> cursor = connection
.zScan(key.getBytes(), ScanOptions.scanOptions()
.count(BATCH_SIZE)
.build())) {
List<byte[]> membersToDelete = new ArrayList<>();
while (cursor.hasNext()) {
ZSetOperations.TypedTuple<byte[]> tuple = cursor.next();
membersToDelete.add(tuple.getValue());
if (membersToDelete.size() >= BATCH_SIZE) {
connection.zRem(key.getBytes(), membersToDelete.toArray(new byte[0][]));
totalDeleted += membersToDelete.size();
membersToDelete.clear();
Thread.yield();
}
}
if (!membersToDelete.isEmpty()) {
connection.zRem(key.getBytes(), membersToDelete.toArray(new byte[0][]));
totalDeleted += membersToDelete.size();
}
}
return null;
});
return totalDeleted;
}
/**
* 删除大 List(LPOP/RPOP)
*/
private int deleteBigList(String key) {
int totalDeleted = 0;
while (true) {
List<Object> result = redisTemplate.execute((RedisCallback<List<Object>>)
connection -> connection.bLPop(1, key.getBytes()));
if (result == null || result.isEmpty()) {
break;
}
totalDeleted++;
if (totalDeleted % BATCH_SIZE == 0) {
Thread.yield();
}
}
return totalDeleted;
}
/**
* RENAME + 异步删除(适用于危险级别大 Key)
*/
private DeleteResult renameAndDelete(String key, RedisType type) {
String tempKey = "del_queue:" + System.currentTimeMillis() + ":" + key;
// RENAME 原子操作,几乎不阻塞
Boolean renamed = redisTemplate.renameIfAbsent(key, tempKey);
if (!renamed) {
return DeleteResult.builder()
.success(false)
.key(key)
.message("Failed to rename key")
.build();
}
// 异步执行扫描删除
CompletableFuture.runAsync(() -> {
try {
scanDelete(tempKey, type);
log.info("Async delete completed for: {}", tempKey);
} catch (Exception e) {
log.error("Async delete failed for: {}", tempKey, e);
}
});
return DeleteResult.builder()
.success(true)
.key(key)
.strategy("RENAME_ASYNC")
.message("Key renamed to " + tempKey + ", async delete scheduled")
.build();
}
}
3.4 删除结果与状态模型
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class KeyInfo {
private String key;
private RedisType type;
private Long count;
private Long size;
private KeyLevel level;
public enum RedisType {
STRING, LIST, SET, ZSET, HASH
}
public enum KeyLevel {
NORMAL, WARNING, DANGER
}
}
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public class DeleteResult {
private boolean success;
private String key;
private String strategy;
private Integer deletedCount;
private Long durationMs;
private String message;
}
3.5 删除任务管理
@Component
@Slf4j
public class DeleteTaskManager {
@Autowired
private SmartDeleteService smartDeleteService;
/**
* 删除任务队列(用于监控和管理)
*/
private final ConcurrentHashMap<String, DeleteTask> taskMap = new ConcurrentHashMap<>();
private final AtomicLong taskIdGenerator = new AtomicLong(0);
/**
* 创建删除任务
*/
public DeleteTask createTask(String key) {
long taskId = taskIdGenerator.incrementAndGet();
DeleteTask task = DeleteTask.builder()
.taskId(taskId)
.key(key)
.status(TaskStatus.PENDING)
.createTime(LocalDateTime.now())
.build();
taskMap.put(key, task);
return task;
}
/**
* 执行删除任务
*/
public DeleteResult executeTask(String key) {
DeleteTask task = taskMap.get(key);
if (task == null) {
task = createTask(key);
}
task.setStatus(TaskStatus.RUNNING);
task.setStartTime(LocalDateTime.now());
try {
DeleteResult result = smartDeleteService.delete(key);
task.setStatus(result.isSuccess() ? TaskStatus.COMPLETED : TaskStatus.FAILED);
task.setEndTime(LocalDateTime.now());
task.setDeletedCount(result.getDeletedCount());
task.setStrategy(result.getStrategy());
return result;
} catch (Exception e) {
task.setStatus(TaskStatus.FAILED);
task.setEndTime(LocalDateTime.now());
task.setErrorMessage(e.getMessage());
throw e;
}
}
/**
* 获取任务状态
*/
public DeleteTask getTaskStatus(String key) {
return taskMap.get(key);
}
/**
* 获取所有任务列表
*/
public List<DeleteTask> getAllTasks() {
return new ArrayList<>(taskMap.values());
}
/**
* 清理已完成的任务(保留最近24小时)
*/
public void cleanCompletedTasks() {
LocalDateTime threshold = LocalDateTime.now().minusHours(24);
taskMap.entrySet().removeIf(entry -> {
DeleteTask task = entry.getValue();
return (task.getStatus() == TaskStatus.COMPLETED
|| task.getStatus() == TaskStatus.FAILED)
&& task.getEndTime().isBefore(threshold);
});
}
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
public static class DeleteTask {
private Long taskId;
private String key;
private TaskStatus status;
private String strategy;
private Integer deletedCount;
private LocalDateTime createTime;
private LocalDateTime startTime;
private LocalDateTime endTime;
private String errorMessage;
}
public enum TaskStatus {
PENDING, RUNNING, COMPLETED, FAILED
}
}
3.6 REST API 控制器
@RestController
@RequestMapping("/api/redis")
@Slf4j
public class RedisController {
@Autowired
private BigKeyDetector bigKeyDetector;
@Autowired
private SmartDeleteService smartDeleteService;
@Autowired
private DeleteTaskManager taskManager;
/**
* 检测单个 Key
*/
@GetMapping("/key/{key}")
public ResponseEntity<KeyInfo> detectKey(@PathVariable String key) {
KeyInfo keyInfo = bigKeyDetector.detect(key);
if (keyInfo == null) {
return ResponseEntity.notFound().build();
}
return ResponseEntity.ok(keyInfo);
}
/**
* 扫描所有大 Key
*/
@GetMapping("/big-keys")
public ResponseEntity<List<KeyInfo>> scanBigKeys(
@RequestParam(defaultValue = "*") String pattern,
@RequestParam(defaultValue = "1000") int count) {
List<KeyInfo> bigKeys = bigKeyDetector.scanAllBigKeys(pattern, count);
return ResponseEntity.ok(bigKeys);
}
/**
* 智能删除 Key
*/
@DeleteMapping("/key/{key}")
public ResponseEntity<DeleteResult> deleteKey(@PathVariable String key) {
DeleteResult result = taskManager.executeTask(key);
if (result.isSuccess()) {
return ResponseEntity.ok(result);
} else {
return ResponseEntity.internalServerError().body(result);
}
}
/**
* 批量删除 Key
*/
@DeleteMapping("/keys/batch")
public ResponseEntity<List<DeleteResult>> batchDeleteKeys(@RequestBody List<String> keys) {
List<DeleteResult> results = new ArrayList<>();
for (String key : keys) {
DeleteResult result = taskManager.executeTask(key);
results.add(result);
}
return ResponseEntity.ok(results);
}
/**
* 获取删除任务状态
*/
@GetMapping("/task/{key}")
public ResponseEntity<DeleteTaskManager.DeleteTask> getTaskStatus(@PathVariable String key) {
DeleteTaskManager.DeleteTask task = taskManager.getTaskStatus(key);
if (task == null) {
return ResponseEntity.notFound().build();
}
return ResponseEntity.ok(task);
}
/**
* 获取所有删除任务
*/
@GetMapping("/tasks")
public ResponseEntity<List<DeleteTaskManager.DeleteTask>> getAllTasks() {
List<DeleteTaskManager.DeleteTask> tasks = taskManager.getAllTasks();
return ResponseEntity.ok(tasks);
}
}
四、配置文件示例
server:
port: 8080
spring:
application:
name: redis-bigkey-unlink-demo
data:
redis:
host: localhost
port: 6379
timeout: 6000ms
lettuce:
pool:
max-active: 8
max-idle: 8
min-idle: 2
# 大 Key 配置
bigkey:
detector:
threshold: 10000
warning-threshold: 1000
delete:
batch-size: 100
async-enabled: true
logging:
level:
com.example.redis: DEBUG
五、监控与告警
5.1 监控指标
@Component
public class BigKeyMetrics {
private final MeterRegistry meterRegistry;
public BigKeyMetrics(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
registerMetrics();
}
private void registerMetrics() {
// 大 Key 检测计数
Counter.builder("bigkey.detected")
.tag("level", "danger")
.register(meterRegistry);
Counter.builder("bigkey.detected")
.tag("level", "warning")
.register(meterRegistry);
// 删除操作计数
Counter.builder("bigkey.deleted")
.tag("strategy", "unlink")
.register(meterRegistry);
Counter.builder("bigkey.deleted")
.tag("strategy", "scan")
.register(meterRegistry);
Counter.builder("bigkey.deleted")
.tag("strategy", "rename_async")
.register(meterRegistry);
// 删除耗时
Timer.builder("bigkey.delete.duration")
.register(meterRegistry);
// 删除元素数量
DistributionSummary.builder("bigkey.delete.elements")
.register(meterRegistry);
}
}
5.2 Prometheus 告警规则
groups:
- name: bigkey_alerts
rules:
- alert: BigKeyDetected
expr: bigkey_detected_total{level="danger"} > 0
for: 1m
labels:
severity: critical
annotations:
summary: "检测到危险级大 Key"
description: "存在超过阈值的大 Key,请及时处理"
- alert: BigKeyDeleteSlow
expr: bigkey_delete_duration_seconds > 10
for: 1m
labels:
severity: warning
annotations:
summary: "大 Key 删除耗时过长"
description: "大 Key 删除操作耗时超过10秒"
- alert: BigKeyDeleteFailed
expr: rate(bigkey_delete_duration_seconds_count[5m]) - rate(bigkey_deleted_total[5m]) > 0
for: 1m
labels:
severity: error
annotations:
summary: "大 Key 删除失败"
description: "大 Key 删除操作出现失败"
六、最佳实践建议
6.1 删除策略选择指南
| Key 类型 | 元素数量 | 推荐策略 | 说明 |
|---|---|---|---|
| 任意 | < 1000 | UNLINK | 直接删除,性能影响可忽略 |
| Hash/Set/ZSet | 1000-10000 | SCAN + 批量删除 | 分片渐进删除,避免阻塞 |
| Hash/Set/ZSet | > 10000 | RENAME + 异步删除 | 原子重命名,后台异步清理 |
| List | > 10000 | LPOP/RPOP 循环 | 渐进式出队删除 |
| String | > 1MB | UNLINK | UNLINK 对大 String 同样有效 |
6.2 大 Key 预防措施
- 键设计规范:避免使用单个 Key 存储大量数据
- 数据分片:对大集合进行分片存储
- 定期清理:建立过期策略,自动清理过期数据
- 监控告警:设置大 Key 检测告警,及时发现问题
- 删除流程:使用自动化工具删除,避免手动操作
6.3 注意事项
- Redis 版本要求:UNLINK 命令需要 Redis 4.0+
- BIO 线程限制:Redis 4.0 只有一个 BIO 线程,大量大 Key 删除可能导致后台清理堆积
- 内存释放时机:UNLINK 只是将释放任务交给后台线程,实际内存回收还需等待
- 监控后台线程:通过
INFO stats查看pending_key_deletes指标 - 重命名风险:RENAME 方案需要确保目标 Key 不存在
互动话题
您在生产环境中遇到过大 Key 删除导致的性能问题吗?您是如何解决的?欢迎在评论区分享您的经验!
