数据库死锁又双叒叕发生了?这5个排查技巧让你3分钟搞定!

数据库死锁又双叒叕发生了?这5个排查技巧让你3分钟搞定!

大家好,今天来聊个让无数后端开发瑟瑟发抖的问题——数据库死锁

想象一下这个场景:周五下午快下班了,你正准备提交代码,突然收到线上告警:"订单系统出现大量超时,疑似数据库死锁!" 你的内心是不是瞬间就崩溃了?更要命的是,用户投诉电话已经打爆了客服热线...

别慌!作为一个和死锁斗争了多年的老后端,今天就给你一套"死锁排查5连招",让你下次遇到这种情况能淡定地说:"小意思,马上解决!"

一、死锁的4种"死法",你中招的是哪种?

首先搞清楚什么是死锁,以及死锁都是怎么发生的,知己知彼才能百战不殆。

1. 经典死锁 - 两个事务互相等待

症状:两个或多个事务互相持有对方需要的资源,形成环形等待:

-- 事务A
BEGIN;
UPDATE order SET status = 1 WHERE id = 100;  -- 锁住order表的id=100记录
UPDATE user SET balance = balance - 100 WHERE id = 50;  -- 等待user表的id=50记录

-- 事务B(同时执行)
BEGIN;
UPDATE user SET balance = balance + 100 WHERE id = 50;  -- 锁住user表的id=50记录
UPDATE order SET amount = 100 WHERE id = 100;  -- 等待order表的id=100记录

结果:两个事务互相等待,形成死锁,数据库会自动杀掉其中一个事务。

2. 索引死锁 - 不同索引顺序导致的死锁

症状:使用不同的索引顺序访问相同的数据:

-- 事务A:按主键索引访问
SELECT * FROM product WHERE id IN (1, 2, 3) FOR UPDATE;

-- 事务B:按普通索引访问  
SELECT * FROM product WHERE category_id = 10 FOR UPDATE;  -- 可能包含id=1,2,3的数据

常见场景

  • 批量更新时使用不同的排序
  • 复合索引的使用顺序不当
  • 唯一键约束冲突

3. 间隙锁死锁 - 范围查询引起的死锁

症状:范围查询时的间隙锁冲突:

-- 事务A
SELECT * FROM order WHERE create_time BETWEEN '2023-01-01' AND '2023-01-02' FOR UPDATE;

-- 事务B
INSERT INTO order (id, create_time, amount) VALUES (1001, '2023-01-01 12:00:00', 100);

常见场景

  • 分页查询加锁
  • 范围查询后插入数据
  • 唯一索引的间隙锁

4. 外键死锁 - 外键约束导致的死锁

症状:外键约束检查时的死锁:

-- 表结构
CREATE TABLE order_item (
  id INT PRIMARY KEY,
  order_id INT,
  FOREIGN KEY (order_id) REFERENCES orders(id)
);

-- 事务A
DELETE FROM orders WHERE id = 100;  -- 需要检查order_item表

-- 事务B
INSERT INTO order_item (order_id) VALUES (100);  -- 需要检查orders表

二、5个排查技巧:从现象到根因定位

第1步:快速确认是否发生死锁

首先要确认系统是否真的发生了死锁:

-- MySQL查看当前死锁信息
SHOW ENGINE INNODB STATUS;

-- 查看死锁历史(MySQL 8.0+)
SELECT * FROM performance_schema.data_lock_waits;

-- 查看当前锁等待情况
SELECT * FROM information_schema.INNODB_TRX;
SELECT * FROM information_schema.INNODB_LOCKS;
SELECT * FROM information_schema.INNODB_LOCK_WAITS;

关键信息解读

# 死锁检测结果示例
------------------------
LATEST DETECTED DEADLOCK
------------------------
2023-12-01 14:30:15 0x7f8b8c000700
*** (1) TRANSACTION:
TRANSACTION 421394138, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 123, OS thread handle 140241391580928, query id 1234567 localhost root updating
UPDATE order SET status = 1 WHERE id = 100

*** (2) TRANSACTION:
TRANSACTION 421394139, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1  
LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)
MySQL thread id 124, OS thread handle 140241391326464, query id 1234568 localhost root updating
UPDATE user SET balance = balance - 100 WHERE id = 50

*** WE ROLL BACK TRANSACTION (2)

第2步:分析死锁日志,定位具体SQL

从死锁日志中提取关键信息:

// Java代码中开启死锁监控
@Component
public class DeadlockMonitor {
    
    private static final Logger logger = LoggerFactory.getLogger(DeadlockMonitor.class);
    
    @Autowired
    private DataSource dataSource;
    
    @Scheduled(fixedRate = 30000) // 每30秒检查一次
    public void checkDeadlock() {
        try (Connection conn = dataSource.getConnection();
             Statement stmt = conn.createStatement()) {
            
            ResultSet rs = stmt.executeQuery("SHOW ENGINE INNODB STATUS");
            if (rs.next()) {
                String status = rs.getString("Status");
                if (status.contains("LATEST DETECTED DEADLOCK")) {
                    // 提取死锁信息
                    String deadlockInfo = extractDeadlockInfo(status);
                    logger.error("检测到数据库死锁: {}", deadlockInfo);
                    
                    // 发送告警
                    sendDeadlockAlert(deadlockInfo);
                }
            }
        } catch (SQLException e) {
            logger.error("检查死锁状态失败", e);
        }
    }
    
    private String extractDeadlockInfo(String status) {
        // 解析死锁信息,提取关键SQL语句和表信息
        Pattern pattern = Pattern.compile("LATEST DETECTED DEADLOCK.*?(?=\\-{10,}|$)", Pattern.DOTALL);
        Matcher matcher = pattern.matcher(status);
        return matcher.find() ? matcher.group(0) : "";
    }
}

第3步:代码层面排查死锁原因

检查应用代码中可能导致死锁的模式:

// 典型的死锁代码模式1:锁顺序不一致
public class BadDeadlockExample {
    
    // 方法A:先锁order后锁user
    @Transactional
    public void transferMoney1(Long orderId, Long userId, BigDecimal amount) {
        // 先更新订单
        Order order = orderMapper.selectById(orderId);
        order.setAmount(order.getAmount().add(amount));
        orderMapper.updateById(order);  // 锁住order表记录
        
        // 再更新用户余额
        User user = userMapper.selectById(userId);
        user.setBalance(user.getBalance().subtract(amount));
        userMapper.updateById(user);    // 锁住user表记录
    }
    
    // 方法B:先锁user后锁order  
    @Transactional
    public void transferMoney2(Long userId, Long orderId, BigDecimal amount) {
        // 先更新用户余额
        User user = userMapper.selectById(userId);
        user.setBalance(user.getBalance().add(amount));
        userMapper.updateById(user);    // 锁住user表记录
        
        // 再更新订单
        Order order = orderMapper.selectById(orderId);
        order.setAmount(order.getAmount().subtract(amount));
        orderMapper.updateById(order);  // 锁住order表记录
    }
}

// 正确的做法:保证锁顺序一致
public class GoodDeadlockExample {
    
    @Transactional
    public void transferMoney(Long orderId, Long userId, BigDecimal amount) {
        // 统一按ID大小顺序获取锁
        if (orderId.compareTo(userId) < 0) {
            lockOrderThenUser(orderId, userId, amount);
        } else {
            lockUserThenOrder(userId, orderId, amount);
        }
    }
    
    private void lockOrderThenUser(Long orderId, Long userId, BigDecimal amount) {
        Order order = orderMapper.selectById(orderId);
        order.setAmount(order.getAmount().add(amount));
        orderMapper.updateById(order);
        
        User user = userMapper.selectById(userId);
        user.setBalance(user.getBalance().subtract(amount));
        userMapper.updateById(user);
    }
    
    private void lockUserThenOrder(Long userId, Long orderId, BigDecimal amount) {
        User user = userMapper.selectById(userId);
        user.setBalance(user.getBalance().add(amount));
        userMapper.updateById(user);
        
        Order order = orderMapper.selectById(orderId);
        order.setAmount(order.getAmount().subtract(amount));
        orderMapper.updateById(order);
    }
}

第4步:数据库层面深入分析

使用数据库工具深入分析锁信息:

-- 查看当前正在执行的事务
SELECT 
    trx_id,
    trx_state,
    trx_started,
    trx_query,
    trx_tables_locked,
    trx_rows_locked,
    trx_rows_modified
FROM information_schema.INNODB_TRX
ORDER BY trx_started;

-- 查看锁等待关系
SELECT 
    waiting_trx_id,
    waiting_pid,
    waiting_query,
    blocking_trx_id,
    blocking_pid,
    blocking_query
FROM sys.innodb_lock_waits;

-- 查看表锁信息
SELECT 
    object_schema,
    object_name,
    lock_type,
    lock_mode,
    lock_status,
    thread_id
FROM performance_schema.data_locks
WHERE object_schema = 'your_database_name';

实战技巧

// 自定义死锁分析工具
@Service
public class DeadlockAnalyzer {
    
    public DeadlockAnalysisResult analyzeDeadlock() {
        DeadlockAnalysisResult result = new DeadlockAnalysisResult();
        
        // 1. 获取当前事务信息
        List<TransactionInfo> transactions = getCurrentTransactions();
        result.setTransactions(transactions);
        
        // 2. 获取锁等待信息
        List<LockWaitInfo> lockWaits = getLockWaits();
        result.setLockWaits(lockWaits);
        
        // 3. 分析死锁模式
        DeadlockPattern pattern = identifyDeadlockPattern(transactions, lockWaits);
        result.setPattern(pattern);
        
        // 4. 生成解决建议
        List<String> suggestions = generateSuggestions(pattern);
        result.setSuggestions(suggestions);
        
        return result;
    }
    
    private DeadlockPattern identifyDeadlockPattern(
            List<TransactionInfo> transactions, 
            List<LockWaitInfo> lockWaits) {
        
        // 分析死锁模式:
        // 1. 锁顺序死锁
        // 2. 索引死锁  
        // 3. 间隙锁死锁
        // 4. 外键死锁
        
        return DeadlockPattern.LOCK_ORDER_DEADLOCK;
    }
}

第5步:制定解决方案并验证

根据分析结果制定具体的解决方案:

// 死锁解决方案工具类
public class DeadlockSolutionUtils {
    
    // 方案1:统一锁顺序
    public static void acquireLocksInOrder(List<Long> resourceIds, 
                                         Function<Long, Void> lockFunction) {
        // 按ID排序,保证锁顺序一致
        resourceIds.stream()
                   .sorted()
                   .forEach(lockFunction);
    }
    
    // 方案2:重试机制
    @Retryable(value = {DeadlockLoserDataAccessException.class}, 
               maxAttempts = 3, 
               backoff = @Backoff(delay = 100))
    public void executeWithDeadlockRetry(Runnable operation) {
        operation.run();
    }
    
    // 方案3:锁超时设置
    public void executeWithLockTimeout(Runnable operation, int timeoutSeconds) {
        try (Connection conn = dataSource.getConnection()) {
            // 设置锁等待超时时间
            conn.createStatement().execute(
                "SET SESSION innodb_lock_wait_timeout = " + timeoutSeconds);
            operation.run();
        } catch (SQLException e) {
            throw new RuntimeException("执行失败", e);
        }
    }
    
    // 方案4:降低隔离级别(谨慎使用)
    @Transactional(isolation = Isolation.READ_COMMITTED)
    public void executeWithLowerIsolation(Runnable operation) {
        operation.run();
    }
}

三、实战案例:3个真实的死锁踩坑经历

案例1:电商平台订单支付死锁

背景:双11期间,订单支付接口出现大量死锁,导致用户无法完成支付。

死锁现象

-- 死锁SQL1:更新订单状态
UPDATE orders SET status = 'PAID', paid_time = NOW() WHERE id = 12345;

-- 死锁SQL2:更新用户余额  
UPDATE users SET balance = balance - 299.00 WHERE id = 67890;

排查过程

  1. 现象确认:支付接口响应时间从100ms飙升到30秒,大量超时
  2. 死锁日志分析:发现两个事务互相等待订单表和用户表的锁
  3. 代码审查:发现支付流程中,不同的支付方式获取锁的顺序不同

根因定位

// 问题代码:不同支付方式锁顺序不一致
public class PaymentService {
    
    // 余额支付:先锁用户,再锁订单
    public void payByBalance(Long orderId, Long userId) {
        User user = userMapper.selectByIdForUpdate(userId);    // 先锁用户
        Order order = orderMapper.selectByIdForUpdate(orderId); // 再锁订单
        // 业务逻辑...
    }
    
    // 积分支付:先锁订单,再锁用户
    public void payByPoints(Long orderId, Long userId) {
        Order order = orderMapper.selectByIdForUpdate(orderId); // 先锁订单  
        User user = userMapper.selectByIdForUpdate(userId);     // 再锁用户
        // 业务逻辑...
    }
}

解决方案

// 修复:统一锁顺序
public class PaymentService {
    
    public void payByBalance(Long orderId, Long userId) {
        doPayment(orderId, userId, PaymentType.BALANCE);
    }
    
    public void payByPoints(Long orderId, Long userId) {
        doPayment(orderId, userId, PaymentType.POINTS);
    }
    
    private void doPayment(Long orderId, Long userId, PaymentType type) {
        // 统一按ID大小顺序获取锁
        if (orderId.compareTo(userId) < 0) {
            Order order = orderMapper.selectByIdForUpdate(orderId);
            User user = userMapper.selectByIdForUpdate(userId);
            processPayment(order, user, type);
        } else {
            User user = userMapper.selectByIdForUpdate(userId);
            Order order = orderMapper.selectByIdForUpdate(orderId);
            processPayment(order, user, type);
        }
    }
}

效果:支付死锁率从5%降到0.01%,支付成功率提升至99.9%。

案例2:库存扣减批量死锁

背景:秒杀活动中,批量扣减库存时频繁出现死锁。

根因定位

// 问题代码:批量更新时顺序随机
public void batchReduceStock(List<Long> productIds, Map<Long, Integer> quantities) {
    for (Long productId : productIds) {  // List顺序可能随机
        Product product = productMapper.selectByIdForUpdate(productId);
        product.setStock(product.getStock() - quantities.get(productId));
        productMapper.updateById(product);
    }
}

解决方案

// 修复:排序后批量处理
public void batchReduceStock(List<Long> productIds, Map<Long, Integer> quantities) {
    // 按productId排序,保证锁顺序一致
    List<Long> sortedIds = productIds.stream()
                                   .sorted()
                                   .collect(Collectors.toList());
    
    for (Long productId : sortedIds) {
        Product product = productMapper.selectByIdForUpdate(productId);
        product.setStock(product.getStock() - quantities.get(productId));
        productMapper.updateById(product);
    }
}

案例3:外键约束死锁

背景:删除订单时,外键约束检查导致死锁。

解决方案

// 方案1:调整删除顺序
@Transactional
public void deleteOrder(Long orderId) {
    // 先删除子表数据
    orderItemMapper.deleteByOrderId(orderId);
    // 再删除主表数据
    orderMapper.deleteById(orderId);
}

// 方案2:使用逻辑删除
@Transactional  
public void deleteOrder(Long orderId) {
    // 标记删除,避免外键约束检查
    orderMapper.markAsDeleted(orderId);
    orderItemMapper.markAsDeletedByOrderId(orderId);
}

四、预防死锁的6个黄金法则

1. 统一资源访问顺序

// 最佳实践:资源排序工具类
public class ResourceLockUtils {
    
    public static <T extends Comparable<T>> void executeInOrder(
            List<T> resources, Function<T, Void> operation) {
        resources.stream()
                .sorted()
                .forEach(operation);
    }
    
    // 多表更新时的标准模式
    public void updateMultipleTables(Long id1, Long id2) {
        List<Long> ids = Arrays.asList(id1, id2);
        executeInOrder(ids, this::updateRecord);
    }
}

2. 缩短事务时间

// 避免长事务
@Transactional
public void badLongTransaction() {
    // 不要在事务中做这些:
    // 1. 调用外部接口
    // 2. 复杂计算  
    // 3. 文件IO操作
    // 4. 等待用户输入
}

// 正确做法:拆分事务
public void goodShortTransaction() {
    // 先做准备工作(事务外)
    PrepareResult result = prepareData();
    
    // 快速执行数据库操作
    executeInTransaction(result);
}

3. 合理设置锁超时

-- MySQL设置锁等待超时
SET SESSION innodb_lock_wait_timeout = 5;  -- 5秒超时

-- 应用中设置查询超时
@Query(value = "SELECT * FROM order WHERE id = ?", timeout = 3)
Order findByIdWithTimeout(Long id);

4. 使用合适的隔离级别

// 根据业务需求选择隔离级别
@Transactional(isolation = Isolation.READ_COMMITTED)  // 降低死锁概率
public void queryOperation() {
    // 只读操作,不需要REPEATABLE_READ
}

@Transactional(isolation = Isolation.REPEATABLE_READ) // 保证数据一致性
public void criticalOperation() {
    // 关键业务操作,需要更高隔离级别
}

5. 死锁重试机制

// Spring重试注解
@Retryable(
    value = {DeadlockLoserDataAccessException.class},
    maxAttempts = 3,
    backoff = @Backoff(delay = 100, multiplier = 2)
)
public void operationWithRetry() {
    // 可能发生死锁的操作
}

// 手动重试实现
public void executeWithRetry(Runnable operation, int maxRetries) {
    int attempts = 0;
    while (attempts < maxRetries) {
        try {
            operation.run();
            return; // 成功执行,退出
        } catch (DeadlockLoserDataAccessException e) {
            attempts++;
            if (attempts >= maxRetries) {
                throw e; // 重试次数用完,抛出异常
            }
            try {
                Thread.sleep(100 * attempts); // 退避等待
            } catch (InterruptedException ie) {
                Thread.currentThread().interrupt();
                throw new RuntimeException(ie);
            }
        }
    }
}

6. 监控和告警

// 死锁监控组件
@Component
public class DeadlockMonitoringService {
    
    private final MeterRegistry meterRegistry;
    private final AlertService alertService;
    
    @EventListener
    public void handleDeadlock(DeadlockEvent event) {
        // 记录死锁指标
        meterRegistry.counter("database.deadlock", 
                            "table", event.getTableName(),
                            "type", event.getDeadlockType())
                    .increment();
        
        // 死锁率超过阈值时告警
        double deadlockRate = calculateDeadlockRate();
        if (deadlockRate > 0.01) {  // 1%
            alertService.sendAlert("数据库死锁率过高: " + deadlockRate);
        }
    }
    
    @Scheduled(fixedRate = 60000) // 每分钟检查
    public void checkDeadlockMetrics() {
        // 定期检查死锁指标
    }
}

五、死锁排查工具箱

1. 数据库监控查询

-- MySQL死锁检测查询合集
-- 1. 查看死锁日志
SHOW ENGINE INNODB STATUS\G

-- 2. 当前事务信息
SELECT 
    trx_id,
    trx_state,
    trx_started,
    trx_isolation_level,
    trx_query
FROM information_schema.INNODB_TRX;

-- 3. 锁等待信息
SELECT 
    r.trx_id waiting_trx_id,
    r.trx_mysql_thread_id waiting_thread,
    r.trx_query waiting_query,
    b.trx_id blocking_trx_id,
    b.trx_mysql_thread_id blocking_thread,
    b.trx_query blocking_query
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;

-- 4. 查看表锁状态
SHOW OPEN TABLES WHERE In_use > 0;

2. Java应用监控代码

// 死锁检测和分析工具
@Service
public class DeadlockDetectionService {
    
    public DeadlockReport generateDeadlockReport() {
        DeadlockReport report = new DeadlockReport();
        
        // 1. 获取死锁历史
        report.setDeadlockHistory(getDeadlockHistory());
        
        // 2. 分析高频死锁表
        report.setHighFrequencyTables(analyzeHighFrequencyTables());
        
        // 3. 识别死锁模式
        report.setDeadlockPatterns(identifyDeadlockPatterns());
        
        // 4. 生成优化建议
        report.setOptimizationSuggestions(generateOptimizationSuggestions());
        
        return report;
    }
}

3. 死锁日志解析工具

// 死锁日志解析器
public class DeadlockLogParser {
    
    public DeadlockInfo parseDeadlockLog(String deadlockLog) {
        DeadlockInfo info = new DeadlockInfo();
        
        // 解析事务信息
        List<TransactionInfo> transactions = parseTransactions(deadlockLog);
        info.setTransactions(transactions);
        
        // 解析锁信息
        List<LockInfo> locks = parseLocks(deadlockLog);
        info.setLocks(locks);
        
        // 解析回滚信息
        String rolledBackTransaction = parseRolledBackTransaction(deadlockLog);
        info.setRolledBackTransaction(rolledBackTransaction);
        
        return info;
    }
    
    private List<TransactionInfo> parseTransactions(String log) {
        // 使用正则表达式解析事务信息
        Pattern pattern = Pattern.compile(
            "\\*\\*\\* \\((\\d+)\\) TRANSACTION:.*?TRANSACTION (\\d+).*?query id (\\d+).*?\\n(.+?)(?=\\*\\*\\*|$)", 
            Pattern.DOTALL);
        Matcher matcher = pattern.matcher(log);
        
        List<TransactionInfo> transactions = new ArrayList<>();
        while (matcher.find()) {
            TransactionInfo txn = new TransactionInfo();
            txn.setTransactionNumber(Integer.parseInt(matcher.group(1)));
            txn.setTransactionId(Long.parseLong(matcher.group(2)));
            txn.setQueryId(Long.parseLong(matcher.group(3)));
            txn.setQuery(matcher.group(4).trim());
            transactions.add(txn);
        }
        
        return transactions;
    }
}

六、总结

数据库死锁问题,说到底就是4句话:

  1. 预防为主:统一锁顺序,缩短事务时间
  2. 快速定位:善用工具,分析死锁日志
  3. 对症下药:针对不同死锁类型采用不同解决方案
  4. 监控到位:建立死锁监控,及时发现问题

记住老司机的口诀:"一检测二分析三修复四预防",下次遇到数据库死锁不慌!

最后提醒一句:死锁虽然可怕,但并不是无解的。关键是要理解死锁的本质,掌握正确的排查方法,建立完善的预防机制。希望这篇文章能帮助大家在面对死锁时更加从容!


关注我,不迷路,持续分享后端技术干货!
点赞、评论、转发,是我创作的最大动力!
公众号:服务端技术精选

声明:本文原创,转载请注明出处。


标题:数据库死锁又双叒叕发生了?这5个排查技巧让你3分钟搞定!
作者:jiangyi
地址:http://www.jiangyi.space/articles/2025/12/21/1766304291123.html

    0 评论
avatar