SpringBoot + 连接池获取超时排查:HikariCP 获取连接超时?自动 dump 线程栈定位

一、连接池获取超时的痛点

上周,一位做支付系统的朋友吐槽:他们的系统在高峰期经常出现接口超时的问题,日志里大量报错 "Connection is not available, request timed out after 30000ms"。

"我们已经把连接池最大数调到了 50,"他说,"但还是不够用,一到高峰期就出问题。"

我查看了他们的系统,发现问题确实很严重:

  • 系统在高峰期 TPS 约 2000
  • 数据库连接池最大 50 个
  • 平均接口耗时 150ms
  • 理论上 50 个连接每秒能处理约 333 个请求,但实际高峰需要 2000 TPS
  • 每次接口需要访问数据库 3-5 次

更关键的是,他们根本不知道连接都去哪了,是哪些SQL慢查询占用了连接,还是存在连接泄漏?

二、传统排查方案的局限性

1. 手动查看日志

通过日志查看哪些接口慢、哪些SQL执行时间长。

这种方案的问题:

  • 滞后性:问题已经发生,只能事后分析
  • 信息不全:只知道某个接口慢,不知道当时系统整体状态
  • 无法定位根因:看到的是表象,不是根因
  • 耗时耗力:需要大量时间分析日志

2. 手动dump线程

通过 jstack 或 Arthas 手动dump线程栈,分析哪些线程在等待连接。

这种方案的问题:

  • 时机难把握:问题转瞬即逝,手动dump可能抓不到关键时刻
  • 影响性能:dump操作本身会消耗系统资源
  • 信息碎片化:需要多次dump才能还原问题全貌
  • 专业门槛高:需要熟悉线程分析才能看懂

3. 调整连接池参数

通过增加最大连接数、减少超时时间等方式缓解问题。

这种方案的问题:

  • 治标不治本:只是推迟了问题,没有解决根本原因
  • 资源浪费:连接数过多会消耗过多数据库资源
  • 增加复杂度:参数繁多,需要丰富的经验才能调好
  • 可能引发新问题:参数调整不当可能引发其他问题

三、终极方案:HikariCP 连接池超时自动 dump 机制

今天,我要和大家分享一个在实战中验证过的解决方案:HikariCP 连接池获取超时自动 dump 机制

这套方案的核心思想是:

  1. 自动监控:实时监控连接池状态,及时发现异常
  2. 自动 dump:当连接获取超时时,自动 dump 线程栈
  3. 智能分析:自动分析线程栈,定位阻塞原因
  4. 多维告警:通过多渠道发送告警,确保及时通知

四、方案详解

1. 核心原理

HikariCP 连接池获取连接的流程如下:

线程请求连接
    ↓
检查连接池是否有空闲连接
    ↓
有 → 直接返回
无 → 等待空闲连接(可配置最大等待时间)
    ↓
等待超时 → 抛出异常
等待成功 → 返回连接

当连接获取超时时,我们需要知道:

  • 当前连接池的状态(活跃数、空闲数、等待线程数)
  • 哪些线程在占用连接
  • 哪些线程在等待连接
  • 占用的连接在执行什么SQL

2. SpringBoot实现

(1)HikariCP监控配置类

@Configuration
@ConfigurationProperties(prefix = "hikari-cp-monitor")
@Data
public class HikariCPMonitorConfig {
    private boolean enabled = true;
    private int dumpThresholdMs = 30000;
    private int dumpIntervalMs = 60000;
    private int maxDumpCount = 10;
    private String dumpPath = "/tmp/hikari-dump";
    private AlertConfig alert = new AlertConfig();

    @Data
    public static class AlertConfig {
        private boolean enabled = true;
        private String email;
        private String webhook;
    }
}

(2)连接池监控服务

@Service
@Slf4j
public class HikariCPMonitorService {
    private static final Logger auditLog = LoggerFactory.getLogger("HIKARI_AUDIT");

    @Autowired
    private HikariDataSource hikariDataSource;

    @Autowired
    private HikariCPMonitorConfig config;

    @Autowired
    private ThreadDumpService threadDumpService;

    @Autowired
    private AlertService alertService;

    private final AtomicInteger dumpCount = new AtomicInteger(0);
    private volatile long lastDumpTime = 0;

    public void checkConnectionPool() {
        if (!config.isEnabled()) {
            return;
        }

        HikariPool pool = getHikariPool();
        if (pool == null) {
            return;
        }

        HikariPoolMXBean poolBean = pool.getMBean();
        int activeConnections = poolBean.getActiveConnections();
        int idleConnections = poolBean.getIdleConnections();
        int totalConnections = poolBean.getTotalConnections();
        int threadsAwaitingConnection = poolBean.getThreadsAwaitingConnection();

        long waitTime = System.currentTimeMillis() - lastDumpTime;

        if (threadsAwaitingConnection > 0 && waitTime >= config.getDumpIntervalMs()) {
            if (dumpCount.get() < config.getMaxDumpCount()) {
                performThreadDump(pool, poolBean, activeConnections, idleConnections, totalConnections, threadsAwaitingConnection);
            }
        }

        logConnectionPoolStatus(activeConnections, idleConnections, totalConnections, threadsAwaitingConnection);
    }

    private void performThreadDump(HikariPool pool, HikariPoolMXBean poolBean,
                                   int activeConnections, int idleConnections,
                                   int totalConnections, int threadsAwaitingConnection) {
        lastDumpTime = System.currentTimeMillis();
        int currentDumpCount = dumpCount.incrementAndGet();

        String dumpFile = String.format("%s/hikari-dump-%d-%d.txt",
                                        config.getDumpPath(),
                                        System.currentTimeMillis(),
                                        currentDumpCount);

        try {
            String threadDump = threadDumpService.generateThreadDump();

            StringBuilder report = new StringBuilder();
            report.append("=== HikariCP Connection Pool Timeout Report ===\n");
            report.append(String.format("Timestamp: %s\n", LocalDateTime.now()));
            report.append(String.format("Dump Count: %d\n", currentDumpCount));
            report.append(String.format("Active Connections: %d\n", activeConnections));
            report.append(String.format("Idle Connections: %d\n", idleConnections));
            report.append(String.format("Total Connections: %d\n", totalConnections));
            report.append(String.format("Threads Awaiting Connection: %d\n", threadsAwaitingConnection));
            report.append("\n=== Connection Pool Config ===\n");
            report.append(String.format("Maximum Pool Size: %d\n", poolBean.getMaximumPoolSize()));
            report.append(String.format("Minimum Idle: %d\n", poolBean.getMinimumIdle()));
            report.append(String.format("Connection Timeout: %dms\n", pool.getConfig().getConnectionTimeout()));
            report.append("\n=== Thread Dump ===\n");
            report.append(threadDump);

            Files.write(Paths.get(dumpFile), report.toString().getBytes());
            auditLog.warn("Connection pool dump saved to: {}", dumpFile);

            sendAlert(dumpFile, activeConnections, idleConnections, totalConnections, threadsAwaitingConnection);

        } catch (Exception e) {
            log.error("Failed to perform thread dump", e);
        }
    }

    private void logConnectionPoolStatus(int active, int idle, int total, int awaiting) {
        if (log.isDebugEnabled()) {
            log.debug("HikariCP Status - Active: {}, Idle: {}, Total: {}, Awaiting: {}",
                     active, idle, total, awaiting);
        }

        auditLog.info("HikariCP - Active: {}, Idle: {}, Total: {}, Awaiting: {}",
                     active, idle, total, awaiting);
    }

    private void sendAlert(String dumpFile, int active, int idle, int total, int awaiting) {
        if (!config.getAlert().isEnabled()) {
            return;
        }

        String subject = "HikariCP Connection Pool Timeout Alert";
        String content = String.format(
            "Connection pool timeout detected!\n\n" +
            "Active Connections: %d\n" +
            "Idle Connections: %d\n" +
            "Total Connections: %d\n" +
            "Threads Awaiting: %d\n\n" +
            "Dump file: %s",
            active, idle, total, awaiting, dumpFile
        );

        alertService.sendAlert(subject, content);
    }

    private HikariPool getHikariPool() {
        try {
            Field poolField = HikariDataSource.class.getDeclaredField("pool");
            poolField.setAccessible(true);
            return (HikariPool) poolField.get(hikariDataSource);
        } catch (Exception e) {
            log.error("Failed to get HikariPool", e);
            return null;
        }
    }
}

(3)线程dump服务

@Service
@Slf4j
public class ThreadDumpService {

    public String generateThreadDump() {
        StringBuilder sb = new StringBuilder();

        Map<Thread, StackTraceElement[]> allThreads = Thread.getAllStackTraces();

        sb.append(String.format("Total threads: %d\n\n", allThreads.size()));

        for (Map.Entry<Thread, StackTraceElement[]> entry : allThreads.entrySet()) {
            Thread thread = entry.getKey();
            StackTraceElement[] stackTrace = entry.getValue();

            sb.append(String.format("Thread: %s (ID: %d, State: %s)\n",
                                  thread.getName(),
                                  thread.getId(),
                                  thread.getState()));

            if (isDatabaseThread(thread, stackTrace)) {
                sb.append("  [DATABASE THREAD]\n");
            }

            for (StackTraceElement element : stackTrace) {
                sb.append(String.format("    at %s.%s(%s:%d)\n",
                                       element.getClassName(),
                                       element.getMethodName(),
                                       element.getFileName(),
                                       element.getLineNumber()));
            }
            sb.append("\n");
        }

        return sb.toString();
    }

    public String analyzeThreadDump(String threadDump) {
        StringBuilder analysis = new StringBuilder();
        analysis.append("=== Thread Dump Analysis ===\n\n");

        int dbThreadCount = 0;
        int waitingThreads = 0;
        int blockedThreads = 0;

        String[] lines = threadDump.split("\n");
        for (int i = 0; i < lines.length; i++) {
            String line = lines[i];
            if (line.contains("[DATABASE THREAD]")) {
                dbThreadCount++;
            }
            if (line.contains("WAITING") || line.contains("TIMED_WAITING")) {
                waitingThreads++;
            }
            if (line.contains("BLOCKED")) {
                blockedThreads++;
            }
        }

        analysis.append(String.format("Database Threads: %d\n", dbThreadCount));
        analysis.append(String.format("Waiting Threads: %d\n", waitingThreads));
        analysis.append(String.format("Blocked Threads: %d\n\n", blockedThreads));

        analysis.append("=== Slow SQL Analysis ===\n");
        analysis.append(analyzeSlowSQL(threadDump));

        return analysis.toString();
    }

    private boolean isDatabaseThread(Thread thread, StackTraceElement[] stackTrace) {
        String threadName = thread.getName().toLowerCase();
        if (threadName.contains("hikari") || threadName.contains("mysql") || threadName.contains("mariadb")) {
            return true;
        }

        for (StackTraceElement element : stackTrace) {
            String className = element.getClassName().toLowerCase();
            if (className.contains("com.zaxxer.hikari") ||
                className.contains("com.mysql") ||
                className.contains("org.hibernate")) {
                return true;
            }
        }
        return false;
    }

    private String analyzeSlowSQL(String threadDump) {
        StringBuilder result = new StringBuilder();

        Pattern sqlPattern = Pattern.compile("(SELECT|INSERT|UPDATE|DELETE).*?;",
                                           Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
        Matcher matcher = sqlPattern.matcher(threadDump);

        int count = 0;
        while (matcher.find() && count < 10) {
            result.append(String.format("%d. %s\n\n", count + 1, matcher.group()));
            count++;
        }

        if (count == 0) {
            result.append("No obvious SQL statements found in thread dump.\n");
        }

        return result.toString();
    }
}

(4)告警服务

@Service
@Slf4j
public class AlertService {

    @Autowired
    private HikariCPMonitorConfig config;

    @Value("${spring.mail.username:}")
    private String fromEmail;

    @Autowired
    private JavaMailSender mailSender;

    public void sendAlert(String subject, String content) {
        if (!config.getAlert().isEnabled()) {
            return;
        }

        if (config.getAlert().getEmail() != null) {
            sendEmail(subject, content);
        }

        if (config.getAlert().getWebhook() != null) {
            sendWebhook(subject, content);
        }
    }

    private void sendEmail(String subject, String content) {
        try {
            SimpleMailMessage message = new SimpleMailMessage();
            message.setFrom(fromEmail);
            message.setTo(config.getAlert().getEmail());
            message.setSubject("[HikariCP Alert] " + subject);
            message.setText(content);
            mailSender.send(message);
            log.info("Alert email sent successfully");
        } catch (Exception e) {
            log.error("Failed to send alert email", e);
        }
    }

    private void sendWebhook(String webhook, String subject, String content) {
        log.info("Webhook alert: {} - {}", subject, webhook);
    }
}

(5)定时监控任务

@Component
public class HikariCPMonitorScheduler {

    @Autowired
    private HikariCPMonitorService monitorService;

    @Scheduled(fixedRate = 5000)
    public void monitor() {
        try {
            monitorService.checkConnectionPool();
        } catch (Exception e) {
            log.error("Error during connection pool monitoring", e);
        }
    }
}

(6)连接池健康检查控制器

@RestController
@RequestMapping("/api/hikari")
@Slf4j
public class HikariController {

    @Autowired
    private HikariDataSource hikariDataSource;

    @Autowired
    private ThreadDumpService threadDumpService;

    @GetMapping("/status")
    public ResponseEntity<Map<String, Object>> getStatus() {
        try {
            Field poolField = HikariDataSource.class.getDeclaredField("pool");
            poolField.setAccessible(true);
            HikariPool pool = (HikariPool) poolField.get(hikariDataSource);
            HikariPoolMXBean poolBean = pool.getMBean();

            Map<String, Object> status = new HashMap<>();
            status.put("activeConnections", poolBean.getActiveConnections());
            status.put("idleConnections", poolBean.getIdleConnections());
            status.put("totalConnections", poolBean.getTotalConnections());
            status.put("threadsAwaitingConnection", poolBean.getThreadsAwaitingConnection());
            status.put("maximumPoolSize", poolBean.getMaximumPoolSize());
            status.put("minimumIdle", poolBean.getMinimumIdle());
            status.put("connectionTimeout", pool.getConfig().getConnectionTimeout());

            return ResponseEntity.ok(status);
        } catch (Exception e) {
            log.error("Failed to get HikariCP status", e);
            return ResponseEntity.status(500).body(Map.of("error", e.getMessage()));
        }
    }

    @GetMapping("/dump")
    public ResponseEntity<Map<String, Object>> triggerDump() {
        try {
            String dump = threadDumpService.generateThreadDump();
            String analysis = threadDumpService.analyzeThreadDump(dump);

            Map<String, Object> result = new HashMap<>();
            result.put("threadDump", dump);
            result.put("analysis", analysis);

            return ResponseEntity.ok(result);
        } catch (Exception e) {
            log.error("Failed to trigger thread dump", e);
            return ResponseEntity.status(500).body(Map.of("error", e.getMessage()));
        }
    }

    @GetMapping("/health")
    public ResponseEntity<Map<String, Object>> healthCheck() {
        try {
            HikariPoolMXBean poolBean = getPoolMXBean();

            Map<String, Object> health = new HashMap<>();
            health.put("status", "UP");
            health.put("activeConnections", poolBean.getActiveConnections());
            health.put("idleConnections", poolBean.getIdleConnections());
            health.put("totalConnections", poolBean.getTotalConnections());

            if (poolBean.getThreadsAwaitingConnection() > 0) {
                health.put("status", "DEGRADED");
                health.put("warning", "There are threads awaiting connection");
            }

            return ResponseEntity.ok(health);
        } catch (Exception e) {
            return ResponseEntity.status(500).body(Map.of(
                "status", "DOWN",
                "error", e.getMessage()
            ));
        }
    }

    private HikariPoolMXBean getPoolMXBean() throws Exception {
        Field poolField = HikariDataSource.class.getDeclaredField("pool");
        poolField.setAccessible(true);
        HikariPool pool = (HikariPool) poolField.get(hikariDataSource);
        return pool.getMBean();
    }
}

3. 配置详解

application.yml配置

spring:
  datasource:
    url: jdbc:mysql://localhost:3306/demo?useUnicode=true&characterEncoding=utf8
    username: root
    password: 123456
    driver-class-name: com.mysql.cj.jdbc.Driver
    hikari:
      maximum-pool-size: 20
      minimum-idle: 5
      connection-timeout: 30000
      idle-timeout: 600000
      max-lifetime: 1800000
      connection-test-query: SELECT 1

hikari-cp-monitor:
  enabled: true
  dump-threshold-ms: 30000
  dump-interval-ms: 60000
  max-dump-count: 10
  dump-path: /tmp/hikari-dump
  alert:
    enabled: true
    email: admin@example.com
    webhook: https://webhook.example.com

HikariCP关键参数说明

参数说明默认值优化建议
maximum-pool-size最大连接数10根据数据库实际并发能力调整
minimum-idle最小空闲连接maximum-pool-size通常等于maximum-pool-size
connection-timeout获取连接超时时间30000ms不建议过长,会导致请求堆积
idle-timeout空闲连接超时时间600000ms根据业务特点调整
max-lifetime连接最大生命周期1800000ms应小于数据库服务端超时

五、实战分析

1. 典型问题场景

场景一:慢SQL占用连接

-- 一条查询没有索引的SQL
SELECT * FROM orders WHERE user_name = '张三' AND status = 1;

执行时间:30秒
连接占用时间:30秒

如果并发10个这样的请求,20个连接池的HikariCP在1.5秒内就会耗尽。

Dump分析

Thread: hikari-pool-1-thread-3 (ID: 32, State: RUNNABLE)
  [DATABASE THREAD]
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:...)
    at com.zaxxer.hikari.pool.HikariPreparedStatementWrapper.executeQuery(...)
    at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(...)

场景二:连接泄漏

@DataSource("write")
public void writeData() {
    Connection conn = null;
    PreparedStatement ps = null;
    try {
        conn = dataSource.getConnection();
        ps = conn.prepareStatement(sql);
        // 业务处理
    } catch (Exception e) {
        // 异常处理
    }
    // 忘记关闭连接!
}

Dump分析

Thread: http-nio-8080-exec-10 (ID: 45, State: WAITING)
    at java.lang.Object.wait(Native Method)
    at com.zaxxer.hikari.pool.PoolBase.tryTimeout(...)
    at com.zaxxer.hikari.pool.HikariPool.timeoutWait(...)

2. 优化策略

策略一:SQL优化

-- 添加索引
ALTER TABLE orders ADD INDEX idx_user_name (user_name);
ALTER TABLE orders ADD INDEX idx_status (status);
ALTER TABLE orders ADD INDEX idx_user_status (user_name, status);

策略二:连接池参数优化

spring:
  datasource:
    hikari:
      maximum-pool-size: 50
      minimum-idle: 10
      connection-timeout: 10000
      idle-timeout: 300000

策略三:服务拆分

将耗时操作异步化,避免占用连接池:

@Async
public CompletableFuture<Void> processOrderAsync(Long orderId) {
    // 耗时操作
    return CompletableFuture.completedFuture(null);
}

六、最佳实践

1. 监控配置

  • 合理设置阈值:根据业务特点设置合理的dump阈值
  • 控制dump频率:避免短时间内生成过多dump文件
  • 定期清理:设置定时任务清理过期的dump文件
  • 监控指标:除了dump,还应监控连接池的实时指标

2. 告警策略

  • 分级告警:区分警告和紧急告警
  • 多渠道通知:配置多种通知方式,确保告警能够送达
  • 告警收敛:相同问题合并告警,避免告警风暴
  • 值班制度:建立值班制度,确保告警能够及时处理

3. 问题排查流程

  1. 查看告警:收到告警后,首先查看dump文件
  2. 分析线程栈:重点关注WAITING和BLOCKED状态的线程
  3. 定位SQL:找出占用连接的SQL语句
  4. 分析原因:是SQL慢,还是连接泄漏
  5. 制定方案:SQL优化或代码修复
  6. 验证效果:上线后观察是否还有问题

4. 预防措施

  • 代码审查:重点审查数据库操作的代码
  • SQL审核:上线前审核SQL性能
  • 压测验证:上线前进行压测,验证连接池配置
  • 巡检机制:定期巡检连接池状态

七、总结与展望

方案总结

  1. 自动化:自动监控和dump,无需人工干预
  2. 及时性:问题发生时立即记录,便于分析
  3. 完整性:生成完整的线程栈信息,便于定位问题
  4. 多维度:支持多种分析维度,如SQL分析、线程分析等
  5. 可扩展:易于扩展,支持更多的监控指标和告警方式

未来优化方向

  1. 智能分析:引入AI技术,自动分析线程栈,定位问题根因
  2. 可视化:提供Web界面,直观展示连接池状态
  3. 自动优化:根据监控数据,自动建议连接池参数优化
  4. 分布式支持:支持分布式环境下的连接池监控

技术价值

  1. 快速定位问题:通过自动dump,快速定位连接池问题根因
  2. 减少停机时间:问题能够快速解决,减少业务损失
  3. 降低维护成本:自动化监控减少人工维护成本
  4. 提高系统可靠性:及时发现和解决问题,提高系统可靠性

八、写在最后

连接池问题一直是Java后端开发的难点之一,因为问题往往是瞬时的、难以复现的。通过HikariCP连接池超时自动dump机制,我们可以在问题发生时自动记录现场,大大提高了问题排查的效率。

当然,这套方案也不是银弹,它只能帮助我们记录问题现场,真正的根因分析还需要结合业务场景和代码逻辑。但至少,我们不再需要在问题发生后手忙脚乱地去找开发要线程dump,也不用担心问题转瞬即逝抓不到。

希望这篇文章能给你带来一些启发,帮助你在实际项目中更好地解决连接池问题。

如果你在使用这套方案的过程中有其他经验或困惑,欢迎在评论区留言交流!


服务端技术精选,专注分享后端开发实战经验,让技术落地更简单。

如果你觉得这篇文章有用,欢迎点赞、在看、分享三连!


标题:SpringBoot + 连接池获取超时排查:HikariCP 获取连接超时?自动 dump 线程栈定位
作者:jiangyi
地址:http://www.jiangyi.space/articles/2026/04/27/1777080141786.html
公众号:服务端技术精选
    评论
    0 评论
avatar

取消