SpringBoot + 连接池获取超时排查:HikariCP 获取连接超时?自动 dump 线程栈定位
一、连接池获取超时的痛点
上周,一位做支付系统的朋友吐槽:他们的系统在高峰期经常出现接口超时的问题,日志里大量报错 "Connection is not available, request timed out after 30000ms"。
"我们已经把连接池最大数调到了 50,"他说,"但还是不够用,一到高峰期就出问题。"
我查看了他们的系统,发现问题确实很严重:
- 系统在高峰期 TPS 约 2000
- 数据库连接池最大 50 个
- 平均接口耗时 150ms
- 理论上 50 个连接每秒能处理约 333 个请求,但实际高峰需要 2000 TPS
- 每次接口需要访问数据库 3-5 次
更关键的是,他们根本不知道连接都去哪了,是哪些SQL慢查询占用了连接,还是存在连接泄漏?
二、传统排查方案的局限性
1. 手动查看日志
通过日志查看哪些接口慢、哪些SQL执行时间长。
这种方案的问题:
- 滞后性:问题已经发生,只能事后分析
- 信息不全:只知道某个接口慢,不知道当时系统整体状态
- 无法定位根因:看到的是表象,不是根因
- 耗时耗力:需要大量时间分析日志
2. 手动dump线程
通过 jstack 或 Arthas 手动dump线程栈,分析哪些线程在等待连接。
这种方案的问题:
- 时机难把握:问题转瞬即逝,手动dump可能抓不到关键时刻
- 影响性能:dump操作本身会消耗系统资源
- 信息碎片化:需要多次dump才能还原问题全貌
- 专业门槛高:需要熟悉线程分析才能看懂
3. 调整连接池参数
通过增加最大连接数、减少超时时间等方式缓解问题。
这种方案的问题:
- 治标不治本:只是推迟了问题,没有解决根本原因
- 资源浪费:连接数过多会消耗过多数据库资源
- 增加复杂度:参数繁多,需要丰富的经验才能调好
- 可能引发新问题:参数调整不当可能引发其他问题
三、终极方案:HikariCP 连接池超时自动 dump 机制
今天,我要和大家分享一个在实战中验证过的解决方案:HikariCP 连接池获取超时自动 dump 机制。
这套方案的核心思想是:
- 自动监控:实时监控连接池状态,及时发现异常
- 自动 dump:当连接获取超时时,自动 dump 线程栈
- 智能分析:自动分析线程栈,定位阻塞原因
- 多维告警:通过多渠道发送告警,确保及时通知
四、方案详解
1. 核心原理
HikariCP 连接池获取连接的流程如下:
线程请求连接
↓
检查连接池是否有空闲连接
↓
有 → 直接返回
无 → 等待空闲连接(可配置最大等待时间)
↓
等待超时 → 抛出异常
等待成功 → 返回连接
当连接获取超时时,我们需要知道:
- 当前连接池的状态(活跃数、空闲数、等待线程数)
- 哪些线程在占用连接
- 哪些线程在等待连接
- 占用的连接在执行什么SQL
2. SpringBoot实现
(1)HikariCP监控配置类
@Configuration
@ConfigurationProperties(prefix = "hikari-cp-monitor")
@Data
public class HikariCPMonitorConfig {
private boolean enabled = true;
private int dumpThresholdMs = 30000;
private int dumpIntervalMs = 60000;
private int maxDumpCount = 10;
private String dumpPath = "/tmp/hikari-dump";
private AlertConfig alert = new AlertConfig();
@Data
public static class AlertConfig {
private boolean enabled = true;
private String email;
private String webhook;
}
}
(2)连接池监控服务
@Service
@Slf4j
public class HikariCPMonitorService {
private static final Logger auditLog = LoggerFactory.getLogger("HIKARI_AUDIT");
@Autowired
private HikariDataSource hikariDataSource;
@Autowired
private HikariCPMonitorConfig config;
@Autowired
private ThreadDumpService threadDumpService;
@Autowired
private AlertService alertService;
private final AtomicInteger dumpCount = new AtomicInteger(0);
private volatile long lastDumpTime = 0;
public void checkConnectionPool() {
if (!config.isEnabled()) {
return;
}
HikariPool pool = getHikariPool();
if (pool == null) {
return;
}
HikariPoolMXBean poolBean = pool.getMBean();
int activeConnections = poolBean.getActiveConnections();
int idleConnections = poolBean.getIdleConnections();
int totalConnections = poolBean.getTotalConnections();
int threadsAwaitingConnection = poolBean.getThreadsAwaitingConnection();
long waitTime = System.currentTimeMillis() - lastDumpTime;
if (threadsAwaitingConnection > 0 && waitTime >= config.getDumpIntervalMs()) {
if (dumpCount.get() < config.getMaxDumpCount()) {
performThreadDump(pool, poolBean, activeConnections, idleConnections, totalConnections, threadsAwaitingConnection);
}
}
logConnectionPoolStatus(activeConnections, idleConnections, totalConnections, threadsAwaitingConnection);
}
private void performThreadDump(HikariPool pool, HikariPoolMXBean poolBean,
int activeConnections, int idleConnections,
int totalConnections, int threadsAwaitingConnection) {
lastDumpTime = System.currentTimeMillis();
int currentDumpCount = dumpCount.incrementAndGet();
String dumpFile = String.format("%s/hikari-dump-%d-%d.txt",
config.getDumpPath(),
System.currentTimeMillis(),
currentDumpCount);
try {
String threadDump = threadDumpService.generateThreadDump();
StringBuilder report = new StringBuilder();
report.append("=== HikariCP Connection Pool Timeout Report ===\n");
report.append(String.format("Timestamp: %s\n", LocalDateTime.now()));
report.append(String.format("Dump Count: %d\n", currentDumpCount));
report.append(String.format("Active Connections: %d\n", activeConnections));
report.append(String.format("Idle Connections: %d\n", idleConnections));
report.append(String.format("Total Connections: %d\n", totalConnections));
report.append(String.format("Threads Awaiting Connection: %d\n", threadsAwaitingConnection));
report.append("\n=== Connection Pool Config ===\n");
report.append(String.format("Maximum Pool Size: %d\n", poolBean.getMaximumPoolSize()));
report.append(String.format("Minimum Idle: %d\n", poolBean.getMinimumIdle()));
report.append(String.format("Connection Timeout: %dms\n", pool.getConfig().getConnectionTimeout()));
report.append("\n=== Thread Dump ===\n");
report.append(threadDump);
Files.write(Paths.get(dumpFile), report.toString().getBytes());
auditLog.warn("Connection pool dump saved to: {}", dumpFile);
sendAlert(dumpFile, activeConnections, idleConnections, totalConnections, threadsAwaitingConnection);
} catch (Exception e) {
log.error("Failed to perform thread dump", e);
}
}
private void logConnectionPoolStatus(int active, int idle, int total, int awaiting) {
if (log.isDebugEnabled()) {
log.debug("HikariCP Status - Active: {}, Idle: {}, Total: {}, Awaiting: {}",
active, idle, total, awaiting);
}
auditLog.info("HikariCP - Active: {}, Idle: {}, Total: {}, Awaiting: {}",
active, idle, total, awaiting);
}
private void sendAlert(String dumpFile, int active, int idle, int total, int awaiting) {
if (!config.getAlert().isEnabled()) {
return;
}
String subject = "HikariCP Connection Pool Timeout Alert";
String content = String.format(
"Connection pool timeout detected!\n\n" +
"Active Connections: %d\n" +
"Idle Connections: %d\n" +
"Total Connections: %d\n" +
"Threads Awaiting: %d\n\n" +
"Dump file: %s",
active, idle, total, awaiting, dumpFile
);
alertService.sendAlert(subject, content);
}
private HikariPool getHikariPool() {
try {
Field poolField = HikariDataSource.class.getDeclaredField("pool");
poolField.setAccessible(true);
return (HikariPool) poolField.get(hikariDataSource);
} catch (Exception e) {
log.error("Failed to get HikariPool", e);
return null;
}
}
}
(3)线程dump服务
@Service
@Slf4j
public class ThreadDumpService {
public String generateThreadDump() {
StringBuilder sb = new StringBuilder();
Map<Thread, StackTraceElement[]> allThreads = Thread.getAllStackTraces();
sb.append(String.format("Total threads: %d\n\n", allThreads.size()));
for (Map.Entry<Thread, StackTraceElement[]> entry : allThreads.entrySet()) {
Thread thread = entry.getKey();
StackTraceElement[] stackTrace = entry.getValue();
sb.append(String.format("Thread: %s (ID: %d, State: %s)\n",
thread.getName(),
thread.getId(),
thread.getState()));
if (isDatabaseThread(thread, stackTrace)) {
sb.append(" [DATABASE THREAD]\n");
}
for (StackTraceElement element : stackTrace) {
sb.append(String.format(" at %s.%s(%s:%d)\n",
element.getClassName(),
element.getMethodName(),
element.getFileName(),
element.getLineNumber()));
}
sb.append("\n");
}
return sb.toString();
}
public String analyzeThreadDump(String threadDump) {
StringBuilder analysis = new StringBuilder();
analysis.append("=== Thread Dump Analysis ===\n\n");
int dbThreadCount = 0;
int waitingThreads = 0;
int blockedThreads = 0;
String[] lines = threadDump.split("\n");
for (int i = 0; i < lines.length; i++) {
String line = lines[i];
if (line.contains("[DATABASE THREAD]")) {
dbThreadCount++;
}
if (line.contains("WAITING") || line.contains("TIMED_WAITING")) {
waitingThreads++;
}
if (line.contains("BLOCKED")) {
blockedThreads++;
}
}
analysis.append(String.format("Database Threads: %d\n", dbThreadCount));
analysis.append(String.format("Waiting Threads: %d\n", waitingThreads));
analysis.append(String.format("Blocked Threads: %d\n\n", blockedThreads));
analysis.append("=== Slow SQL Analysis ===\n");
analysis.append(analyzeSlowSQL(threadDump));
return analysis.toString();
}
private boolean isDatabaseThread(Thread thread, StackTraceElement[] stackTrace) {
String threadName = thread.getName().toLowerCase();
if (threadName.contains("hikari") || threadName.contains("mysql") || threadName.contains("mariadb")) {
return true;
}
for (StackTraceElement element : stackTrace) {
String className = element.getClassName().toLowerCase();
if (className.contains("com.zaxxer.hikari") ||
className.contains("com.mysql") ||
className.contains("org.hibernate")) {
return true;
}
}
return false;
}
private String analyzeSlowSQL(String threadDump) {
StringBuilder result = new StringBuilder();
Pattern sqlPattern = Pattern.compile("(SELECT|INSERT|UPDATE|DELETE).*?;",
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher matcher = sqlPattern.matcher(threadDump);
int count = 0;
while (matcher.find() && count < 10) {
result.append(String.format("%d. %s\n\n", count + 1, matcher.group()));
count++;
}
if (count == 0) {
result.append("No obvious SQL statements found in thread dump.\n");
}
return result.toString();
}
}
(4)告警服务
@Service
@Slf4j
public class AlertService {
@Autowired
private HikariCPMonitorConfig config;
@Value("${spring.mail.username:}")
private String fromEmail;
@Autowired
private JavaMailSender mailSender;
public void sendAlert(String subject, String content) {
if (!config.getAlert().isEnabled()) {
return;
}
if (config.getAlert().getEmail() != null) {
sendEmail(subject, content);
}
if (config.getAlert().getWebhook() != null) {
sendWebhook(subject, content);
}
}
private void sendEmail(String subject, String content) {
try {
SimpleMailMessage message = new SimpleMailMessage();
message.setFrom(fromEmail);
message.setTo(config.getAlert().getEmail());
message.setSubject("[HikariCP Alert] " + subject);
message.setText(content);
mailSender.send(message);
log.info("Alert email sent successfully");
} catch (Exception e) {
log.error("Failed to send alert email", e);
}
}
private void sendWebhook(String webhook, String subject, String content) {
log.info("Webhook alert: {} - {}", subject, webhook);
}
}
(5)定时监控任务
@Component
public class HikariCPMonitorScheduler {
@Autowired
private HikariCPMonitorService monitorService;
@Scheduled(fixedRate = 5000)
public void monitor() {
try {
monitorService.checkConnectionPool();
} catch (Exception e) {
log.error("Error during connection pool monitoring", e);
}
}
}
(6)连接池健康检查控制器
@RestController
@RequestMapping("/api/hikari")
@Slf4j
public class HikariController {
@Autowired
private HikariDataSource hikariDataSource;
@Autowired
private ThreadDumpService threadDumpService;
@GetMapping("/status")
public ResponseEntity<Map<String, Object>> getStatus() {
try {
Field poolField = HikariDataSource.class.getDeclaredField("pool");
poolField.setAccessible(true);
HikariPool pool = (HikariPool) poolField.get(hikariDataSource);
HikariPoolMXBean poolBean = pool.getMBean();
Map<String, Object> status = new HashMap<>();
status.put("activeConnections", poolBean.getActiveConnections());
status.put("idleConnections", poolBean.getIdleConnections());
status.put("totalConnections", poolBean.getTotalConnections());
status.put("threadsAwaitingConnection", poolBean.getThreadsAwaitingConnection());
status.put("maximumPoolSize", poolBean.getMaximumPoolSize());
status.put("minimumIdle", poolBean.getMinimumIdle());
status.put("connectionTimeout", pool.getConfig().getConnectionTimeout());
return ResponseEntity.ok(status);
} catch (Exception e) {
log.error("Failed to get HikariCP status", e);
return ResponseEntity.status(500).body(Map.of("error", e.getMessage()));
}
}
@GetMapping("/dump")
public ResponseEntity<Map<String, Object>> triggerDump() {
try {
String dump = threadDumpService.generateThreadDump();
String analysis = threadDumpService.analyzeThreadDump(dump);
Map<String, Object> result = new HashMap<>();
result.put("threadDump", dump);
result.put("analysis", analysis);
return ResponseEntity.ok(result);
} catch (Exception e) {
log.error("Failed to trigger thread dump", e);
return ResponseEntity.status(500).body(Map.of("error", e.getMessage()));
}
}
@GetMapping("/health")
public ResponseEntity<Map<String, Object>> healthCheck() {
try {
HikariPoolMXBean poolBean = getPoolMXBean();
Map<String, Object> health = new HashMap<>();
health.put("status", "UP");
health.put("activeConnections", poolBean.getActiveConnections());
health.put("idleConnections", poolBean.getIdleConnections());
health.put("totalConnections", poolBean.getTotalConnections());
if (poolBean.getThreadsAwaitingConnection() > 0) {
health.put("status", "DEGRADED");
health.put("warning", "There are threads awaiting connection");
}
return ResponseEntity.ok(health);
} catch (Exception e) {
return ResponseEntity.status(500).body(Map.of(
"status", "DOWN",
"error", e.getMessage()
));
}
}
private HikariPoolMXBean getPoolMXBean() throws Exception {
Field poolField = HikariDataSource.class.getDeclaredField("pool");
poolField.setAccessible(true);
HikariPool pool = (HikariPool) poolField.get(hikariDataSource);
return pool.getMBean();
}
}
3. 配置详解
application.yml配置
spring:
datasource:
url: jdbc:mysql://localhost:3306/demo?useUnicode=true&characterEncoding=utf8
username: root
password: 123456
driver-class-name: com.mysql.cj.jdbc.Driver
hikari:
maximum-pool-size: 20
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000
connection-test-query: SELECT 1
hikari-cp-monitor:
enabled: true
dump-threshold-ms: 30000
dump-interval-ms: 60000
max-dump-count: 10
dump-path: /tmp/hikari-dump
alert:
enabled: true
email: admin@example.com
webhook: https://webhook.example.com
HikariCP关键参数说明
| 参数 | 说明 | 默认值 | 优化建议 |
|---|---|---|---|
| maximum-pool-size | 最大连接数 | 10 | 根据数据库实际并发能力调整 |
| minimum-idle | 最小空闲连接 | maximum-pool-size | 通常等于maximum-pool-size |
| connection-timeout | 获取连接超时时间 | 30000ms | 不建议过长,会导致请求堆积 |
| idle-timeout | 空闲连接超时时间 | 600000ms | 根据业务特点调整 |
| max-lifetime | 连接最大生命周期 | 1800000ms | 应小于数据库服务端超时 |
五、实战分析
1. 典型问题场景
场景一:慢SQL占用连接
-- 一条查询没有索引的SQL
SELECT * FROM orders WHERE user_name = '张三' AND status = 1;
执行时间:30秒
连接占用时间:30秒
如果并发10个这样的请求,20个连接池的HikariCP在1.5秒内就会耗尽。
Dump分析:
Thread: hikari-pool-1-thread-3 (ID: 32, State: RUNNABLE)
[DATABASE THREAD]
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:...)
at com.zaxxer.hikari.pool.HikariPreparedStatementWrapper.executeQuery(...)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(...)
场景二:连接泄漏
@DataSource("write")
public void writeData() {
Connection conn = null;
PreparedStatement ps = null;
try {
conn = dataSource.getConnection();
ps = conn.prepareStatement(sql);
// 业务处理
} catch (Exception e) {
// 异常处理
}
// 忘记关闭连接!
}
Dump分析:
Thread: http-nio-8080-exec-10 (ID: 45, State: WAITING)
at java.lang.Object.wait(Native Method)
at com.zaxxer.hikari.pool.PoolBase.tryTimeout(...)
at com.zaxxer.hikari.pool.HikariPool.timeoutWait(...)
2. 优化策略
策略一:SQL优化
-- 添加索引
ALTER TABLE orders ADD INDEX idx_user_name (user_name);
ALTER TABLE orders ADD INDEX idx_status (status);
ALTER TABLE orders ADD INDEX idx_user_status (user_name, status);
策略二:连接池参数优化
spring:
datasource:
hikari:
maximum-pool-size: 50
minimum-idle: 10
connection-timeout: 10000
idle-timeout: 300000
策略三:服务拆分
将耗时操作异步化,避免占用连接池:
@Async
public CompletableFuture<Void> processOrderAsync(Long orderId) {
// 耗时操作
return CompletableFuture.completedFuture(null);
}
六、最佳实践
1. 监控配置
- 合理设置阈值:根据业务特点设置合理的dump阈值
- 控制dump频率:避免短时间内生成过多dump文件
- 定期清理:设置定时任务清理过期的dump文件
- 监控指标:除了dump,还应监控连接池的实时指标
2. 告警策略
- 分级告警:区分警告和紧急告警
- 多渠道通知:配置多种通知方式,确保告警能够送达
- 告警收敛:相同问题合并告警,避免告警风暴
- 值班制度:建立值班制度,确保告警能够及时处理
3. 问题排查流程
- 查看告警:收到告警后,首先查看dump文件
- 分析线程栈:重点关注WAITING和BLOCKED状态的线程
- 定位SQL:找出占用连接的SQL语句
- 分析原因:是SQL慢,还是连接泄漏
- 制定方案:SQL优化或代码修复
- 验证效果:上线后观察是否还有问题
4. 预防措施
- 代码审查:重点审查数据库操作的代码
- SQL审核:上线前审核SQL性能
- 压测验证:上线前进行压测,验证连接池配置
- 巡检机制:定期巡检连接池状态
七、总结与展望
方案总结
- 自动化:自动监控和dump,无需人工干预
- 及时性:问题发生时立即记录,便于分析
- 完整性:生成完整的线程栈信息,便于定位问题
- 多维度:支持多种分析维度,如SQL分析、线程分析等
- 可扩展:易于扩展,支持更多的监控指标和告警方式
未来优化方向
- 智能分析:引入AI技术,自动分析线程栈,定位问题根因
- 可视化:提供Web界面,直观展示连接池状态
- 自动优化:根据监控数据,自动建议连接池参数优化
- 分布式支持:支持分布式环境下的连接池监控
技术价值
- 快速定位问题:通过自动dump,快速定位连接池问题根因
- 减少停机时间:问题能够快速解决,减少业务损失
- 降低维护成本:自动化监控减少人工维护成本
- 提高系统可靠性:及时发现和解决问题,提高系统可靠性
八、写在最后
连接池问题一直是Java后端开发的难点之一,因为问题往往是瞬时的、难以复现的。通过HikariCP连接池超时自动dump机制,我们可以在问题发生时自动记录现场,大大提高了问题排查的效率。
当然,这套方案也不是银弹,它只能帮助我们记录问题现场,真正的根因分析还需要结合业务场景和代码逻辑。但至少,我们不再需要在问题发生后手忙脚乱地去找开发要线程dump,也不用担心问题转瞬即逝抓不到。
希望这篇文章能给你带来一些启发,帮助你在实际项目中更好地解决连接池问题。
如果你在使用这套方案的过程中有其他经验或困惑,欢迎在评论区留言交流!
服务端技术精选,专注分享后端开发实战经验,让技术落地更简单。
如果你觉得这篇文章有用,欢迎点赞、在看、分享三连!
标题:SpringBoot + 连接池获取超时排查:HikariCP 获取连接超时?自动 dump 线程栈定位
作者:jiangyi
地址:http://www.jiangyi.space/articles/2026/04/27/1777080141786.html
公众号:服务端技术精选
- 一、连接池获取超时的痛点
- 二、传统排查方案的局限性
- 1. 手动查看日志
- 2. 手动dump线程
- 3. 调整连接池参数
- 三、终极方案:HikariCP 连接池超时自动 dump 机制
- 四、方案详解
- 1. 核心原理
- 2. SpringBoot实现
- (1)HikariCP监控配置类
- (2)连接池监控服务
- (3)线程dump服务
- (4)告警服务
- (5)定时监控任务
- (6)连接池健康检查控制器
- 3. 配置详解
- application.yml配置
- HikariCP关键参数说明
- 五、实战分析
- 1. 典型问题场景
- 场景一:慢SQL占用连接
- 场景二:连接泄漏
- 2. 优化策略
- 策略一:SQL优化
- 策略二:连接池参数优化
- 策略三:服务拆分
- 六、最佳实践
- 1. 监控配置
- 2. 告警策略
- 3. 问题排查流程
- 4. 预防措施
- 七、总结与展望
- 方案总结
- 未来优化方向
- 技术价值
- 八、写在最后
评论