一、问题背景:线上故障的惊魂时刻
某日下午,线上系统突然出现大量接口超时,错误日志显示:
Could not get JDBC Connection; nested exception is
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available,
request timed out after 30000ms.
更诡异的是:数据库服务器正常,网络连接正常,应用日志中却没有任何 SQL 执行超时。
这正是数据库连接池"假死"现象的典型特征。
二、核心概念:HikariCP 连接池机制
2.1 连接池工作原理
┌────────────────────────────────────────────────────────────────┐
│ HikariCP 连接池架构 │
├────────────────────────────────────────────────────────────────┤
│ │
│ 应用线程 │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Connection 请求 │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ 连接池管理器 │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Conn-1 │ │ Conn-2 │ │ Conn-3 │ ... │ │
│ │ │ ACTIVE │ │ IDLE │ │ ACTIVE │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ │ totalConnections: 20 │ │
│ │ activeConnections: 18 │ │
│ │ idleConnections: 2 │ │
│ │ threadsAwaitingConnection: 5 │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ MySQL 数据库 │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
2.2 连接泄漏的成因分析
┌────────────────────────────────────────────────────────────────┐
│ 连接泄漏成因分析 │
├────────────────────────────────────────────────────────────────┤
│ │
│ 1. 未关闭连接 │
│ try { │
│ Connection conn = ds.getConnection(); │
│ // 业务逻辑 │
│ // 忘记 conn.close() │
│ } catch (...) {} │
│ │
│ 2. 连接未正确归还 │
│ try { │
│ Connection conn = ds.getConnection(); │
│ if (condition) { │
│ return; // 早期返回,连接未关闭 │
│ } │
│ conn.close(); │
│ } catch (...) {} │
│ │
│ 3. 长事务占用连接 │
│ BEGIN TRANSACTION; │
│ // 处理大量数据,耗时过长 │
│ // 其他请求等待连接 │
│ COMMIT; │
│ │
│ 4. 异常处理不当 │
│ try { │
│ Connection conn = ds.getConnection(); │
│ throw new RuntimeException(); │
│ conn.close(); │
│ } catch (Exception e) { │
│ // 连接未关闭 │
│ } │
│ │
└────────────────────────────────────────────────────────────────┘
2.3 HikariCP 配置参数解析
| 参数 | 默认值 | 说明 |
|---|---|---|
maximumPoolSize | 10 | 连接池最大连接数 |
minimumIdle | 10 | 空闲时保留的最小连接数 |
connectionTimeout | 30000ms | 获取连接超时时间 |
idleTimeout | 600000ms | 空闲连接超时时间 |
maxLifetime | 1800000ms | 连接最大生命周期 |
connectionTestQuery | - | 连接测试 SQL |
leakDetectionThreshold | 0 (禁用) | 泄漏检测阈值 |
poolName | HikariPool | 连接池名称 |
三、实现方案:连接泄漏检测 + 线程 Dump 分析
3.1 方案架构设计
┌────────────────────────────────────────────────────────────────┐
│ 连接池健康监控架构 │
├────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ 泄漏检测 │───▶│ 告警通知 │ │
│ └──────────────────┘ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ 线程 Dump │───▶│ 堆栈分析 │ │
│ └──────────────────┘ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ 连接池监控 │───▶│ 性能指标 │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
3.2 HikariCP 连接池配置
@Configuration
public class HikariConfig {
@Bean
@ConfigurationProperties(prefix = "spring.datasource.hikari")
public DataSource dataSource() {
HikariConfigSource config = new HikariConfigSource();
// 基础配置
config.setPoolName("OrderPool");
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
// 超时配置
config.setConnectionTimeout(30000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);
// 泄漏检测配置(关键!)
config.setLeakDetectionThreshold(60000); // 60秒未归还视为泄漏
// 连接测试
config.setConnectionTestQuery("SELECT 1");
// 其他优化配置
config.setAutoCommit(true);
config.setConnectionInitSql("SELECT 1");
return new HikariDataSource(config);
}
}
3.3 连接泄漏检测工具
@Component
@Slf4j
public class ConnectionLeakDetector {
@Autowired
private DataSource dataSource;
private ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
/**
* 启动连接泄漏检测
*/
@PostConstruct
public void start() {
// 每分钟检查一次连接池状态
scheduler.scheduleAtFixedRate(this::checkForLeaks, 1, 1, TimeUnit.MINUTES);
}
@PreDestroy
public void stop() {
scheduler.shutdown();
}
private void checkForLeaks() {
if (!(dataSource instanceof HikariDataSource)) {
return;
}
HikariDataSource hikariDataSource = (HikariDataSource) dataSource;
HikariPool pool = hikariDataSource.getHikariPoolMXBean();
if (pool == null) {
return;
}
try {
PoolStats stats = pool.getPoolStats();
log.info("HikariCP Pool Stats: active={}, idle={}, total={}, waiting={}",
stats.getActiveConnections(),
stats.getIdleConnections(),
stats.getTotalConnections(),
stats.getThreadsAwaitingConnection());
// 检测潜在泄漏
if (stats.getThreadsAwaitingConnection() > 0) {
log.warn("Connection pool is under pressure! {} threads waiting for connection",
stats.getThreadsAwaitingConnection());
// 触发线程 Dump 分析
triggerThreadDump();
}
// 监控连接泄漏
if (stats.getActiveConnections() > hikariDataSource.getMaximumPoolSize() * 0.8) {
log.error("Connection pool usage exceeds 80%! Active: {}, Max: {}",
stats.getActiveConnections(),
hikariDataSource.getMaximumPoolSize());
// 发送告警
sendAlert(stats);
}
} catch (Exception e) {
log.error("Failed to check connection pool", e);
}
}
/**
* 触发线程 Dump
*/
private void triggerThreadDump() {
StringBuilder dump = new StringBuilder();
Map<Thread, StackTraceElement[]> allStacks = Thread.getAllStackTraces();
dump.append("\n========== Connection-Related Thread Dump ==========\n");
for (Map.Entry<Thread, StackTraceElement[]> entry : allStacks.entrySet()) {
Thread thread = entry.getKey();
StackTraceElement[] stack = entry.getValue();
// 只关注等待数据库连接的线程
String threadInfo = thread.getName() + " (state=" + thread.getState() +
", daemon=" + thread.isDaemon() + ")";
// 检查是否在等待连接
boolean waitingForConnection = Arrays.stream(stack)
.anyMatch(s -> s.toString().contains("Hikari") ||
s.toString().contains("getConnection"));
if (waitingForConnection || thread.getName().contains("Hikari")) {
dump.append("\n").append(threadInfo).append("\n");
for (StackTraceElement element : stack) {
dump.append(" at ").append(element).append("\n");
}
}
}
log.error(dump.toString());
}
private void sendAlert(PoolStats stats) {
// 发送告警(可对接钉钉、企业微信等)
log.error("ALERT: Connection pool exhaustion detected! Active: {}, Waiting: {}",
stats.getActiveConnections(),
stats.getThreadsAwaitingConnection());
}
}
3.4 线程 Dump 分析工具
@Component
@Slf4j
public class ThreadDumpAnalyzer {
/**
* 获取完整线程 Dump
*/
public String generateFullDump() {
StringBuilder dump = new StringBuilder();
dump.append("\n");
dump.append("================================================================================\n");
dump.append(" FULL THREAD DUMP\n");
dump.append("================================================================================\n");
dump.append("Generated at: ").append(LocalDateTime.now()).append("\n");
dump.append("Total threads: ").append(Thread.activeCount()).append("\n\n");
Map<Thread, StackTraceElement[]> allStacks = Thread.getAllStackTraces();
// 按线程状态分组统计
Map<Thread.State, List<Thread>> threadsByState = allStacks.keySet().stream()
.collect(Collectors.groupingBy(Thread::getState));
dump.append("Thread State Summary:\n");
for (Map.Entry<Thread.State, List<Thread>> entry : threadsByState.entrySet()) {
dump.append(" ").append(entry.getKey()).append(": ").append(entry.getValue().size()).append("\n");
}
dump.append("\n");
// 详细线程信息
for (Map.Entry<Thread, StackTraceElement[]> entry : allStacks.entrySet()) {
Thread thread = entry.getKey();
StackTraceElement[] stack = entry.getValue();
dump.append("--------------------------------------------------------------------------------\n");
dump.append('"').append(thread.getName()).append('"');
dump.append(" nid=").append(thread.getId());
dump.append(" state=").append(thread.getState());
dump.append(" daemon=").append(thread.isDaemon());
dump.append(" priority=").append(thread.getPriority());
dump.append("\n");
// Thread group
ThreadGroup group = thread.getThreadGroup();
if (group != null) {
dump.append(" java.lang.ThreadGroup: ").append(group.getName());
dump.append(" [maximumActiveThreads=").append(group.activeCount()).append("]\n");
}
// Lock info
dump.append(" java.lang.Thread@").append(Integer.toHexString(thread.hashCode())).append("\n");
// Stack trace
for (StackTraceElement element : stack) {
dump.append(" at ");
dump.append(element.getClassName()).append(".");
dump.append(element.getMethodName());
dump.append("(").append(element.getFileName());
if (element.getLineNumber() >= 0) {
dump.append(":").append(element.getLineNumber());
}
dump.append(")\n");
}
dump.append("\n");
}
return dump.toString();
}
/**
* 分析数据库连接相关线程
*/
public String analyzeDatabaseThreads() {
StringBuilder analysis = new StringBuilder();
analysis.append("\n");
analysis.append("================================================================================\n");
analysis.append(" DATABASE CONNECTION ANALYSIS\n");
analysis.append("================================================================================\n\n");
Map<Thread, StackTraceElement[]> allStacks = Thread.getAllStackTraces();
// 1. 等待连接的线程
List<Thread> waitingThreads = new ArrayList<>();
// 2. 持有连接的线程
List<Thread> holdingThreads = new ArrayList<>();
// 3. HikariCP 内部线程
List<Thread> hikariThreads = new ArrayList<>();
for (Map.Entry<Thread, StackTraceElement[]> entry : allStacks.entrySet()) {
Thread thread = entry.getKey();
String threadName = thread.getName().toLowerCase();
StackTraceElement[] stack = entry.getValue();
if (threadName.contains("hikari")) {
hikariThreads.add(thread);
}
boolean waitingForConn = Arrays.stream(stack)
.anyMatch(s -> s.toString().contains("getConnection") ||
s.toString().contains("Hikari"));
boolean holdingConn = Arrays.stream(stack)
.anyMatch(s -> s.toString().contains("java.sql") ||
s.toString().contains("Connection") ||
s.toString().contains("PreparedStatement"));
if (waitingForConn && thread.getState() == Thread.State.WAITING) {
waitingThreads.add(thread);
}
if (holdingConn) {
holdingThreads.add(thread);
}
}
// 输出分析结果
analysis.append("1. HikariCP Pool Threads:\n");
analysis.append(" Count: ").append(hikariThreads.size()).append("\n");
for (Thread t : hikariThreads) {
analysis.append(" - ").append(t.getName())
.append(" (state=").append(t.getState()).append(")\n");
}
analysis.append("\n2. Threads WAITING for connection:\n");
analysis.append(" Count: ").append(waitingThreads.size()).append("\n");
for (Thread t : waitingThreads) {
analysis.append(" - ").append(t.getName())
.append(" (state=").append(t.getState()).append(")\n");
for (StackTraceElement s : t.getStackTrace()) {
analysis.append(" at ").append(s).append("\n");
}
}
analysis.append("\n3. Threads HOLDING database connections:\n");
analysis.append(" Count: ").append(holdingThreads.size()).append("\n");
for (Thread t : holdingThreads) {
analysis.append(" - ").append(t.getName())
.append(" (state=").append(t.getState()).append(")\n");
// 只输出前5个堆栈
StackTraceElement[] stack = t.getStackTrace();
for (int i = 0; i < Math.min(5, stack.length); i++) {
analysis.append(" at ").append(stack[i]).append("\n");
}
}
return analysis.toString();
}
/**
* 检测死锁
*/
public boolean detectDeadlock() {
ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean();
long[] deadlockedThreads = threadMXBean.findDeadlockedThreads();
if (deadlockedThreads != null && deadlockedThreads.length > 0) {
log.error("DEADLOCK DETECTED! {} threads in deadlock", deadlockedThreads.length);
return true;
}
return false;
}
}
3.5 连接池监控指标
@Component
@Slf4j
public class HikariPoolMetrics {
private final MeterRegistry meterRegistry;
private final DataSource dataSource;
public HikariPoolMetrics(MeterRegistry meterRegistry, DataSource dataSource) {
this.meterRegistry = meterRegistry;
this.dataSource = dataSource;
registerMetrics();
scheduleMetricsCollection();
}
private void registerMetrics() {
// 连接池大小指标
Gauge.builder("hikari.connections.active", this, HikariPoolMetrics::getActiveConnections)
.register(meterRegistry);
Gauge.builder("hikari.connections.idle", this, HikariPoolMetrics::getIdleConnections)
.register(meterRegistry);
Gauge.builder("hikari.connections.total", this, HikariPoolMetrics::getTotalConnections)
.register(meterRegistry);
Gauge.builder("hikari.connections.waiting", this, HikariPoolMetrics::getWaitingThreads)
.register(meterRegistry);
Gauge.builder("hikari.connections.max", this, HikariPoolMetrics::getMaxConnections)
.register(meterRegistry);
}
private void scheduleMetricsCollection() {
Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(() -> {
try {
collectMetrics();
} catch (Exception e) {
log.error("Failed to collect pool metrics", e);
}
}, 0, 30, TimeUnit.SECONDS);
}
private void collectMetrics() {
if (dataSource instanceof HikariDataSource) {
HikariDataSource hds = (HikariDataSource) dataSource;
HikariPoolMXBean poolBean = hds.getHikariPoolMXBean();
if (poolBean != null) {
log.debug("Pool stats - active: {}, idle: {}, waiting: {}",
poolBean.getActiveConnections(),
poolBean.getIdleConnections(),
poolBean.getThreadsAwaitingConnection());
}
}
}
private int getActiveConnections() {
if (dataSource instanceof HikariDataSource) {
HikariPoolMXBean bean = ((HikariDataSource) dataSource).getHikariPoolMXBean();
return bean != null ? bean.getActiveConnections() : 0;
}
return 0;
}
private int getIdleConnections() {
if (dataSource instanceof HikariDataSource) {
HikariPoolMXBean bean = ((HikariDataSource) dataSource).getHikariPoolMXBean();
return bean != null ? bean.getIdleConnections() : 0;
}
return 0;
}
private int getTotalConnections() {
if (dataSource instanceof HikariDataSource) {
HikariPoolMXBean bean = ((HikariDataSource) dataSource).getHikariPoolMXBean();
return bean != null ? bean.getTotalConnections() : 0;
}
return 0;
}
private int getWaitingThreads() {
if (dataSource instanceof HikariDataSource) {
HikariPoolMXBean bean = ((HikariDataSource) dataSource).getHikariPoolMXBean();
return bean != null ? bean.getThreadsAwaitingConnection() : 0;
}
return 0;
}
private int getMaxConnections() {
if (dataSource instanceof HikariDataSource) {
return ((HikariDataSource) dataSource).getMaximumPoolSize();
}
return 0;
}
}
3.6 连接泄漏追踪拦截器
@Component
@Aspect
@Slf4j
public class ConnectionLeakTrackingAspect {
@Autowired
private DataSource dataSource;
private final ConcurrentHashMap<String, ConnectionTrace> connectionTraces = new ConcurrentHashMap<>();
@Around("execution(* javax.sql.DataSource.getConnection(..))")
public Object trackConnectionAcquisition(ProceedingJoinPoint joinPoint) throws Throwable {
String traceId = UUID.randomUUID().toString();
long startTime = System.currentTimeMillis();
// 获取调用堆栈
StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace();
String caller = formatStackTrace(stackTrace);
ConnectionTrace trace = ConnectionTrace.builder()
.traceId(traceId)
.acquireTime(startTime)
.caller(caller)
.threadName(Thread.currentThread().getName())
.build();
connectionTraces.put(traceId, trace);
try {
Connection connection = (Connection) joinPoint.proceed();
// 包装连接以跟踪关闭
return new TrackedConnection(connection, traceId, () -> {
connectionTraces.remove(traceId);
});
} catch (Exception e) {
connectionTraces.remove(traceId);
throw e;
}
}
/**
* 报告未关闭的连接
*/
@Scheduled(fixedDelay = 60000)
public void reportUnclosedConnections() {
long now = System.currentTimeMillis();
long threshold = 30000; // 30秒未关闭
List<ConnectionTrace> leaks = connectionTraces.values().stream()
.filter(t -> (now - t.getAcquireTime()) > threshold)
.collect(Collectors.toList());
if (!leaks.isEmpty()) {
log.error("=== Connection Leak Detected! ===");
for (ConnectionTrace trace : leaks) {
log.error("Leaked connection: traceId={}, duration={}ms, caller={}",
trace.getTraceId(),
now - trace.getAcquireTime(),
trace.getCaller());
}
}
}
private String formatStackTrace(StackTraceElement[] stackTrace) {
// 获取调用者的堆栈信息
for (int i = 2; i < Math.min(stackTrace.length, 10); i++) {
StackTraceElement element = stackTrace[i];
if (!element.getClassName().contains("Hikari")) {
return element.getClassName() + "." + element.getMethodName() +
"(" + element.getFileName() + ":" + element.getLineNumber() + ")";
}
}
return "Unknown";
}
@Builder
@Data
private static class ConnectionTrace {
private String traceId;
private long acquireTime;
private String caller;
private String threadName;
}
/**
* 包装连接以跟踪关闭
*/
private static class TrackedConnection implements InvocationHandler {
private final Connection delegate;
private final String traceId;
private final Runnable onClose;
public TrackedConnection(Connection delegate, String traceId, Runnable onClose) {
this.delegate = delegate;
this.traceId = traceId;
this.onClose = onClose;
}
@Override
public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
if (method.getName().equals("close")) {
onClose.run();
}
return method.invoke(delegate, args);
}
}
}
四、配置文件示例
server:
port: 8080
spring:
application:
name: hikari-connection-pool-demo
datasource:
url: jdbc:mysql://localhost:3306/example_db?useSSL=false&serverTimezone=UTC
username: root
password: password
driver-class-name: com.mysql.cj.jdbc.Driver
type: com.zaxxer.hikari.HikariDataSource
hikari:
pool-name: OrderHikariPool
maximum-pool-size: 20
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000
connection-test-query: SELECT 1
# 泄漏检测(生产环境建议开启)
leak-detection-threshold: 60000
management:
endpoints:
web:
exposure:
include: health, info, prometheus, metrics
metrics:
tags:
application: ${spring.application.name}
logging:
level:
com.zaxxer.hikari: DEBUG
com.zaxxer.hikari.pool: TRACE
五、线程 Dump 分析实战
5.1 常见连接池假死场景
场景一:连接未关闭导致泄漏
"HikariPool-1 housekeeper" #50 daemon prio=5 os_prio=31 tid=0x00007f8a9c01a800
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take()
"http-nio-8080-exec-1" #30 daemon prio=5 os_prio=31 tid=0x00007f8a9c03c800
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait()
at java.lang.Object.wait(Object.java:502)
at com.zaxxer.hikari.pool.PoolBase.tryEndTask()
"application-thread" #25 prio=5 os_prio=31 tid=0x00007f8a9b80f800
java.lang.Thread.State: RUNNABLE
at com.mysql.cj.jdbc.ConnectionImpl.prepareStatement()
at sun.reflect.NativeMethodAccessorImpl.invoke0()
分析:应用线程持有连接但未释放
场景二:长事务占用连接
"batch-processing-thread" #42 prio=5 os_prio=31 tid=0x00007f8a9c04a800
java.lang.Thread.State: BLOCKED (on object monitor)
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException()
java.sql.SQLException: Connection is not available, request timed out after 30000ms
分析:批量处理线程占用连接时间过长
5.2 快速诊断命令
# 生成线程 Dump
jstack -l <pid> > thread_dump.txt
# 查看 HikariCP 线程
grep -A 20 "HikariPool" thread_dump.txt
# 统计线程状态
grep "java.lang.Thread.State" thread_dump.txt | sort | uniq -c
# 查看 WAITING 线程的完整堆栈
grep -B 5 "WAITING" thread_dump.txt | grep -A 10 "HTTP"
# 检测死锁
jstack -ld <pid> | grep -A 10 "deadlock"
六、监控告警配置
6.1 Prometheus 告警规则
groups:
- name: hikari_connection_pool_alerts
rules:
- alert: HikariPoolConnectionTimeout
expr: hikari_connections_waiting > 0
for: 1m
labels:
severity: critical
annotations:
summary: "HikariCP 连接获取超时"
description: "有 {{ $value }} 个线程在等待获取数据库连接"
- alert: HikariPoolExhaustion
expr: hikari_connections_active / hikari_connections_max > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "HikariCP 连接池耗尽"
description: "连接池使用率超过 90%,活跃: {{ $value }}"
- alert: HikariPoolLeakDetected
expr: increase(hikari_connections_leak[5m]) > 0
for: 1m
labels:
severity: error
annotations:
summary: "检测到数据库连接泄漏"
description: "HikariCP 检测到连接泄漏"
七、最佳实践建议
7.1 连接使用规范
| 规范 | 说明 |
|---|---|
| 使用 try-with-resources | 自动关闭连接,推荐方式 |
| 及时关闭 | 业务完成后立即关闭连接 |
| 避免长事务 | 尽量缩短事务持有时间 |
| 异常处理 | 确保在 finally 中关闭连接 |
| 连接复用 | 不要在循环中反复获取连接 |
7.2 连接池配置建议
| 配置项 | 推荐值 | 说明 |
|---|---|---|
maximumPoolSize | CPU核心数×2 | 根据数据库并发能力调整 |
minimumIdle | maximumPoolSize的50% | 保持适量预热连接 |
connectionTimeout | 30秒 | 足够等待时间 |
idleTimeout | 10分钟 | 非高峰时段回收 |
maxLifetime | 30分钟 | 防止连接老化 |
leakDetectionThreshold | 60秒 | 生产环境建议开启 |
7.3 监控检查清单
- 定期检查连接池状态
- 配置连接泄漏检测
- 设置超时告警
- 定期生成线程 Dump 分析
- 监控慢查询
- 检查数据库连接数限制
互动话题
您在排查数据库连接池问题时有哪些经验?是否遇到过连接泄漏导致的故障?欢迎在评论区分享!
