记一个 SpringBoot 优雅停机失败问题
概要
最近,生产环境 EKS(Elastic Kubernetes Service)偶尔出现 POD 被强杀问题。
项目是 SpringBoot 框架,启动了优雅停机,并且配置了停机时长为 120 秒。既然是被强杀,那肯定是因为在 120 秒内没有正常停掉应用。
经过日志分析,是由于开了弹性伸缩,项目还没启动完就收到了 kill 信号,这时候启动线程和停机线程并行跑,导致被停掉的线程又被启动起来,最终导致优雅停机失败,过了 120 秒后,容器被 kill -9 了。
过程分析
为什么说“启动线程和停机线程并行跑,导致被停掉的线程又被启动起来”,难道 SpringBoot 就这么 LOW ,就不知道让启动线程跑完再执行停机线程?
带着问题,我们一起翻一下 Spring 源码,来个刨根问底。
首先来看一下 SpringBoot 的启动过程
org.springframework.boot.SpringApplication#run(java.lang.String...)
public ConfigurableApplicationContext run(String... args) {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
ConfigurableApplicationContext context = null;
Collection<SpringBootExceptionReporter> exceptionReporters = new ArrayList<>();
configureHeadlessProperty();
SpringApplicationRunListeners listeners = getRunListeners(args);
listeners.starting();
try {
ApplicationArguments applicationArguments = new DefaultApplicationArguments(args);
ConfigurableEnvironment environment = prepareEnvironment(listeners, applicationArguments);
configureIgnoreBeanInfo(environment);
Banner printedBanner = printBanner(environment);
// 1.创建ApplicationContext
context = createApplicationContext();
exceptionReporters = getSpringFactoriesInstances(SpringBootExceptionReporter.class,
new Class[] { ConfigurableApplicationContext.class }, context);
prepareContext(context, environment, listeners, applicationArguments, printedBanner);
// 2.初始化Bean,完成后会发布ContextRefreshedEvent事件
refreshContext(context);
afterRefresh(context, applicationArguments);
stopWatch.stop();
if (this.logStartupInfo) {
new StartupInfoLogger(this.mainApplicationClass).logStarted(getApplicationLog(), stopWatch);
}
// 3.执行ApplicationListener,完成后发布ApplicationStartedEvent事件
listeners.started(context);
// 4.执行ApplicationRunner和CommandLineRunner
callRunners(context, applicationArguments);
}
catch (Throwable ex) {
handleRunFailure(context, ex, exceptionReporters, listeners);
throw new IllegalStateException(ex);
}
try {
// 5.发布ApplicationReadyEvent事件
listeners.running(context);
}
catch (Throwable ex) {
handleRunFailure(context, ex, exceptionReporters, null);
throw new IllegalStateException(ex);
}
return context;
}
从上面代码可以看出来,SpringBoot 主要启动流程中是没有加锁的,那就是说当启动了一半,这时候收到了 shutdown 信号,那就会一边继续启动,一边又关闭线程。那不就乱套了吗?
带着疑问,我们继续往深处翻看源码。从refreshContext(context)
往里面跟,一直跟到
org.springframework.context.support.AbstractApplicationContext#refresh
public void refresh() throws BeansException, IllegalStateException {
synchronized(this.startupShutdownMonitor) {
this.prepareRefresh();
ConfigurableListableBeanFactory beanFactory = this.obtainFreshBeanFactory();
this.prepareBeanFactory(beanFactory);
try {
this.postProcessBeanFactory(beanFactory);
this.invokeBeanFactoryPostProcessors(beanFactory);
this.registerBeanPostProcessors(beanFactory);
this.initMessageSource();
this.initApplicationEventMulticaster();
this.onRefresh();
this.registerListeners();
this.finishBeanFactoryInitialization(beanFactory);
this.finishRefresh();
} catch (BeansException var9) {
if (this.logger.isWarnEnabled()) {
this.logger.warn("Exception encountered during context initialization - cancelling refresh attempt: " + var9);
}
this.destroyBeans();
this.cancelRefresh(var9);
throw var9;
} finally {
this.resetCommonCaches();
}
}
}
然后再看下这个类的close()
方法
public void close() {
synchronized(this.startupShutdownMonitor) {
this.doClose();
if (this.shutdownHook != null) {
try {
Runtime.getRuntime().removeShutdownHook(this.shutdownHook);
} catch (IllegalStateException var4) {
}
}
}
}
看出什么来了吗?是的,两个方法都加了synchronized
同步锁,锁对象都是startupShutdownMonitor
。
这就明白了,原来 SpringBoot 只会保证某一个节点的原子性,而不会保证整个启动过程不被打断。
当正在执行 refresh 方法时,刚好接收到 shutdown 信号,会等 refresh 方法执行完,再执行 close 方法。那就是说,如果自定义线程池是在 ContextRefreshedEvent 事件中启动的,且在 ContextClosedEvent 事件中关闭的,那是能保证顺序的。但如果线程池是在 ApplicationStartedEvent 、ApplicationRunner 、CommandLineRunner 和 ApplicationReadyEvent 事件中启动的,那就有可能先执行了 ContextClosedEvent 事件方法再执行这些事件。这时候,就要自己做好执行顺序的保障。
private final AtomicBoolean started = new AtomicBoolean(false);
private final AtomicBoolean stopped = new AtomicBoolean(false);
private final Object startupShutdownMonitor = new Object();
private final ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(1);
@Inject
private ThreadPoolExecutor threadPoolExecutor;
@Inject
private RedisTemplate<String, String> redisTemplate;
@EventListener
public void start(ApplicationReadyEvent event) {
synchronized (this.startupShutdownMonitor) {
if (stopped.get() || started.getAndSet(true)) {
return;
}
// 启动定时任务
scheduledExecutorService.scheduleAtFixedRate(() -> {
// 从redis队列里获取任务
List<String> list = redisTemplate.opsForList().range("key", 0, 10);
if (list != null) {
// 启动线程池异步执行任务
list.forEach(e -> threadPoolExecutor.execute(() -> {
// 执行任务
}));
}
}, 0, 10, TimeUnit.HOURS);
}
}
@EventListener
public void stop(ContextClosedEvent event) {
synchronized (this.startupShutdownMonitor) {
if (stopped.getAndSet(true) || !started.get()) {
return;
}
// 先停止定时任务,让线程池队列里的任务继续执行
scheduledExecutorService.shutdown();
}
}
@PreDestroy
public void close() {
synchronized (this.startupShutdownMonitor) {
// 最后关闭线程池
threadPoolExecutor.shutdownNow();
}
}