Skip to main content

Remember a SpringBoot graceful shutdown failure problem

Mr.ChenAbout 3 minarticleJavaSpring Boot

Summary

Recently, the production environment EKS (Elastic Kubernetes Service) occasionally has the problem of POD being forcibly killed.

The project is the SpringBoot framework, which starts graceful shutdown and configures a shutdown duration of 120 seconds. Since it was forcibly killed, it must be because the application was not stopped normally within 120 seconds.

After log analysis, it is because the auto scaling is enabled, and the project received a kill signal before it started. At this time, the startup thread and the shutdown thread run in parallel, causing the stopped thread to be started again, which eventually leads to graceful shutdown failure. After 120 seconds, the container was killed -9.

Process analysis

Why do you say that "the startup thread and the shutdown thread run in parallel, causing the stopped thread to be started again?" Is it that SpringBoot is so LOW, so it doesn't know how to let the startup thread finish running before executing the shutdown thread?

With questions in mind, let's go through the Spring source code and get to the bottom of it.

First look at the startup process of SpringBoot

org.springframework.boot.SpringApplication#run(java.lang.String...)

public ConfigurableApplicationContext run(String... args) {
     StopWatch stopWatch = new StopWatch();
     stopWatch. start();
     ConfigurableApplicationContext context = null;
     Collection<SpringBootExceptionReporter> exceptionReporters = new ArrayList<>();
     configureHeadlessProperty();
     SpringApplicationRunListeners listeners = getRunListeners(args);
     listeners. starting();
     try {
       ApplicationArguments applicationArguments = new DefaultApplicationArguments(args);
       ConfigurableEnvironment environment = prepareEnvironment(listeners, applicationArguments);
       configureIgnoreBeanInfo(environment);
       Banner printedBanner = printBanner(environment);
       // 1. Create ApplicationContext
       context = createApplicationContext();
       exceptionReporters = getSpringFactoriesInstances(SpringBootExceptionReporter.class,
           new Class[] { ConfigurableApplicationContext. class }, context);
       prepareContext(context, environment, listeners, applicationArguments, printedBanner);
       // 2. Initialize the Bean, and publish the ContextRefreshedEvent event after completion
       refreshContext(context);
       afterRefresh(context, applicationArguments);
       stopWatch. stop();
       if (this. logStartupInfo) {
         new StartupInfoLogger(this. mainApplicationClass). logStarted(getApplicationLog(), stopWatch);
       }
       // 3. Execute the ApplicationListener and release the ApplicationStartedEvent event after completion
       listeners.started(context);
       // 4. Execute ApplicationRunner and CommandLineRunner
       callRunners(context, applicationArguments);
     }
     catch (Throwable ex) {
       handleRunFailure(context, ex, exceptionReporters, listeners);
       throw new IllegalStateException(ex);
     }

     try {
       // 5. Publish the ApplicationReadyEvent event
       listeners. running(context);
     }
     catch (Throwable ex) {
       handleRunFailure(context, ex, exceptionReporters, null);
       throw new IllegalStateException(ex);
     }
     return context;
   }

As can be seen from the above code, there is no lock in the main startup process of SpringBoot, that is to say, when the startup is halfway, and the shutdown signal is received at this time, it will continue to start and close the thread at the same time. Wouldn't that be a mess?

With doubts, we continue to dig deeper into the source code. Follow from refreshContext(context) to the inside

org.springframework.context.support.AbstractApplicationContext#refresh

public void refresh() throws BeansException, IllegalStateException {
     synchronized(this.startupShutdownMonitor) {
         this. prepareRefresh();
         ConfigurableListableBeanFactory beanFactory = this.obtainFreshBeanFactory();
         this. prepareBeanFactory(beanFactory);

         try {
             this. postProcessBeanFactory(beanFactory);
             this.invokeBeanFactoryPostProcessors(beanFactory);
             this.registerBeanPostProcessors(beanFactory);
             this.initMessageSource();
             this.initApplicationEventMulticaster();
             this.onRefresh();
             this. registerListeners();
             this. finishBeanFactoryInitialization(beanFactory);
             this. finishRefresh();
         } catch (BeansException var9) {
             if (this. logger. isWarnEnabled()) {
                 this.logger.warn("Exception encountered during context initialization - canceling refresh attempt: " + var9);
             }

             this.destroyBeans();
             this. cancelRefresh(var9);
             throw var9;
         } finally {
             this.resetCommonCaches();
         }

     }
}

Then look at the close() method of this class

public void close() {
     synchronized(this.startupShutdownMonitor) {
         this. doClose();
         if (this. shutdownHook != null) {
             try {
                 Runtime.getRuntime().removeShutdownHook(this.shutdownHook);
             } catch (IllegalStateException var4) {
             }
         }

     }
}

See what's coming? Yes, both methods have added a synchronized synchronization lock, and the lock object is startupShutdownMonitor.

This makes it clear, It turns out that SpringBoot will only guarantee the atomicity of a certain node, but will not guarantee that the entire startup process will not be interrupted.

When the refresh method is being executed, the shutdown signal is just received, and the close method will be executed after the refresh method is executed. That is to say, if the custom thread pool is started in the ContextRefreshedEvent event and closed in the ContextClosedEvent event, the order can be guaranteed. But if the thread pool is started in the ApplicationStartedEvent, ApplicationRunner, CommandLineRunner, and ApplicationReadyEvent events, it is possible to execute the ContextClosedEvent event method before executing these events. At this time, it is necessary to ensure the order of execution by yourself.

private final AtomicBoolean started = new AtomicBoolean(false);
private final AtomicBoolean stopped = new AtomicBoolean(false);
private final Object startupShutdownMonitor = new Object();
private final ScheduledExecutorService scheduledExecutorService = Executors. newScheduledThreadPool(1);

@Inject
private ThreadPoolExecutor threadPoolExecutor;
@Inject
private RedisTemplate<String, String> redisTemplate;

@EventListener
public void start(ApplicationReadyEvent event) {
   synchronized (this.startupShutdownMonitor) {
     if (stopped. get() || started. getAndSet(true)) {
       return;
     }
     // Start the scheduled task
     scheduledExecutorService.scheduleAtFixedRate(() -> {
       // Get tasks from the redis queue
       List<String> list = redisTemplate.opsForList().range("key", 0, 10);
       if (list != null) {
         // Start the thread pool to execute tasks asynchronously
         list.forEach(e -> threadPoolExecutor.execute(() -> {
           // execute the task
         }));
       }
     }, 0, 10, TimeUnit.HOURS);
   }
}

@EventListener
public void stop(ContextClosedEvent event) {
   synchronized (this.startupShutdownMonitor) {
     if (stopped. getAndSet(true) || !started. get()) {
       return;
     }
     // Stop the timing task first, let the tasks in the thread pool queue continue to execute
     scheduledExecutorService. shutdown();
   }
}

@PreDestroy
public void close() {
   synchronized (this.startupShutdownMonitor) {
     // Finally close the thread pool
     threadPoolExecutor. shutdownNow();
   }
}