In this post we will quickly go through how we can do graceful shutdown in Go Apps.

Before how, let’s briefly look into why?

To do one off work like migrations
For different deployment strategies like: Rolling, Blue/Green, Canary

What do we achieve with this

Avoid data loss: partial transactions, inconsistent states and corrupted files
Active requests are processed and completed
Plays well with orchestrators
Clean up resources

How do we do this

package main

import (
	"fmt"
	"os"
	"os/signal"
	"time"

	"sync/atomic"
	"syscall"

	"github.com/gofiber/fiber/v2"
)

var isShuttingDown atomic.Bool

const GRACE_TIME_PERIOD = 24

func main() {
	sigs := make(chan os.Signal, 1)

	signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)

	done := make(chan struct{}, 1)

	go func() {

		sig := <-sigs
		fmt.Println(sig)

		isShuttingDown.Store(true)

		done <- struct{}{}
	}()

	app := fiber.New()

	app.Use(func(c *fiber.Ctx) error {
		if isShuttingDown.Load() {
			return c.Status(503).SendString("Shutting Down!!")
		}
		return c.Next()
	})

	app.Get("/", func(c *fiber.Ctx) error {
		return c.SendString("Hello, World!")
	})

	app.Get("/slow-request", func(c *fiber.Ctx) error {
		time.Sleep(8 * time.Second)
		return c.SendString("Slow request Processed")
	})

	app.Get("/very-slow-request", func(c *fiber.Ctx) error {
		time.Sleep(40 * time.Second)
		return c.SendString("Very Slow request Processed")
	})

	app.Get("/health", func(c *fiber.Ctx) error {
		return c.SendString("Up")
	})

	go func() {
		fmt.Println("starting the server")
		err := app.Listen(":3000"); err != nil {
			fmt.Println("Error occurred while listening", err.Error())
		}
	}()

	<-done
	signal.Stop(sigs)

	fmt.Println("Initiating Graceful Shutdown")
	err := app.ShutdownWithTimeout(GRACE_TIME_PERIOD * time.Second)

	if err != nil {
		fmt.Println("Error occurred while shutting down", err.Error())
	}
	fmt.Println("shutting down!!!")
}

Components

Rest App & Handlers
Signal Capture
Middleware
Close Connections
Release resources

Rest App and Handlers

In our case we are using Go-F iber for our rest app.

We have 4 endpoints

/ - Returns “Hello, World!”
/slow-request - This is slow endpoint, takes 8 seconds to complete
/very-slow-request - This is very slow endpoint, takes around 40 seconds
/health - returns UP

Our goal is once we get a shutdown signal

Block any new incoming requests
Process active requests

Signal Capture

For graceful termination, app is sent a SIGTERM signal and given some grace time to finish its activities and then shutdown.

If the app takes more than this grace time period, the orchestrator will kill the app.

This grace time is configurable and generally defaults to 30 seconds.

In our implementation, we keep a 20% buffer and set internal frace time as 24 seconds.

We listen for SIGINT and SIGTERM signal.

Once we get this signal we make a note that app is shutting down in the flag “isShuttingDown”.

After receiving the signal, we initiate shutdown with timeout.

Middleware

In the middle ware we check if the app is shutting down or not.

If the app is shutting down, we respond with 503 and message

Else we let it reach to next stage, handlers in this case

This allows us to block new incoming requests.

Release resources

Another thing that we would need is to close connections to dbs, caches and queues. Since in go, after opening a connection we defer the close. This plays well with what we want to do and once shutdown starts the connections are closed in reverse order.

Allocated memory, file descriptors and os level resources are automatically reclaimed by the operating system.

What happens when a long running request is initiated just before we get the shutdown signal

In our current case, slow-request will be completed whereas very-slow-request will be cancelled prematurely.

We can handle this:

We can increase the grace time period, but before doing that we should think if our approach is correct. Are we doing async work in this request and treating it as synchronous work? if yes, shouldn’t we change the architecture.
If it is a read request we can just let it fail and the client figures out how to handle failure.
If it is some task processing, the task could be again marked to pending stage and can be picked up again.

Why do we need a buffered channel for signal channel

Package signal will not block sending to c: the caller must ensure
that c has sufficient buffer space to keep up with the expected
signal rate. For a channel used for notification of just one signal value,
a buffer of size 1 is sufficient.

If we do not have buffer this may cause blocking behaviour till receiver is ready.

I found these resources really useful