In this post we will quickly go through how we can do graceful shutdown in Go Apps.
Before how, let’s briefly look into why?
- To do one off work like migrations
- For different deployment strategies like: Rolling, Blue/Green, Canary
What do we achieve with this
- Avoid data loss: partial transactions, inconsistent states and corrupted files
- Active requests are processed and completed
- Plays well with orchestrators
- Clean up resources
How do we do this
package main
import (
"fmt"
"os"
"os/signal"
"time"
"sync/atomic"
"syscall"
"github.com/gofiber/fiber/v2"
)
var isShuttingDown atomic.Bool
const GRACE_TIME_PERIOD = 24
func main() {
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)
done := make(chan struct{}, 1)
go func() {
sig := <-sigs
fmt.Println(sig)
isShuttingDown.Store(true)
done <- struct{}{}
}()
app := fiber.New()
app.Use(func(c *fiber.Ctx) error {
if isShuttingDown.Load() {
return c.Status(503).SendString("Shutting Down!!")
}
return c.Next()
})
app.Get("/", func(c *fiber.Ctx) error {
return c.SendString("Hello, World!")
})
app.Get("/slow-request", func(c *fiber.Ctx) error {
time.Sleep(8 * time.Second)
return c.SendString("Slow request Processed")
})
app.Get("/very-slow-request", func(c *fiber.Ctx) error {
time.Sleep(40 * time.Second)
return c.SendString("Very Slow request Processed")
})
app.Get("/health", func(c *fiber.Ctx) error {
return c.SendString("Up")
})
go func() {
fmt.Println("starting the server")
err := app.Listen(":3000"); err != nil {
fmt.Println("Error occurred while listening", err.Error())
}
}()
<-done
signal.Stop(sigs)
fmt.Println("Initiating Graceful Shutdown")
err := app.ShutdownWithTimeout(GRACE_TIME_PERIOD * time.Second)
if err != nil {
fmt.Println("Error occurred while shutting down", err.Error())
}
fmt.Println("shutting down!!!")
}
Components
- Rest App & Handlers
- Signal Capture
- Middleware
- Close Connections
- Release resources
Rest App and Handlers
In our case we are using Go-F iber for our rest app.
We have 4 endpoints
- / - Returns “Hello, World!”
- /slow-request - This is slow endpoint, takes 8 seconds to complete
- /very-slow-request - This is very slow endpoint, takes around 40 seconds
- /health - returns UP
Our goal is once we get a shutdown signal
- Block any new incoming requests
- Process active requests
Signal Capture
For graceful termination, app is sent a SIGTERM signal and given some grace time to finish its activities and then shutdown.
If the app takes more than this grace time period, the orchestrator will kill the app.
This grace time is configurable and generally defaults to 30 seconds.
In our implementation, we keep a 20% buffer and set internal frace time as 24 seconds.
We listen for SIGINT and SIGTERM signal.
Once we get this signal we make a note that app is shutting down in the flag “isShuttingDown”.
After receiving the signal, we initiate shutdown with timeout.
Middleware
In the middle ware we check if the app is shutting down or not.
If the app is shutting down, we respond with 503 and message
Else we let it reach to next stage, handlers in this case
This allows us to block new incoming requests.
Release resources
Another thing that we would need is to close connections to dbs, caches and queues. Since in go, after opening a connection we defer the close. This plays well with what we want to do and once shutdown starts the connections are closed in reverse order.
Allocated memory, file descriptors and os level resources are automatically reclaimed by the operating system.
What happens when a long running request is initiated just before we get the shutdown signal
In our current case, slow-request will be completed whereas very-slow-request will be cancelled prematurely.
We can handle this:
- We can increase the grace time period, but before doing that we should think if our approach is correct. Are we doing async work in this request and treating it as synchronous work? if yes, shouldn’t we change the architecture.
- If it is a read request we can just let it fail and the client figures out how to handle failure.
- If it is some task processing, the task could be again marked to pending stage and can be picked up again.
Why do we need a buffered channel for signal channel
Package signal will not block sending to c: the caller must ensure
that c has sufficient buffer space to keep up with the expected
signal rate. For a channel used for notification of just one signal value,
a buffer of size 1 is sufficient.
If we do not have buffer this may cause blocking behaviour till receiver is ready.
I found these resources really useful