AWS RDS Proxy - It's not set and forget

by Yuval Oren, Co-Founder / CEO

We helped a client tackle a tricky AWS RDS Proxy issue: API spikes overwhelmed their database despite the proxy. Here’s what we learned

Every once in a while, they have a surge of API calls that launch a large number of Lambda functions, which then clog the RDS connection pool.

Well, the obvious textbook solution is to put an AWS RDS Proxy in front of RDS so it reuses connections to the database and prevents this exact behavior.

We did this initially - We pointed the Lambda functions to the AWS RDS Proxy endpoint, which reduced the impact of the issue.

A while later, they reached out saying that they had a spike in API calls, which again caused an issue with the database.

We started investigating the problem by looking at the AWS RDS Proxy CloudWatch metrics:

AWS RDS Proxy CloudWatch metrics with pinned sessions

(This screenshot is from a lab environment that I created to demonstrate the issue and fix)

You can clearly see that the number of client connections matches the database connections, which means that the RDS Proxy didn’t do what it was intended to do.

This led us to look at another stat, DatabaseConnectionsCurrentlySessionPinned, which, as you can see in the screenshot, matches the client and database connections.

DatabaseConnectionsCurrentlySessionPinned shows, at any moment, how many of the proxy’s open database connections are “stuck” to a single client because the session contains state (for example, an open transaction, a prepared statement, or a temporary table). When this number is high, every Lambda or application thread is holding its own private connection, so the proxy can’t reuse them.

So, what is causing this behavior?

The answer is hiding in the code, and in our case, the Golang Lambda function.

On the surface, the SQL queries are fairly straightforward. Mostly simple SELECT and SET queries like this:

	var uname string
	if err := db.QueryRowContext(ctx,
		`SELECT username FROM users WHERE id = ?`, userID).
		Scan(&uname); err != nil {
		return Response{}, err
	}

Seems harmless enough. But turns out that sometimes even simple code requires adjustments to work well with the AWS RDS Proxy.

The issue is caused by the default behavior of the Golang mysql driver - which sends the query to the server to prepare the statement (and replace the “?” with the userID).

And that’s what is causing the pinned sessions.

The solution was simple - pass the interpolateParams=true parameter to the database connection string:

	dsnSuffix := "&interpolateParams=true"
	dsn := fmt.Sprintf("%s:%s@tcp(%s)/%s?charset=utf8mb4&parseTime=true&loc=UTC%s",
		os.Getenv("DB_USER"), os.Getenv("DB_PASS"),
		os.Getenv("DB_HOST"), os.Getenv("DB_NAME"), dsnSuffix)

Now the query is prepared on the client side instead of the server.

And this is what it looks like in our lab environment:

AWS RDS Proxy CloudWatch metrics with no pinned sessions

You can see a few interesting things (we ran the exact same stress test):

  1. We have way more client connections than before.
  2. Database connections are steady and lower than client connections.
  3. We have no pinned sessions.
  4. The number of queries is much higher.

This is because there are no errors, and the app can scale.

Now this is one potential fix for the problem, and there are more use cases where code can cause pinned sessions, but my main goal here is to show you that AWS RDS Proxy is not necessarily a set-and-forget solution, and your codebase will have to support it as well.

The full code example:

package main

import (
	"context"
	"database/sql"
	"encoding/json"
	"fmt"
	"log"
	"os"

	"github.com/aws/aws-lambda-go/lambda"
	_ "github.com/go-sql-driver/mysql"
)

var db *sql.DB

func init() {
	// 👉 For the “pinned” variant leave dsnSuffix := ""
	// 👉 For the “pooled” variant use dsnSuffix := "&interpolateParams=true"
	dsnSuffix := "&interpolateParams=true"

	dsn := fmt.Sprintf("%s:%s@tcp(%s)/%s?charset=utf8mb4&parseTime=true&loc=UTC%s",
		os.Getenv("DB_USER"), os.Getenv("DB_PASS"),
		os.Getenv("DB_HOST"), os.Getenv("DB_NAME"), dsnSuffix)

	var err error
	db, err = sql.Open("mysql", dsn)
	if err != nil {
		log.Fatal(err)
	}

	db.SetMaxIdleConns(20)
	db.SetMaxOpenConns(50)
}

type Response struct {
	ID       int64  `json:"id"`
	Username string `json:"username"`
	Count    int64  `json:"hit_count"`
}

func HandleRequest(ctx context.Context) (Response, error) {
	const userID = 42

	var uname string
	if err := db.QueryRowContext(ctx,
		`SELECT username FROM users WHERE id = ?`, userID).
		Scan(&uname); err != nil {
		return Response{}, err
	}

	if _, err := db.ExecContext(ctx,
		`UPDATE users SET hit_count = hit_count+1 WHERE id = ?`, userID); err != nil {
		return Response{}, err
	}

	var count int64
	if err := db.QueryRowContext(ctx,
		`SELECT hit_count FROM users WHERE id = ?`, userID).
		Scan(&count); err != nil {
		return Response{}, err
	}

	return Response{ID: userID, Username: uname, Count: count}, nil
}

func main() {
	// aws‑lambda‑go wires the handler
	lambda.Start(HandleRequest)

	// (The json Marshal below prevents “imported but not used” if you edit)
	_, _ = json.Marshal(nil)
}

More articles

AWS Geo Blocking with CloudFront and WAF

_In this post, I will show you how to use CloudFront or AWS WAF to enable geo-blocking (blocking based on geographical location) for your website or APIs._

Read more

Our Approach to Onboarding New DevOps Hires

What's the best way to onboard a new DevOps hire? Instead of having them aimlessly go through Confluence we have a better approach.

Read more

Tell us about your project

Our offices

  • Israel
    5 Hatidhar St
    Ra'anana, Israel