Automating RDS Password Rotation with Fargate Task Updates

Automating RDS Password Rotation with Fargate Task Updates

AWS Secret Manager offers automatic password rotation for RDS databases, providing an opportunity to strengthen security. However, rotating passwords without disrupting connectivity in ECS Fargate clusters presents challenges. In this article, we’ll discuss a solution that addresses this challenge by using multiple database users, scheduled rotations, and a Lambda function to smoothly update passwords without any code changes

Let’s get started!

Let’s understand the scenario:

Current Architecture Overview:

  • Application Platform:- ECS Fargate cluster

  • Database:- RDS Instance

  • Password management Process of fargate task: The task will fetch the database password from the Secret manager

Requirements: To enforce an additional layer of security, monthly rotation of the database password is necessary. AWS Secret Manager’s automatic rotation can update the RDS database password seamlessly. However, we face a challenge with the application fetching the environment variable during container creation, which could cause connectivity issues during password rotation.

The Challenge: The application relies on environment variables for database connectivity, and a password rotation could disrupt the running containers. Although AWS provides options like automatic password refresh and retry mechanisms, rewriting the application’s database connection logic is not feasible in our case. Therefore, as a DevOps engineer, we need to find an alternative solution that meets the following criteria:

Refer to AWS-provided solutions: Rotate RDS Password without restarting the container:

As per the AWS blog,

Our issue here is,
When the secret rotates,
open database connections are not dropped*. While rotation is happening, there is a short period between when the password in the database changes and when the secret is updated. **During this time, there is a low risk of the database denying calls that use the rotated credentials. You can mitigate this risk with an [appropriate retry strategy](aws.amazon.com/blogs/architecture/exponenti..).** After rotation, new connections use the new credentials.*

In the Above AWS blog, clearly mentioned the risk and mitigation option that is retry logic in code. but in our situation where is the scenario, we can not touch the application code + We need to mitigate the risk of denying database calls during the password rotation:

Here, How we solved this challenging scenario:

Solution Overview:

This approach ensures that we never rotate the password of the user currently being used by the container, minimizing disruptions to the system’s functionality. By implementing this method, we enhance security while maintaining seamless operation
.
Strategy is simple — Let me try to explain.

1. Create two database users with the same permissions:
a). Blue User: Password rotation scheduled on the 1st day of every month.
b). Green User: Password rotation scheduled on the 15th day of every month.

2. The ECS task will always use the most recently rotated user. To achieve this, we rotate the password for the user that is not actively used by the ECS, then restart the container. The recently rotated user becomes active, and the cycle continues..

This approach ensures that, at any given time, we never rotate the password of the user currently being utilised by the container, thereby minimising disruptions to the system’s functionality.”

I know you might have confusions :-( Let me explain with a scenario: (Please keep an eye on flow chat for better understanding)

Just imagine the current active database user in ecs task is: Green
On 1st day of the month it will start the rotation of user “Blue” (As per the fixed schedule) → AWS managed lambda function will take care of the password rotation of the RDS server -> Above password rotation event will trigger “Custom Lambda”

Custom Lambda Function Logic: To orchestrate the password rotation and ECS update process, we will employ a Lambda function triggered by a CloudWatch event. Here’s the logic it follows:

  1. Trigger: RDS user rotation event → Check the current date → If it’s the 1st day of the month, update the active user parameter as “Blue” in the Parameter Store | If it’s the 15th day of the month, update the active user parameter as “Green” in the Parameter Store.

  2. Trigger an ECS service redeployment using rolling deployment:

  • When a container starts, it pulls the current username from the Parameter Store and retrieves the active user’s password from Secret Manager.

Custom Lambda Code:

import boto3
import datetime

def lambda_rotater(event, context):
    # Specify your AWS region
    region = 'ca-central-1'

    # Specify your ECS cluster and service details
    cluster_name = 'backend-cluster'
    service_name = 'backendservice'

    # Specify SSM parameter details
    parameter_name = 'activeuser'

    # Create ECS and SSM clients
    ecs_client = boto3.client('ecs', region_name=region)
    ssm_client = boto3.client('ssm', region_name=region)

    try:
        # Get the current day of the month
        current_day = datetime.datetime.now().day

        # Map the current day to the parameter value
        parameter_map = {
            1: 'blue',
            15: 'green'
        }

        # Get the parameter value from the map
        parameter_value = parameter_map.get(current_day)

        if parameter_value is None:
            return {
                'statusCode': 200,
                'body': 'No update required for ECS service.'
            }

        # Update the SSM parameter
        response = ssm_client.put_parameter(
            Name=parameter_name,
            Value=parameter_value,
            Type='String',
            Overwrite=True
        )
        print("SSM parameter updated:", response)

        # Describe the ECS service to get the current task definition
        response = ecs_client.describe_services(
            cluster=cluster_name,
            services=[service_name]
        )
        print(response)

        # Get the current task definition ARN
        task_definition_arn = response['services'][0]['taskDefinition']
        print("Task definition ARN:", task_definition_arn)

        # Update the service with the same task definition to trigger a redeployment
        response = ecs_client.update_service(
            cluster=cluster_name,
            service=service_name,
            taskDefinition=task_definition_arn,
            forceNewDeployment=True
        )
        print("Response of task update:", response)

        return {
            'statusCode': 200,
            'body': 'ECS service redeployment initiated successfully.'
        }
    except Exception as e:
        return {
            'statusCode': 500,
            'body': f'Error: {str(e)}'
        }

Disclaimer:
The solution presented in this article is based on a hypothetical scenario and may require customization based on specific application requirements.

Did you find this article valuable?

Support Midhun K by becoming a sponsor. Any amount is appreciated!