Giter Site home page Giter Site logo

nemo64 / dbal-rds-data Goto Github PK

View Code? Open in Web Editor NEW
27.0 6.0 1.0 188 KB

A driver to use the aws aurora serverless rds data api in the doctrine database abstraction layer

License: MIT License

PHP 100.00%
aws aws-aurora doctrine-dbal aws-rds database aurora aurora-mysql doctrine

dbal-rds-data's Introduction

Packagist Version GitHub Workflow Status Packagist License Packagist Downloads

doctrine driver for the Aurora Serverless rds data api

This is a driver to use the aws rds-data api on projects that are using dbal for their database access.

It emulates a MySQL connection including transactions. However: the driver does never establish a persistent connection.

This is experimental. I implemented it in a symfony project with the doctrine orm and with this driver it worked fine. I tested the schema tools, migrations and transactions.

At this moment, I wouldn't recommend using the rds-data api at all because of it's 1000 request per second limitation that they secretly added at the end of 2020. It is listed as "Data API requests per second" in your service quota list but not in the documentation.

Why would you use it?

  • The data api makes it possible to use a database in an aws hosting environment without the need for VPC's which add complexity and cost money if you need internet access though NAT Gateways.
  • Your application does not need the database password in plain text. You just need access to the aws api which can be managed a lot better. (there are other ways to achieve the same but still, it is really easy with the data api)
  • There might be a performance benefit due to not needing to establish a direct database connection and automatic pool management.

Why wouldn't you use it?

  • This implementation hasn't been battle tested for 20 years. Have a look into the Implementation Details section and see if you are comfortable with them.
  • Performance while running many small queries is probably the biggest issue. The rds data api has (as of writing this) a not well documented limit of 1000 queries per second per account. If you have ever worked with the doctrine orm, then you know that that's not a lot, especially if you run unoptimized background jobs. The aws sdk will retry queries, which just means that everything will become slow.
  • The rds-data api has size restrictions in the ExecuteStatement call which might become a problem when your application grows although they don't seem to be enabled at this moment.
  • The rds-data api is not available everywhere. This limitation is slowly getting lifted though.
  • The rds-data api has some inherit limitations base on the fact that it is mostly stateless. The biggest problem is that you can't set (session) variables. This also means you can't setTransactionIsolation levels although that is an optional feature in dbal anyways. You can still use normal locking in transactions though.
  • The rds-data api is only available with Aurora Serverless and this library also limits you to MySQL mode. If you plan on using other databases then you can't use the rds-data api and this library (yet). Here are alternatives you might want to consider:
    • Aurora Serverless in Postgres mode (although this can probably very easily be added here, I'm open to pull requests)
    • Aurora Classic to get an SLA or to benefit from reserved instance pricing on predictable workloads
    • Aurora Global for better availability and all the benefits of Aurora Classic
    • or even normal RDS to save money or use engines that are not emulated by Aurora

How to use it

First you must store your database credentials as a secret including the username. Then make sure to correctly configure access to your database to use the secret and the database. If you create a iam user then there is a "AmazonRDSDataFullAccess" policy that can be used directly.

If you use dbal directly than this is the way:

<?php
$connectionParams = array(
    'driverClass' => \Nemo64\DbalRdsData\RdsDataDriver::class,
    'host' => 'eu-west-1', // the aws region
    'user' => '[aws-api-key]', // optional if it is defined in the environment 
    'password' => '[aws-api-secret]', // optional if it is defined in the environment
    'dbname' => 'mydb',
    'driverOptions' => [
        'resourceArn' => 'arn:aws:rds:eu-west-1:012345678912:cluster:database-1np9t9hdbf4mk',
        'secretArn' => 'arn:aws:secretsmanager:eu-west-1:012345678912:secret:db-password-tSo334',
    ]
);
$conn = \Doctrine\DBAL\DriverManager::getConnection($connectionParams);

Or use the short url syntax which is easier to change using environment variables:

<?php
$connectionParams = array(
    'driverClass' => \Nemo64\DbalRdsData\RdsDataDriver::class,
    'url' => '//eu-west-1/mydb'
        . '?driverOptions[resourceArn]=arn:aws:rds:eu-west-1:012345678912:cluster:database-1np9t9hdbf4mk'
        . '&driverOptions[secretArn]=arn:aws:secretsmanager:eu-west-1:012345678912:secret:db-password-tSo334'
);
$conn = \Doctrine\DBAL\DriverManager::getConnection($connectionParams);

Since I developed in symfony project, I might as well add how to define the driver in symfony:

# doctrine.yaml
doctrine:
    dbal:
        # the url can override the driver class
        # but I can't define this driver in the url which is why i made it the default
        # Doctrine\DBAL\DriverManager::parseDatabaseUrlScheme
        driver_class: Nemo64\DbalRdsData\RdsDataDriver
        url: '%env(resolve:DATABASE_URL)%'
# .env

# you must not include a driver in the database url
# in this case I also didn't include the aws tokens in the url 
DATABASE_URL=//eu-west-1/mydb?driverOptions[resourceArn]=arn&driverOptions[secretArn]=arn

# the aws-sdk will pick those up
# they are automatically configured in lambda and ec2 environments 
#AWS_ACCESS_KEY_ID=...
#AWS_SECRET_ACCESS_KEY=...
#AWS_SESSION_TOKEN=...

Other than the configuration it should work exactly like any other dbal connection.

Driver options

  • resourceArn (string; required) this is the ARN of the database cluster. Go into the your RDS-Management > your database > Configuration and copy it from there.
  • secretArn (string; required) this is the ARN of the secret that stores the database password. Go into the [SecretManager] > your secret and use the Secret ARN.
  • timeout (int; default=45) The timeout setting of guzzle. Set to 0 for indefinite but that might not be useful. The rds-data api will block for a maximum of 45 seconds (see the rds-data docs). Schema update queries will automatically be executed with the continueAfterTimeout option. If you need to run long update queries than you might want to use the rds data client directly. Use $dbalConnection->getWrappedConnection()->getClient() to get the aws-sdk client.
  • pauseRetries (int; default=0) The amount of retries when the database is paused. If you set this, also consider setting pauseRetryDelay to ensure a somewhat correct retry time.
  • pauseRetryDelay (int; default=10) The amount of seconds to wait until another attempt is made if the last one failed due to the database being paused. As of writing this, Aurora takes anywhere from 30 seconds to 2 minutes to unpause. This is way too long to wait in most cases. Lambda will automatically retry events, so you are normally better of just letting the event fail. A user also usually won't wait a minute for a page to load, so you should present them with a proper error. See Paused databases for a way to do that.

CloudFormation

Sure, here is a CloudFormation template to configure Aurora Serverless and a Secret, putting both together and setting an environment variable with the needed information.

This might be serverless flavoured but you should get the hang of it.

# [...]

  iamRoleStatements:
    # allow using the rds-data api
    # https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html#data-api.access
    - Effect: Allow
      Resource: '*' # it isn't supported to limit this
      Action:
        # https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazonrdsdataapi.html
        - rds-data:ExecuteStatement
        - rds-data:BeginTransaction
        - rds-data:CommitTransaction
        - rds-data:RollbackTransaction
    # this rds-data endpoint will use the same identity to get the secret 
    # so you need to be able to read the password secret
    - Effect: Allow
      Resource: !Ref DatabaseSecret
      Action:
        # https://docs.aws.amazon.com/IAM/latest/UserGuide/list_awssecretsmanager.html
        - secretsmanager:GetSecretValue

# [...]

  environment:
    DATABASE_URL: !Join
      - ''
      - - '//' # rds-data is set to default because custom drivers can't be named in a way that they can be used here
        - !Ref AWS::Region # the hostname is the region
        - '/mydb'
        - '?driverOptions[resourceArn]='
        - !Join [':', ['arn:aws:rds', !Ref AWS::Region, !Ref AWS::AccountId, 'cluster', !Ref Database]]
        - '&driverOptions[secretArn]='
        - !Ref DatabaseSecret

# [...]

  # Make sure that there is a default VPC in your account.
  # https://console.aws.amazon.com/vpc/home#vpcs:isDefault=true
  # If not, click "Actions" > "Create Default VPC"
  # While your applications doesn't need it, the database must still be provisioned into a VPC so use the default. 
  Database:
    Type: AWS::RDS::DBCluster # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-rds-dbcluster.html
    Properties:
      Engine: aurora
      EngineMode: serverless
      EnableHttpEndpoint: true # https://stackoverflow.com/a/58759313 (not fully documented in every language yet)
      DatabaseName: 'mydb'
      MasterUsername: !Join ['', ['{{resolve:secretsmanager:', !Ref DatabaseSecret, ':SecretString:username}}']]
      MasterUserPassword: !Join ['', ['{{resolve:secretsmanager:', !Ref DatabaseSecret, ':SecretString:password}}']]
      BackupRetentionPeriod: 1 # day
      # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-rds-dbcluster-scalingconfiguration.html
      ScalingConfiguration: {MinCapacity: 1, MaxCapactiy: 2, AutoPause: true}
  DatabaseSecret:
    Type: AWS::SecretsManager::Secret # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-secretsmanager-secret.html
    Properties:
      GenerateSecretString:
        SecretStringTemplate: '{"username": "admin"}'
        GenerateStringKey: "password"
        PasswordLength: 41 # max length of a mysql password
        ExcludeCharacters: '"@/\'
  DatabaseSecretAttachment:
    Type: AWS::SecretsManager::SecretTargetAttachment # https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-secretsmanager-secrettargetattachment.html
    Properties:
      SecretId: !Ref DatabaseSecret
      TargetId: !Ref Database
      TargetType: AWS::RDS::DBCluster

I also wrote an article going more into detail on how to setup and share an Aurora Serverless between multiple stacks: www.marco.zone/shared-aurora-serverless-using-cloudformation

Implementation details

Error handling

The rds data api only provides error messages, not error codes. To correctly map those errors to dbal exceptions I use a huge regular expression. See: Nemo64\DbalRdsData\RdsDataException::EXPRESSION. Since most of it is generated using the mysql error documentation, it should be fine but it might not be 100% reliable. I asked for this in the amazon developer forum but haven't gotten a response yet.

Paused databases

If an Aurora Serverless is paused, you'll get this error message:

Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.

I mapped this error message to error code 6000 (server errors are 1xxx and client errors 2xxx). It'll also be converted to dbal's Doctrine\DBAL\Exception\ConnectionException which existing application might already handle gracefully. But the most important thing is that you can catch and handle code 6000 specifically to better tell your user that the database is paused and will probably be available soon.

Parameters in prepared statements

While ExecuteStatement does support parameters, it only supports named parameters. Question mark placeholders need to be emulated (which is funny because the mysqli driver only supports question mark placeholders and no named parameters). This is achieved by replacing ? with :1, :2 etc. The replacement algorithm will avoid replacing ? within string literals but be aware that you shouldn't mix heavy string literals and question mark placeholders, just to be safe.

String literals

Every driver has some form of connection aware literal string escape function. But because the rds-data api is connectionless, it doesn't have such a method (except for parameters of course). To emulate the escape feature, a check for none ASCII characters is performed. If the string is pure ASCII it'll just pass though addslashes and gets quotes. If it has more exotic characters it'll be base64 encoded to prevent any chance of multibyte sql injection attacks. This should work transparently in most situations but you should definitely avoid using the literal function and instead use parameter binding whenever possible.

dbal-rds-data's People

Contributors

nemo64 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

chatii

dbal-rds-data's Issues

Investigate (and document) transaction isolation

dbal defines Doctrine\DBAL\Connection::setTransactionIsolation which allows to set the transaction isolation to 4 different levels.
https://www.doctrine-project.org/projects/doctrine-dbal/en/2.10/reference/transactions.html

This function runs a SET SESSION TRANSACTION ISOLATION LEVEL query in the background which should fail over the rds-data api (I haven't tested yet).

It doesn't look like it is possible to set the isoluation level within a transaction which is the only way to get something resembling a state.
The documentation says that you can't change the transaction characteristicts during a transaction.

So I should probably document that it isn't possible to use that feature and maybe adjust the platform definition so that it isn't even tried.

Fetch mode 8 not supported (class)

While using the AuditBundle I got Fetch mode 8 not supported which is the Doctrine\DBAL\FetchMode::CUSTOM_OBJECT fetch mode or the PDO::FETCH_CLASS fetch mode.

Seems like this must be supported.

Gracefully handle huge results

The ExecuteStatement documentation warns about huge results.

The response size limit is 1 MB or 1,000 records. If the call returns more than 1 MB of response data or over 1,000 records, the call is terminated.

I tested queries with over 10,000 results and it worked fine but I wouldn't rely on it.

To prepare for the day they actually limit the result, queries should automatically get a LIMIT applied and be paged while iterating the result. The limit should be configurable (including unlimited) but default to 1,000 so the implementation follows the specification.

It may be possible to execute the request for the next page async but I have the experience that guzzle only actually starts the request when accessing it so there needs to be some investigation.

Investigate async-aws

I noticed that some projects switch from the official aws-sdk to async-aws which looks promising.

I should investigate if it is possible to contribute there to add support for the rds-data api and possibly integrate it here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.