One of the most useful and often difficult things to deal with when developing software is access to data, quickly. With seeds you can get data quickly, but it's a small amount of data and as your application changes this becomes a nightmare to maintain. A better way, in many cases is to have a pool of resources (in this case data in a database, cache, search infrastructure, etc) that can be utilized by an environment for some amount of time, instantly. This is what Instant Datasets does for you and your teammates.
Release supports this exact thing for databases on RDS. We will be expanding this offering to anything that has a set of data in the future. With a small amount of setup, you can have production replica data available to any of your environments, instantly, regardless of the amount of data.
RDS is great technology and we have used it at Release from the beginning to power our production application (releaseapp.io). It automatically takes daily snapshots of your application databases and allows you readily access to them if needed. The one major drawback to using RDS for environments other than production is a long spin up time for any particular database, making it not ideal for ephemeral/staging environments. The solution to this problem and the solution to giving your staging environments production-like data is the same: Instant Datasets.
Instant Datasets are a collection of databases that are ready to be used by an ephemeral environment and based on production snapshots, already created for you by RDS. These snapshots are used to create your instant datasets. Each time an environment "checks-out" (claims the database in the set for itself) an existing database in the set, another is created so you never run out of databases in the set. This allows for access to production-like data instantly in any environment you create.
In order to use Instant Datasets in Release we need a snapshot to restore from. In order to have one you must have automatic snapshots setup. Please refer to this document to setup snapshots in RDS.
In order to setup Instant Datasets in Release you will need two things:
AWS Snapshot ID
Database Password (referenced above)
You can get an RDS snapshot Id from AWS by doing the following: Log into to your AWS Management Console and go to the RDS Service.
Click Snapshots (on the left) -> Click System (tabs in the upper middle)
To use a Instant Dataset with an application in Release there are two things you need to do
Setup the actual Instant Dataset, using the data you collected from AWS
Setup your application to use the dataset upon deployment of staging and/or ephemeral environments. Each Instant Dataset is limited to a single account.
Login to Release and then in the top right corner, click on the Account Settings gear.
Click the DATASETS tab and click NEW.
Name: Anything you like to help you remember what this dataset contains.
Cluster: Instant Datasets must be assigned to a cluster. For most people this will be the default cluster. This cluster must have access to the snapshot.
AWS Snapshot ID: You have this from the previous steps in the AWS console. It should begin with rds: and be of the form rds:name-of-snapshot-datetime-stamp.
RDS Database Password: AWS gives you this when you first setup the RDS instance. If you don't know this, you won't be able to setup your Instant Dataset. This is often kept with your other secrets.
Instant Dataset Size: This is the number of available databases in the Instant Dataset. Each time a space is created one of the instances will be claimed. That also starts the creation of another entity in the set. If you have 5 ephemeral or staging spaces at any one time, a set of size 5 should be sufficient.
Click CREATE, to begin the process of creating the database instances. This process takes an unknown amount of time based on size of the database, but it will take at least a few minutes and could take hours.
Once the dataset is ready to be used it will transition to an AVAILABLE state and environments can now use this dataset when they are deployed. Click on VIEW to see the details of a specific dataset.
Setup the environments that you want to use Instant Datasets
Add mappings of the _Generated ENV_s to your ENV ﬁle
Navigate to an Application and click on Settings. You will need to add a couple of lines to your default environment conﬁg in order to use the Instant Dataset, you just created in templates.
environment_templates:- name: ephemeraldatasets:- name: release-prod-for-development
In this example we have setup the ephemeral template to use our instant dataset. When we create an environment manually or through a pull request, Release will generate an Environment Specific Configuration file with our new dataset.
If we didn't setup the Instant Dataset to work with every ephemeral environment through our default configuration, we can also add it explicitly to an environment we create. The syntax is the same whether Release automatically creates it or we do it manually.
datasets:- name: release-prod-for-development
Edit the Environment Variables and add mappings from the Generated ENVs to your default or environment specific configuration, the syntax is the same.
---mapping:DATABASE_HOST: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_HOSTDATABASE_PASSWORD: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_PASSDATABASE_USER: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_USER
Your envs are on the left side of the ':' and the generated envs are on the right.
Your application is now ready to use your Instant Dataset! Whenever you deploy an ephemeral/staging environment it will check-out one of the databases to be used while the environment exists.
Deploy an ephemeral environment
Your environment will check-out a database and use it for the lifetime of the environment.
A background job will be kicked off to create a new db instance to replace the one your environment used. This will maintain the dataset at the same size after one of the databases is claimed.
Instant Datasets allow you to access any data regardless of the size or complexity instantly by your staging environments. You can create multiple RDS datasets and add them to your app config and map their env variables too. This allows you to have services using different datasets or a single service accessing multiple datasets.