Search…
Instant Datasets
This guide describes the purpose of creating and having instant datasets for your applications and how to do it using Release.

Instant Datasets

One of the most useful and often difficult things to deal with when developing software is access to data, quickly. With seeds you can get data quickly, but it's a small amount of data and as your application changes this becomes a nightmare to maintain. A better way, in many cases is to have a pool of resources (in this case data in a database, cache, search infrastructure, etc) that can be utilized by an environment for some amount of time, instantly. This is what Instant Datasets does for you and your teammates.
Release supports this exact thing for databases on RDS. We will be expanding this offering to anything that has a set of data in the future. With a small amount of setup, you can have production replica data available to any of your environments, instantly, regardless of the amount of data.

Why use Instant Database Datasets with Release?

RDS is great technology and we have used it at Release from the beginning to power our production application (releaseapp.io). It automatically takes daily snapshots of your application databases and allows you readily access to them if needed. The one major drawback to using RDS for environments other than production is a long spin up time for any particular database, making it not ideal for ephemeral/staging environments. The solution to this problem and the solution to giving your staging environments production-like data is the same: Instant Datasets.
Instant Datasets are a collection of databases that are ready to be used by an ephemeral environment and based on production snapshots, already created for you by RDS. These snapshots are used to create your instant datasets. Each time an environment "checks-out" (claims the database in the set for itself) an existing database in the set, another is created so you never run out of databases in the set. This allows for access to production-like data instantly in any environment you create.
High-level diagram of using Instant Datasets

AWS Setup

Setting up Snapshots in RDS

In order to use Instant Datasets in Release we need a snapshot to restore from. In order to have one you must have automatic snapshots setup. Please refer to this document to setup snapshots in RDS.
In order for your applications to use the snapshots you MUST have the database password. This is not something you or Release can look-up after the database has been created.
Disclaimer from Amazon when creating your RDS Database

Collecting data needed to setup Instant Datasets in Release

In order to setup Instant Datasets in Release you will need two things:
    AWS Snapshot ID
    Database Password (referenced above)
You can get an RDS snapshot Id from AWS by doing the following: Log into to your AWS Management Console and go to the RDS Service.
Login into the Management Console and find RDS
Click Snapshots (on the left) -> Click System (tabs in the upper middle)
List of snapshots

Setting up a Instant Dataset in Release

To use a Instant Dataset with an application in Release there are two things you need to do
    1.
    Setup the actual Instant Dataset, using the data you collected from AWS
    2.
    Setup your application to use the dataset upon deployment of staging and/or ephemeral environments. Each Instant Dataset is limited to a single account.

Setup your first Instant Dataset

Login to Release and then in the top right corner, click on the Account Settings gear.
Account Settings gear for the Release account
Click the DATASETS tab and click NEW.
New Instant Dataset form
    Name: Anything you like to help you remember what this dataset contains.
    Cluster: Instant Datasets must be assigned to a cluster. For most people this will be the default cluster. This cluster must have access to the snapshot.
    AWS Snapshot ID: You have this from the previous steps in the AWS console. It should begin with rds: and be of the form rds:name-of-snapshot-datetime-stamp.
    RDS Database Password: AWS gives you this when you first setup the RDS instance. If you don't know this, you won't be able to setup your Instant Dataset. This is often kept with your other secrets.
    Instant Dataset Size: This is the number of available databases in the Instant Dataset. Each time a space is created one of the instances will be claimed. That also starts the creation of another entity in the set. If you have 5 ephemeral or staging spaces at any one time, a set of size 5 should be sufficient.
Click CREATE, to begin the process of creating the database instances. This process takes an unknown amount of time based on size of the database, but it will take at least a few minutes and could take hours.
RDS takes around 5-6 mins to create the database. Once that finishes it takes time to restore the data from the snapshot and this is based on how large the snapshot is.
List of Instant Datasets
Once the dataset is ready to be used it will transition to an AVAILABLE state and environments can now use this dataset when they are deployed. Click on VIEW to see the details of a specific dataset.
Take note of the Generated Envs

Setup your application to use Instant Datasets

    Setup the environments that you want to use Instant Datasets
    Add mappings of the _Generated ENV_s to your ENV file
Copy and paste the 'Name' and the 'Generated Envs' into a text buffer as you will need them in order to setup your application.
Name and Generated ENVs from the example above.
release-prod-for-development RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_HOST RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_PASS RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_USER

Setup your environments to use Instant Datasets

Navigate to an Application and click on Settings. You will need to add a couple of lines to your default environment config in order to use the Instant Dataset, you just created in templates.
Setting up Instant Datasets for the ephemeral environment template
1
environment_templates:
2
- name: ephemeral
3
datasets:
4
- name: release-prod-for-development
Copied!
In this example we have setup the ephemeral template to use our instant dataset. When we create an environment manually or through a pull request, Release will generate an Environment Specific Configuration file with our new dataset.
If we didn't setup the Instant Dataset to work with every ephemeral environment through our default configuration, we can also add it explicitly to an environment we create. The syntax is the same whether Release automatically creates it or we do it manually.
Setting up Instant Datasets for a specific environment
1
datasets:
2
- name: release-prod-for-development
Copied!

Map generated envs to your application's envs

Edit the Environment Variables and add mappings from the Generated ENVs to your default or environment specific configuration, the syntax is the same.
1
---
2
mapping:
3
DATABASE_HOST: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_HOST
4
DATABASE_PASSWORD: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_PASS
5
DATABASE_USER: RELEASE_PROD_FOR_DEVELOPMENT_RDS_DB_POOL_USER
Copied!
Your envs are on the left side of the ':' and the generated envs are on the right.

Deploy your application to use your dataset

Your application is now ready to use your Instant Dataset! Whenever you deploy an ephemeral/staging environment it will check-out one of the databases to be used while the environment exists.
    Deploy an ephemeral environment
    Your environment will check-out a database and use it for the lifetime of the environment.
    A background job will be kicked off to create a new db instance to replace the one your environment used. This will maintain the dataset at the same size after one of the databases is claimed.
One database is being used by my environment and another is being created

Conclusion

Instant Datasets allow you to access any data regardless of the size or complexity instantly by your staging environments. You can create multiple RDS datasets and add them to your app config and map their env variables too. This allows you to have services using different datasets or a single service accessing multiple datasets.
Last modified 6mo ago