Storing JSON Blobs in Amazon S3 With Elixir

May 29, 2018

Storing JSON Blobs in Amazon S3 With Elixir

The idea for this project of using S3 with Elixir came from an actual project and a blog post I stumbled on.

Elixir projects

I was working on a Ruby on Rails application with millions of rows of data quite a while ago. We would often pull reports from this data and since it was on the Heroku platform, we would often see these warning messages in our slack monitoring channel about memory consumption being over 100%.

We were worried that we wouldn’t be able to sustain this approach anymore, and so we wanted to start archiving the data that didn’t need to be used in real time anymore. I found this great post on the moz blog about how they essentially used a cold storage solution in Amazon S3 buckets (and here, I assume they mean they used something similar to a nested json data file based on their description).

JSON Blob Experiment

This led me to start experimenting with storing json files (let’s call them “json blobs”) on Amazon S3 with Elixir. This followed the approach from Moz described in the aforementioned blog post as well as allowing me to play with Elixir.

Here is a 3 step approach I used to have an Elixir project store json blobs on S3.

Note: At the time I created this, Phoenix 1.2 was the current version. I’m updating the commands to use phoenix 1.3, although the structure of the code may resemble Phoenix 1.2.

Step 1: Create Mix Project with ex_aws

First, I generated a new Phoenix project.

mix json_blobber

In the mix.exs file, I also added the ex_aws package as a dependency for storing files in Amazon S3 as shown below.

# Type `mix help deps` for examples and options.
defp deps do
  [{:phoenix, "~> 1.3.0-rc"},
   {:phoenix_pubsub, "~> 1.0"},
   {:phoenix_ecto, "~> 3.2"},
   {:postgrex, ">= 0.0.0"},
   {:phoenix_html, "~> 2.6"},
   {:phoenix_live_reload, "~> 1.0", only: :dev},
   {:gettext, "~> 0.11"},
   {:ex_aws, "~> 1.1"},
   {:sweet_xml, "~> 0.6"},
   {:hackney, "~> 1.7"},
   {:poison, "~> 3.1"},
   {:cowboy, "~> 1.0"}]

Step 2: Module to Generate the JSON records

The next step was to create a JsonGenerator module. This module simply generated the “json blobs” I wanted to store on S3 as shown below.

defmodule JsonGenerator do
  def generate_records(n) do
    for x <- 1..n do
    |> Poison.encode!

  def json_record do
    %{video_id: :rand.uniform(200000), snapshot_id: :rand.uniform(56000), view_count: :rand.uniform(50000), comment_count: :rand.uniform(500), like_count: :rand.uniform(150), dislike_count: nil, share_count: :rand.uniform(400), created_at: random_date_string, updated_at: random_date_string, status: "fetched"}

  defp random_date_string do
    [year, month, day, hours, minutes, seconds] =
      [Enum.random(1900..2020), Enum.random(1..12), Enum.random(1..28),
       Enum.random(0..24), Enum.random(0..59), Enum.random(0..59)]
      |> -> e
                  |> Integer.to_string
                  |> String.pad_leading(2, "0")
    "#{year}-#{month}-#{day} #{hours}:#{minutes}:#{seconds}"

Step 3: Module to Write to AWS

To hew to the “separation of concerns” principle, I created another module specifically to write to AWS. Using Elixir’s import statement, I imported the JsonGenerator module to use the json blob generating functionality I created in Step 2.

The API for the JsonAws module is simple. The writerecords method writes a user-specified number of json blobs up to S3 using the configured S3 bucket. The readfile method then reads the file that stores the json blobs.

defmodule JsonAws do
  @s3_bucket Application.get_env(:ex_aws, :s3_bucket)

  import ExAws
  import JsonGenerator

  def write_records(n) do
    |> generate_records
    |> (&ExAws.S3.put_object(@s3_bucket, "json_file.txt", &1)).()
    |> ExAws.request!

  def read_file(file \\ "json_file.txt") do
    {status, data} = file
                     |> (&ExAws.S3.get_object(@s3_bucket, &1)).()
                     |> ExAws.request!
                     |> parse_json

  defp parse_json(%{body: body, headers: headers, status_code: 200}) do
    |> Poison.decode

Step 4: Set Up Your AWS Keys

One of the last few steps is to configure ex_aws. You can simply copy the config.exs file below.

In config.exs:

config :ex_aws,
  access_key_id: [{:system, "AWS_ACCESS_KEY_ID"}, :instance_role],
  secret_access_key: [{:system, "AWS_SECRET_ACCESS_KEY"}, :instance_role],
  region: System.get_env("AWS_DEFAULT_REGION"),
  s3_bucket: System.get_env("S3_BUCKET")

To make this all work, you’ll have to setup your AWS keys in your Amazon account and then put those values in a .env file in the top level of your application.

Set up a .env file as follows:

export AWS_ACCESS_KEY_ID=yyy
export AWS_DEFAULT_REGION=us-east-24
export S3_BUCKET=yyy

You get these values from the Amazon AWS management console. You may have to sign up for an account if you don’t have one.

Step 5: Run Your New Program

Finally, boot up an iex session as follows and run write_records method specifying the number of json records you want to create and store in S3.

$ iex -S mix
iex(1)> JsonAws.write_records(10)


Boom! And that’s all there is to storing simple json data on S3. It’s a good way to get your feet in Elixir.