Storing JSON Blobs in Amazon S3 With Elixir

May 29, 2018

Storing JSON Blobs in Amazon S3 With Elixir

The idea for this project of using S3 with Elixir came from an actual project and a blog post I stumbled on.

Elixir projects

I was working on a Ruby on Rails application with millions of rows of data quite a while ago. We would often pull reports from this data and since it was on the Heroku platform, we would often see these warning messages in our slack monitoring channel about memory consumption being over 100%.

We were worried that we wouldn’t be able to sustain this approach anymore, and so we wanted to start archiving the data that didn’t need to be used in real time anymore. I found this great post on the moz blog about how they essentially used a cold storage solution in Amazon S3 buckets (and here, I assume they mean they used something similar to a nested json data file based on their description).

JSON Blob Experiment

This led me to start experimenting with storing json files (let’s call them “json blobs”) on Amazon S3 with Elixir. This followed the approach from Moz described in the aforementioned blog post as well as allowing me to play with Elixir.

Here is a 3 step approach I used to have an Elixir project store json blobs on S3.

Note: At the time I created this, Phoenix 1.2 was the current version. I’m updating the commands to use phoenix 1.3, although the structure of the code may resemble Phoenix 1.2.

Step 1: Create Mix Project with ex_aws

First, I generated a new Phoenix project.

mix phx.new json_blobber

In the mix.exs file, I also added the ex_aws package as a dependency for storing files in Amazon S3 as shown below.

# Type `mix help deps` for examples and options.
defp deps do
  [{:phoenix, "~> 1.3.0-rc"},
   {:phoenix_pubsub, "~> 1.0"},
   {:phoenix_ecto, "~> 3.2"},
   {:postgrex, ">= 0.0.0"},
   {:phoenix_html, "~> 2.6"},
   {:phoenix_live_reload, "~> 1.0", only: :dev},
   {:gettext, "~> 0.11"},
   {:ex_aws, "~> 1.1"},
   {:sweet_xml, "~> 0.6"},
   {:hackney, "~> 1.7"},
   {:poison, "~> 3.1"},
   {:cowboy, "~> 1.0"}]
end

Step 2: Module to Generate the JSON records

The next step was to create a JsonGenerator module. This module simply generated the “json blobs” I wanted to store on S3 as shown below.

defmodule JsonGenerator do
  def generate_records(n) do
    for x <- 1..n do
      json_record
    end
    |> Poison.encode!
  end

  def json_record do
    %{video_id: :rand.uniform(200000), snapshot_id: :rand.uniform(56000), view_count: :rand.uniform(50000), comment_count: :rand.uniform(500), like_count: :rand.uniform(150), dislike_count: nil, share_count: :rand.uniform(400), created_at: random_date_string, updated_at: random_date_string, status: "fetched"}
  end

  defp random_date_string do
    [year, month, day, hours, minutes, seconds] =
      [Enum.random(1900..2020), Enum.random(1..12), Enum.random(1..28),
       Enum.random(0..24), Enum.random(0..59), Enum.random(0..59)]
      |> Enum.map(fn(e) -> e
                  |> Integer.to_string
                  |> String.pad_leading(2, "0")
                  end)
    "#{year}-#{month}-#{day} #{hours}:#{minutes}:#{seconds}"
  end
end

Step 3: Module to Write to AWS

To hew to the “separation of concerns” principle, I created another module specifically to write to AWS. Using Elixir’s import statement, I imported the JsonGenerator module to use the json blob generating functionality I created in Step 2.

The API for the JsonAws module is simple. The write_records method writes a user-specified number of json blobs up to S3 using the configured S3 bucket. The read_file method then reads the file that stores the json blobs.

defmodule JsonAws do
  @s3_bucket Application.get_env(:ex_aws, :s3_bucket)

  import ExAws
  import JsonGenerator

  def write_records(n) do
    n
    |> generate_records
    |> (&ExAws.S3.put_object(@s3_bucket, "json_file.txt", &1)).()
    |> ExAws.request!
  end

  def read_file(file \\ "json_file.txt") do
    {status, data} = file
                     |> (&ExAws.S3.get_object(@s3_bucket, &1)).()
                     |> ExAws.request!
                     |> parse_json
  end

  defp parse_json(%{body: body, headers: headers, status_code: 200}) do
    body
    |> Poison.decode
  end
end

Step 4: Set Up Your AWS Keys

One of the last few steps is to configure ex_aws. You can simply copy the config.exs file below.

In config.exs:

config :ex_aws,
  access_key_id: [{:system, "AWS_ACCESS_KEY_ID"}, :instance_role],
  secret_access_key: [{:system, "AWS_SECRET_ACCESS_KEY"}, :instance_role],
  region: System.get_env("AWS_DEFAULT_REGION"),
  s3_bucket: System.get_env("S3_BUCKET")

To make this all work, you’ll have to setup your AWS keys in your Amazon account and then put those values in a .env file in the top level of your application.

Set up a .env file as follows:

export AWS_ACCESS_KEY_ID=yyy
export AWS_SECRET_ACCESS_KEY=yyy
export AWS_DEFAULT_REGION=us-east-24
export S3_BUCKET=yyy

You get these values from the Amazon AWS management console. You may have to sign up for an account if you don’t have one.

Step 5: Run Your New Program

Finally, boot up an iex session as follows and run write_records method specifying the number of json records you want to create and store in S3.

$ iex -S mix
iex(1)> JsonAws.write_records(10)

Summary

Boom! And that’s all there is to storing simple json data on S3. It’s a good way to get your feet in Elixir.


Profile picture

Written by Bruce Park who lives and works in the USA building useful things. He is sometimes around on Twitter.