Migrating your information to a cloud-based infrastructure is a critical first step in digital transformation efforts. Doing so will allow you to scale your technology infrastructure to meet growing demands, eliminate the need to maintain hardware, and allow your company to become more agile.
No more will you have to worry about a coffee spill destroying a server, or the physical security of your mission critical infrastructure.
Moving to the cloud streamlines operations and security, giving you added flexibility and more time to focus on the core of what you do best.
There are a several leading players in cloud-based data infrastructure, including Amazon Web Services (AWS). This overview will provide a reference for what AWS products might best support your efforts and take a look at some of the most valuable options for getting started when working with cloud-based data.
Navigating the range of products available through Amazon Web Services (AWS) can be daunting, and even a quick look at AWS’s documentation, reveals dozens of options.
The first step, as you might expect, is to register. Once you head over to the registration page, you can create an account to get started. You’ll need to provide a credit card number, but free trial services are provided, including resources for each of the services described below.
With so many to choose from, this article will take a look at some of the most valuable options for getting started when working with data.
The first step, as you might expect, is to register. Once you head over to the registration page, you can create an account to get started. You’ll need to provide a credit card number, but free trial services are provided, including resources for each of the services described below.
After registering, head over to the AWS Management Console page. From here, you can access the various services to power your project. Be sure to make note of which AWS Region you are starting your project in. The region will be displayed in the upper right hand corner of the console (see picture below). If you change regions, you won’t see your current work or projects;
AWS Regions are physical locations where AWS has data centers, and projects are tied to hardware housed in these specific locations. With that, here’s some of the services you’ll want to use for your next data project on the cloud.
VPC stands for virtual private cloud. While it’s not exactly a charged service in the same way that the remaining products below are, a VPC is essential for setting up your initial infrastructure, and serves as a virtual network where you can connect various services from AWS.
A VPC will be created by default when you initialize an EC2 instance, but it is a good detail to keep in mind as you’ll want other services to run within the same VPC so they can communicate with one another. For many more details, see AWS’s VPC overview.
EC2 will provide you with a full featured cloud server, with a terminal interface. You can execute shell commands and administer privileges just as you would from your home computer.
First you select an image which includes an operating system and additional drivers. From there, you’ll select the physical architecture that you’ll rent out from AWS. At that point, you’ll be ready to launch the instance and can follow AWS’s guidelines for connecting to the instance whether its Linux based or Windows via SSH.
From there, you can use all of your normal command line tools, including scheduling reoccurring tasks using cron.
In addition to the computing power provided by EC2, you will also want to attach a cloud database to your project to manage information, documents and other files.
While EC2 is a normal machine which includes local storage, you will generally want to connect an EC2 instance to a separate storage service in order to allow you to scale the EC2 instance to meet the needs of your project. After all, one of the many advantages of a cloud infrastructure is the ability to scale systems on demand.
EC2 is not a good storage option, since if you shut down an instance that is no longer needed, all associated data will be erased. S3 provides data redundancy, ensuring that data is not lost and providing a safer medium which is easier to scale as your needs grow.
Setting up an S3 bucket is very straightforward. Return to the AWS console and click or search for, S3.
From there, you can spin up a bucket in a few clicks. Afterwards, you’ll undoubtedly want to connect this to your EC2 instance so that you can fuse your processing and storage powers. Here’s a quick guide on linking S3 and EC2.
For further details about storage options that you can attach to your EC2 instance, see AWS’s EC2 storage documentation.
Moving beyond S3, Amazon also provides fully managed database system in RDS. S3 itself can host any file type and is typically used for simple data or large files. Amazon Relational Database Server (RDS) cannot store arbitrary file types, but instead provides you with a managed database.
Currently RDS supports PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, and Amazon Aura, which is MySQL and PostgreSQL compatible. Again, setup is easy and straightforward to get started, simply click through a few screens to initialize a database server and you will be up and running.
Here’s the full documentation for setting up a database and documentation for linking the database to your EC2 instance.
AWS Lambda allows you to execute functions and code without creating and managing a server through EC2 you won’t have to worry about selecting an operating system or updating security patches.
You simply upload the code and pay for what compute time you use. Typically, this can reduce your costs by paying for only what you use, as opposed to paying for a resource bandwidth which may, or may not be consumed.
For example, one might have an ETL pipeline that needs to run on a regular cadence to meet reporting needs, performing filtering and aggregations on raw data stored in a database.
Such a script would be an important component to building a dashboard. While you could spin up an EC2 instance and create a cron job to run this script on a reoccurring schedule, you’ll have to pay for the EC2 resource whether or not the ETL script is running or not. By using Lambda, you could also schedule the script to execute on a regular schedule while only paying for services while that script runs.
The newest of the AWS services covered here, SageMaker is the go-to tool for machine learning. SageMaker provides several different benefits.
First, you can spin up Jupyter Notebooks directly from SageMaker, allowing you to interactively conduct exploratory data analysis and tune models on the fly. From there, SageMaker also allows you to publish these models into production and create endpoints which can be leveraged as a service, say from your website.
Finally, SageMaker also provides a valuable service for labeling the data itself. If you’re training a supervised learning algorithm, for example, SageMaker may be able to automatically label some of your data, or outsource datapoints to vetted individuals who can manually label cases in order to give you the data needed to build a predictive algorithm.
AWS has a myriad of services which can be daunting for a newcomer to navigate. Having a preliminary overview can help you get started and choose the right services based on your individual needs. Typically, starting with managed services will streamline your workflow, leaving you with more time and resources to focus on higher level tasks. In these cases, SageMaker, Lamda and RDS are good go to tools for productionalizing data pipelines.
If you need more flexible data storage options for files such as images that aren’t well suited for a database, S3 is the go to option, and if you need a full-fledged server for your project, then EC2 is the tool of choice. Wherever your needs, you can also find plenty more information and guidance on AWS’s getting started page.
Chisel provides end-to-end solutions for your data and analytics programs. Looking to get started migrating to the cloud? Contact us today and find the experts you need to get started.
You may not be ready for us now, but you’ll want to remember us when you are. Enter your email to stay updated on the latest in analytics and our services.