Cloud hosting differs from a normal unmanaged VPS because of the way that server instances are started and execute. Server data does not automatically persist between instances and each instance has unique IP address and default storage. With services like Amazon Web Services, it's possible to have persistent storage that is attached to an instance after booting, but the instance will always create a new root device based on the snapshot associated with the Amazon Machine Image (AMI) that was selected to be run. By existing within the cloud, the way that a server is accessed also changes, because all traffic must pass through the Amazon firewall first. As a consequence, configuring a basic AWS instance is a bit different from a normal unmanaged VPS, so, as I recently set up my own server, I'll describe the entire process for creating a basic web hosting server to run a small blog (such as this one).

Launching an Instance

The AWS Management Console is much easier to use than the command line tools because it displays all of the options for you, so I would recommend using it when possible. As a new user, I did everything except packaging an AMI from a snapshot using the AWS Management Console, because packaging an AMI from a snapshot can only be done with the command line tools. For the first instance you launch, if you plan on making use of the free usage tier (perfect for a small website, although you will likely run 1-2/month in storage costs for your image snapshots unless you decrease the size of your root device when you create your own AMI), select one of the two Basic Amazon Linux AMIs. After choosing an instance, you should choose which availability zone to launch in. For the first instance you ever use, you can leave this as "No Preference" but if you plan on attaching to persistent storage in the future, you must note which availability zones contain your persistent EBS Volumes and launch your server instance in the same zone. Snapshots, however, are available to launch AMIs in all zones. Currently, I'm running a 64-bit micro instance in the us-east-1b zone, because the micro instance is well suited to my needs: infrequent accesses (low traffic) with occasional usage spikes, where up to two virtual cores are available.

On the advanced options page, I chose to use all of the default settings. The AWS guidelines recommend turning on instance termination protection (which is free), but I didn't bother because the AMI will default to a Stopped state on shutdown. This means that the EBS that was created for the instance will not be deleted and it can be restarted without data loss. For the first instance, I planned on creating a snapshot to use for a custom AMI anyway, so I wanted to be sure that the instance and its associated EBS Volume disappear when I terminated it, by default. Similarly, on the tags page, because I had not configured any special cloud-init scripts, I didn't need to tag my instances with anything, although I chose to add the domain name to the instance "Name" tag.

The first time I launched the instance, I generated a keypair, which is a relatively straightforward process. Because I am using Windows, after downloading the keypair, I had to import it for use with PuTTY, which is also very easy, using puttygen to convert the key. I simply loaded the keyfile that was downloaded from Amazon and then saved the private key in PuTTY's internal format. I also loaded the key into pageant so that I can use it with psftp or for tunneling with svn+ssh later on. The final system that I configured does not allow key-less authorization, as is recommended by Amazon, so it is necessary to expose the key to all of the local programs that I will need to use to communicate with my server.

Finally, for the firewall, I created a new policy. This firewall uses a whitelisting-only approach where all traffic is blocked by default, so you can only specify which blocks of addresses may access specific ports on the machine. This is great for basic firewall, as you may block all traffic to a port or allow only yourself to access the machine via SSH, but it is insufficient for conveniently blocking malicious addresses without either denying access to all of your users or adding a large number of security rules manually. For this you will still need to fall back to traditional approaches such as iptables. However, for restricting access to SSH, this works great, so initially I added rules for SSH to allow access only from my home machine, and I added additional rules to allow SMTP, HTTP, and HTTPS traffic from all addresses. Here, the source must be specified in CIDR notation, such as 107.20.240.83/32 or 0.0.0.0/0 for all incoming addresses.

After the instance launched, I obtained an Elastic IP address so that I could use a common IP for all of my DNS records, and assigned it to the running instance. The public DNS name of each instance is assigned at boot time and will not be the same from run to run, even for a "Stopped" instance that is restarted. Using an Elastic IP that is associated with an instance after it boots is the only way to ensure that it has a specific IP address.

Installing Server Software

Now that my instance was running, I connected with PuTTY using my private key to authenticate. The default user name on all AWS instances is ec2-user, which I felt like changing. So, I created for myself a user account, greg, with the permissions that I would normally expect to allow me to perform basic server administration tasks:

[ec2-user@thelonepole /]$ sudo useradd -g users,wheel -m greg

Much to my surprise, after changing to my new account, I was unable to use sudo to perform any tasks. This is because, by default, on the Linux AMIs, the /etc/sudoers file does not have an entry for wheel, even though the group exists and ec2-user is a member. I chose to simply add an entry for my new account to allow me to use sudo without a password, in the same style as ec2-user was originally configured:

greg ALL = NOPASSWD: ALL

Now that my account was configured, it was time to install some basic server applications: Apache and MySQL, with PHP 5.3. Luckily, all of these applications are available properly compiled for a virtualized server from the built-in yum repository. I come from a Gentoo background, so the naming seemed a little bit strange here and there, as well as the separation of basic features into distinct "packages," rather than USE flags, but every distribution has to be different somehow (and, USE flags only really work when you're compiling from source). As root, use yum to install packages (or with sudo):

[root@thelonepole /]# yum install httpd

In total, to get a basic server with support for PHP 5.3, MySQL, and SSL connections, I had to install these packages: httpd, mod_ssl, mysql, mysql-server php (which includes php-cli and php-common), php-gd, php-mcrypt, php-mysql, php-pdo, and php-mbstring. If you plan on using it, APC is also available as php-pecl-apc. Not all of the php modules are available, but presumably you can compile from source or from PECL if you need one something that is unavailable.

With the binaries installed, all of these services had to be configured. Luckily, for PHP, the default settings were pretty much perfect, as they disable registerglobals and magicquotes_gpc. The default memory limit is a bit high at 128M, but this shouldn't actually be a problem because it doesn't reserve this memory. A correct php.conf is added to the apache config directory, so PHP will be enabled by default next time you start the httpd service.

Next was apache. The default configuration structure for Apache on the Linux AMI is a little bit different from the Gentoo structure (which, of course, I think is more logical), so I had a few small hiccups while setting things up. All configuration is stored in /etc/httpd, with httpd.conf being stored in /etc/httpd/conf/ and further generic configuration files (which are automatically included) being stored in /etc/httpd/conf.d/. My DNS entries have multiple CNAMEs for the various services I host on my site, including one for webmail and one for this blog. In order to set this up from apache's perspective, I used the NameVirtualHost feature to enable vhosts, and then I configured all vhosts by name, NOT by IP address. This was very important because the Elastic IP address that is used for the server is not exposed to the server instance itself: with ifconfig it is easy to see that eth0 is assigned a private address behind a NAT. Therefore, although apache can bind to the Elastic IP, it'll never receive any traffic on it, and the vhosts won't work, so, due to the unpredictability of IP addresses for a server instance, it is most useful to bind to all incoming addresses and do all vhost configuration by name only. The end result was that my vhosts.conf file looks pretty standard, only with no IP-based vhosts:

NameVirtualHost *:80
NameVirtualHost *:443

# Default entry as fall-through, also handles www subdomain, matches main server config
<VirtualHost *:80>
SSLEngine Off
ServerAdmin greg@thelonepole.com
DocumentRoot /var/www/html
ServerName www.thelonepole.com
</VirtualHost>

<VirtualHost *:80>
SSLEngine Off
ServerAdmin greg@thelonepole.com
DocumentRoot /var/www/blog
ServerName blog.thelonepole.com
</VirtualHost>

I also set up several SSL-enabled vhosts to secure my webmail traffic, but I will talk about those another day, because their set up was almost completely routine and therefore not really a part of my experience with AWS. However, there is one important thing to mention about the default configuration file included in the mod_ssl package, which will cause a problem whether you plan on using SSL immediately or not. The default ssl.conf file includes an enabled VirtualHost entry that references several nonexistent keys. I am unsure why those keys come up in there, but for some reason they do, so the best thing to do is either create the keys or delete the configuration. I chose to go the latter route because it was faster to, so I removed the entire VirtualHost entry from ssl.conf. The rest of the file, which sets up global options for the SSL engine, however, is very useful and serves as a fine set of default settings.

With Apache successfully configured, this leaves only mysqld. Luckily, only the root password needs to be set from the command line, and the rest can be configured from a much more straightforward interface, such as phpmyadmin. Unfortunately, like the systemwide root password, it appears that the MySQL root password is scrambled during the mysql-server package install process. This created a huge headache for me at first, but the MySQL documentation includes useful information for bypassing the root password at start time and configuring the server automatically:

[root@thelonepole /]# mysqld --skip-grant-table
[root@thelonepole /]# mysql_secure_installation

The secure installation script also allowed me to remove anonymous access and delete the temp table during setup, saving me time later on. Then, after unpacking a phpadmin tarball into the webroot, it was easy to log in to MySQL as root and add additional users.

After all services are configured properly, I had to add them to be started automatically. This is done on CentOS using the chkconfig tool, so to add mysqld and httpd to start in runlevels 3, 4, and 5 (if the system boots into runlevel 2, we are not going to get anywhere anyway, so it doesn't really matter), I only issued two commands:

[root@thelonepole /]# chkconfig --level 345 httpd on
[root@thelonepole /]# chkconfig --level 345 mysqld on

At this point, I started both Apache and MySQL with the service interface used on CentOS:

[root@thelonepole /]# service httpd start
[root@thelonepole /]# service mysqld start

and I was able to access both standard html files and phpmyadmin, as well as manage my databases. With the basic server configuration done, only one thing remained: preparing an AMI so that next time I launch an instance, I do not need to do any configuration.

Creating an AMI

AMIs are backed by snapshots of instances that were attached to running server instances. This means that an AMI cannot be created from an EBS Volume directly. In fact, you cannot create an AMI through the current AWS Management Console unless you first export a snapshot to Amazon S3, something that I have had no interest in doing. Assuming that you're able to set up the EC2 tools as described in the EC2 User Guide, it is very easy to create a new AMI from a snapshot of the EBS volume.

First, from the Management Console, I created a new snapshot of the volume attached to my running server instance from Volumes section of the EC2 management tab. When the snapshot was ready, I opened up my local command prompt and ran straight through the configuration process described in the setting up your tools section of the User Guide. There is one important omission from the User Guide, however: the JAVA_HOME environment variable must be set. On Windows it will usually look like this:

C:\> set JAVA_HOME="c:\program files\java\jre6"

From here there is only one command required to create a new AMI from the snapshot:

C:\> ec2-register -n Image_Name -d Image_description --root-device-name /dev/sda1 -b /dev/sda1=[snapshot-id]:[size]

Note: Imagename and Imagedescription cannot contain spaces. The Management Console doesn't seem to parse them properly anyway, so there isn't much reason in putting details into the description in particular.

This creates a private AMI backed by EBS storage (creating one with instance storage is not recommended) that will start by creating an EBS volume from the given snapshot. Note that the snapshot-id is NOT its Name value (the Name field is just for human administrators), but actually the value listed in Snapshot-ID in the Management Console. The size is specified in GiB, so for a default instance based on the Amazon Basic Linux AMI it would be 8, but it can also be left blank, which will cause AWS to infer its value from the listed size of the snapshot (not disk usage). The size parameter can also be used to increase the size of a drive, which is great if you are nearing maximum capacity on your server root. If you have multiple snapshots that you need to attach at boot time, for instance if you mount /var or /home on a separate drive, then additional -b parameters should be given, such as -b /dev/sdf=[snapshot-id]. For some reason, Amazon recommends attaching secondary drives using /dev/sdf through /dev/sdp and NOT numbering partitions.

Personally, I use a multiple drive set up where I've separated the base operating system from my data, so most of /var/ is on a separate drive that is attached at boot time. By separating my OS from my configuration data, I can swap out the underlying system without having to migrate my data, so long as both systems are configured to look in the same place. One caveat of such a system is that because I use snapshots to generate EBS Volumes at boot, if I run two servers at once, I will get two separate drives, which can create an issue for synchronizing data. I feel like there are two ways to resolve this for multi-server systems, but I haven't really explored either option fully because I don't need to run two servers for performance. (1) Set up a server that has the "real" drives attached to it and then expose them as NFS volumes to the other servers, so that they immediately see each others' modifications and do not have write conflicts. (2) Set up dedicated servers for each service, as in a normal infrastructure situation: one for handling email, one (or more) for hosting content, and one master plus multiple slaves for handling databases, along with a load balancer for directing traffic. I think (1) is acceptable as a transition solution (the drive is shared but the servers are heterogeneous so that there are no database conflicts) or in specific situations (such as serving static content on multiple IP addresses) but would not scale or survive long in a high traffic environment. I think that (2) is more robust because it forces all data to be written in one place (so there aren't two separate databases trying to write to the same files) although it will be more expensive to run that many dedicated servers. I will revisit this in the future and explain how I've moved things like /var/www off of my main drive onto a data drive, as well as tricks for connecting to persistent EBS Volumes (not ones based off of snapshots) at boot time, using the cloud-init scripts.

This just about covers all of the basic system configuration that I had to do to get a small web server running using AWS. I also configured sendmail so that I could send and receive email, as well as an IMAP daemon (cyrus) for use with webmail (squirrelmail) and deployed self-signed SSL certificates to encrypt sensitive traffic (such as webmail and phpmyadmin). I also created my own (untrusted) CA certificate in order to sign certificates for use with client authentication, to further restrict access to the webmail and phpmyadmin services, but all of that configuration is beyond the scope of simply setting up an AWS server and will have to wait for another day.