After setting up a basic web server, I wanted to secure access to privileged resources using SSL. Normally, if I were dealing with customers or others who need to establish some kind of trust with me from the outside world, I would have to purchase a properly signed and verified SSL certificate from VeriSign or another Certificate Authority. However, because my goal is to secure internal resources that must be exposed publicly due to the nature of the internet (webmail and phpMyAdmin), it is cheaper (read: free) to create and sign my own certificates. End users do not need to trust me, and because I can personally verify the origin of my CA certificate, I do not need one signed by an external authority. I can then use this to sign additional certificates in order to create an area of my server that requires a user to present a certificate signed by my personal CA certificate in order to proceed, allowing very tight access control over sensitive areas.

Using the 64-bit Basic Linux AMI, OpenSSL is already installed. The configuration file, openssl.cnf is located at /etc/pki/tls/openssl.cnf. Luckily, it has a pretty reasonable set of defaults, and in particular the directory tree is already set up, with appropriate permissions (0700) on the directory used for storing the CA private key. Because openssl.cnf points to the CA certificate and the private key directly, I chose to rename my keys from the default value and entered modified the values for certificate and private_key to reflect this. With the basic OpenSSL configuration complete, it was time to create a self-signed CA certificate, set up the directory structure, and configure apache to use mod_ssl.

Signing the CA certificate

Creating a certificate that you can use to sign other requests is very straightforward:

[root@thelonepole /]# openssl req -newkey rsa:2048 -keyout tlpCA.key -x509 -days 365 -out tlpCA.crt

This creates a new X.509 format certificate along with its private key. It is automatically self-signed (this form of the command combines the key generation step, certificate request step, and self-signing step into a single command). My certificate requires that a password be entered when it is used to sign a certificate request, but for a small private server with exactly one trusted user the password can be removed by adding the -nodes flag, for convenience. You may also want to change the number of days for which the certificate is valid so that you don't have to recreate it every year (for a CA certificate, you might as well set this to 10 or more years).

Once the certificate and key were ready, I moved them to the locations earlier specified in openssl.cnf for certificate and private_key. Then, I had to create the file used to track the next available serial number. For some reason this isn't created by default, possibly because of some obscure vulnerability when the next available serial number is known, but for our purposes, we are in a relatively low security situation, so starting at 1 is pretty reasonable:

[root@thelonepole /]# touch /etc/pki/CA/serial
[root@thelonepole /]# echo 1 > /etc/pki/CA/serial

If this step is omitted, openssl will complain when it comes time to sign certificate requests because it does not know which serial number to fill in.

Signing Other Certificates

At this point, with the CA certificate in place, it is easy to create and sign certificate requests. It is necessary to create a different certificate for each subdomain that you wish to be able to use SSL with, because the certificate will include the domain that it is valid for in its path. If this doesn't match the ServerName directive, it will generate a warning when starting apache, and users (even those who have added your CA certificate to their trusted store, i.e. you) will also see a warning in their browsers when using SSL. Creating and signing a certificate is done with two commands:

[root@thelonepole /]# openssl req -newkey rsa:2048 -nodes -keyout webmail.key -out webmail.csr
[root@thelonepole /]# openssl ca -in webmail.csr -out webmail.crt

Note that the certificates that are created for use with apache should be done WITHOUT a password, using the -nodes flag. If you don't do this, you'll be prompted for a password each time you start apache, which is a huge problem if your server goes down and/or is automatically restarted and you aren't available to enter the password. The certificates generated by the above command will also be valid for the period of time specified in your openssl.cnf. If you want them to last longer, be sure to specify -days XXX.

Copy these certificates to a known location, such as /etc/pki/tls and note their location so that it can be used when configuring apache. You will also want to create a certificate (in exactly the same way) for yourself, which will be used for client authentication for SSL-enabled subdomains as well. However, for a client certificate, there is an additional step: it must be converted from X.509 format to the PKCS #12 format that is often the only one understood by browsers. The single output file will combine both the signed certificate and the private key, and you will have it on any computer or device that you expect to use to access your secure areas, so BE SURE to use a strong export password when prompted during this step. The conversion can be done with one command, which takes in your private key and signed certificate and exports it as a client certificate in PKCS format:

[root@thelonepole /]# openssl pkcs12 -export -clcerts -out greg.p12 -in greg.crt -inkey greg.key

Then, transfer the PKCS file to your computer and install it in your web browser, usually with sftp. Under chrome, I had the option to install it specifically as a key for client authentication, but I am not sure if all browsers support the categorization in this manner. You will also need to install a copy of your CA certificate into the trusted root certificate store, or the browser will not accept the client certificate. Luckily you can simply download the X.509 format certificate from your server and tell your browser about it without converting it (which wouldn't make sense anyway because it'd require the private key…).

Configuring mod_ssl

Finally, there are several directives to add to the VirtualHost entries in order to configure them to use SSL. First, if you intend to only allow SSL to be used for a specific subdomain, like I do with my webmail, create an entry for normal HTTP that redirects all traffic to HTTPS:

<VirtualHost *:80>
    ServerName webmail.thelonepole.com
    Redirect permanent / https://webmail.thelonepole.com
</VirtualHost>

Next, a VirtualHost entry must be created that has the desired SSL settings, listening on port 443. Obviously, it must include SSLEngine On in order to enable SSL, but there are actually several extra directives that should be included in order to have client authentication work. Be sure to set SSLCACertificateFile to point to your local CA certificate and enable client authentication with SSLVerifyClient. In order to require that clients use certificates directly signed by your CA certificate, set SSLVerifyDepth to 1. I've included a complete VirtualHost entry as an example, based on the configuration I use here.

<VirtualHost *:443>
    ServerAdmin greg@thelonepole.com
    DocumentRoot /var/www/webmail
    ServerName webmail.thelonepole.com
    ErrorLog logs/webmail-error_log
    CustomLog logs/webmail-access_log combined
    SSLEngine On
    SSLCACertificateFile /path/to/CA/tlpCA.crt
    SSLCertificateFile /path/to/apache/ssl/webmail.crt
    SSLCertificateKeyFile /path/to/apache/ssl/webmail.key
    SSLVerifyClient require
    SSLVerifyDepth 1
    SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown
    CustomLog logs/webmail-ssl_request_log "%t %h %{SSL_PROTOCOL}x %{SSL_CIPHER}x \"%r\" %b"
</VirtualHost>

At this point, simply restart apache, and, if DNS was already set up, you should be prompted for a key when accessing your subdomain. If you have not added your key to the browser's store (or are interested in verifying that it requires a client-supplied key and use an unconfigured browser), you should see a message notifying you that an error with code "ssl_error_handshake_failure_alert" happened during SSL negotiation.

This has been updated after some feedback that I received from a more experienced FPGA designer explaining that asynchronous reset is an inappropriate default approach because it can cause partial resets, metastability, and partially correct computations as it moves across the device. Therefore, a synchronous reset should be expected, with a dedicated reset synchronizer (not shown or considered here)

Last time I wrote about how to organize a basic Verilog module that implements a combinational circuit in order to improve clarity for readers and synthesis tools alike. Implementing a sequential circuit correctly is a lot more challenging because it requires an internal combinational circuit. For my discussion I will largely focus on Altera's tools, because they are what I use on a day-to-day basis, but I would expect the same principles to apply to code targeting Xilinx or another vendor.

The traditional approach to designing a sequential circuit begins by creating a state diagram. In a classroom setting, students would then go on to pick a state encoding, create transition tables, and then optimize the combinational logic for implementation. Luckily, as an engineer I don't have to do things the academic way, and I can instead rely on the synthesis tools to optimize the combinational circuit for me. With a good set of tools, it is actually possible for the synthesis step to choose a better state encoding (to minimize logic usage, propagation delay, or register usage, as necessary) than one that I will choose arbitrarily, which means that I can focus primarily on making sure the transition and output logic is correct. Frequently, I don't even create a state diagram on paper, instead setting up the template that I normally use and then writing in the state transitions directly while defining the state machine's behavior.

With that as the goal, a specific approach to writing state machines is necessary to ensure that it's clear to the synthesis tools and to other readers what is going on. There are two options: one is to use a graphical state machine builder, which I would obviously prefer not to do, as Verilog is usually faster to write and easier to maintain, because clicking is not an efficient way to transfer input to the computer (at least, not as efficient as the keyboard, in this case). The other option is to use language features to alert the synthesis tools that you are creating a state machine. In Quartus II, this can be done using the parameter data type to create state variable templates, or, alternatively, I prefer using the `define keyword to define states as constants. Both will be picked up by the synthesis tools and allow for state remapping if desired. Although Altera recommends using parameter, I prefer the use of `define because it does not expose the state variables to the hierarchy as modifiable in the way a module parameter does. Note however that the Altera tools compile all the files in a shared namespace, so all `define'd symbols must have unique names. This is easy to do with a consistent naming scheme, personally I use STATE_<MOD>_<STATE NAME>, with MOD being up to 5 letters of abbreviation for the module being implemented.

One important behavior to note here is that Altera has different options for state machine synthesis, some of which increase the number of bits used for state encoding (such as one-hot encoding). Therefore, it may not always be a good thing for the synthesis tools to infer a state machine—if explicit structural storage is instantiated (such as lpm_ff) then the state machine will not be optimized by the compiler. The state machine can also be specifically set to use "user-defined" encoding by placing a synthesis directive on the same line as the state variable's declaration. All of the options for state machine encoding are specified in this document describing Quartus II Synthesis options.

(* syn_encoding = "user" *) reg [1:0] state;

Structurally, it is important to separate storage logic from state calculation logic, for a couple of reasons. First, the prevailing belief is that blocking assignments (with the = operator) should be used for combinational logic only, while non-blocking assignments (with the <= operator) should be used for sequential logic only. Assignment types should never be mixed within a single behavioral block because unintended or unpredictable simulation and synthesis behavior can result from differences in interpretation. Therefore, use non-blocking assignment in an edge-sensitive behavioral block to implement the storage logic, and use blocking assignment in a combinational behavioral block to implement next state and output.

Second, it makes the module easier to maintain if the its organization reflects the hardware structures that it is implementing, and it is more likely for the synthesis tools to produce the most optimal implementation. In particular, when dealing with enable, synchronous or asynchronous preset or clear signals, and other features normally associated with the flip flop elements themselves (as opposed to a part of the combinational logic before the flip flop), on a modern FPGA these features can either be implemented as logic inside the LUTs or they can be implemented using the dedicated enable, preset, and clear lines of the storage element within the LE. In order to make it clear to the tools that an enable is being used, the conditional use of the enable should be in the storage block, not the combinational block. For example, consider the following two implementations of a simple vending machine that accepts two coin types (1 and 2, where coin 2 is worth two of coin 1) and counts up to a total of three of coin 1. Then it issues a vend signal and returns to the idle state, with no change calculations, for simplicity.

/** Implementation A */
assign vend = (state == `STATE_EX_S3);

always @* begin
    nextState = state;

    case(state)
        `STATE_EX_IDLE : begin
            if (coin1 == 1'b1) begin
                if (enable == 1'b1)
                    nextState = `STATE_EX_S1;
            end
            else if (coin2 == 1'b1) begin
                if (enable == 1'b1)
                    nextState = `STATE_EX_S2;
            end
        end
        `STATE_EX_S1 : begin
            if (coin1 == 1'b1) begin
                if (enable == 1'b1)
                    nextState = `STATE_EX_S3;
            end
            else if (coin2 == 1'b1) begin
                if (enable == 1'b1)
                    nextState = `STATE_EX_S3;
            end
        end
        `STATE_EX_S2 : begin
            if (coin1 == 1'b1) begin
                if (enable == 1'b1)
                    nextState = `STATE_EX_S3;
            end
            else if (coin2 == 1'b1) begin
                if (enable == 1'b1)
                    nextState = `STATE_EX_S3;
            end
        end
        `STATE_EX_S3 :  begin
            if (enable == 1'b1)
                nextState = `STATE_EX_IDLE;
        end
    endcase
end

always @(posedge clk) begin
    if (reset_L == 1'b0) begin
        state <= `STATE_EX_IDLE;
    end
    else begin
        state <= nextState;
    end
end
/** Implementation B */
assign vend = (state == `STATE_EX_S3);

always @* begin
    nextState = state;

    case(state)
        `STATE_EX_IDLE : begin
            if (coin1 == 1'b1) begin
                nextState = `STATE_EX_S1;
            end
            else if (coin2 == 1'b1) begin
                nextState = `STATE_EX_S2;
            end
        end
        `STATE_EX_S1 : begin
            if (coin1 == 1'b1) begin
                nextState = `STATE_EX_S3;
            end
            else if (coin2 == 1'b1) begin
                nextState = `STATE_EX_S3;
            end
        end
        `STATE_EX_S2 : begin
            if (coin1 == 1'b1) begin
                nextState = `STATE_EX_S3;
            end
            else if (coin2 == 1'b1) begin
                nextState = `STATE_EX_S3;
            end
        end
        `STATE_EX_S3 :  begin
            nextState = `STATE_EX_IDLE;
        end
    endcase
end

always @(posedge clk) begin
    if (reset_L == 1'b0) begin
        state <= `STATE_EX_IDLE;
    end
    else if (enable == 1'b1) begin
        state <= nextState;
    end
end

When I synthesize both of these files, I would normally expect the same result as they are logically equivalent. However, because the default assignment of nextState is not gated by the enable signal while all of the state transitions are, the synthesis tools are unable to push the enable condition through up to the register, and implementation A requires two LEs more than implementation B. Unfortunately, due to the state machine inference there are further problems: the enable signal is inferred as a part of the state transitions (technically, it is, although not many think about it that way because it affects all of state transitions in the same way). If I simply replace the behaviorally generated flip flop with an explicit structural D-Flip Flop using the Altera primitives, the same relationship exists (A requires two more LEs than implementation B), but both implementations require two fewer LEs overall:

/** Implementation A */
lpm_ff #(.lpm_width(2)) storage(.clock(clk), .sclr(!reset_L), .data(nextState), .q(state));

/** Implementation B */
lpm_ff #(.lpm_width(2)) storage(.clock(clk), .sclr(!reset_L), .data(nextState), .q(state), .enable(enable));

A quick look at the RTL Viewer shows that the discrepancy between implementations A and B is due to the enable signal not being pushed into the flip flops and instead being implemented in logic.

RTL For Implementation A

vs.

RTL For Implementation B

We can see that the same six multiplexers (more or less) are used to implement the state transition logic, but for implementation A, the enable signal is being handled purely in logic before the state transition logic, whereas in implementation B it is correctly passed off to the enable signal on the D Flip Flop that is already available in the LE's register, with no additional logic cost. As a result, be sure to separate signals that are (or can be) associated with the storage elements themselves and handle them near the storage element, either inside the appropriate sequential behavioral block or with the appropriate input on the structural storage element, rather than in with the rest of the next state logic. This is both cleaner to read and understand and it will generally produce better synthesis results.

Third, and last for now, on the list of file organization techniques is the use of the case statement (or its relatives, casex and casez) inside the combinational block for next-state logic, unless building a counter or other type of state machine where the state transitions are systematic, in which case simple statements like nextState = state + 1'b1; are sufficient. If a counter is being used inside another state machine to determine timing, then I believe it is sufficient to increment the counter as a default option and reset or modify it as necessary in the actual states that have logic governing their progression. While examining the earlier vending machine example, when the resource usage was significantly higher than expected, I found that if the vend signal was assigned inside the combinational block and the state-handling logic was an if-else if tree, three unnecessary multiplexers were generated to drive the output, although the state machine logic was still calculated correctly. If, instead, the case structure were used or the continuous assignment that I finally settled for (in keeping with my guidelines for combinational logic—it was simple enough to be stated as continuous assignment, therefore it should be handled that way), then the multiplexers did not appear. I believe this is due to a difference in representation for if-else trees and case statements internal to the synthesis tools, and because the case structure is arguably more reasonable in one's source code because behavior must be defined for every state, more or less, the tools handle it much better.

A potential fourth rule that some people would advocate is physically separating the logic for calculating the next state and the logic for calculating the machine outputs into two behavioral blocks, but I make no recommendation here. For Moore-type machines, this can be reasonable because it provides visual clues as to what each block corresponds to on the typical mental image that most people have for a Moore-type machine (input -> combinational logic -> flip flops -> output logic -> outputs), but personally I find that this just leads to more coding (or copy/pasting the case structure) with no significant benefit. I also use primarily Mealy-type machines, where the separation argument doesn't make much sense, because the outputs will also depend on the input. As a result the two blocks would end up mirroring each other very closely in a structural sense, and, without the correct set of tool options, the synthesizer could generate two copies of the control/condition signals for the relevant multiplexers, wasting resources. I haven't had the opportunity (read: desire) to explore possible differences in synthesis for the two approaches, so I cannot provide definitive advice on which style is better.

Finally, with these three (four?) rules in mind, I present a general Verilog state machine template based on the guidelines described above. I also realized that it was more useful to provide this as a copy/paste-able fill-in-the-blanks template, rather than one with dummy logic already present. Plus, the earlier vending machine implementations also exemplify this, just without the description of what goes where.

// Verilog state machine template
module state_proto(clk, reset_L, ...);
    input clk;
    input reset_L;
    // ... other inputs/outputs

    // State declarations
    `define STATE_EX_S0     2'b00
    `define STATE_EX_S1     2'b01

    // State variables
    reg [1:0] state;
    reg [1:0] nextState;
    // ... more state variables for embedded state machines

    // ... Internal signals/wires

    // Continuous time assignment for simple outputs and internal signals

    // Next state and output logic
    always @* begin
        nextState = state;
        // ... default values for outputs and other state machines

        case (state)
            `STATE_EX_S0 : // ... transition logic/output logic
            `STATE_EX_S1 : // ... transition logic/output logic
            // ... more states
        endcase
    end

    // Storage, combine all state machine storage, with synchronous clear
    always @(posedge clk) begin
        if (reset_L == 1'b0) begin
            state <= `STATE_EX_S0;
            // ... other state defaults
        end
        else begin
            state <= nextState;
            // ... flop in other storage unconditionally
        end
    end

    // Structural storage, instantiate things like shift registers, FIFOs, etc. Control signals
    // are generated above
endmodule

Hardware implementation is fundamentally different from software implementation. A lot of beginners are confused by the fact that hardware description languages such as Verilog resemble C programs or Python programs. Modules have the same basic ingredients: variables, conditional statements, and assignment. Verilog even allows some type of control flow, using blocking assignment inside always blocks, which can frequently cause people to use familiar yet complicated techniques that come up in software design.

Don't do it.

Last year I was a TA for an introductory digital design course, and a lot of students struggled once we were writing Verilog modules to implement complex communicating state machines or building combinational processes that operate sequentially, like a single-cycle microprocessor. Most of them understood the concepts of digital design, but were thrown off by how Verilog works and how it is converted into actual hardware, especially with regards to distinguishing combinational and sequential sections of circuits and writing appropriate code for each.

In software, functions and classes can take almost any form and provide a wide variety of behaviors. Entire books exist on different types of design patterns that should be used to simplify the task of implementing a set of classes to provide a specific interface or implementation tool. In (synchronous) digital hardware, however, there are two fundamental types of circuits: combinational circuits and sequential circuits. This simplifies the design process into identifying the appropriate type of circuit and then following the standard pattern for implementing it. The implementation, however, is usually clouded by the three distinct implementation styles possible with Verilog: structural, data flow, and behavioral. I will essentially ignore structural because it is equivalent to designing a circuit with a schematic, but in text form, so it avoids most of the complications of dealing with Verilog.

On an abstract level, combinational circuits describe those whose outputs are a continuous time function of their inputs. Basic combinational circuits should be implemented using data flow style code, with continuous assignment statements using the assign keyword. This means that a simple 2:1 mux primitive could be implemented like so:

module mux21(a, b, s, c);
    input a;
    input b;
    input s;
    output c;

    assign  c = (s == 1'b1) ? b : a;
endmodule

More complicated combinational circuits should be implemented using a combination of the two. Continuous assignment statements for simple logic, and behavioral statements for expressions that are most easily described with nested ifs or case statements. For instance, a 4-to-2 priority encoder is easily implemented using a casex statement:

module priority_encoder(in,  out);
    input [3:0] in;
    output [1:0] out;
    reg [1:0] out;

    always @* begin
        out = 2'b00;

        casex (in)
            4'b1xxx : out = 2'b11;
            4'b01xx : out = 2'b10;
            4'b001x : out = 2'b01;
            4'b0001 : out = 2'b00;
        endcase
    end
endmodule

A common problem with behavioral combinational circuits, however, is the accidental creation of transparent latches, where the output retains its previous value for one or more of the input cases. For the above priority encoder, if line 7, where out is assigned the default value of 0, were not present, then the synthesis tools would infer an automatic latch, because if in is equal to 0000, then out is not driven with a new value and the specification states that it should retain its previous value. This is not only the incorrect behavior but also wastes resources inside the device as a latch must be used, as well as additional LUTs to implement the latch enable, plus the actual logic function being implemented.

I propose instead the use of a simple template for all combinational circuits in order to create a hardware-specific mental approach. Declare all ports, internal wires, and variables at the top. Organize the file vertically as one would normally organize a schematic from left-to-right on a piece of paper: each block uses the results of the block(s) above it. Avoid having continuous time assignment use results that are calculated later by a behavioral block--create another continuous time assignment block AFTER the behavioral structure, to reinforce the signal path visually. Behavioral blocks themselves should be organized as a set of default values followed by a set of conditional statements to overwrite the values for specific input cases. Most importantly, try to organize the code so that it mirrors the circuit you are hoping to create: for decisions provide both true and false assignment information (unless one is the default value) to resemble a 2:1 mux and avoid using two disjoint if statements to assign a value because it creates a situation where the semantics of the blocking assignment operator become very significant, which makes it difficult for others (or yourself a few months later) to understand the original intentions of the code. Such if statements also obscure the parallelism inherent in the evaluation, because, while all of the if statements can be evaluated in parallel, some kind of additional glue logic must be invisibly added in order to arbitrate the final assignment to the enable signal. For instance, do not write code like this:

always @* begin
    enable = 1'b0;

    if (counter > 3'b100)
        enable = 1'b1;

    if (sload == 1'b1)
        enable = 1'b1;

    if (force_hold == 1'b1)
        enable = 1'b0;
end

This code obscures the true purpose of the force_hold signal by relying on the sequential evaluation of the always block to ensure that it overrides the counter and synchronous load signals. Understanding a complex circuit by reading the code once and forming an appropriate mental image can be tricky, so it is much more reasonable to nest if statements. Because this is hardware and not software, there are no branches and no additional penalties for having a large chain of if statements—if the two circuits are equivalent, then the same hardware should be generated in either case, but one makes the behavior more obvious. Nesting the control flow also makes the final physical tree-like mux implementation more apparent in the source code, allowing you to estimate the necessary resources more easily. Therefore, I would re-write the above block like this:

always @*
    enable = 1'b0;

    if (force_hold == 1'b0) begin
        if (counter &gt; 3'b100 || sload == 1'b1)
            enable = 1'b1;
    end
end

This immediately makes it clear that the value of force_hold overrules the counter value and any synchronous load signals, so it would drive a 2:1 mux on the output, with the 0-value input taking its input from an earlier mux and the 1-value input taking the constant value of 0. The earlier mux would then be an optimized implementation of the conditional choice, with, for example, the counter condition as the control signal, the 1-value input as the constant 1, and the 0-value input as the synchronous load condition, implementing an OR function with a mux. The placement of the two conditions could also be swapped, but the synthesis tools will usually choose the correct assignment in order to optimize for the delay of each incoming signal.

From these examples, I can create a prototypical combinational Verilog circuit

module proto(i0, i1, i2, o0, o1, o2);
    input i0, i1;
    input [5:0] i2;
    output o0, o1, o2;

    // Redefine outputs that will be driven inside an always block
    reg o1, o2;

    // Internal signals here
    wire p1;

    // Continuous time assignments come first for simple expressions like bus parity or
    // logic expressions with only a few gates
    assign p1 = ^i2;
    assign o0 = i0 | i1;

    // Then afterwards behavioral blocks specify more complex relationships
    always @* begin
        // First, default values for everything assigned in this always block
        o1 = 1'b0;
        o2 = 1'b0;

        // Then, either per output or combined if they share enough of the decision tree
        // have the actual logic for computing outputs
        if (i2 > 6'b110000) begin
            if (i0 == 1'b1)
                o1 = p1;
            if (i1 == 1'b1)
                o2 = p1;
        end
        else begin
            o1 = 1'b1;
            o2 = 1'b1;
        end     
    end
endmodule

Writing simple combinational circuits that reflect the type of circuit being created, without being verbose, helps with synthesis. Tools are better at optimizing large circuits than people are (just like compilers are better at optimizing large programs than people are), but they must first be able to understand the circuits. Sticking to simple constructs and suggestive organization makes it easier for the tools to produce the simplest circuit without accidentally creating additional logic because of unintended semantics. Some tools (such as Altera's Quartus or Xilinx's ISE) allow you to view the schematic that was produced from the Verilog that you compiled; I highly recommend doing this for complex designs, especially if timing becomes an issue, to determine if the synthesis tools are doing what you want, or if you need to massage the source code into a cleaner format.

Only once a decent style has been developed for approaching combinational circuits is it even possible to begin creating successful, complex sequential circuits, because every sequential circuit must contain at least one combinational circuit for calculating the next state and the circuit's output. Next time: building proper edge-triggered sequential circuits using Verilog.

After switching from tr-ircd to InspIRCd 2.0, a lot of users complained that there was no longer a usermode to hide their idle time in /whois. In order to remedy this, I chose a random unused usermode (+l) and implemented m_hideidle.so, which allows users to prevent idle time from being shown in /whois. Opers with priv users/auspex will still be able to see idle time, because it is important for opers with the correct privileges to be able to see information that normal users cannot.

If you plan on using this module, be sure to add information to your HELPOP scripts to tell users that mode +l is available and what it does.

/******************************************************************************
 *    InspIRCd 2.0 - m_hideidle.so
 *    Greg Malysa, 2011.
 *
 *    This program is free software: you can redistribute it and/or modify
 *    it under the terms of the GNU General Public License as published by
 *    the Free Software Foundation, either version 3 of the License, or
 *    (at your option) any later version.
 *
 *    This program is distributed in the hope that it will be useful,
 *    but WITHOUT ANY WARRANTY; without even the implied warranty of
 *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *    GNU General Public License for more details.
 *
 *    You should have received a copy of the GNU General Public License
 *    along with this program.  If not, see &lt;http://www.gnu.org/licenses/&gt;.
 *
 ******************************************************************************
 *
 *    Third party module to provide usermode +l, which allows users to hide
 *    their idle time in /whois from normal users. Opers with users/auspex will
 *    always see idle times.
 *
 *****************************************************************************/

#include "inspircd.h"

// ModeHandler class for applying usermodes
class HideIdle : public ModeHandler {
public:
    HideIdle(Module *Creator) : ModeHandler(Creator, "hideidle", 'l', PARAM_NONE, MODETYPE_USER) {
        oper = false;
    }

    ModeAction OnModeChange(User *source, User *dest, Channel *channel, std::string &parameter, bool adding) {
        if (adding) {
            if (!dest-&gt;IsModeSet('l')) {
                dest-&gt;SetMode('l', true);
                return MODEACTION_ALLOW;
            }
        }
        else {
            if (dest-&gt;IsModeSet('l')) {
                dest-&gt;SetMode('l', false);
                return MODEACTION_ALLOW;
            }
        }
        return MODEACTION_DENY;
    }
};

// Module class interfaces with IRCD
class ModuleHideIdle : public Module {
    HideIdle hi;
public:
    ModuleHideIdle() : hi(this) {
        if (!ServerInstance-&gt;Modes-&gt;AddMode(&hi))
            throw ModuleException("Could not add +l (hide idle) usermode!");
        Implementation eventlist[] = {I_OnWhoisLine};
        ServerInstance-&gt;Modules-&gt;Attach(eventlist, this, 1);
    }
    virtual ~ModuleHideIdle() {}

    ModResult OnWhoisLine(User *user, User *dest, int &numeric, std::string &text) {
        // Check for the numeric about idle time and hide if necessary
        if (numeric == 317) {
            if (dest-&gt;IsModeSet('l') && !user-&gt;HasPrivPermission("users/auspex")) {
                return MOD_RES_DENY;
            }
        }

        // Default behavior is to show idle time
        return MOD_RES_PASSTHRU;
    }

    Version GetVersion() {
        return Version("Allows users to toggle their idle time being shown in /whois with usermode +l", 0);
    }
};

MODULE_INIT(ModuleHideIdle)

Cloud hosting differs from a normal unmanaged VPS because of the way that server instances are started and execute. Server data does not automatically persist between instances and each instance has unique IP address and default storage. With services like Amazon Web Services, it's possible to have persistent storage that is attached to an instance after booting, but the instance will always create a new root device based on the snapshot associated with the Amazon Machine Image (AMI) that was selected to be run. By existing within the cloud, the way that a server is accessed also changes, because all traffic must pass through the Amazon firewall first. As a consequence, configuring a basic AWS instance is a bit different from a normal unmanaged VPS, so, as I recently set up my own server, I'll describe the entire process for creating a basic web hosting server to run a small blog (such as this one).

Launching an Instance

The AWS Management Console is much easier to use than the command line tools because it displays all of the options for you, so I would recommend using it when possible. As a new user, I did everything except packaging an AMI from a snapshot using the AWS Management Console, because packaging an AMI from a snapshot can only be done with the command line tools. For the first instance you launch, if you plan on making use of the free usage tier (perfect for a small website, although you will likely run 1-2/month in storage costs for your image snapshots unless you decrease the size of your root device when you create your own AMI), select one of the two Basic Amazon Linux AMIs. After choosing an instance, you should choose which availability zone to launch in. For the first instance you ever use, you can leave this as "No Preference" but if you plan on attaching to persistent storage in the future, you must note which availability zones contain your persistent EBS Volumes and launch your server instance in the same zone. Snapshots, however, are available to launch AMIs in all zones. Currently, I'm running a 64-bit micro instance in the us-east-1b zone, because the micro instance is well suited to my needs: infrequent accesses (low traffic) with occasional usage spikes, where up to two virtual cores are available.

On the advanced options page, I chose to use all of the default settings. The AWS guidelines recommend turning on instance termination protection (which is free), but I didn't bother because the AMI will default to a Stopped state on shutdown. This means that the EBS that was created for the instance will not be deleted and it can be restarted without data loss. For the first instance, I planned on creating a snapshot to use for a custom AMI anyway, so I wanted to be sure that the instance and its associated EBS Volume disappear when I terminated it, by default. Similarly, on the tags page, because I had not configured any special cloud-init scripts, I didn't need to tag my instances with anything, although I chose to add the domain name to the instance "Name" tag.

The first time I launched the instance, I generated a keypair, which is a relatively straightforward process. Because I am using Windows, after downloading the keypair, I had to import it for use with PuTTY, which is also very easy, using puttygen to convert the key. I simply loaded the keyfile that was downloaded from Amazon and then saved the private key in PuTTY's internal format. I also loaded the key into pageant so that I can use it with psftp or for tunneling with svn+ssh later on. The final system that I configured does not allow key-less authorization, as is recommended by Amazon, so it is necessary to expose the key to all of the local programs that I will need to use to communicate with my server.

Finally, for the firewall, I created a new policy. This firewall uses a whitelisting-only approach where all traffic is blocked by default, so you can only specify which blocks of addresses may access specific ports on the machine. This is great for basic firewall, as you may block all traffic to a port or allow only yourself to access the machine via SSH, but it is insufficient for conveniently blocking malicious addresses without either denying access to all of your users or adding a large number of security rules manually. For this you will still need to fall back to traditional approaches such as iptables. However, for restricting access to SSH, this works great, so initially I added rules for SSH to allow access only from my home machine, and I added additional rules to allow SMTP, HTTP, and HTTPS traffic from all addresses. Here, the source must be specified in CIDR notation, such as 107.20.240.83/32 or 0.0.0.0/0 for all incoming addresses.

After the instance launched, I obtained an Elastic IP address so that I could use a common IP for all of my DNS records, and assigned it to the running instance. The public DNS name of each instance is assigned at boot time and will not be the same from run to run, even for a "Stopped" instance that is restarted. Using an Elastic IP that is associated with an instance after it boots is the only way to ensure that it has a specific IP address.

Installing Server Software

Now that my instance was running, I connected with PuTTY using my private key to authenticate. The default user name on all AWS instances is ec2-user, which I felt like changing. So, I created for myself a user account, greg, with the permissions that I would normally expect to allow me to perform basic server administration tasks:

[ec2-user@thelonepole /]$ sudo useradd -g users,wheel -m greg

Much to my surprise, after changing to my new account, I was unable to use sudo to perform any tasks. This is because, by default, on the Linux AMIs, the /etc/sudoers file does not have an entry for wheel, even though the group exists and ec2-user is a member. I chose to simply add an entry for my new account to allow me to use sudo without a password, in the same style as ec2-user was originally configured:

greg ALL = NOPASSWD: ALL

Now that my account was configured, it was time to install some basic server applications: Apache and MySQL, with PHP 5.3. Luckily, all of these applications are available properly compiled for a virtualized server from the built-in yum repository. I come from a Gentoo background, so the naming seemed a little bit strange here and there, as well as the separation of basic features into distinct "packages," rather than USE flags, but every distribution has to be different somehow (and, USE flags only really work when you're compiling from source). As root, use yum to install packages (or with sudo):

[root@thelonepole /]# yum install httpd

In total, to get a basic server with support for PHP 5.3, MySQL, and SSL connections, I had to install these packages: httpd, mod_ssl, mysql, mysql-server php (which includes php-cli and php-common), php-gd, php-mcrypt, php-mysql, php-pdo, and php-mbstring. If you plan on using it, APC is also available as php-pecl-apc. Not all of the php modules are available, but presumably you can compile from source or from PECL if you need one something that is unavailable.

With the binaries installed, all of these services had to be configured. Luckily, for PHP, the default settings were pretty much perfect, as they disable registerglobals and magicquotes_gpc. The default memory limit is a bit high at 128M, but this shouldn't actually be a problem because it doesn't reserve this memory. A correct php.conf is added to the apache config directory, so PHP will be enabled by default next time you start the httpd service.

Next was apache. The default configuration structure for Apache on the Linux AMI is a little bit different from the Gentoo structure (which, of course, I think is more logical), so I had a few small hiccups while setting things up. All configuration is stored in /etc/httpd, with httpd.conf being stored in /etc/httpd/conf/ and further generic configuration files (which are automatically included) being stored in /etc/httpd/conf.d/. My DNS entries have multiple CNAMEs for the various services I host on my site, including one for webmail and one for this blog. In order to set this up from apache's perspective, I used the NameVirtualHost feature to enable vhosts, and then I configured all vhosts by name, NOT by IP address. This was very important because the Elastic IP address that is used for the server is not exposed to the server instance itself: with ifconfig it is easy to see that eth0 is assigned a private address behind a NAT. Therefore, although apache can bind to the Elastic IP, it'll never receive any traffic on it, and the vhosts won't work, so, due to the unpredictability of IP addresses for a server instance, it is most useful to bind to all incoming addresses and do all vhost configuration by name only. The end result was that my vhosts.conf file looks pretty standard, only with no IP-based vhosts:

NameVirtualHost *:80
NameVirtualHost *:443

# Default entry as fall-through, also handles www subdomain, matches main server config
<VirtualHost *:80>
SSLEngine Off
ServerAdmin greg@thelonepole.com
DocumentRoot /var/www/html
ServerName www.thelonepole.com
</VirtualHost>

<VirtualHost *:80>
SSLEngine Off
ServerAdmin greg@thelonepole.com
DocumentRoot /var/www/blog
ServerName blog.thelonepole.com
</VirtualHost>

I also set up several SSL-enabled vhosts to secure my webmail traffic, but I will talk about those another day, because their set up was almost completely routine and therefore not really a part of my experience with AWS. However, there is one important thing to mention about the default configuration file included in the mod_ssl package, which will cause a problem whether you plan on using SSL immediately or not. The default ssl.conf file includes an enabled VirtualHost entry that references several nonexistent keys. I am unsure why those keys come up in there, but for some reason they do, so the best thing to do is either create the keys or delete the configuration. I chose to go the latter route because it was faster to, so I removed the entire VirtualHost entry from ssl.conf. The rest of the file, which sets up global options for the SSL engine, however, is very useful and serves as a fine set of default settings.

With Apache successfully configured, this leaves only mysqld. Luckily, only the root password needs to be set from the command line, and the rest can be configured from a much more straightforward interface, such as phpmyadmin. Unfortunately, like the systemwide root password, it appears that the MySQL root password is scrambled during the mysql-server package install process. This created a huge headache for me at first, but the MySQL documentation includes useful information for bypassing the root password at start time and configuring the server automatically:

[root@thelonepole /]# mysqld --skip-grant-table
[root@thelonepole /]# mysql_secure_installation

The secure installation script also allowed me to remove anonymous access and delete the temp table during setup, saving me time later on. Then, after unpacking a phpadmin tarball into the webroot, it was easy to log in to MySQL as root and add additional users.

After all services are configured properly, I had to add them to be started automatically. This is done on CentOS using the chkconfig tool, so to add mysqld and httpd to start in runlevels 3, 4, and 5 (if the system boots into runlevel 2, we are not going to get anywhere anyway, so it doesn't really matter), I only issued two commands:

[root@thelonepole /]# chkconfig --level 345 httpd on
[root@thelonepole /]# chkconfig --level 345 mysqld on

At this point, I started both Apache and MySQL with the service interface used on CentOS:

[root@thelonepole /]# service httpd start
[root@thelonepole /]# service mysqld start

and I was able to access both standard html files and phpmyadmin, as well as manage my databases. With the basic server configuration done, only one thing remained: preparing an AMI so that next time I launch an instance, I do not need to do any configuration.

Creating an AMI

AMIs are backed by snapshots of instances that were attached to running server instances. This means that an AMI cannot be created from an EBS Volume directly. In fact, you cannot create an AMI through the current AWS Management Console unless you first export a snapshot to Amazon S3, something that I have had no interest in doing. Assuming that you're able to set up the EC2 tools as described in the EC2 User Guide, it is very easy to create a new AMI from a snapshot of the EBS volume.

First, from the Management Console, I created a new snapshot of the volume attached to my running server instance from Volumes section of the EC2 management tab. When the snapshot was ready, I opened up my local command prompt and ran straight through the configuration process described in the setting up your tools section of the User Guide. There is one important omission from the User Guide, however: the JAVA_HOME environment variable must be set. On Windows it will usually look like this:

C:\> set JAVA_HOME="c:\program files\java\jre6"

From here there is only one command required to create a new AMI from the snapshot:

C:\> ec2-register -n Image_Name -d Image_description --root-device-name /dev/sda1 -b /dev/sda1=[snapshot-id]:[size]

Note: Imagename and Imagedescription cannot contain spaces. The Management Console doesn't seem to parse them properly anyway, so there isn't much reason in putting details into the description in particular.

This creates a private AMI backed by EBS storage (creating one with instance storage is not recommended) that will start by creating an EBS volume from the given snapshot. Note that the snapshot-id is NOT its Name value (the Name field is just for human administrators), but actually the value listed in Snapshot-ID in the Management Console. The size is specified in GiB, so for a default instance based on the Amazon Basic Linux AMI it would be 8, but it can also be left blank, which will cause AWS to infer its value from the listed size of the snapshot (not disk usage). The size parameter can also be used to increase the size of a drive, which is great if you are nearing maximum capacity on your server root. If you have multiple snapshots that you need to attach at boot time, for instance if you mount /var or /home on a separate drive, then additional -b parameters should be given, such as -b /dev/sdf=[snapshot-id]. For some reason, Amazon recommends attaching secondary drives using /dev/sdf through /dev/sdp and NOT numbering partitions.

Personally, I use a multiple drive set up where I've separated the base operating system from my data, so most of /var/ is on a separate drive that is attached at boot time. By separating my OS from my configuration data, I can swap out the underlying system without having to migrate my data, so long as both systems are configured to look in the same place. One caveat of such a system is that because I use snapshots to generate EBS Volumes at boot, if I run two servers at once, I will get two separate drives, which can create an issue for synchronizing data. I feel like there are two ways to resolve this for multi-server systems, but I haven't really explored either option fully because I don't need to run two servers for performance. (1) Set up a server that has the "real" drives attached to it and then expose them as NFS volumes to the other servers, so that they immediately see each others' modifications and do not have write conflicts. (2) Set up dedicated servers for each service, as in a normal infrastructure situation: one for handling email, one (or more) for hosting content, and one master plus multiple slaves for handling databases, along with a load balancer for directing traffic. I think (1) is acceptable as a transition solution (the drive is shared but the servers are heterogeneous so that there are no database conflicts) or in specific situations (such as serving static content on multiple IP addresses) but would not scale or survive long in a high traffic environment. I think that (2) is more robust because it forces all data to be written in one place (so there aren't two separate databases trying to write to the same files) although it will be more expensive to run that many dedicated servers. I will revisit this in the future and explain how I've moved things like /var/www off of my main drive onto a data drive, as well as tricks for connecting to persistent EBS Volumes (not ones based off of snapshots) at boot time, using the cloud-init scripts.

This just about covers all of the basic system configuration that I had to do to get a small web server running using AWS. I also configured sendmail so that I could send and receive email, as well as an IMAP daemon (cyrus) for use with webmail (squirrelmail) and deployed self-signed SSL certificates to encrypt sensitive traffic (such as webmail and phpmyadmin). I also created my own (untrusted) CA certificate in order to sign certificates for use with client authentication, to further restrict access to the webmail and phpmyadmin services, but all of that configuration is beyond the scope of simply setting up an AWS server and will have to wait for another day.