Hardware implementation is fundamentally different from software implementation. A lot of beginners are confused by the fact that hardware description languages such as Verilog resemble C programs or Python programs. Modules have the same basic ingredients: variables, conditional statements, and assignment. Verilog even allows some type of control flow, using blocking assignment inside always blocks, which can frequently cause people to use familiar yet complicated techniques that come up in software design.

Don't do it.

Last year I was a TA for an introductory digital design course, and a lot of students struggled once we were writing Verilog modules to implement complex communicating state machines or building combinational processes that operate sequentially, like a single-cycle microprocessor. Most of them understood the concepts of digital design, but were thrown off by how Verilog works and how it is converted into actual hardware, especially with regards to distinguishing combinational and sequential sections of circuits and writing appropriate code for each.

In software, functions and classes can take almost any form and provide a wide variety of behaviors. Entire books exist on different types of design patterns that should be used to simplify the task of implementing a set of classes to provide a specific interface or implementation tool. In (synchronous) digital hardware, however, there are two fundamental types of circuits: combinational circuits and sequential circuits. This simplifies the design process into identifying the appropriate type of circuit and then following the standard pattern for implementing it. The implementation, however, is usually clouded by the three distinct implementation styles possible with Verilog: structural, data flow, and behavioral. I will essentially ignore structural because it is equivalent to designing a circuit with a schematic, but in text form, so it avoids most of the complications of dealing with Verilog.

On an abstract level, combinational circuits describe those whose outputs are a continuous time function of their inputs. Basic combinational circuits should be implemented using data flow style code, with continuous assignment statements using the assign keyword. This means that a simple 2:1 mux primitive could be implemented like so:

module mux21(a, b, s, c);
    input a;
    input b;
    input s;
    output c;

    assign  c = (s == 1'b1) ? b : a;
endmodule

More complicated combinational circuits should be implemented using a combination of the two. Continuous assignment statements for simple logic, and behavioral statements for expressions that are most easily described with nested ifs or case statements. For instance, a 4-to-2 priority encoder is easily implemented using a casex statement:

module priority_encoder(in,  out);
    input [3:0] in;
    output [1:0] out;
    reg [1:0] out;

    always @* begin
        out = 2'b00;

        casex (in)
            4'b1xxx : out = 2'b11;
            4'b01xx : out = 2'b10;
            4'b001x : out = 2'b01;
            4'b0001 : out = 2'b00;
        endcase
    end
endmodule

A common problem with behavioral combinational circuits, however, is the accidental creation of transparent latches, where the output retains its previous value for one or more of the input cases. For the above priority encoder, if line 7, where out is assigned the default value of 0, were not present, then the synthesis tools would infer an automatic latch, because if in is equal to 0000, then out is not driven with a new value and the specification states that it should retain its previous value. This is not only the incorrect behavior but also wastes resources inside the device as a latch must be used, as well as additional LUTs to implement the latch enable, plus the actual logic function being implemented.

I propose instead the use of a simple template for all combinational circuits in order to create a hardware-specific mental approach. Declare all ports, internal wires, and variables at the top. Organize the file vertically as one would normally organize a schematic from left-to-right on a piece of paper: each block uses the results of the block(s) above it. Avoid having continuous time assignment use results that are calculated later by a behavioral block--create another continuous time assignment block AFTER the behavioral structure, to reinforce the signal path visually. Behavioral blocks themselves should be organized as a set of default values followed by a set of conditional statements to overwrite the values for specific input cases. Most importantly, try to organize the code so that it mirrors the circuit you are hoping to create: for decisions provide both true and false assignment information (unless one is the default value) to resemble a 2:1 mux and avoid using two disjoint if statements to assign a value because it creates a situation where the semantics of the blocking assignment operator become very significant, which makes it difficult for others (or yourself a few months later) to understand the original intentions of the code. Such if statements also obscure the parallelism inherent in the evaluation, because, while all of the if statements can be evaluated in parallel, some kind of additional glue logic must be invisibly added in order to arbitrate the final assignment to the enable signal. For instance, do not write code like this:

always @* begin
    enable = 1'b0;

    if (counter > 3'b100)
        enable = 1'b1;

    if (sload == 1'b1)
        enable = 1'b1;

    if (force_hold == 1'b1)
        enable = 1'b0;
end

This code obscures the true purpose of the force_hold signal by relying on the sequential evaluation of the always block to ensure that it overrides the counter and synchronous load signals. Understanding a complex circuit by reading the code once and forming an appropriate mental image can be tricky, so it is much more reasonable to nest if statements. Because this is hardware and not software, there are no branches and no additional penalties for having a large chain of if statements—if the two circuits are equivalent, then the same hardware should be generated in either case, but one makes the behavior more obvious. Nesting the control flow also makes the final physical tree-like mux implementation more apparent in the source code, allowing you to estimate the necessary resources more easily. Therefore, I would re-write the above block like this:

always @*
    enable = 1'b0;

    if (force_hold == 1'b0) begin
        if (counter > 3'b100 || sload == 1'b1)
            enable = 1'b1;
    end
end

This immediately makes it clear that the value of force_hold overrules the counter value and any synchronous load signals, so it would drive a 2:1 mux on the output, with the 0-value input taking its input from an earlier mux and the 1-value input taking the constant value of 0. The earlier mux would then be an optimized implementation of the conditional choice, with, for example, the counter condition as the control signal, the 1-value input as the constant 1, and the 0-value input as the synchronous load condition, implementing an OR function with a mux. The placement of the two conditions could also be swapped, but the synthesis tools will usually choose the correct assignment in order to optimize for the delay of each incoming signal.

From these examples, I can create a prototypical combinational Verilog circuit

module proto(i0, i1, i2, o0, o1, o2);
    input i0, i1;
    input [5:0] i2;
    output o0, o1, o2;

    // Redefine outputs that will be driven inside an always block
    reg o1, o2;

    // Internal signals here
    wire p1;

    // Continuous time assignments come first for simple expressions like bus parity or
    // logic expressions with only a few gates
    assign p1 = ^i2;
    assign o0 = i0 | i1;

    // Then afterwards behavioral blocks specify more complex relationships
    always @* begin
        // First, default values for everything assigned in this always block
        o1 = 1'b0;
        o2 = 1'b0;

        // Then, either per output or combined if they share enough of the decision tree
        // have the actual logic for computing outputs
        if (i2 > 6'b110000) begin
            if (i0 == 1'b1)
                o1 = p1;
            if (i1 == 1'b1)
                o2 = p1;
        end
        else begin
            o1 = 1'b1;
            o2 = 1'b1;
        end     
    end
endmodule

Writing simple combinational circuits that reflect the type of circuit being created, without being verbose, helps with synthesis. Tools are better at optimizing large circuits than people are (just like compilers are better at optimizing large programs than people are), but they must first be able to understand the circuits. Sticking to simple constructs and suggestive organization makes it easier for the tools to produce the simplest circuit without accidentally creating additional logic because of unintended semantics. Some tools (such as Altera's Quartus or Xilinx's ISE) allow you to view the schematic that was produced from the Verilog that you compiled; I highly recommend doing this for complex designs, especially if timing becomes an issue, to determine if the synthesis tools are doing what you want, or if you need to massage the source code into a cleaner format.

Only once a decent style has been developed for approaching combinational circuits is it even possible to begin creating successful, complex sequential circuits, because every sequential circuit must contain at least one combinational circuit for calculating the next state and the circuit's output. Next time: building proper edge-triggered sequential circuits using Verilog.