Wednesday, November 25, 2020

A Comparison of Formal and Simulation for a Simple, Yet Non-Trivial Design - Part 1

I've talked a lot about constrained random verification on the blog, but now it's time to branch out to formal verification. As a fun first post on the topic, I thought we could do a comparative study. We can take a design and write two verification environments for it, one using formal verification and the other using simulation, based on UVM. Once we're done, it should be very interesting to be able to look at them side-by-side and to do an analysis.

I thought long and hard about which one of the verification environments should come first. While it might have made more sense to start with simulation, as this is what is probably most familiar to many of you, excitement got the better of me and I decided to go with formal verification.

Description of the Design Under Test

Let's have a look at the design we'll be verifying. My goal was to keep the design as simple as possible, so that it doesn't require a lot of effort to understand, but for it to have a few interesting quirks that make verifying it (especially using simulation) a bit more complicated.

Our design under test (DUT) will be a device that can process shapes by performing various operations on them. It can process circles, rectangles and triangles. As operations, it can compute the perimeter or the area of the shape. For rectangles, it can also figure out if a given one is a square. We can only perform this query on a rectangle, though, so it won't be possible to ask, for example, whether a circle is a square. It can also figure out whether a triangle is equilateral or whether it is isosceles. As before, these queries can only be performed on triangles.

Our design will have a single special function register (SFR), called CTRL. The CTRL register will be used to configure the type of shape to process and the operation to perform on it. It will have the following layout:

CTRL register layout generated using https://github.com/wavedrom/bitfield

The contents of the SHAPE field are one-hot encoded:

Value Shape
'b001 CIRCLE
'b010 RECTANGLE
'b100 TRIANGLE

The coding of the OPERATION field is a bit more complicated:

Value Operation
'b000_0000 PERIMETER
'b000_0001 AREA
'b010_0000 IS_SQUARE
'b100_0000 IS_EQUILATERAL
'b100_0001 IS_ISOSCELES

The uppermost three bits define which shapes can be combined with the operation. Operations that contain 000 in these bits can be used with any shape. In all other cases, these bits will contain the code for the shape on which the operation can be performed.

Writing a reserved value for any of the fields will cause the write to be ignored, in which case the SFR will keep its value. Writing an illegal combination of SHAPE and OPERATION values will also cause the write to be ignored.

To make things more interesting, the DUT provides a way of changing the operation without changing the shape. This is done by writing the special value 111 to the SHAPE field, together with the new value of OPERATION. We'll call this special value KEEP_SHAPE. Having this mechanism makes it easier for software to only update the OPERATION field, without having to do a read-modify-write of the CTRL register. The resulting value of the SFR after this write operation has to contain a legal combination, otherwise the write will be ignored. For example, it is allowed to write KEEP_SHAPE together with IS_ISOSCELES if CTRL.SHAPE is TRIANGLE, because it would result in (TRIANGLE, IS_ISOCELES), which is a legal combination. A write of KEEP_SHAPE together with IS_SQUARE would be ignored in the previous situation, as this would result in (TRIANGLE, IS_SQUARE), which is an illegal combination.

A similar mechanism exists for the OPERATION field, where the special value 111_1111, called KEEP_OPERATION, behaves like KEEP_SHAPE.

A real design would need some more SFRs to configure the shapes and to read out the results. For simplicity, we'll skip those and focus our verification efforts on the CTRL register.

Our design will have a simple bus interface:

module shape_processor(
    input bit rst_n,
    input bit clk,

    input bit write,
    input bit [31:0] write_data,

    input bit read,
    output bit [31:0] read_data
    );

I didn't include an address signal to keep things simple. All bus operations will always target the CTRL register.

You can find the full code here, including the verification environment, if you want to dig deeper into it or to experiment with it yourself.

Formal Verification

Before we start writing anything, let's stop for a second and think about what we want to achieve. It's true that we could write a few assertions about what the design is supposed whenever this or that is written. This would mean focusing on the implementation.

What is the most important thing that the design requirements try to ensure? All of them are there to make sure that the DUT is always in a legal configuration, where it can properly perform its function. Requirements like "writes with reserved values are ignored" or "writes with illegal combinations are ignored" merely serve to support this core requirement. The cool thing about formal verification is that we can focus on these core requirements.

To check these core requirements, we'll use a gray-box verification approach. We'll look directly at the internal DUT SFR. This way we can start writing properties without having to write any kind of infrastructure for the bus interface.

Of course, we could create a separate model of the DUT SFRs and write our properties using that. We could update it based on the bus operations that execute on the interface, similarly to how we would do it in a UVM-based testbench. This would not only require extra effort to implement the model and the monitoring, but it would increase the state space, making it more difficult for the tool to achieve proofs.

I was previously against any kind of white-box at block level, but I have since softened my stance, including when doing simulation. I now find it to be an acceptable trade-off, as long as it is done responsibly and in moderation. It should also only be done on parts of the design that are architecturally "visible" and that are relatively stable.

When accessing the values of the DUT's internal CTRL register, we should try to protect ourselves from any implementation specific aspects that might change unexpectedly and cause a lot of rework on our side. For example, the designer might choose to define a single 32 bit vector, where the bit positions for the fields might match those in the specification. They might decide to have two separate signals, one for each field. There are many possibilities to choose from.

We can define our "references" to the internal signals using the following let statements:

let shape_in_sfr = shape_processor.ctrl_sfr.shape;
let operation_in_sfr = shape_processor.ctrl_sfr.operation;

Our design currently uses a struct for the CTRL register, but if this were to ever change, we would only need to update these let statements.

To make our code more readable, we should model the encodings for the SFR fields using enums:

typedef enum bit [2:0] {
  CIRCLE = 'b001,
  RECTANGLE = 'b010,
  TRIANGLE = 'b100,
  KEEP_SHAPE = '1
} shape_e;

typedef enum bit [6:0] {
  PERIMETER = 'b000_0000,
  AREA = 'b000_0001,
  IS_SQUARE = 'b010_0000,
  IS_EQUILATERAL = 'b100_0000,
  IS_ISOSCELES = 'b100_0001,
  KEEP_OPERATION = '1
} operation_e;

With the infrastructure laid out, it's time to get to work! The most basic check we can do is to ensure that the values of the CTRL SFR fields will never have any of the reserved values. For the SHAPE field we would have the following assertion:

no_reserved_shapes_in_sfr: assert property (
    !is_reserved_shape(shape_in_sfr));

The is_reserved_shape(...) function will return 1 when it is supplied a value that doesn't match any of the values defined in the shape_e type:

function bit is_reserved_shape(bit [2:0] val);
  return !(val inside { KEEP_SHAPE, CIRCLE, RECTANGLE, TRIANGLE });
endfunction

(Ideally, we would have used a $cast(...) to check whether the bit vector argument can be converted to the desired enum type, but I couldn't get this to work.)

If the DUT would not properly validate the value of the SHAPE field before writing it, then the assertion from above would catch it.

We would also need a similar assertion for the OPERATION field:

no_reserved_operations_in_sfr: assert property (
    !is_reserved_operation(operation_in_sfr));

While KEEP_SHAPE and KEEP_OPERATION are defined write values for the CTRL fields, it is not possible for the SFR to contain any of these values. We can check that the DUT doesn't latch KEEP_SHAPE by mistake using the following assertion:

no_keep_shape_in_sfr: assert property (
    shape_in_sfr != KEEP_SHAPE);

Also, for KEEP_OPERATION we would have:

no_keep_operation_in_sfr: assert property (
    operation_in_sfr != KEEP_OPERATION);

We also want to ensure that we only ever see legal combinations in the CTRL SFR. For this, we need an extra assertion:

only_legal_combinations_in_sfr: assert property (
    is_legal_combination(shape_in_sfr, operation_in_sfr));

The is_legal_combination(...) function models the rules that govern which SHAPES can be combined with which OPERATIONS:

function bit is_legal_combination(shape_e shape, operation_e operation);
  if (operation inside { PERIMETER, AREA })
    return 1;
  if (operation == IS_SQUARE)
    return shape == RECTANGLE;
  if (operation inside { IS_EQUILATERAL, IS_ISOSCELES })
    return shape == TRIANGLE;
  return 0;
endfunction

Ideally such modeling code should not be a one-to-one copy of the RTL code. Where the RTL compares bits, the modeling code uses enum values, which also makes it very readable.

It's not the task of the is_legal_combination(...) function to model the behavior of KEEP_SHAPE or of KEEP_OPERATION. It should never be called with any of these values. The fact that it returns 0 if any of these values is passed as an argument is an implementation artifact. Normally, in simulation, I would trigger a fatal error in such cases, to make it clear to the caller that it's doing something wrong. Sadly, I couldn't get the tool to do this.

The assertions we wrote up to now will prove that our DUT can never be put in a faulty configuration. Unfortunately, they will also hold for a DUT that completely ignores writes (assuming, of course, that it comes out of reset with a legal combination of fields). This is because we haven't made it a requirement for the DUT to update its fields based on the write data presented to it. We only said that should it update its fields, the resulting values must satisfy the assertions we wrote.

A very cheap way of making sure that the DUT updates its fields is to write a cover property that proves that each (proper) value of the SHAPE and OPERATION fields can be stored in the SFR. We'll use the term "proper" to refer to all modes other than the two KEEP_* modes.

We can check that a CIRCLE can be seen in the SHAPE field using the following cover:

circle_in_sfr: cover property (
    shape_in_sfr == CIRCLE);

This won't bring us much if CIRCLE is the reset value of the field. Simply coming out reset will be enough to show us a trace. We should also check that we can see a RECTANGLE or a TRIANGLE:

rectangle_in_sfr: cover property (
    shape_in_sfr == RECTANGLE);

triangle_in_sfr: cover property (
    shape_in_sfr == TRIANGLE);

The DUT will have to be able to update the field, otherwise we won't get traces for all of the properties.

We're assuming that the CTRL register can be reset, otherwise the tool could just show us the DUT starting up with any SHAPE value. In this case the covers would be trivially reachable. The starting value could also be a reserved value, though, which would be caught by our assertions. Indirectly, we are ensuring that the SFR is reset. Moreover, if the SFR would not have a reset signal, then the tool would generally warn us about this.

We should give the OPERATION field the same treatment to make sure that all its proper values are reachable. Writing a cover for each possible operation, like we did for shapes, could get pretty tedious, though. To save typing we could use a generate statement and have the tool do the looping for us:

for (genvar operation = 0; operation < 2**$bits(operation_e); operation++) begin
  if (operation inside { PERIMETER, AREA, IS_SQUARE, IS_EQUILATERAL, IS_ISOSCELES }) begin
    operation_in_sfr_seen: cover property (
        operation_in_sfr == operation);
  end
end

The code suffers from some SystemVerilog limitations. Genvars are integers, so the names of the generated blocks won't be very suggestive because they'll contain the numeric values of the enum literals, not the literals themselves. It's also not possible to loop over enums directly, which makes the code longer and invites duplication. (The SystemVerilog syntax for looping over enums is not that great anyway, but at least it can handle changes in the enum type.)

The assertions that we wrote up to now for the internal SFR values, together with the covers, give us a very solid base. They provide enough guidance to implement the design properly and to avoid many non-trivial bugs. If we had high confidence in our designer and a good testing setup at the integration level we could probably stop here. If we were to go for the gold and verify the design completely we would still need a few more properties.

Since we're doing gray-box verification, we should also check that the SFR contents are delivered on the read data bus when a bus read is performed.

When working with the bus data, it would be really annoying to have to fiddle with bit positions to figure out what shape or operation we are getting as a result of the read. To help us extract the fields, we can define a struct with the same layout as the CTRL register:

typedef struct packed {
  bit [12:0] reserved1;
  bit [2:0] SHAPE;
  bit [8:0] reserved0;
  bit [6:0] OPERATION;
} ctrl_sfr_reg;

We can convert the read data to this type, for easy access to the fields using the . operator:

ctrl_sfr_reg read_data_as_ctrl_sfr;
assign read_data_as_ctrl_sfr = read_data;

To make the code that uses them shorter, we can define some let statements for the read values of the fields, like we did for the internal SFR:

let shape_on_read_bus = read_data_as_ctrl_sfr.SHAPE;
let operation_on_read_bus = read_data_as_ctrl_sfr.OPERATION;

Now we can write an assertion for each of the fields:

shape_delivered_as_read_data: assert property (
    read |-> shape_on_read_bus == shape_in_sfr);

operation_delivered_as_read_data: assert property (
    read |-> operation_on_read_bus == operation_in_sfr);

We haven't said anything about the behavior of the KEEP_SHAPE mode. For this, we'll have to look at the data being written on the bus.

Before we look at the write data, though, we can write another assertion. We can check that the SFR is only ever updated as a result of a bus write:

sfr_constant_if_no_write: assert property (
    !write |=> $stable(shape_processor.ctrl_sfr));

If the DUT were to change the value of the SFR when, for example, no access is taking place, this assertion would fail. This is a very powerful check, which is also very simple to write. It's pretty typical to find bugs where the RTL code doesn't properly sample the bus control signals.

We can do the same trick for the write data that we did with the read data, in order to get the values of the fields:

ctrl_sfr_reg write_data_as_ctrl_sfr;
assign write_data_as_ctrl_sfr = write_data;

let shape_on_write_bus = write_data_as_ctrl_sfr.SHAPE;
let operation_on_write_bus = write_data_as_ctrl_sfr.OPERATION;

Now we can check that when we try to write KEEP_SHAPE, the value of the DUT's SFR doesn't change:

shape_constant_if_keep_shape_write: assert property (
    write && shape_on_write_bus == KEEP_SHAPE |=>
        $stable(shape_in_sfr));

We can do the same for KEEP_OPERATION:

operation_constant_if_keep_operation_write: assert property (
    write && operation_on_write_bus == KEEP_OPERATION  |=>
        $stable(operation_in_sfr));

The previous assertions don't say anything about what should happen with the other field whenever a KEEP_* is written. For example, a DUT that completely ignores a write that contains KEEP_SHAPE and keeps the value of OPERATION as well will also satisfy the assertions. We can write a cover property to see that the DUT can update the value of OPERATION in such cases:

operation_updated_when_keep_shape: cover property (
    write && shape_on_write_bus == KEEP_SHAPE ##1 $changed(operation_in_sfr));

This cover will fail if the DUT completely ignores writes with KEEP_SHAPE.

Naturally, we can write a similar cover property for KEEP_OPERATION:

shape_updated_when_keep_operation: cover property (
    write && operation_on_write_bus == KEEP_OPERATION ##1 $changed(shape_in_sfr));

The properties we have don't fully specify what should happen whenever we try to write a reserved value to one of the fields. A DUT that treats a write with a reserved SHAPE value the same way it treats a KEEP_SHAPE will also pass all checks. To solve this, we can add the following assertions:

sfr_constant_if_reserved_shape_write: assert property (
    write && is_reserved_shape(shape_on_write_bus) |=>
        $stable(shape_processor.ctrl_sfr));

sfr_constant_if_reserved_operation_write: assert property (
    write && is_reserved_operation(operation_on_write_bus) |=>
        $stable(shape_processor.ctrl_sfr));

With these assertions, we make sure that writes that contain reserved values are completely ignored.

It's becoming more and more difficult to find a DUT that fulfills these properties, but doesn't fulfill the specification. The kinds of faulty implementations that these properties won't catch are starting to look like intentional mistakes. We've reached another point where we could probably just stop, but still be able to get a good night's sleep, knowing that the design is in very good shape. We could also write a few more properties and leave no stone unturned where bugs could hide.

Let's recall that whenever an illegal combination of SHAPE and OPERATION values is written, the DUT is supposed to ignore it. If, however, the DUT were to write some default combination into the SFR in such a case, for example the reset value of (CIRCLE, PERIMETER), then none of the properties we have would fail. This doesn't sound like the kind of mistake a designer would make when building the design from scratch. It might be an issue if a previous version of the specification used to require this behavior, in which case it would be possible to overlook the new requirement.

Writing a single assertion for this might get a bit more complicated because of the KEEP_* values, but we can split the problem into multiple cases.

First, we should check that illegal combinations of proper modes are ignored:

sfr_constant_if_illegal_combination_write_of_proper_modes: assert property (
    write && shape_on_write_bus != KEEP_SHAPE && operation_on_write_bus != KEEP_OPERATION
        && !is_legal_combination(shape_on_write_bus, operation_on_write_bus) |=>
            $stable(shape_processor.ctrl_sfr));

The fact that we already have the is_legal_combination(...) function which can operate on proper modes will nudge us into thinking about this behavior as an own case.

The next case we have is when we are writing KEEP_SHAPE, but the resulting combination would be illegal. This can also be implemented using the is_legal_combination(...) function, but instead of taking the bus value for the shape, we take the value in the SFR:

sfr_constant_if_illegal_combination_write_of_keep_shape: assert property (
    write && shape_on_write_bus == KEEP_SHAPE
        && !is_legal_combination(shape_in_sfr, operation_on_write_bus) |=>
            $stable(shape_processor.ctrl_sfr));

Of course, we have to do the same for KEEP_OPERATION:

sfr_constant_if_illegal_combination_write_of_keep_operation: assert property (
    write && operation_on_write_bus == KEEP_OPERATION
        && !is_legal_combination(shape_on_write_bus, operation_in_sfr) |=>
            $stable(shape_processor.ctrl_sfr));

Now we can prove that the DUT always ignores illegal writes, regardless of the reason (either reserved value or illegal combination), as we have properties for each of the cases.

I'm having a really difficult time coming up with a faulty implementation that still passes the current set of properties. The best I can come up with at this point is a design that somehow writes a "twisted" version of the data given to it: instead of writing CIRCLE, it writes RECTANGLE and vice-versa, but only when also trying to write PERIMETER (otherwise we might get illegal combination in the SFR).

For this design maybe it isn't that important to check that writes properly update the SFR fields, as there isn't much that much room for error. For a design where one of the shapes can be disabled via a separate SFR bit I could see it becoming very important. Let's imagine, for example, that the CIRCLE shape could be disabled depending on whether the pi computation unit is powered off. If this design also has a bug where it disables the TRIANGLE shape as well, we would not be able to find this if we only cover that the SHAPE field can be TRIANGLE. This is because the tool could always show us such traces when the pi computation unit is turned on.

To weed out any remaining pathological design behaviors we can write one more pair of assertions: that legal writes, that are supposed to update the SFR, do so properly for each of the fields. Here's what the assertion would look like for SHAPE:

legal_write_data_written_to_shape: assert property (
    write && is_legal_ctrl_write_data() |=>
        shape_in_sfr == $past(shape_on_write_bus));

While we were able to write multiple assertions when dealing with the cases where the write had to be ignored, in this case we have to write one assertion. We have to model what it means for the write data to be legal and this means describing all the properties required of it. None of the fields are allowed to contain reserved values and the combination of fields is supposed to be legal:

function bit is_legal_ctrl_write_data();
  return !is_reserved_shape(shape_on_write_bus)
      && !is_reserved_operation(operation_on_write_bus)
      && is_legal_ctrl_write_data_combination();
endfunction

The combination being legal is defined differently, depending on whether we are dealing with a KEEP_* or with a proper mode:

function bit is_legal_ctrl_write_data_combination();
  if (shape_on_write_bus == KEEP_SHAPE)
    return is_legal_combination(shape_in_sfr, operation_on_write_bus);
  if (operation_on_write_bus == KEEP_OPERATION)
    return is_legal_combination(shape_on_write_bus, operation_in_sfr);
  return is_legal_combination(shape_on_write_bus, operation_on_write_bus);
endfunction

If we try to prove this assertion, we'll notice that it fails. It will complain that it can't write KEEP_SHAPE into the SFR. This is because KEEP_SHAPE is not a value that leads to an update of the SHAPE field. We have to exclude this case and then the assertion will pass:

legal_write_data_written_to_shape: assert property (
    write && is_legal_ctrl_write_data() && shape_on_write_bus != KEEP_SHAPE |=>
        shape_in_sfr == $past(shape_on_write_bus));

Now imagine that the design would have contained the same bug. We could have let something slip through, since this assertion would have passed. Fortunately, even if this were the case, we still have the assertions on the internal SFR, which disallow KEEP_SHAPE as a value. This is the power of thinking about the core requirements of the design!

It's not that we wouldn't be able to do something similar in simulation. We could very well write checks on the internal SFR, alongside our register modeling which describes which writes are legal and which are not. It's just that we probably wouldn't, because our thinking is very focused on describing cause/effect relationships. We would probably stop after finishing with our register modeling and move on to the next thing.

For completeness, we also need a property for the OPERATION field:

legal_write_data_written_to_operation: assert property (
    write && is_legal_ctrl_write_data() && operation_on_write_bus != KEEP_OPERATION |=>
        operation_in_sfr == $past(operation_on_write_bus));

Before we end, I would like to revisit illegal writes that are supposed to be ignored. Now that we have a is_legal_ctrl_write_data() function, we could use it to write a single assertion to handle all cases of illegal writes, which are supposed to be ignored:

sfr_constant_if_illegal_write: assert property (
    write && !is_legal_ctrl_write_data() |=>
        $stable(shape_processor.ctrl_sfr));

This could replace the other assertions that we have that handle each illegal case separately.

Depending on the situation, it might be more advantageous to have either a single assertion for a requirement or to split it into multiple assertions. Multiple assertions are easier to debug and might be easier for the tool to prove (at least for big designs), because we are asking it to give us answers to slightly easier questions. Writing a single assertion has the advantage that it requires less code. It's easier to handle changes to the requirements when there are less places to update.

For this design I would go with the single property.

Conclusions

We've seen how formal verification nudges us to think about requirements, not about how to implement a testbench that is a faithful representation of the design.

We made our properties very readable and modular, which allowed us to tackle the verification task in incremental chunks. We should strive to avoid replicating the design and to make the properties read like sentences in the specification. In the end, our property checking environment doesn't have many more lines of code than the design itself.

We also saw that formal verification doesn't have to be an "all or nothing" proposition. Depending on our level of risk aversion we could decide to live with only a sub-set of checks, but still get high design quality. We can focus on the things that are truly important or difficult and not spend too much effort on the parts that are unlikely to have bugs.

Stay tuned for part 2, where we'll build a UVM testbench for the same design.

Sunday, March 29, 2020

Favor Composition Over Inheritance - Even for Constraints

Simulation is currently the dominant functional verification technique, with constrained random verification the most widely used methodology. While producing random data is a big part of it, letting the solver blindly generate stimulus isn't going to be very efficient. Constraints are needed to guide the stimulus toward interesting scenarios.

A good constrained random test suite contains a mixture of tests with varying degrees of randomness. This is achieved by progressively adding constraints to tests to reduce their randomness. This is best explained with an example.

Let's assume we're verifying a device that can handle read and write accesses to locations in its address map. These accesses can either be done in secure mode or in non-secure mode. We model an access using a UVM sequence item:

class sequence_item extends uvm_sequence_item;

  typedef enum {
    READ,
    WRITE
  } direction_e;

  typedef enum {
    SECURE,
    NONSECURE
  } sec_mode_e;

  rand direction_e direction;
  rand bit [31:0] address;
  rand sec_mode_e sec_mode;

  // ...
endclass

Not all accesses are legal and illegal accesses would be rejected by the device.

Only certain address ranges are mapped, while accesses to unmapped addresses are illegal. If we were to write a test that only accesses mapped addresses, we would have to add the following constraints to generated items:

constraint only_mapped_addresses {
  address inside {
      [CODE_START_ADDR:CODE_END_ADDR],
      [SRAM_START_ADDR:SRAM_END_ADDR],
      [PERIPHERAL_START_ADDR:PERIPHERAL_END_ADDR] };
}

Our device also only allows writes to aligned addresses. For a 32-bit bus, this would mean that the lowest two address bits have to be 0:

constraint only_writes_to_aligned_addresses {
  direction == WRITE;
  address[1:0] == 0;
}

Lastly, certain ranges of our device's address map are restricted to secure code. Let's assume that the address map is split into 16 regions of 256 MB each. Within each of these regions, the lower half is reserved for secure accesses. This means that bit 27 of the address is always 0 for a secure access:

constraint only_secure_accesses_to_lower_half_of_range {
  sec_mode == SECURE;
  address[27] == 0;
}

The test suite for this device would contain a random test where unconstrained items are generated. One test would be directed toward generating accesses to mapped addresses, another test would only perform writes to aligned addresses, while another test would perform only secure accesses. At the same time, we would also need tests that lie at the intersection of the three features, so we would want tests that do pairwise combinations: aligned writes to mapped addresses, aligned writes in secure mode and secure access to mapped addresses. Finally, we also need a test that combines all three and only does secure writes to mapped addresses.

(While a real test suite would definitely need a lot more classes of tests, this post isn't focused on verification planning, but on the mechanical aspects of implementing a robust constraint management strategy, so please ignore the simplicity of the example.)

It might be the case that these behaviors will get tweaked over time as the project moves forward or as a new derivative of the device is developed. The address map might change as regions are moved, removed or resized, or new regions are added. The bus width might change, which would change which addresses are aligned, or we could get a feature request to implement writes of other granularities (e.g. half-word). The definition of secure regions could also change or they could become configurable via special function registers. Any of these changes should be easy to handle and shouldn't require massive changes to the verification code.

Let's skip the obvious idea of putting all constraints into the sequence item class and activating/deactivating them selectively based on the test. This won't scale for real projects, where we would have many more constraints, which would make the code unreadable.

Using mixins

Quite a while back I wrote about how to use mixins to manage constraints.

The mixin approach is flexible, because it allows us to handle each aspect individually. Instead of having all constraints in a single class, we can have one mixin for each constrained feature.

We need one for mapped addresses:

class only_mapped_addresses_mixin #(type T = sequence_item) extends T;

  constraint only_mapped_addresses {
    address inside {
        [CODE_START_ADDR:CODE_END_ADDR],
        [SRAM_START_ADDR:SRAM_END_ADDR],
        [PERIPHERAL_START_ADDR:PERIPHERAL_END_ADDR] };
  }

  // ...
endclass

We also need one for writes:

class only_legal_writes_mixin #(type T = sequence_item) extends T;

  constraint only_writes_to_aligned_addresses {
    direction == WRITE;
    address[1:0] == 0;
  }

  // ...
endclass

Finally, we need a mixin for secure accesses:

class only_legal_secure_accesses_mixin #(type T = sequence_item) extends T;

  constraint only_secure_accesses_to_lower_half_of_range {
    sec_mode == SECURE;
    address[27] == 0;
  }

  // ...
endclass

Assuming that we have a random test that starts regular sequence items, we would use these mixins to write our more directed tests by replacing the original sequence item type with one with constraints mixed in.

The test that only accesses mapped addresses would do the following factory override:

class test_mapped_addresses extends test_all_random;

  protected virtual function void set_factory_overrides();
    sequence_item::type_id::set_type_override(
        only_mapped_addresses_mixin #(sequence_item)::get_type());
  endfunction

  // ...
endclass

The other two feature tests would look similar, but would use their respective mixins.

To only perform writes to mapped addresses we would need to chain the two mixins:

class test_legal_writes_to_mapped_addresses extends test_all_random;

  protected virtual function void set_factory_overrides();
    sequence_item::type_id::set_type_override(
        only_legal_writes_mixin #(only_mapped_addresses_mixin #(sequence_item))::get_type());
  endfunction

  // ...
endclass

Of course, we would do the same to handle the other two pairs.

Similarly, we could use the same principle to combine all three features:

class test_legal_writes_to_mapped_addresses_in_secure_mode extends test_all_random;

  protected virtual function void set_factory_overrides();
    sequence_item::type_id::set_type_override(
        only_legal_writes_mixin #(
            only_mapped_addresses_mixin #(
                only_legal_secure_accesses_mixin #(sequence_item)))::get_type());
  endfunction

  // ...
endclass

The mixin approach comes with some issues, though.

Constraints are always polymorphic, so we have to be very careful to use unique constraint names across all mixins. Applying two different mixins that use the same constraint name would result in only the outer mixin's constraints being applied, because it would override the constraint defined in the inner mixin. It's very easy to run into this issue when using copy/paste to define a new mixin and forgetting to change the name of the constraint. Frustration will follow, as the code looks right, but leads to unexpected results. Moreover, the more mixins are used in the code base, the easier it is for constraint name collisions to happen.

Chaining of mixins is not particularly readable. It is bearable for one or two levels, but the more levels there are, the worse it's going to get.

Finally, using mixins will cause the number of types in our code to explode. Each mixin on top of a class will create a new type. From a coding standpoint this isn't such a big deal, as we won't be referencing those types directly. The more types we have, though, the longer our compile times are going to get. Also, note that for the compiler mixin1 #(mixin2 #(some_class)) is a distinct type from mixin2 #(mixin1 #(some_class)), regardless if it results in the "same" class. It's very easy to use mixin1 #(mixin2 #(some_class)) in one test, but use mixin3 #(mixin2 #(mixin1 #(some_class))) in another, which would make the compiler "see" an extra type.

The mixin pattern uses inheritance, which doesn't match the call to action in the post title, so obviously we're not going to stop here.

Using aspect oriented programming

It's much easier to write our test family using aspect oriented programming (AOP). AOP allows us to alter the definition of a class from a different file. Even though SystemSystemverilog doesn't support AOP, I'd still like to show an example in e, as it can provide us with some hints into how we could improve the mixin-base solution.

(Please note that the following code may not be idiomatic, so don't take it as a reference on how to handle constraints in e.)

Our sequence item definition would look similar:

<'
struct sequence_item like any_sequence_item {

  direction: direction_e;
  address: uint(bits: 32);
  sec_mode: sec_mode_e;

};
'>

In our test that only does mapped accesses, we would tell the compiler to add the constraint to the sequence item:

<'
import test_all_random;

extend sequence_item {
  keep address in [CODE_START_ADDR..CODE_END_ADDR] or
      address in [SRAM_START_ADDR..SRAM_END_ADDR] or
      address in [PERIPHERAL_START_ADDR..PERIPHERAL_END_ADDR];
};
'>

This does not result in a new type. It tweaks the existing sequence_item type for the duration of that test.

If we would like to reuse the constraint in the test that only writes to mapped addresses, we could put the extension into its own file. We could do the same for the other extensions that handle the other features. This would allow each test to load the relevant extension files. For example, for legal writes to mapped addresses we would have:

<'
import test_all_random;
import constraints/only_legal_writes;
import constraints/only_mapped_addresses;
'>

The file structure is similar to what we had when we used mixins, but the code is much cleaner.

Pay special attention to the natural language description of what we are doing: in test_mapped_addresses we are adding the constraint to the sequence_item type.

Using constraint objects

Regular object oriented programming doesn't allow us to change type definitions. What we can do, however, is build our code in such a way as to allow it to be extended when it is being used.

Back in 2015, there was an exciting DVCon paper that presented how to add constraints using composition. It showed how to add additional constraints to an instance of an object without changing the type of that object. This is done by encapsulating the constraints into their own objects which extend the behavior of the original object's randomize() function. Have a quick look at the paper before proceeding, to understand the exact way this is done.

While the paper shows how to add constraints to object instances, we can extend the approach to add constraints globally, to all instances of a type. If we look back at the AOP case from before, this would be conceptually similar to what we were doing there. We would be emulating the addition of constraints to the sequence_item type.

The paper makes an attempt at global constraints in its final section, by using the UVM configuration DB. While that approach works, I feel that it is not expressive enough. A better API, consisting of a static function to add constraints globally, would make the code much more readable than a very verbose config DB set(...) call.

To get the extensibility we want, we have to set up the necessary infrastructure for it. If the sequence item class is under our control, we can modify it directly. Alternatively, if the sequence item is part of an external UVC package, we can define a sub-class which contains the necessary code.

We'll assume that sequence_item can't be changed and we'll create a new constrained_sequence_item class. We would either use this sub-class in our sequences directly or use a factory override.

To execute code that affects all instances, the sequence item class needs a static function through which constraints are added:

class constrained_sequence_item extends sequence_item;

  static function void add_global_constraint(abstract_constraint c);
    // ...
  endfunction

  // ...
endclass

The abstract_constraint class would be the base class for our constraints and would provide us with a reference to the object that is being randomized:

virtual class abstract_constraint;

  protected sequence_item object;

  function void set_object(sequence_item object);
    this.object = object;
  endfunction

endclass

The code to handle global constraints is similar to the one presented in the paper. We store all global constraints in a static array:

class constrained_sequence_item extends sequence_item;

  local static rand abstract_constraint global_constraints[$];

  static function void add_global_constraint(abstract_constraint c);
     global_constraints.push_back(c);
  endfunction

  // ...
endclass

Before randomizing a sequence item instance, we have to set up the constraint objects to point to it:

class constrained_sequence_item extends sequence_item;

  function void pre_randomize();
    foreach (global_constraints[i])
      global_constraints[i].set_object(this);
  endfunction

  // ...
endclass

With the infrastructure set up, we can move on. We encapsulate the constraints for our features into their own constraint classes:

class only_mapped_addresses_constraint extends abstract_constraint #(sequence_item);

  constraint c {
    object.address inside {
        [CODE_START_ADDR:CODE_END_ADDR],
        [SRAM_START_ADDR:SRAM_END_ADDR],
        [PERIPHERAL_START_ADDR:PERIPHERAL_END_ADDR] };
  }

endclass
class only_legal_writes_constraint extends abstract_constraint #(sequence_item);

  constraint c {
    object.direction == sequence_item::WRITE;
    object.address[1:0] == 0;
  }

endclass
class only_legal_secure_accesses_constraint extends abstract_constraint #(sequence_item);

  constraint c {
    object.sec_mode == sequence_item::SECURE;
    object.address[27] == 0;
  }

endclass

In the test that only accesses mapped addresses we would make sure to add the required constraints:

class test_mapped_addresses extends test_all_random;

  protected virtual function void add_constraints();
    only_mapped_addresses_constraint c = new();
    constrained_sequence_item::add_global_constraint(c);
  endfunction

  // ...
endclass

The add_constraints() function should be called before any sequence items are started. A good place to call it from is the end_of_elaboration_phase(...) function.

In the other feature oriented tests we would simply add their respective constraints.

For the test that does writes to mapped addresses we just need to make sure that both constraints are added. We could do this by extending the random test and making two add_global_constraint(...) calls, one for each constraint object:

class test_legal_writes_to_mapped_addresses extends test_random;

  protected virtual function void add_constraints();
    only_legal_writes_constraint c0 = new();
    only_mapped_addresses_constraint c1 = new();
    constrained_sequence_item::add_global_constraint(c0);
    constrained_sequence_item::add_global_constraint(c1);
  endfunction

  // ...
endclass

We could also extend the test that only does legal writes and add the constraints for mapped addresses:

class test_legal_writes_to_mapped_addresses extends test_legal_writes;

  protected virtual function void add_constraints();
    only_mapped_addresses_constraint c = new();
    super.add_constraints();
    constrained_sequence_item::add_global_constraint(c);
  endfunction

  // ...
endclass

Of course, this approach can be used to handle all combinations of constraints.

Adding constraints dynamically has the same advantages as the mixin approach we looked at earlier.

It doesn't suffer from the same readability issue, because we don't rely on long parameterization chains. It suffers from a bit too much verboseness due to the multiple add_global_constraint(...) calls, though this could be improved by adding a variant of the function that accepts a list of constraint objects.

This approach also avoids the type explosion issue that mixins have and is potentially faster to compile.

There is a bit of boilerplate code required for the infrastructure. This can be extracted into a reusable library.

The first thing we need to do is to make the abstract constraint class parameterizable:

virtual class abstract_constraint #(type T = int);

  protected T object;

  function void set_object(T object);
    this.object = object;
  endfunction

endclass

The package should expose a macro to handle the constraint infrastructure:

`define constraints_utils(TYPE) \
  static function void add_global_constraint(constraints::abstract_constraint #(TYPE) c); \
  // ...

There was a subtle issue with the simplistic infrastructure code we looked at before. It wasn't able to handle randomization of multiple instances at the same time (for example, when randomizing an array of sequence items). As this is a more exotic use case, the problem won't show up immediately. It's a simple fix to make, but it would be very annoying to have to make it in mutiple projects. Even when the code might look deceptively simple and have us think it's not worth the hassle to put into an own library, doing so makes it easier to implement and propagate fixes for such issues.

The macro makes the definition of constrained_sequence_item much cleaner:

class constrained_sequence_item extends sequence_item;

  `constraints_utils(sequence_item)

  // ...
endclass

You can find the constraints library on GitHub.

It already supports instance and global constraints. What I would like to add is the possibility to add constraints to all items under a UVM scope, similar to what the paper presents at the end, but using a nicer API that doesn't require the user to do any UVM config DB calls.

I also want to look at the possibility of magically generating all test combinations, given a list of constraint objects. Currently we had to enumerate all combinations of constraints by writing a test class for each, which is very repetitive. It would be great if we could get this automatically and save ourselves the typing. This is something I'll definitely look at in a future post.

In the meantime you can have a look at the full example code on GitHub and try it out yourselves. I hope it inspires you to write flexible test suites that help you reach you verification goals faster.

Sunday, February 9, 2020

Bigger Is Not Always Better: Builds Are Faster with Smaller Packages

One trend over the past few years is that the projects I've been working on tend to get bigger and more complicated. Bigger projects come with new challenges. Among these are the fact that it's much more difficult to keep the entire project in one's head, the need to synchronize with more developers because team sizes grow, a higher risk of having to re-write code because of poorly understood requirements or because some requirements change, and many more.

There's one thing, though, that crept up on me: compile times get much bigger. While this doesn't sound like a big deal, I've found that long build times are an absolute drain on productivity. I use SVUnit a lot, so I'm used to having a very short path between making a change and seeing the effects of that change. Ideally, there should be no delay between starting the test script and getting the result. A delay of a couple of seconds is tolerable. More than 10 seconds becomes noticeable. After exceeding the one minute mark, the temptation to switch to something else (like the Internet browser) becomes very high. This slowdown happens gradually, with each new class that is added, decreasing development speed.

In this post I'd like to talk about compilation. This topic has a tendency to be trivialized and underestimated, even more so in the hardware industry, where it's common to have design flows already set up to deal with this process.

Full vs. incremental builds

A build is the process of taking the source code and producing an executable object. When talking about builds, there are two terms we need to be familiar with: full builds and incremental builds. A full build is performed when there isn't any build output, which requires the entire source code to be built. This is either the case when starting in a new workspace (for example, after cloning the source repository) or after deleting the previous build output. An incremental build only builds the parts of the source code that have changed since the previous build. Because only parts of the project are rebuilt in this case, this means that, generally, the process is faster.

We'll limit our discussion about builds to SystemVerilog packages, though the same concepts also apply to modules and interfaces.

Let's say we have two packages, package0 and package1, which we use in our verification environment:

// File: package0.sv

package package0;

  class some_base_class;
  endclass

endpackage
// File: package1.sv

package package1;

  import package0::*;

  class some_derived_class extends some_base_class;
  endclass

endpackage

Compiling these two packages using an EDA tool is pretty straightforward:

compiler package0.sv package1.sv

The very first time we run this command, the tool will parse the two source files and generate the data structures it uses to represent the compiler output. Since we didn't have any build output when we ran the command, we were performing a full build.

If we add a new class to package1 and run the compiler again, we will be performing an incremental build:

package package1;

  import package0::*;

  class some_derived_class extends some_base_class;
  endclass

  class some_other_class;
  endclass

endpackage

The tool will only recompile package1. It won't touch package0, since it didn't change. If compilation for package0 takes a lot of time, this will saves us that time.

A deeper dive into SystemVerilog compilation

Before we continue with our main discussion, it makes sense to look a bit deeper into how SystemVerilog compilation works. Before I investigated this topic I had some misplaced ideas, which I would like to dispel.

I have only ever really looked at the behavior of one particular EDA tool, but I assume that other simulators behave similarly, as they all have a common heritage. Some SystemVerilog tools differentiate between compilation and elaboration. These defintions depend on the tool you're using. I've seen compilation used to mean parsing the code and generating syntax trees. Elaboration takes these syntax trees and generates executable code that is run in the simulator. I'll use the term compile to mean both of these steps.

Let's start small, with a single package that contains only one class:

package package0;

  class some_base_class;
  endclass

endpackage

After we compile the package, we will have performed a full build. Now, let's add another class to the package:

package package0;

  class some_base_class;
  endclass

  class some_base_class2;
  endclass

endpackage

In this case, you'll notice that the tool compiles both classes. I'm a bit cautious about posting the log files and how I can tell that it's compiling both classes. Some tools make this easier to see than others. One clear sign is that compile takes longer. You can try it out by adding more and more classes and recompiling. I've created some scripts that can help out with such experiments: https://github.com/verification-gentleman-blog/smaller-packages/.

In this case, an incremental compile takes about as much time as a full build, which suggests that nothing is being reused from previous build attempts. Even if we only add classes, the build output for previously built classes is discarded.

What did we learn from this? That tools only organize build output using packages as their most granular unit. Changes within packages are "lost", from an incremental build point of view.

You could argue that from the previous experiment we could infer that tools organize build output based on files. If we were to put each file in its own class and include them in the package, then the tool would be able to somehow behave differently. This isn't, the case, though. `include directives are handled by the pre-processor. It interleaves all of the files together and gives the compiler a big file with all the class definitions inline (the situation we had previously).

We can do another experiment to convince ourselves that builds aren't organized by files. Let's put two packages inside the same file:

package package0;
endpackage

package package1;
endpackage

Let's modify package1 by adding a new variable:

package package0;
endpackage

package package1;
  bit some_var;
endpackage

When rebuilding, we'll notice that only package1 gets rebuilt, but package0 is left alone. (This is also the behavior we would have liked to have for classes inside a package.)

Now let's also modify package0 by adding a variable to it:

package package0;
  bit some_var;
endpackage

package package1;
  bit some_var;
endpackage

When rebuilding, we'll see that package0 is being rebuilt, as we expected, but, surprisingly, so is package1. This is very confusing initially, but obvious once you know the explanation. Because we shifted the lines where package1 and its items are defined in the file, the tool has to update debug information regarding line numbers. This is important for debuggers and for messages that contain line numbers (like assertion erros, $info(...) calls, etc.). This, by the way, is a very good reason to only define one element (package, interface, module) per file.

Let's look at one more thing. Let's take two packages that have a dependency relationship:

package package0;

  class some_base_class;
  endclass

endpackage
package package1;

  import package0::*;

  class some_derived_class extends some_base_class;
  endclass

endpackage

It's clear that changes to package1 shouldn't (and indeed won't) cause rebuilds of package0. It's also clear that changing some_base_class will have to trigger a rebuild of package1. Now, let's add a new class to package0:

package package0;

  class some_base_class;
  endclass

  class some_base_class2;
  endclass

endpackage

At this point, we shouldn't be surprised anymore that both packages are rebuilt in this case. This is because the tool only understands changes at the package level. package1 depends on package0, so any change to package0 will lead to a rebuild of package1, regardless if this is really needed. Unfortunately, this isn't the behavior we would like to have.

Contrast the way SystemVerilog builds work to C++, where files are compiled individually and are linked together in a separate step (a gross over-simplifaction). Changes to one class don't cause recompiles of other classes in the same namespace, if the two classes are unrelated. This is because C++ classes are split between the header (which declares which functions a class provides) and the implementation (which contain the function bodies). A class that depends on another class includes its header, to let the compiler know that it relies on the other class. Only changes in a class's header cause recompiles of dependent classes, while changes to its implementation don't. Because of this setup, C++ builds are much more powerful when it comes to build avoidance, by only rebuilding the parts that they absolutely have to build. This allows for guidelines that incremental builds should take between 5-10 seconds and that full builds (including tests) should take between 1-2 minutes, according to http://www.bitsnbites.eu/faster-c-builds/, numbers which are incredibly low by SystemVerilog standards, where merely starting the simulator takes double digit numbers of seconds.

Case study

The classic testbench structure for an IP block consists of one or more interface verification components (IVCs), that contain code related to the signal protocols used by the design, and one module verification component (MVC), that contains code for aspects related to the design functionality.

IVCs typically consist of a package and one or more interfaces. We don't usually make changes to the IVCs, so once we've built them via a full build, they won't have any impact on subsequent incremental builds.

Most of our work is focused on the MVC. As we've seen above, if we place our MVC code into one package, then any change we make to it will trigger a new build, because of the way SystemVerilog tools handle incremental builds. This isn't going to be very efficient, as an incremental build of the package after each change will take about as long as a full build.

What would happen if we could split our big MVC package into multiple smaller packages?

It's experiment time again! We'll assume that we can split the code such that building each package takes the same amount of time. We'll also ignore any extra costs from building multiple packages instead of one single package. This means that if an incremental build of the entire mvc package would have taken N seconds, then by splitting it into P packages each of the smaller packages would take N/P seconds to build. We'll also assume that we are just as likely to make changes to any of the smaller packages. This means that the probablity to change any package is 1/P.

Let's assume that we can create two independent packages, p0 and p1. We can misuse UML to visualize the package topology:

open the post in a browser to display images

Any change we make to p0 won't cause rebuilds of p1 and vice-versa. We can compute the average incremental build time in this case. Building any of the packages takes N/2 seconds, but we do it only half of the time (since in the other half we change the other package). The average incremental build time is the mean: N/2 * 1/2 + N/2 * 1/2 = N/2. By splitting the code into two independent packages, we've managed to half our incremental build time. It's not very realistic, though, that we could manage to do such a split on a real project.

Let's have a look at something closer to reality. Let's assume that we can split our MVC into two packages, p0 and p1, but p1 depends on p0:

open the post in a browser to display images

An incremental build of p1 would still take only N/2 seconds, because changing anything in p1 doesn't have any effect on p0. A change in p0 would mean that we also have to rebuild p1, which means that it would take N/2 + N/2 = N seconds. On average, we would need N/2 * 1/2 + N * 1/2 = 3/4 * N seconds.

We should try to structure our code in such a way as to increase the number of packages without any dependencies to each other. Let's say we can split p1 from the previous example into two independent packages, p1_0 and p1_1:

open the post in a browser to display images

In this case, changing anything in either p1_0 or p1_1 would take N/3 seconds. A change in p0 would require all three packages to be rebuilt and would take the full N seconds. On average, a change would take N/3 * 1/3 + N/3 * 1/3 + N * 1/3 = 7/9 * N seconds.

We could go on further with deeper package hierarchies, but I think you get the idea.

MVC code lends itself nicely to such a structure. We typically have some "common" code that models the basics of our DUT, from which we can model different higher level aspects, relating to the features of the DUT. We would use our models inside checks or coverage, which could be built independently from each other:

open the post in a browser to display images

Conclusions

Splitting code across multiple packages will generally be better for compilation speed. There are also other advantages. It could make the code base easier to understand, by grouping code by theme (code for X goes in package p_x, code for Y goes in package p_y). It could also make development easier, by allowing developers to specialize in only a handful of the packages, instead of having to deal with the entire code base.

Having to manage multiple packages brings its own set of challenges, though. It could make the code base more difficult to understand if the boundaries between packages are arbitrary (where does code for X go, in p0 or p1?). More packages, especially when they have intricate dependency relationships, also make compilation more difficult to set up.

I'm not going to recommend making one package per class, just to improve build times. Ideally, SystemVerilog compilers should evolve to better handle incremental compilation, by working at levels lower than just packages. At the same time, you should care about turnaround time, so dumping all code into one package shouldn't be your default mode of operation.