Sunday, March 29, 2020

Favor Composition Over Inheritance - Even for Constraints

Simulation is currently the dominant functional verification technique, with constrained random verification the most widely used methodology. While producing random data is a big part of it, letting the solver blindly generate stimulus isn't going to be very efficient. Constraints are needed to guide the stimulus toward interesting scenarios.

A good constrained random test suite contains a mixture of tests with varying degrees of randomness. This is achieved by progressively adding constraints to tests to reduce their randomness. This is best explained with an example.

Let's assume we're verifying a device that can handle read and write accesses to locations in its address map. These accesses can either be done in secure mode or in non-secure mode. We model an access using a UVM sequence item:

class sequence_item extends uvm_sequence_item;

  typedef enum {
    READ,
    WRITE
  } direction_e;

  typedef enum {
    SECURE,
    NONSECURE
  } sec_mode_e;

  rand direction_e direction;
  rand bit [31:0] address;
  rand sec_mode_e sec_mode;

  // ...
endclass

Not all accesses are legal and illegal accesses would be rejected by the device.

Only certain address ranges are mapped, while accesses to unmapped addresses are illegal. If we were to write a test that only accesses mapped addresses, we would have to add the following constraints to generated items:

constraint only_mapped_addresses {
  address inside {
      [CODE_START_ADDR:CODE_END_ADDR],
      [SRAM_START_ADDR:SRAM_END_ADDR],
      [PERIPHERAL_START_ADDR:PERIPHERAL_END_ADDR] };
}

Our device also only allows writes to aligned addresses. For a 32-bit bus, this would mean that the lowest two address bits have to be 0:

constraint only_writes_to_aligned_addresses {
  direction == WRITE;
  address[1:0] == 0;
}

Lastly, certain ranges of our device's address map are restricted to secure code. Let's assume that the address map is split into 16 regions of 256 MB each. Within each of these regions, the lower half is reserved for secure accesses. This means that bit 27 of the address is always 0 for a secure access:

constraint only_secure_accesses_to_lower_half_of_range {
  sec_mode == SECURE;
  address[27] == 0;
}

The test suite for this device would contain a random test where unconstrained items are generated. One test would be directed toward generating accesses to mapped addresses, another test would only perform writes to aligned addresses, while another test would perform only secure accesses. At the same time, we would also need tests that lie at the intersection of the three features, so we would want tests that do pairwise combinations: aligned writes to mapped addresses, aligned writes in secure mode and secure access to mapped addresses. Finally, we also need a test that combines all three and only does secure writes to mapped addresses.

(While a real test suite would definitely need a lot more classes of tests, this post isn't focused on verification planning, but on the mechanical aspects of implementing a robust constraint management strategy, so please ignore the simplicity of the example.)

It might be the case that these behaviors will get tweaked over time as the project moves forward or as a new derivative of the device is developed. The address map might change as regions are moved, removed or resized, or new regions are added. The bus width might change, which would change which addresses are aligned, or we could get a feature request to implement writes of other granularities (e.g. half-word). The definition of secure regions could also change or they could become configurable via special function registers. Any of these changes should be easy to handle and shouldn't require massive changes to the verification code.

Let's skip the obvious idea of putting all constraints into the sequence item class and activating/deactivating them selectively based on the test. This won't scale for real projects, where we would have many more constraints, which would make the code unreadable.

Using mixins

Quite a while back I wrote about how to use mixins to manage constraints.

The mixin approach is flexible, because it allows us to handle each aspect individually. Instead of having all constraints in a single class, we can have one mixin for each constrained feature.

We need one for mapped addresses:

class only_mapped_addresses_mixin #(type T = sequence_item) extends T;

  constraint only_mapped_addresses {
    address inside {
        [CODE_START_ADDR:CODE_END_ADDR],
        [SRAM_START_ADDR:SRAM_END_ADDR],
        [PERIPHERAL_START_ADDR:PERIPHERAL_END_ADDR] };
  }

  // ...
endclass

We also need one for writes:

class only_legal_writes_mixin #(type T = sequence_item) extends T;

  constraint only_writes_to_aligned_addresses {
    direction == WRITE;
    address[1:0] == 0;
  }

  // ...
endclass

Finally, we need a mixin for secure accesses:

class only_legal_secure_accesses_mixin #(type T = sequence_item) extends T;

  constraint only_secure_accesses_to_lower_half_of_range {
    sec_mode == SECURE;
    address[27] == 0;
  }

  // ...
endclass

Assuming that we have a random test that starts regular sequence items, we would use these mixins to write our more directed tests by replacing the original sequence item type with one with constraints mixed in.

The test that only accesses mapped addresses would do the following factory override:

class test_mapped_addresses extends test_all_random;

  protected virtual function void set_factory_overrides();
    sequence_item::type_id::set_type_override(
        only_mapped_addresses_mixin #(sequence_item)::get_type());
  endfunction

  // ...
endclass

The other two feature tests would look similar, but would use their respective mixins.

To only perform writes to mapped addresses we would need to chain the two mixins:

class test_legal_writes_to_mapped_addresses extends test_all_random;

  protected virtual function void set_factory_overrides();
    sequence_item::type_id::set_type_override(
        only_legal_writes_mixin #(only_mapped_addresses_mixin #(sequence_item))::get_type());
  endfunction

  // ...
endclass

Of course, we would do the same to handle the other two pairs.

Similarly, we could use the same principle to combine all three features:

class test_legal_writes_to_mapped_addresses_in_secure_mode extends test_all_random;

  protected virtual function void set_factory_overrides();
    sequence_item::type_id::set_type_override(
        only_legal_writes_mixin #(
            only_mapped_addresses_mixin #(
                only_legal_secure_accesses_mixin #(sequence_item)))::get_type());
  endfunction

  // ...
endclass

The mixin approach comes with some issues, though.

Constraints are always polymorphic, so we have to be very careful to use unique constraint names across all mixins. Applying two different mixins that use the same constraint name would result in only the outer mixin's constraints being applied, because it would override the constraint defined in the inner mixin. It's very easy to run into this issue when using copy/paste to define a new mixin and forgetting to change the name of the constraint. Frustration will follow, as the code looks right, but leads to unexpected results. Moreover, the more mixins are used in the code base, the easier it is for constraint name collisions to happen.

Chaining of mixins is not particularly readable. It is bearable for one or two levels, but the more levels there are, the worse it's going to get.

Finally, using mixins will cause the number of types in our code to explode. Each mixin on top of a class will create a new type. From a coding standpoint this isn't such a big deal, as we won't be referencing those types directly. The more types we have, though, the longer our compile times are going to get. Also, note that for the compiler mixin1 #(mixin2 #(some_class)) is a distinct type from mixin2 #(mixin1 #(some_class)), regardless if it results in the "same" class. It's very easy to use mixin1 #(mixin2 #(some_class)) in one test, but use mixin3 #(mixin2 #(mixin1 #(some_class))) in another, which would make the compiler "see" an extra type.

The mixin pattern uses inheritance, which doesn't match the call to action in the post title, so obviously we're not going to stop here.

Using aspect oriented programming

It's much easier to write our test family using aspect oriented programming (AOP). AOP allows us to alter the definition of a class from a different file. Even though SystemSystemverilog doesn't support AOP, I'd still like to show an example in e, as it can provide us with some hints into how we could improve the mixin-base solution.

(Please note that the following code may not be idiomatic, so don't take it as a reference on how to handle constraints in e.)

Our sequence item definition would look similar:

<'
struct sequence_item like any_sequence_item {

  direction: direction_e;
  address: uint(bits: 32);
  sec_mode: sec_mode_e;

};
'>

In our test that only does mapped accesses, we would tell the compiler to add the constraint to the sequence item:

<'
import test_all_random;

extend sequence_item {
  keep address in [CODE_START_ADDR..CODE_END_ADDR] or
      address in [SRAM_START_ADDR..SRAM_END_ADDR] or
      address in [PERIPHERAL_START_ADDR..PERIPHERAL_END_ADDR];
};
'>

This does not result in a new type. It tweaks the existing sequence_item type for the duration of that test.

If we would like to reuse the constraint in the test that only writes to mapped addresses, we could put the extension into its own file. We could do the same for the other extensions that handle the other features. This would allow each test to load the relevant extension files. For example, for legal writes to mapped addresses we would have:

<'
import test_all_random;
import constraints/only_legal_writes;
import constraints/only_mapped_addresses;
'>

The file structure is similar to what we had when we used mixins, but the code is much cleaner.

Pay special attention to the natural language description of what we are doing: in test_mapped_addresses we are adding the constraint to the sequence_item type.

Using constraint objects

Regular object oriented programming doesn't allow us to change type definitions. What we can do, however, is build our code in such a way as to allow it to be extended when it is being used.

Back in 2015, there was an exciting DVCon paper that presented how to add constraints using composition. It showed how to add additional constraints to an instance of an object without changing the type of that object. This is done by encapsulating the constraints into their own objects which extend the behavior of the original object's randomize() function. Have a quick look at the paper before proceeding, to understand the exact way this is done.

While the paper shows how to add constraints to object instances, we can extend the approach to add constraints globally, to all instances of a type. If we look back at the AOP case from before, this would be conceptually similar to what we were doing there. We would be emulating the addition of constraints to the sequence_item type.

The paper makes an attempt at global constraints in its final section, by using the UVM configuration DB. While that approach works, I feel that it is not expressive enough. A better API, consisting of a static function to add constraints globally, would make the code much more readable than a very verbose config DB set(...) call.

To get the extensibility we want, we have to set up the necessary infrastructure for it. If the sequence item class is under our control, we can modify it directly. Alternatively, if the sequence item is part of an external UVC package, we can define a sub-class which contains the necessary code.

We'll assume that sequence_item can't be changed and we'll create a new constrained_sequence_item class. We would either use this sub-class in our sequences directly or use a factory override.

To execute code that affects all instances, the sequence item class needs a static function through which constraints are added:

class constrained_sequence_item extends sequence_item;

  static function void add_global_constraint(abstract_constraint c);
    // ...
  endfunction

  // ...
endclass

The abstract_constraint class would be the base class for our constraints and would provide us with a reference to the object that is being randomized:

virtual class abstract_constraint;

  protected sequence_item object;

  function void set_object(sequence_item object);
    this.object = object;
  endfunction

endclass

The code to handle global constraints is similar to the one presented in the paper. We store all global constraints in a static array:

class constrained_sequence_item extends sequence_item;

  local static rand abstract_constraint global_constraints[$];

  static function void add_global_constraint(abstract_constraint c);
     global_constraints.push_back(c);
  endfunction

  // ...
endclass

Before randomizing a sequence item instance, we have to set up the constraint objects to point to it:

class constrained_sequence_item extends sequence_item;

  function void pre_randomize();
    foreach (global_constraints[i])
      global_constraints[i].set_object(this);
  endfunction

  // ...
endclass

With the infrastructure set up, we can move on. We encapsulate the constraints for our features into their own constraint classes:

class only_mapped_addresses_constraint extends abstract_constraint #(sequence_item);

  constraint c {
    object.address inside {
        [CODE_START_ADDR:CODE_END_ADDR],
        [SRAM_START_ADDR:SRAM_END_ADDR],
        [PERIPHERAL_START_ADDR:PERIPHERAL_END_ADDR] };
  }

endclass
class only_legal_writes_constraint extends abstract_constraint #(sequence_item);

  constraint c {
    object.direction == sequence_item::WRITE;
    object.address[1:0] == 0;
  }

endclass
class only_legal_secure_accesses_constraint extends abstract_constraint #(sequence_item);

  constraint c {
    object.sec_mode == sequence_item::SECURE;
    object.address[27] == 0;
  }

endclass

In the test that only accesses mapped addresses we would make sure to add the required constraints:

class test_mapped_addresses extends test_all_random;

  protected virtual function void add_constraints();
    only_mapped_addresses_constraint c = new();
    constrained_sequence_item::add_global_constraint(c);
  endfunction

  // ...
endclass

The add_constraints() function should be called before any sequence items are started. A good place to call it from is the end_of_elaboration_phase(...) function.

In the other feature oriented tests we would simply add their respective constraints.

For the test that does writes to mapped addresses we just need to make sure that both constraints are added. We could do this by extending the random test and making two add_global_constraint(...) calls, one for each constraint object:

class test_legal_writes_to_mapped_addresses extends test_random;

  protected virtual function void add_constraints();
    only_legal_writes_constraint c0 = new();
    only_mapped_addresses_constraint c1 = new();
    constrained_sequence_item::add_global_constraint(c0);
    constrained_sequence_item::add_global_constraint(c1);
  endfunction

  // ...
endclass

We could also extend the test that only does legal writes and add the constraints for mapped addresses:

class test_legal_writes_to_mapped_addresses extends test_legal_writes;

  protected virtual function void add_constraints();
    only_mapped_addresses_constraint c = new();
    super.add_constraints();
    constrained_sequence_item::add_global_constraint(c);
  endfunction

  // ...
endclass

Of course, this approach can be used to handle all combinations of constraints.

Adding constraints dynamically has the same advantages as the mixin approach we looked at earlier.

It doesn't suffer from the same readability issue, because we don't rely on long parameterization chains. It suffers from a bit too much verboseness due to the multiple add_global_constraint(...) calls, though this could be improved by adding a variant of the function that accepts a list of constraint objects.

This approach also avoids the type explosion issue that mixins have and is potentially faster to compile.

There is a bit of boilerplate code required for the infrastructure. This can be extracted into a reusable library.

The first thing we need to do is to make the abstract constraint class parameterizable:

virtual class abstract_constraint #(type T = int);

  protected T object;

  function void set_object(T object);
    this.object = object;
  endfunction

endclass

The package should expose a macro to handle the constraint infrastructure:

`define constraints_utils(TYPE) \
  static function void add_global_constraint(constraints::abstract_constraint #(TYPE) c); \
  // ...

There was a subtle issue with the simplistic infrastructure code we looked at before. It wasn't able to handle randomization of multiple instances at the same time (for example, when randomizing an array of sequence items). As this is a more exotic use case, the problem won't show up immediately. It's a simple fix to make, but it would be very annoying to have to make it in mutiple projects. Even when the code might look deceptively simple and have us think it's not worth the hassle to put into an own library, doing so makes it easier to implement and propagate fixes for such issues.

The macro makes the definition of constrained_sequence_item much cleaner:

class constrained_sequence_item extends sequence_item;

  `constraints_utils(sequence_item)

  // ...
endclass

You can find the constraints library on GitHub.

It already supports instance and global constraints. What I would like to add is the possibility to add constraints to all items under a UVM scope, similar to what the paper presents at the end, but using a nicer API that doesn't require the user to do any UVM config DB calls.

I also want to look at the possibility of magically generating all test combinations, given a list of constraint objects. Currently we had to enumerate all combinations of constraints by writing a test class for each, which is very repetitive. It would be great if we could get this automatically and save ourselves the typing. This is something I'll definitely look at in a future post.

In the meantime you can have a look at the full example code on GitHub and try it out yourselves. I hope it inspires you to write flexible test suites that help you reach you verification goals faster.

Sunday, February 9, 2020

Bigger Is Not Always Better: Builds Are Faster with Smaller Packages

One trend over the past few years is that the projects I've been working on tend to get bigger and more complicated. Bigger projects come with new challenges. Among these are the fact that it's much more difficult to keep the entire project in one's head, the need to synchronize with more developers because team sizes grow, a higher risk of having to re-write code because of poorly understood requirements or because some requirements change, and many more.

There's one thing, though, that crept up on me: compile times get much bigger. While this doesn't sound like a big deal, I've found that long build times are an absolute drain on productivity. I use SVUnit a lot, so I'm used to having a very short path between making a change and seeing the effects of that change. Ideally, there should be no delay between starting the test script and getting the result. A delay of a couple of seconds is tolerable. More than 10 seconds becomes noticeable. After exceeding the one minute mark, the temptation to switch to something else (like the Internet browser) becomes very high. This slowdown happens gradually, with each new class that is added, decreasing development speed.

In this post I'd like to talk about compilation. This topic has a tendency to be trivialized and underestimated, even more so in the hardware industry, where it's common to have design flows already set up to deal with this process.

Full vs. incremental builds

A build is the process of taking the source code and producing an executable object. When talking about builds, there are two terms we need to be familiar with: full builds and incremental builds. A full build is performed when there isn't any build output, which requires the entire source code to be built. This is either the case when starting in a new workspace (for example, after cloning the source repository) or after deleting the previous build output. An incremental build only builds the parts of the source code that have changed since the previous build. Because only parts of the project are rebuilt in this case, this means that, generally, the process is faster.

We'll limit our discussion about builds to SystemVerilog packages, though the same concepts also apply to modules and interfaces.

Let's say we have two packages, package0 and package1, which we use in our verification environment:

// File: package0.sv

package package0;

  class some_base_class;
  endclass

endpackage
// File: package1.sv

package package1;

  import package0::*;

  class some_derived_class extends some_base_class;
  endclass

endpackage

Compiling these two packages using an EDA tool is pretty straightforward:

compiler package0.sv package1.sv

The very first time we run this command, the tool will parse the two source files and generate the data structures it uses to represent the compiler output. Since we didn't have any build output when we ran the command, we were performing a full build.

If we add a new class to package1 and run the compiler again, we will be performing an incremental build:

package package1;

  import package0::*;

  class some_derived_class extends some_base_class;
  endclass

  class some_other_class;
  endclass

endpackage

The tool will only recompile package1. It won't touch package0, since it didn't change. If compilation for package0 takes a lot of time, this will saves us that time.

A deeper dive into SystemVerilog compilation

Before we continue with our main discussion, it makes sense to look a bit deeper into how SystemVerilog compilation works. Before I investigated this topic I had some misplaced ideas, which I would like to dispel.

I have only ever really looked at the behavior of one particular EDA tool, but I assume that other simulators behave similarly, as they all have a common heritage. Some SystemVerilog tools differentiate between compilation and elaboration. These defintions depend on the tool you're using. I've seen compilation used to mean parsing the code and generating syntax trees. Elaboration takes these syntax trees and generates executable code that is run in the simulator. I'll use the term compile to mean both of these steps.

Let's start small, with a single package that contains only one class:

package package0;

  class some_base_class;
  endclass

endpackage

After we compile the package, we will have performed a full build. Now, let's add another class to the package:

package package0;

  class some_base_class;
  endclass

  class some_base_class2;
  endclass

endpackage

In this case, you'll notice that the tool compiles both classes. I'm a bit cautious about posting the log files and how I can tell that it's compiling both classes. Some tools make this easier to see than others. One clear sign is that compile takes longer. You can try it out by adding more and more classes and recompiling. I've created some scripts that can help out with such experiments: https://github.com/verification-gentleman-blog/smaller-packages/.

In this case, an incremental compile takes about as much time as a full build, which suggests that nothing is being reused from previous build attempts. Even if we only add classes, the build output for previously built classes is discarded.

What did we learn from this? That tools only organize build output using packages as their most granular unit. Changes within packages are "lost", from an incremental build point of view.

You could argue that from the previous experiment we could infer that tools organize build output based on files. If we were to put each file in its own class and include them in the package, then the tool would be able to somehow behave differently. This isn't, the case, though. `include directives are handled by the pre-processor. It interleaves all of the files together and gives the compiler a big file with all the class definitions inline (the situation we had previously).

We can do another experiment to convince ourselves that builds aren't organized by files. Let's put two packages inside the same file:

package package0;
endpackage

package package1;
endpackage

Let's modify package1 by adding a new variable:

package package0;
endpackage

package package1;
  bit some_var;
endpackage

When rebuilding, we'll notice that only package1 gets rebuilt, but package0 is left alone. (This is also the behavior we would have liked to have for classes inside a package.)

Now let's also modify package0 by adding a variable to it:

package package0;
  bit some_var;
endpackage

package package1;
  bit some_var;
endpackage

When rebuilding, we'll see that package0 is being rebuilt, as we expected, but, surprisingly, so is package1. This is very confusing initially, but obvious once you know the explanation. Because we shifted the lines where package1 and its items are defined in the file, the tool has to update debug information regarding line numbers. This is important for debuggers and for messages that contain line numbers (like assertion erros, $info(...) calls, etc.). This, by the way, is a very good reason to only define one element (package, interface, module) per file.

Let's look at one more thing. Let's take two packages that have a dependency relationship:

package package0;

  class some_base_class;
  endclass

endpackage
package package1;

  import package0::*;

  class some_derived_class extends some_base_class;
  endclass

endpackage

It's clear that changes to package1 shouldn't (and indeed won't) cause rebuilds of package0. It's also clear that changing some_base_class will have to trigger a rebuild of package1. Now, let's add a new class to package0:

package package0;

  class some_base_class;
  endclass

  class some_base_class2;
  endclass

endpackage

At this point, we shouldn't be surprised anymore that both packages are rebuilt in this case. This is because the tool only understands changes at the package level. package1 depends on package0, so any change to package0 will lead to a rebuild of package1, regardless if this is really needed. Unfortunately, this isn't the behavior we would like to have.

Contrast the way SystemVerilog builds work to C++, where files are compiled individually and are linked together in a separate step (a gross over-simplifaction). Changes to one class don't cause recompiles of other classes in the same namespace, if the two classes are unrelated. This is because C++ classes are split between the header (which declares which functions a class provides) and the implementation (which contain the function bodies). A class that depends on another class includes its header, to let the compiler know that it relies on the other class. Only changes in a class's header cause recompiles of dependent classes, while changes to its implementation don't. Because of this setup, C++ builds are much more powerful when it comes to build avoidance, by only rebuilding the parts that they absolutely have to build. This allows for guidelines that incremental builds should take between 5-10 seconds and that full builds (including tests) should take between 1-2 minutes, according to http://www.bitsnbites.eu/faster-c-builds/, numbers which are incredibly low by SystemVerilog standards, where merely starting the simulator takes double digit numbers of seconds.

Case study

The classic testbench structure for an IP block consists of one or more interface verification components (IVCs), that contain code related to the signal protocols used by the design, and one module verification component (MVC), that contains code for aspects related to the design functionality.

IVCs typically consist of a package and one or more interfaces. We don't usually make changes to the IVCs, so once we've built them via a full build, they won't have any impact on subsequent incremental builds.

Most of our work is focused on the MVC. As we've seen above, if we place our MVC code into one package, then any change we make to it will trigger a new build, because of the way SystemVerilog tools handle incremental builds. This isn't going to be very efficient, as an incremental build of the package after each change will take about as long as a full build.

What would happen if we could split our big MVC package into multiple smaller packages?

It's experiment time again! We'll assume that we can split the code such that building each package takes the same amount of time. We'll also ignore any extra costs from building multiple packages instead of one single package. This means that if an incremental build of the entire mvc package would have taken N seconds, then by splitting it into P packages each of the smaller packages would take N/P seconds to build. We'll also assume that we are just as likely to make changes to any of the smaller packages. This means that the probablity to change any package is 1/P.

Let's assume that we can create two independent packages, p0 and p1. We can misuse UML to visualize the package topology:

open the post in a browser to display images

Any change we make to p0 won't cause rebuilds of p1 and vice-versa. We can compute the average incremental build time in this case. Building any of the packages takes N/2 seconds, but we do it only half of the time (since in the other half we change the other package). The average incremental build time is the mean: N/2 * 1/2 + N/2 * 1/2 = N/2. By splitting the code into two independent packages, we've managed to half our incremental build time. It's not very realistic, though, that we could manage to do such a split on a real project.

Let's have a look at something closer to reality. Let's assume that we can split our MVC into two packages, p0 and p1, but p1 depends on p0:

open the post in a browser to display images

An incremental build of p1 would still take only N/2 seconds, because changing anything in p1 doesn't have any effect on p0. A change in p0 would mean that we also have to rebuild p1, which means that it would take N/2 + N/2 = N seconds. On average, we would need N/2 * 1/2 + N * 1/2 = 3/4 * N seconds.

We should try to structure our code in such a way as to increase the number of packages without any dependencies to each other. Let's say we can split p1 from the previous example into two independent packages, p1_0 and p1_1:

open the post in a browser to display images

In this case, changing anything in either p1_0 or p1_1 would take N/3 seconds. A change in p0 would require all three packages to be rebuilt and would take the full N seconds. On average, a change would take N/3 * 1/3 + N/3 * 1/3 + N * 1/3 = 7/9 * N seconds.

We could go on further with deeper package hierarchies, but I think you get the idea.

MVC code lends itself nicely to such a structure. We typically have some "common" code that models the basics of our DUT, from which we can model different higher level aspects, relating to the features of the DUT. We would use our models inside checks or coverage, which could be built independently from each other:

open the post in a browser to display images

Conclusions

Splitting code across multiple packages will generally be better for compilation speed. There are also other advantages. It could make the code base easier to understand, by grouping code by theme (code for X goes in package p_x, code for Y goes in package p_y). It could also make development easier, by allowing developers to specialize in only a handful of the packages, instead of having to deal with the entire code base.

Having to manage multiple packages brings its own set of challenges, though. It could make the code base more difficult to understand if the boundaries between packages are arbitrary (where does code for X go, in p0 or p1?). More packages, especially when they have intricate dependency relationships, also make compilation more difficult to set up.

I'm not going to recommend making one package per class, just to improve build times. Ideally, SystemVerilog compilers should evolve to better handle incremental compilation, by working at levels lower than just packages. At the same time, you should care about turnaround time, so dumping all code into one package shouldn't be your default mode of operation.

Sunday, June 25, 2017

Testing SVA Properties and Sequences

After a pretty long absence, it’s finally time to complete the series on unit testing interface UVCs. I meant to write this post in October/November 2016. While writing the code, I got bogged down by a simulator bug and tried to find an elegant work around, but failed. I got frustrated and shelved the work for a while. In the meantime I got caught up with technical reading and with taking online courses. I’ve also been pretty busy at work, putting in quite a bit of overtime, which left without much energy to do anything blog related. Well, enough excuses, it’s time to get to it…

Aside from monitoring and driving signals, an interface UVC is also responsible for checking that the DUT conforms to the protocol specification. This is typically done with SystemVerilog assertions, which provide a compact and powerful syntax for describing temporal behavior at the signal level. I already wrote a bit about unit testing SVAs using SVAUnit, a library from our friends at AMIQ Consulting.

I’ve since then had a bit of an epiphany while working on an interface UVC for a new proprietary on-chip interface. Now on a previous module, I needed to write some bigger SVA properties of the type “when register X is written via AHB, then Y happens”. The AHB UVC I was using (also written by me a while back) only had assertions embedded in a checker, but it didn’t export any SVA sequences for users to combine into their own properties. I ended up doing a bit of clipboard based inheritance and defining the needed sequences in my module SVA checker. I had a similar problem when I was trying to write some simple formal properties for a different module, where I also just copied (gasp!) and patched the needed code. Now AHB isn’t such a complicated and/or dynamic protocol, but my actions were in direct violation of the DRY princple. For the new UVC I was developing, I decided to not make the same mistake and build a nice hierarchy of SVA sequences, properties and assertions. These would form part of the UVC API, sort of an SVA API which users could “call” in their own code. As any important parts of the exported API, such members have to be unit tested.

Let’s look at some simple AHB SVA constructs. Since this protocol is so popular, I assume that most of you are already aquainted with it and for those of you who aren’t I’ll try to keep things simple, but I won’t explain any protocol details. Since the spec isn’t open, I can’t link it here, but a quick search should net you some useful resources. Nevertheless, I’m pretty sure you’ll be able to follow the post without becoming an expert in the protocol.

One of the first non-trivial protocol rules for AHB is that “[when] the slave is requesting wait states, the master must not change the transfer type […]”. Let’s write a property for this with the following signature:

property trans_held_until_ready(HTRANS, HREADY);
  // ...
endproperty

In the previous post we saw how we can use the expect statement to check that signal toggles happen as desired. The unit test supplied the property, while the code being tested handled the signals. To check a property, we can reverse the two roles and have the test supply the signal toggles, while the code under test is exactly our property of interest. The same `FAIL_UNLESS_PROP(…) macro can help us check if the property passes for a legal trace:

    `SVTEST(trans_held_until_ready__trans_stable__passes)
      cb.HTRANS <= NONSEQ;
      cb.HREADY <= 0;
      cb.HREADY <= ##3 1;

      `FAIL_UNLESS_PROP(trans_held_until_ready(HTRANS, HREADY))
    `SVTEST_END

I’ve ommited the definitions of the signals, which are bundled in a clocking block called cb. Declaring this clocking block as default also allows use to use the cycled delay operator, ##n, which makes the code a bit more readable.

Just checking that properties pass is rather boring though. Not only that, but a property that doesn’t pass when it should results in a false negative, which is instantly visible in the simulation log. It’s much more valuable that a property fail when it should, because false positives are much more insidious and less likely to be caught. We can do this also with an expect statement, but we’ll need to trigger a fail if the corresponding pass block is triggered. As with `FAIL_UNLESS_PROP(…), we can wrap this check inside a macro:

`define FAIL_IF_PROP_PASS(prop) \
  expect (prop) \
    `FAIL_IF(1)

Having HTRANS change before an occurence of HREADY should cause our property to fail:

    `SVTEST(trans_held_until_ready__trans_changes__fails)
      cb.HTRANS <= NONSEQ;
      cb.HREADY <= 0;
      cb.HTRANS <= ##3 SEQ;

      `FAIL_IF_PROP(trans_held_until_ready(HTRANS, HREADY))
    `SVTEST_END

Here’s how a property that satisfies both tests could look:

property trans_held_until_ready(HTRANS, HREADY);
  HTRANS inside { NONSEQ, SEQ } |=>
    $stable(HTRANS) throughout HREADY [->1];
endproperty

Those of you who’ve worked with AHB before might raise an eyebrow looking at that code. What if, for example, HTRANS comes together with HREADY and changes in the following cycle? The property shouldn’t fail, as the first transfer was accepted and a new one can begin. We can add a test for this exact situation:

    `SVTEST(trans_held_until_ready__trans_stable__passes)
      cb.HTRANS <= NONSEQ;
      cb.HREADY <= 0;
      cb.HREADY <= ##3 1;

      `FAIL_UNLESS_PROP(trans_held_until_ready(HTRANS, HREADY))
    `SVTEST_END

With this test, we can fix the property:

  property trans_held_until_ready(HTRANS, HREADY);
    HTRANS inside { NONSEQ, SEQ } |->
      HREADY or ##1 ($stable(HTRANS) throughout HREADY [->1]);
  endproperty

Some of you might argue that when HTRANS comes at the same time as HREADY, we’re not really “holding” anything. It could be argued that this is a special case and what we’re really interested in for this property are the cases where we actually see some wait states. We could exclude the instant grant case by tweaking the antecedent in such a way that it doesn’t match. This would lead to a vacuous pass of the property. A vacuous pass means that we have’t really checked anything, because we didn’t particularly care what happened in that situation. Vacuous passes aren’t usually shown in the assertion statistics, so we could “misuse” the number of times an assertion of this property passes as coverage for how many times we’ve seen (correctly) stalled transfers.

A vacuous pass is still a pass though and as per the LRM it should also trigger the execution of an assert/expect statement’s pass block. Some tools don’t work like this, though, choosing instead to not execute pass blocks on vacuous successes (unless the user explictly enables this, maybe via some command line switch or simulator setting). We can use this to our advantage to distinguish between a “real” pass and a vacuous pass. What we  have then, is a sort of ternary logic, where a property can result in one of the following:

  • (nonvacuous) pass, where the pass block is executed
  • fail, where the fail block is executed
  • vacuous pass, where neither block is executed

Note that there’s no concept of vacuous fails. Something either works, it doesn’t or it isn’t “important”.

Even if a simulator does execute pass block for vacuous successes, this behavior can either be turned off via a switch or, in a more portable fashion, via the $assertcontrol(…) system task (if it’s supported by the tool). This means that we can rather safely rely on the behavior described in the outcome list to determine vacuity.

As before, we can wrap such a check inside a macro. It’s definition is a bit trickier, since we need to check that neither block was executed. We can do this using variables:

`define FAIL_UNLESS_PROP_VAC(prop) \
  begin \
    bit pass_called; \
    bit fail_called; \
    \
    expect (prop) \
      pass_called = 1; \
    else \
      fail_called = 1; \
    \
    if (pass_called || fail_called) \
      `FAIL_IF(1) \
  end

If any of the two blocks gets executed, one of the variables will be set and we can issue an error. This code, while deceptively simple, fails to compile in some simulators, with them complaining that they can’t find the definition of pass_called inside the pass block (and, of course, the same for fail_called). This is the part where I got bogged down trying to find a suitable workaround. The only way I could get this to work was to define the *_called variables inside a package and use the scope operator to reference them in the pass/fail blocks:

package vgm_svunit_utils_sva;

  bit pass_called;
  bit fail_called;

endpackage

This seems rather crazy, because not only does it require a user to include the file with the macro definition, but to also compile the extra support package. It’s a bit much for just a couple of measly variables, but it’s either this or nothing…

Since we’re going to rely on global variables with a persistent lifetime, we’ll need to make sure to set them to 0 before executing the expect:

`define FAIL_UNLESS_PROP_VAC(prop) \
  begin \
    vgm_svunit_utils_sva::pass_called = 0; \
    vgm_svunit_utils_sva::fail_called = 0; \
    \
    expect (prop) \
      vgm_svunit_utils_sva::pass_called = 1; \
    else \
      vgm_svunit_utils_sva::fail_called = 1; \
    \
    if (vgm_svunit_utils_sva::pass_called || \
        vgm_svunit_utils_sva::fail_called) \
      `FAIL_IF(1) \
end

Using this macro, we can tweak our test for instantly granted transfers to require a vacuous pass for the property:

    `SVTEST(trans_held_until_ready__trans_stable__vacuous)
      cb.HTRANS <= NONSEQ;
      cb.HREADY <= 0;
      cb.HREADY <= ##3 1;

      `FAIL_UNLESS_PROP_VAC(trans_held_until_ready(HTRANS, HREADY))
    `SVTEST_END

This will also mean that we have to fix the property:

property trans_held_until_ready(HTRANS, HREADY);
  HTRANS inside { NONSEQ, SEQ } && !HREADY |=>
    $stable(HTRANS) throughout HREADY [->1];
endproperty

Let’s take a step back now. Remember, that for the `FAIL_IF_PROP(…) macro we were checking whether the pass block is executed and if it was we issued an error. This doesn’t fit into the whole “ternary logic” scheme we discussed above when talking about vacuity. If we only did this, we wouldn’t be able to distinguish a fail from a vacuous pass. The macro name is also kind of misleading. What do we want here? Do we want the property to fail? Do we want it to fail or be vacuous, but under no circumstances result in a nonvacuous pass? What I intended was the former, but others could just as well interpret it as the latter.

More explicit macros would better clarify our intent. In this case, what we want is a `FAIL_UNLESS_PROP_FAIL(…) macro, because we are explicitly checking that the property can catch errors. What we should check is that the fail block gets executed:

`define FAIL_UNLESS_PROP_FAIL(prop) \
  begin \
    vgm_svunit_utils_sva::fail_called = 0; \
    \
    expect (prop) \
    else \
      vgm_svunit_utils_sva::fail_called = 1; \
    \
    if (!vgm_svunit_utils_sva::fail_called) \
      `FAIL_IF(1) \
  end

In some cases, we would want to forbid a property to fail, but it wouldn’t be important if the pass is vacuous or not. Here we would have a `FAIL_IF_PROP_FAIL(…) macro, that check that the fail block wasn’t executed. There are six such macros that we can define, a pair for each of the three possible outcomes. We won’t look at their definitions here, but their construction is pretty simple now that we know the behavior of the pass/fail blocks.

It’s time for another realization about our property: the trigger condition is slightly off. Once we assert it what’s going to happen is that a new evaluation thread is going to be started on each clock cycle where a transfer is stalled. All of these threads are going to run in parallel, perform the same check and end at the very same time – when HREADY finally comes. This isn’t good for perfomance, especially if we have many AHB interfaces and very long stalls.

The start of an AHB transaction is a pretty interesting event, not only for this property, but potentially for others. A UVC user might be interesed in writing an own property that triggers once a transfer has started. We can fix our property and at the same time provide a reusable building block by defining a sequence:

sequence trans_started(HTRANS);
  // ...
endsequence

Just as for properties we wanted to check whether they fail or pass when we want them, for sequence we want to make sure that they match when they should and don’t match when they shouldn’t. We can check for a sequence match (or lack thereof) by treating it as a property and checking its pass/fail state. Doing the following would look a bit weird, though:

`FAIL_UNLESS_PROP_PASS(trans_started(HTRANS))

The intent of the code becomes a bit muddied: “Are we testing a property? But I thought trans_started(…) was a sequence…”. It would be better to have separate macros for sequence testing:

`define FAIL_IF_SEQ(seq) \
  `FAIL_UNLESS_PROP_FAIL(seq ##0 1)

`define FAIL_UNLESS_SEQ(seq) \
  `FAIL_UNLESS_PROP_PASS(seq ##0 1)

Using these will make the code a bit clearer. Also, notice the extra ##0 1 fused after the sequence. This is to ensure that we can’t accidentally pass a property as an argument, because the fusion operator will cause a compile error unless what comes before it is a sequence.

Coming back to our trans_started(…) sequence, the first thing we would like it to do is to not match when HTRANS is IDLE:

    `SVTEST(trans_started__idle__doesnt_match)
      HTRANS <= IDLE;
      `FAIL_IF_SEQ(trans_started(HTRANS))
    `SVTEST_END

We would also like it to match then HTRANS becomes active after an IDLE:

    `SVTEST(trans_started__coming_from_idle__matches)
      cb.HTRANS <= IDLE;
      ##1;

      cb.HTRANS <= NONSEQ;

      `FAIL_UNLESS_SEQ(trans_started(HTRANS))
    `SVTEST_END

With our tests in place, we can write the sequence implementation that fulfils them:

sequence trans_started(HTRANS);
  HTRANS inside { NONSEQ, SEQ } &&
    $past(HTRANS inside { IDLE, BUSY });
endsequence

Something still doesn’t feel right, though. What about back to back transfers? It’s perfectly legal to start a new transfer immediately after the previous one was granted. In this case, there isn’t any IDLE cycle to use as an anchor. What we can, however, use is the occurrence of HREADY in the previous cycle, which we’ll have to feed to the property. Here’s how this could be tested:

    `SVTEST(trans_started__after_done_trans__matches)
      cb.HTRANS <= NONSEQ;
      cb.HREADY <= 1;
      ##1;

      `FAIL_UNLESS_SEQ(trans_started(HTRANS, HREADY))
    `SVTEST_END

The fixed sequence would then be:

sequence trans_started(HTRANS, HREADY);
  HTRANS inside { NONSEQ, SEQ } &&
    $past(HTRANS inside { IDLE, BUSY } || HREADY);
endsequence

If we try to run this in the simulator, though, the test is still going to fail, even though there isn’t anything obviously wrong with our fix. This is because the test contains a very subtle mistake. When the underlying expect statement from the `FAIL_*(…) macro kicks in, in its very first cycle the value returned by $past(HREADY) is 0, because we haven’t actually let it run long enough for there to have been an actual previous cycle. The LRM states that in this case $past(…) returns the default value of the expression passed to it. What we need to do is move the delay operator into the `FAIL_*(…) macro, to allow the expect to sample HREADY first and look for a match of trans_started(…) afterwards:

    `SVTEST(trans_started__after_done_trans__matches)
      cb.HTRANS <= NONSEQ;
      cb.HREADY <= 1;

      `FAIL_UNLESS_SEQ(##1 trans_started(HTRANS, HREADY))
    `SVTEST_END

This way the test does what we intend it to do. We can now instantiate the sequence inside our trans_held_until_ready(…) property to have it trigger at the appropriate times. Since trans_started(…) has already been tested, we don’t have to write too many tests for the property, focusing just on what’s important. Also, by breaking the problem into smaller parts its easier to notice what the corner cases might be. As we’ve seen, writing even such a small property can be tricky, so we should make sure that our code works before trusting it to find design bugs.

Regarding the testing of assertions, I’m not saying that this isn’t important as well. A lot of the more complicated assertions we need to write will have to rely on support code (for example when pipelining comes into the mix) and we’re going to want to check that all parts of an assertion – the property, the support code and the connections between them – fit properly together. For protocol assertions and other simple assertions, I favor breaking down into smaller properties and sequences and testing those, not only to make testing easier, but to also provide a set of reusable elements that UVC users can integrate into their own code.

You can find the full code for the examples here and you can also download the SVUnit utils package if you want to start using these techiques for your own code.

There’s still some work to be done regarding unit tests and SVA constructs. For one, we strongly relied on the assumption that vacuous passes don’t trigger action blocks. We could add some code that tests this assumption by doing a trial run of a known vacuous property (e.g. 0 –> 1). If this isn’t the case, we could try calling the $assertcontrol(…) system task to disable vacuous success execution of pass blocks, if the task is available. Finally, if all else fails, we could inform the user to change the tool invocation to match our required behavior. This plan makes me feel less bad about the extra *_utils_sva package, which we had to use for the workaround with the status variables, because this is where we’d put all of this extra code. I’d also like to see this code merged into SVUnit at some point, but I’m not sure if now is the right time, due to the differences in tool capabilities across simulator vendors.

Now I’d like to conclude this series on unit testing UVCs. The tips in the past few posts should help you develop your UVCs faster and with higher quality, thereby increasing your confidence that they are ready for life in the harsh and unforgiving world of verification.

Monday, August 15, 2016

Testing UVM Drivers, Part 2

In the previous post we looked at how we can emulate sequencer/driver communication using a lightweight stub of uvm_sequencer. Let's also look at some more tips and tricks I've picked up while writing unit tests for drivers. To mix things up a bit, let's look at the AXI protocol. We're not going to implement a full featured driver; instead, we'll focus on the write channels:

interface vgm_axi_interface(input bit ACLK, input bit ARESETn);
  logic [3:0] AWID;
  logic [31:0] AWADDR;
  logic [3:0] AWLEN;
  logic AWVALID;
  logic AWREADY;

  logic [3:0] WID;
  logic [31:0] WDATA;
  logic WLAST;
  logic WVALID;
  logic WREADY;

  logic [3:0] BID;
  logic [1:0] BRESP;
  logic BVALID;
  logic BREADY;
endinterface

Our sequence item will model the properties of a write transaction:

typedef enum bit [3:0] { LENGTH_[1:16] } length_e;


class sequence_item extends uvm_sequence_item;
  rand bit [3:0] id;
  rand bit [31:0] address;
  rand length_e length;
  rand transfer transfers[];
  rand int unsigned delay;

  // ...
endclass


class transfer extends uvm_sequence_item;
  rand bit[31:0] data;
  rand int unsigned delay;

  // ...
endclass

The driver will consist of the familiar get and drive loop:

class master_driver extends uvm_driver #(sequence_item);
  virtual vgm_axi_interface intf;

  virtual protected task get_and_drive();
    forever begin
      seq_item_port.get_next_item(req);
      drive();
      seq_item_port.item_done();
    end
  endtask

  // ...
endclass

We'll implement the driver based on the following unit tests. Have a quick look at them before going further:

    `SVTEST(drive_awvalid__with_delay)
      sequence_item item = new("item");
      `FAIL_UNLESS(item.randomize() with {
        delay == 3;
      })
      sequencer.add_item(item);

      repeat (3)
        @(posedge clk) `FAIL_UNLESS(intf.AWVALID === 0)
      @(posedge clk) `FAIL_UNLESS(intf.AWVALID === 1)
    `SVTEST_END


    `SVTEST(drive_awvalid__held_until_hready)
      sequence_item item = new("item");
      `FAIL_UNLESS(item.randomize() with {
        delay == 0;
      })
      sequencer.add_item(item);
      intf.AWREADY <= 0;

      repeat (4)
        @(posedge clk) `FAIL_UNLESS(intf.AWVALID === 1)

      intf.AWREADY <= 1;
      @(posedge clk) `FAIL_UNLESS(intf.AWVALID === 1)
      @(posedge clk) `FAIL_UNLESS(intf.AWVALID === 0)
    `SVTEST_END


    `SVTEST(drive_write_addr_channel)
      sequence_item item = new("item");
      `FAIL_UNLESS(item.randomize() with {
        id == 5;
        address == 'h1122_3344;
        length == LENGTH_14;
        delay == 0;
      })
      sequencer.add_item(item);

      @(posedge clk);
      `FAIL_UNLESS(intf.AWID == 5)
      `FAIL_UNLESS(intf.AWADDR == 'h1122_3344)
      `FAIL_UNLESS(intf.AWLEN == 'b1101)
    `SVTEST_END


    `SVTEST(drive_wvalid__with_delay)
      sequence_item item = new("item");
      `FAIL_UNLESS(item.randomize() with {
        delay == 0;
        length == LENGTH_4;
        transfers[0].delay == 1;
        transfers[1].delay == 3;
        transfers[2].delay == 2;
        transfers[3].delay == 0;
      })
      sequencer.add_item(item);

      // Skip over address phase
      @(posedge clk);

      @(posedge clk) `FAIL_UNLESS(intf.WVALID === 0)
      @(posedge clk) `FAIL_UNLESS(intf.WVALID === 1)

      repeat (3)
        @(posedge clk) `FAIL_UNLESS(intf.WVALID === 0)
      @(posedge clk) `FAIL_UNLESS(intf.WVALID === 1)

      repeat (2)
        @(posedge clk) `FAIL_UNLESS(intf.WVALID === 0)
      @(posedge clk) `FAIL_UNLESS(intf.WVALID === 1)

      @(posedge clk) `FAIL_UNLESS(intf.WVALID === 1)
    `SVTEST_END


    `SVTEST(drive_write_data_channel)
      sequence_item item = new("item");
      `FAIL_UNLESS(item.randomize() with {
        delay == 0;
        length == LENGTH_4;
        transfers[0].data == 'hffff_ffff;
        transfers[1].data == 'h0000_0000;
        transfers[2].data == 'haaaa_aaaa;
        transfers[3].data == 'h5555_5555;
        transfers[0].delay == 0;
        transfers[1].delay == 0;
        transfers[2].delay == 0;
        transfers[3].delay == 0;
      })
      sequencer.add_item(item);

      // Skip over address phase
      @(posedge clk);

      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'hffff_ffff)
      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'h0000_0000)
      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'haaaa_aaaa)
      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'h5555_5555)
    `SVTEST_END


    `SVTEST(drive_write_data_channel__data_held_until_wready)
      sequence_item item = new("item");
      `FAIL_UNLESS(item.randomize() with {
        delay == 0;
        length == LENGTH_4;
        transfers[0].data == 'hffff_ffff;
        transfers[1].data == 'h0000_0000;
        transfers[2].data == 'haaaa_aaaa;
        transfers[3].data == 'h5555_5555;
        transfers[0].delay == 0;
        transfers[1].delay == 0;
        transfers[2].delay == 0;
        transfers[3].delay == 0;
      })
      sequencer.add_item(item);

      // Skip over address phase
      @(posedge clk);

      intf.WREADY <= 0;
      repeat (3)
        @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'hffff_ffff)
      intf.WREADY <= 1;
      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'hffff_ffff)

      intf.WREADY <= 0;
      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'h0000_0000)
      intf.WREADY <= 1;
      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'h0000_0000)

      intf.WREADY <= 0;
      repeat (2)
        @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'haaaa_aaaa)
      intf.WREADY <= 1;
      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'haaaa_aaaa)

      intf.WREADY <= 0;
      repeat (4)
        @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'h5555_5555)
      intf.WREADY <= 1;
      @(posedge clk) `FAIL_UNLESS(intf.WDATA === 'h5555_5555)
    `SVTEST_END


    `SVTEST(drive_write_data_channel__wlast_driven_for_last_transfer)
      sequence_item item = new("item");
      `FAIL_UNLESS(item.randomize() with {
        delay == 0;
        length == LENGTH_8;
        foreach (transfers[i])
          transfers[i].delay == 0;
      })
      sequencer.add_item(item);

      // Skip over address phase
      @(posedge clk);

      repeat (7)
        @(posedge clk) `FAIL_IF(intf.WLAST)
      @(posedge clk) `FAIL_UNLESS(intf.WLAST)
    `SVTEST_END

The tests above cover the desired functionality that we want to implement in our driver. We won't go through each and every one of them and see what production code we need to write, since the actual implementation of the driver isn't really important for this post. We want to focus more on the tests themselves.

When first confronted with these tests, a new developer won't have an easy time understanding what's going on. The tests are pretty verbose and it's not immediately clear what the focus of each one is. Let's see how we could improve this.

First, we'll notice that the one thing we do in each test is to create an item and queue it for the driver. We use randomization to make sure that the item is "consistent", meaning that the constraints defined in it hold. We could set item variables procedurally, but we would need to ensure that the length and the number of transfers match, which would mean even more code. Instead of repeating these steps in each unit test, we could centralize them into one place. Since we use randomization, we can't extract a function. We're forced to use a macro:

  `define add_item_with(CONSTRAINTS) \
    sequence_item item = new("item"); \
    `FAIL_UNLESS(item.randomize() with CONSTRAINTS; \
    sequencer.add_item(item);

This is going to save us some lines of code and it's going to make the intent of the code more obvious. What will differ between unit tests are the constraints they use when adding an item:

    `SVTEST(drive_awvalid__with_delay)
      `add_item_with({
        delay == 3;
      })

      // ...
    `SVTEST_END


    `SVTEST(drive_awvalid__held_until_hready)
      `add_item_with({
        delay == 0;
      })

      // ...
    `SVTEST_END


    `SVTEST(drive_write_addr_channel)
      `add_item_with({
        id == 5;
        address == 'h1122_3344;
        length == LENGTH_14;
        delay == 0;
      })

      // ...
    `SVTEST_END

We can even go one step further. If we look at the tests again, we'll notice that we use the same constraint over and over again to get an item without any delay. Also, whenever we want to check some write data channel aspects, we want to constrain the delay of each transfer to be zero. We do have tests where we want non-zero delays, but they are the exceptional cases. What we could do is to add some default values to the delay variables via soft constraints. This way, whenever we use the `add_item_with(...) macro we know that we'll get an item without any delay:

  `define add_item_with(CONSTRAINTS) \
    sequence_item item = new("item"); \
    `FAIL_UNLESS(item.randomize() with { \
      soft delay == 0; \
      foreach (transfers[i]) \
        soft transfers[i].delay == 0; \
      \
      if (1) \
        CONSTRAINTS \
    }) \
    sequencer.add_item(item);

This way, we can write even more compact code to add items. Here are the two data tests I mentioned:

    `SVTEST(drive_write_data_channel)
      `add_item_with({
        length == LENGTH_4;
        transfers[0].data == 'hffff_ffff;
        transfers[1].data == 'h0000_0000;
        transfers[2].data == 'haaaa_aaaa;
        transfers[3].data == 'h5555_5555;
      })

      // ...
    `SVTEST_END


    `SVTEST(drive_write_data_channel__data_held_until_wready)
      `add_item_with({
        length == LENGTH_4;
        transfers[0].data == 'hffff_ffff;
        transfers[1].data == 'h0000_0000;
        transfers[2].data == 'haaaa_aaaa;
        transfers[3].data == 'h5555_5555;
      })

      // ...
    `SVTEST_END

If we compare them with the initial version from the beginning of the post, we'll see that they are much more compact. It's also clearer that the handling of the data variables are what we're testing and not anything related to delays.

Since we want to check the behavior of signals at certain points in time, we need to do a lot of waits. The statement @(posedge clk) comes up a lot in our unit tests. We could shorten this by using a default clocking block:

module master_driver_unit_test;
  // ...

  default clocking @(posedge clk);
  endclocking
endmodule

Now, instead of using the verbose @(posedge clk) statement, we can use the cycle delay operator:

    `SVTEST(drive_wvalid__with_delay)
      // ...

      // Skip over address phase
      ##1;

      ##1 `FAIL_UNLESS(intf.WVALID === 0)
      ##1 `FAIL_UNLESS(intf.WVALID === 1)

      repeat (3)
        ##1 `FAIL_UNLESS(intf.WVALID === 0)
      ##1 `FAIL_UNLESS(intf.WVALID === 1)

      repeat (2)
        ##1 `FAIL_UNLESS(intf.WVALID === 0)
      ##1 `FAIL_UNLESS(intf.WVALID === 1)

      ##1 `FAIL_UNLESS(intf.WVALID === 1)
    `SVTEST_END

Having less words to parse makes the test's intent clearer. Using a default clocking block is an option most of the time, but if you have a more exotic protocol that uses both edges of the clock or multiple clocks altogether, it's not going to work.

One thing you may have noticed is that I've marked some wait statements with comments. If you've read Clean Code (and if you haven't you should), you'll call me out on this. Uncle Bob says that comments are a crutch for poorly written code that doesn't express its intent properly. Instead of relying on comments, we could create a named task:

  task wait_addr_phase_ended();
    ##1;
  endtask

Now, when a test calls this task, it'll be immediately apparent what the intention is:

    `SVTEST(drive_write_data_channel)
      `add_item_with({
        // ...
      })

      wait_addr_phase_ended();

      // ...
    `SVTEST_END

There is a mismatch between what the task does and its name. The task actually just waits for one cycle. To reflect this, it should have been named wait_cycle(). A task call like this will take us back to square one in terms of test readability. We may as well just use the ##1 statement as that tells us the same thing. If we want to solve this mismatch between the task name and its implementation, we should change the latter. In the context of our tests, we knew that the address phase was going to end after one clock cycle. Generally, though, What we want is to wait for AWVALID and AWREADY to be high at the same time:

  task wait_addr_phase_ended();
    @(posedge clk iff intf.AWVALID && intf.AWREADY);
  endtask

Some unit-testing purists might bash this method because it adds complexity to the testing logic. This is something we should try to avoid, since we don't want to have to start testing our tests. The task is lean enough that I'd say this point doesn't apply here.

While we may have streamlined our test preparation code, checking that our expectations are fulfilled is ridiculously long. Procedural code isn't really well suited to check behavior over time. You know what is though? Assertions... Instead of writing a big mess of procedural code that does repeats and cycle delays, we could write a nice property and check that it holds. When people hear the word property, they normally think of concurrent assertions, but this isn't really what we want here. What we want to do is to check that a certain property holds after a certain point in time, not during the whole simulation.

Luckily, SystemVerilog provides the expect construct, which does exactly what we want. Given a property, it will begin evaluating it starting with the first subsequent clock. For example, to check that the driver can drive data transfers with delays, we could write the following:

    `SVTEST(drive_wvalid__with_delay)
      // ...

      expect (
        intf.WVALID === 0
          ##1 intf.WVALID === 1
          ##1 intf.WVALID === 0 [*3]
          ##1 intf.WVALID === 1
          ##1 intf.WVALID === 0 [*2]
          ##1 intf.WVALID === 1
          ##1 intf.WVALID === 1)
      else
        `FAIL_IF(1)
    `SVTEST_END

This is much cleaner than the procedural code we had before. It also allows us to structure our unit tests according to the Arrange, Act, Assert pattern (even though most of the time for drivers Arrange and Act get a bit mixed, but at least Assert is separated clearly).

Since expecting that properties hold is something we'll want to do over and over again, let's wrap it in a utility macro and make it part of the vgm_svunit_utils package:

`define FAIL_UNLESS_PROP(prop) \
  expect (prop) \
  else \
    `FAIL_IF(1)

Using this macro will give the unit tests a more consistent SVUnit look and feel:

    `SVTEST(drive_wvalid__with_delay)
      // ...

      `FAIL_UNLESS_PROP(
        intf.WVALID === 0
          ##1 intf.WVALID === 1
          ##1 intf.WVALID === 0 [*3]
          // ...
      )
    `SVTEST_END

As we saw in the previous post, it's very important to use the === (4-state equality) operator instead of ==, otherwise we're writing tests that always pass. I intentionally wrote the drive_write_addr_channel buggy to show this (kudos to anyone who noticed). Also, we need to watch out for hidden comparisons. In the last test we didn't even use the equality operator:

    `SVTEST(drive_write_data_channel__wlast_driven_for_last_transfer)
      // ...

      repeat (7)
        @(posedge clk) `FAIL_IF(intf.WLAST)
      @(posedge clk) `FAIL_UNLESS(intf.WLAST)
    `SVTEST_END

Because of the way the `FAIL_* macros are written, they will both always pass, so the test isn't really doing anything. If we were to re-write them using properties, we would notice if WLAST isn't being driven:

    `SVTEST(drive_write_data_channel__wlast_driven_for_last_transfer)
      // ...

      `FAIL_UNLESS_PROP(
        !intf.WLAST [*7]
          ##1 intf.WLAST)
    `SVTEST_END

If WLAST were to be X, then the negation would also return X, which would be interpreted as a 0, leading to a fail of the property. For single bit signals, using properties is much safer than comparing for equality. There's also the added bonus that the code is more compact. For vectors, though, we still need to make sure that we're using the === operator.

Another cool thing that properties allow us to do is to focus more on the aspects that are important for a test. For example, in the data_held_until_ready test we want to check that the driver doesn't modify the value of WDATA until it sees a corresponding WREADY. We did this by choosing known values for the data transfers and checking that these values stay on the bus for the appropriate number of clock cycles. Actually, we don't (or shouldn't) care what data is being driven, as long as it stays constant. We could simplify the code to the following:

    `SVTEST(drive_write_data_channel__data_held_until_wready)
      `add_item_with({
        length == LENGTH_4;
      })

      // ...

      `FAIL_UNLESS_PROP(
        stable_for(intf.WDATA, 3)
          ##1 stable_for(intf.WDATA, 1)
          ##1 stable_for(intf.WDATA, 2)
          ##1 stable_for(intf.WDATA, 4))
    `SVTEST_END

The stable_for(...) sequence only exists because my simulator complained about using $past(...) inside an expect statement. It's content could have been inlined to the property:

  sequence stable_for(signal, int unsigned num_cycles);
    ##1 ($stable(signal) [*num_cycles]);
  endsequence

We don't really need to use a property to check that the signal stays stable. As per the LRM we could use the $stable(...) system function inside procedural code as well, but some tools don't allow this, limiting its use to assertions.

While it's usually frowned upon to have randomness in unit tests, I would argue that this isn't necessarily a problem here. Randomness should be avoided because it causes us to write complicated code for our expectations. As long as we don't need to do this (i.e. the checking code stays the same regardless of what we plug into it), everything should be fine. There is one caveat, though. On the off chance that two consecutive transfers contain the same data, it won't be possible to figure out whether the driver moved to the second transfer too early. In the extreme case, all transfers could have the same data value, making it impossible to check that the driver really holds transfers constant for their entire durations. This could be solved by writing a constraint that all data values should be unique.

Last, but not least, I hinted in the previous post that a driver not only drives items from the sequencer. It's also supposed to request items from the sequencer at well defined points in time. For example, a driver should call get_next_item(...) once it's able to process a new item, but not before, to allow the running sequence to randomize items at the latest possible time (so called late randomization). This is helpful when sequences use the current state of the system to decide what to do next. For simple protocols this is easy: a new item can start exactly after the previous one has finished. For pipelined protocols, though, it's a not as easy. The AXI protocol is massively pipelined and can have a lot of ongoing transactions at any time. I don't want to have to think what a sensible definition for an available slot for a new item would be, because the scheme would probably be too complicated. I do however want to show how we could verify that a driver calls sequencer methods at defined times.

We'll take a contrived example of how to handle responses, since I couldn't think of anything better. Whether we're checking that put_response(...) or get_next_item(...) was called when expected doesn't really matter, so this example should be enough to prove a point. Let's say that when a response comes on the write response channel, the driver is supposed to let the sequencer know.

We'll have another sequence item for the response:

typedef enum { OKAY, EXOKAY, SLVERR, DECERR } response_kind_e;

class response extends uvm_sequence_item;
  bit [3:0] id;
  response_kind_e kind;

  // ...
endclass

First of all, we'll want to check that the response contains the proper values:

    `SVTEST(put_response_when_ready)
      response rsp;

      intf.BID <= 5;
      intf.BVALID <= 1;
      ##1;

      intf.BREADY <= 1;
      ##1;

      uvm_wait_for_nba_region();
      `FAIL_UNLESS(sequencer.try_get_rsp(rsp))
      `FAIL_UNLESS(rsp.id == 5)
    `SVTEST_END

The driver waits for a response to be seen on the bus and calls put_response(...) when this is the case:

class master_driver extends uvm_driver #(sequence_item, response);
  // ..

  virtual protected task collect_and_put();
    forever begin
      collect();
      seq_item_port.put_response(rsp);
    end
  endtask

  virtual protected task collect();
    @(intf.ACLK iff intf.BVALID);
    rsp = response::type_id::create("rsp");
    rsp.id = intf.BID;
  endtask
endclass

The test is going to pass with this implementation. The problem is, though, that the response is considered to be done only once it has also been accepted, i.e. BREADY goes high. Unfortunately, the previous code sends responses too early. We'll need to update our unit test to check that put_response(...) was called after BREADY was also high and that it was called only once. This is where the test diagnostic information that sequencer_stub provides is going to be useful:

    `SVTEST(put_response_when_ready)
      // ...

      intf.BREADY <= 1;
      ##1;

      `FAIL_UNLESS_TRIGGERED(sequencer.put_response_called)
      `FAIL_UNLESS(sequencer.num_put_response_calls == 1)

      // ...
    `SVTEST_END

The sequencer_stub class contains a named event put_response_called that, as the name suggests, is triggered when put_reponse(...) is called by the driver. The `FAIL_UNLESS_TRIGGERED(...) macro is part of the vgm_svunit_utils package. It wraps the code required to check that an event was triggered in the current time step:

`define FAIL_UNLESS_TRIGGERED(ev) \
  fork \
    wait (ev.triggered); \
    \
    begin \
      #1; \
      `FAIL_IF_LOG(1, `"'ev' not triggered`") \
    end \
  join_any \
  disable fork;

The wait statement checks if the event was already triggered (because the driver code could have executed first) and if it wasn't, it blocks. If time moves forward and the event didn't get triggered, an error message is triggered. Since the test will now fail, we'll need to update the driver code (not shown).

We've covered a lot of ground in this post on how to write more comprehensive, readable and maintainable unit tests for UVM drivers. You can find the example code here. I hope this helps you be more productive when developing your own drivers.