Monday, September 14, 2015

Be More Assertive about Your Testbench Code

Developing verification environments revolves around writing checks. We need to separate the concepts of checking the DUT from checking testbench code. DUT checks represent the "business logic" of our verification software. The code we write isn't perfect, though. Sprinkling the testbench with checks of its own helps to ensure its correctness by catching programming errors at their source.

Both SystemVerilog and e provide language constructs to reason about the DUTs behavior. In SystemVerilog we have the assert keyword, while e programmers use check in procedural code and expect to verify temporal behavior. These keywords are tightly integrated with EDA tools, allowing users to tag individual checks, inspect their states (for example, using an assertion browser) or annotate them to their verification plans.

If you've ever read up on SystemVerilog, chances are you've seen code snippets similar to this one:

byte some_var;
assert (std::randomize(some_var) with { some_var == 1000; });

Checking the return value of randomize() is in general a good idea, because it helps us find cases where we have contradicting constraints. It's pretty clear that randomization will fail in the code snippet above, since a byte can only hold values up to 255. The reason it fails is because we made a mistake when setting our constraint, resulting in buggy testbench code.

While we will see see an error message when executing this code, using assert to do implement such checks is not the way to go. This is because the IEEE 1800-2012 LRM states that "Assertions are primarily used to validate the behavior of a design.". It also says that the assert statement is supposed to be used "to specify the property as an obligation for the design that is to be checked to verify that the property holds.". The fact that the randomization call was successful doesn't relate in any way to the DUT. It is purely a testbench issue, so we shouldn't be using assert to check it.

There are multiple problems that misusing assert like this will cause. First, since EDA tools interpret assert statements as DUT checks and track them, any such testbench checks will appear alongside "real" assertions and pollute the overview. This is more of an annoyance than a major problem. The problems come when we realize that assertions can be disabled using the $assertoff(...) system task. If before executing the randomize() call above the simulator would encounter an $assertoff(...), we wouldn't get any error flagged since the check would be disabled. This means that in cases where we would expect assertion errors (like error injection or fault simulations) and would disable some DUT checks, we might accidentally disable some our testbench's checks in the process. Let's also look at what happens when we disable assertions that would pass. Consider the following code snippet:

byte some_var;
assert (std::randomize(some_var) with { some_var == 10; });

This randomize() call will always be successful, but if we were to disable all assertions, then we'd have the nice surprise of seeing that some_var will remain 0. This is because the randomize() doesn't get executed anymore. There was also a rumor at one point that some simulators might execute the statement, while others might not, leading to more potential for inconsistency between different vendors (as if there wasn't enough variation in SystemVerilog simulator implementations...). I'm not sure what the status right now is (all the ones I've tested won't execute the randomize() statement), but I hope this and the other reasons above convinced you that using assert in this way is a very bad idea.

The assert keyword is also part of the e language, where it's meant to be used to check e code for correct behavior (remember that the keywords to check the design for correct behavior were check and expect). SystemVerilog doesn't have such a language construct dedicated to checking our own code, but then again neither does C. In C, assertions are implemented using the preprocessor. Programmers include the assert.h header, which defines the assert(...) macro. If the expression passed as an argument to the macro fails, an error message is printed which contains the location of the error (file and line) and the program is stopped.

We can implement something similar for SystemVerilog. Since assert is already taken, I've had the not so original idea of calling our macro prog_assert (for program, not progressive). If you've got a better name for it, please let me know in the comments. Our header will be called "prog_assert.svh". The macro needs to check the expression and in case of a fail, trigger a $fatal(...) call:

`define prog_assert(expr) \
  begin \
    if (!(expr)) \
      $fatal(0, $sformatf("Assertion '%s' failed.", `"expr`")); \
  end

The $fatal(...) message generated by the tool will already contain the location of the message (the file, line and scope - this is mandated by the standard). In addition to this, we can also print the expression that caused the fail. Let's see the macro in action. Let's say that we want to implement a rectangle class that takes the sides as constructor arguments:

class rectangle;
  extern function new(int unsigned side0, int unsigned side1);
  // ...
endclass

It doesn't make any sense to pass negative numbers for their lengths, so we can enforce them to be positive by declaring them as int unsigned. It also doesn't make any sense to allow any of the sides to be 0. This is something that we need to check at run time, when the constructor gets called:

function rectangle::new(int unsigned side0, int unsigned side1);
  `prog_assert(side0 > 0)
  `prog_assert(side1 > 0)
  // ...
endfunction

This way we can ensure that the code that is instantiating a rectangle isn't buggy.

Another feature of the C assert "library" is the ability to disable checks for deployed code. The idea behind this is that while software is being developed, it has bugs. We want to be able to track down those bugs quickly when they cause an assertion to fail and fix them. Production software should (ideally) be free of bugs, so any checks we have will only slow us down without any added benefit (since we know they're all going to pass anyway). Assertions are disabled when the NDEBUG symbol is defined. We can have our macro work the same way:

`ifdef NDEBUG
  `define prog_assert(expr) \
    begin \
    end
`else
  // ...
`endif

When NDEBUG is defined before including prog_assert.svh, the prog_assert macro will expand to basically nothing (as compilers should be able to optimize the empty begin...end block away). This means that the code passed as the expression won't be seen by the compiler. This makes it interesting to look at what happens if we use prog_assert with a randomize() call:

byte some_var;
`prog_assert(std::randomize(some_var) with { some_var == 10; })
$display("some_var = %0d", some_var);

If we simply execute this code, we won't see any error message (since the randomize() call can't fail) and we'll see that some_var got the value 10. If however we define the NDEBUG symbol beforehand, we'll notice that some_var stays 0. This is because the randomize() call never happens. This is a feature, not a bug as the C library also works like this. Programmers are only supposed to use expressions without any side-effects inside assert statements.

After a bit of research I learned that the Unreal engine (a big library used by a lot of video games) has some very nice assertion mechanisms in place. Aside from the assert style statement provided by assert.h (which they call check), it also defines two others. Most of them do basically the same thing, with some extra sugar on top. The more interesting one is called verify and the difference between it and assert is that the expression it operates on also gets executed in production builds, i.e. in cases where assert would expand to nothing. This is exactly the behavior we need to check the status of randomize():

`ifdef NDEBUG
  `define prog_verify(expr) \
    begin \
      void'(expr); \
    end
`else
  `define prog_verify(expr) \
    `prog_assert(expr)
`endif

During the development stage, prog_verify(...) acts just like prog_assert(...) (it checks the expression and issues an error when it evaluates to false). After deployment, it merely evaluates the expression. Why do we need both macros? Wouldn't prog_verify(...) suffice? Well, evaluating the expression uses up processor time, but if it doesn't have any side-effects there's no point in doing it. The safest bet would be to always use prog_verify(...), but for cases where we know that executing the expression doesn't change the state of the testbench we can gain more performance in production mode by using prog_assert(...).

If you want to use these macros in your own code, I intend to maintain them on GitHub. Feel free to follow the micro-project, download it and suggest improvements. I'm still considering adding another macro called prog_ensure(...) that always checks its expression argument, regardless of whether NDEBUG is defined.

What I don't like at all about SystemVerilog is that there isn't any concept of a standard library. This is exactly the kind of thing that should be contained in such a library, that should come packaged with the simulator. The closest thing to this is UVM, but I'm not particularly thrilled by the design decisions taken there (i.e. building big monoliths that will eventually topple and crush us all!) and I don't want to suggest adding any new features. You might not want to create extra dependencies when developing UVCs by having to also specify the path to "prog_assert.svh". A pragmatic solutions would be to just copy the code for the macros inside the UVC (there isn't much code to copy anyway) and just change the prefix from prog to <uvc_name>.

We have all of these nice features to find bugs inside the DUT and point us in the direction of where to look to fix them. It's a shame to not pay our own code the same amount of attention. Software programmers have been using assertions for quite some time now to check the validity of their own code or that of their clients. If you want to write more robust verification software, whether it's UVCs or testbenches, give prog_assert a try.

5 comments:

  1. First, let me say -- YES! Everybody should we writing more assertions inside their TB code. They are the best way to immediately catch any unexpected behaviour. And, yes, we shouldn't be using the bare 'assert' statement, and should go with a macro.

    Now, for me, the main reason for using the macro is to be able to control the error message and reporting, not so much for the case where assertions are turned off and we don't run the code inside it. There's a big difference between a testbench and a game using Unreal engine :) -- in case of a game, it is imperative to squeeze every little bit of performance out of the code, and it's never acceptable for the game to crash with an internal code assertion. Therefore, production code will always turn off assertions. In testbenches, however, there is really no good reason to ever run without TB assertions on (IMO) -- the relatively small increase in performance is not good enough to offset the possibility of (1) missing bugs (both TB and RTL), and (2) making it more difficult to categorize and understand fails.

    As for improvements to the assert macros, here are a few things that my assert macro has:
    - An optional second 'msg' parameter that describes the failure. This is much more user-friendly than "Assertion 'x >= 0' has failed", which usually means little without context. It also encourages self-documenting code: `assert(x >= 0, "sqrt() can't be called with negative numbers")
    - Speaking of context, I add `__FILE__ and `__LINE__ to the fail message, so that each assertion fail is easy to find, and to give additional uniqueness to each assert fail when categorizing fail signatures
    - Before the fatal error, I throw in $stacktrace; The majority of assertions check function input parameters, so when one of them fails, all you know is that *someone* called the function with illegal parameters. $stacktrace shows you exactly where the call came from.
    - To play nice with UVM testbenches, an `ifdef can select between $fatal and `uvm_fatal.

    ReplyDelete
    Replies
    1. At work I also have a version of the macro where I can specify an extra message. You make a great point about self documentation when using it! It's something I'll add soon.

      Specifying `__FILE__ and `__LINE__ in the message is unnecessary, since calls to $fatal are required by the standard to print both these things and the scope where they originate.

      I had no idea $stacktrace() existed. That's also a great idea.

      I also thought about this topic, but I'm not really sure how to proceed. On the one hand, I could make a macro that only handles printing the error message. Users could override it to call something else entirely (`uvm_fatal or whatever). This feels a bit like overthinking it (and I have a very strong tendency to do this). On the other hand, UVM is kind of ubiquitous when using SV, so acknowledging its existence by creating such an `ifdef seems reasonable.

      Delete
    2. `uvm_fatal message is printed using the UVM report server format, which usually truncates file paths to something that fits in N characters. So, we can't rely on that for full filename/number of the assertion location in UVM testbenches. That, and in order to make asserts without the optional message more user-friendly, are the main reasons why I still put the `__FILE/LINE__ into the error message.

      Delete
  2. another good post !

    one thing that I think should be addressed is the coverage of the assertions (and "check")

    I found that covering the checkers "triggering" is a wander full way to find about missing checkers and branches that were not hit by your simulations (due to missing stimulus, or, more often, disabled checkers and bad "corner case filtering" in the checkers )

    ReplyDelete
    Replies
    1. This is what I mean when I say that 'assert' is tightly integrated with tools. I like to tag checks (in procedural code) with:

      SOME_CHECK_NAME : assert (some_condition)
      else
      `uvm_error(...)

      This way I can back-annotate them to the verification plan. Using 'assert' for sanity checks would just create noise.

      Delete