Saturday, March 28, 2015

Do You Want Sprinkles with That? - Mixing in Constraints

The goal of modern verification techniques is to do as much as possible with as little code as possible. This is best done with a "write once, tweak everywhere" approach to test development. This type of flexibility comes for free in AOP; that's why it's built into e's DNA. For OOP, however, it requires thought and planning, and is achieved by using design patterns. One reason why the UVM exists is to encapsulate some of these patterns for us (especially the factory). Even so, this doesn't mean that some design pattern knowledge won't help us do fancy stuff in our code.

In this post I want to talk about how to layer constraints across the sequence item class hierarchy. My points would be best understood by looking at a concrete example. Let's say that our DUT has an AHB bus. I've chosen AHB because it's very widespread and most of you will have already worked with it. We'll keep things simple and only consider a reduced sequence item:

class vgm_ahb_item extends uvm_sequence_item;
  rand bit[31:0] addr;
  rand direction_e direction;
  rand burst_e burst;
  rand size_e size;
  rand mode_e mode;
  rand privilege_e privilege;

  rand int unsigned delay;


  constraint delay_init_val {
    delay inside { [0 : 10] };
  }

  constraint no_instr_write {
    mode == INSTR -> direction == READ;
  }

  constraint aligned_address {
    size == HALFWORD -> addr[0:0] == 0;
    size == WORD -> addr[1:0] == 0;
  }

  // ...
endclass

Address and direction are pretty self explanatory. Burst tells us how many bus cycles will be performed. Size represents the number of bytes transferred in each bus cycle. Mode tells us whether we are moving data or instructions. Finally, there is also privilege that shows us from what part of the code the access originated. The item already contains some structural constrains given by the protocol.

Let's say we've written two tests for our DUT. The first test does write/read-back pairs at random locations to make sure that the entire address space is accessible. It does this by starting the following sequence:

class write_read_sequence extends uvm_sequence #(vgm_ahb_item);
  virtual task body();
    req = vgm_ahb_item::type_id::create("req");

    for (int i = 0; i < 20; i++) begin
      start_item(req);
      if (!req.randomize() with { direction == WRITE; })
        `uvm_error("RANDERR", "Randomization error")
      finish_item(req);

      start_item(req);
      req.direction = READ;
      if (!req.randomize(delay))
        `uvm_error("RANDERR", "Randomization error")
      finish_item(req);
    end
  endtask
endclass

The second test does purely random accesses inside the address space by starting another sequence:

class random_access_sequence extends uvm_sequence #(vgm_ahb_item);
  virtual task body();
    req = vgm_ahb_item::type_id::create("req");

    for (int i = 0; i < 30; i++) begin
      start_item(req);
      if (!req.randomize())
        `uvm_error("RANDERR", "Randomization error")
      finish_item(req);
    end
  endtask
endclass

After running these tests for a while with different seeds we stumble onto a bug. It seems our device has problems when doing privileged data accesses. Addresses within 0x0 and 0x20 cause trouble when being accessed by single word bursts. We want to put more emphasis on these transfers to make sure that we're really stressing this part of the DUTs functionality. This is where "write once, tweak everywhere" comes along. We can just run the same tests as before, but add a new constraint to make the problematic bursts more likely.

This is best done by creating a new test that starts the same sequence, but sets a type override on the sequence item. This new sequence item would be defined in the same file as the test and would contain the extra constraint:

class write_read_corner_case_ahb_item extends vgm_ahb_item;
  constraint corner_case {
    mode dist { DATA := 3, INSTR := 1 };
    privilege dist { PRIVILEGED := 3, USER := 1 };
    (mode == DATA && privilege == PRIVILEGED) ->
      (addr inside { [32'h0:32'h20] } && size == WORD && burst == SINGLE);
  }
endclass

The new write_read test would just extend the previous one that already starts the sequence and just set a type override:

class test_write_read_corner_case extends test_write_read;
  function void end_of_elaboration_phase(uvm_phase phase);
    uvm_factory factory = uvm_factory::get();
    factory.set_type_override_by_type(vgm_ahb_item::get_type(),
      write_read_corner_case_ahb_item::get_type());
  endfunction
endclass

We'd want to do the same thing for the random_access test. If we define a similar sequence item in another test file it immediately becomes clear that we've doubled up information. The same constraint would exist in two files. At this point we could do a tradeoff between encapsulation and maintainability. We can declare the corner_case item outside of the tests, in some common location. This will make it fall under shared ownership (as all test writers would see it), with all the challenges that brings. At least we wouldn't need to maintain the same constraint in two (or potentially more) files.

With that settled, we run our regression longer, but we find another bug. This one has to do with reading words with 0 delay. As before, we want to guide our randomization efforts more on this one too. Adding a constraint to both of the tests is the same case that we looked at above. We can handle it in the same way. What we do notice, however, is that this bug, like the previous one, affects WORD transfers. It makes sense to try and combine this constraint with the one from above and make sure that we don't have any other bugs at the intersection of these two cases.

Before we proceed, let's summarize. We've currently defined a corner_case item and a fast_reads item that we can use to tweak the initial tests with:

class corner_case_ahb_item extends vgm_ahb_item;
  constraint corner_case {
    mode dist { DATA := 3, INSTR := 1 };
    privilege dist { PRIVILEGED := 3, USER := 1 };
    (mode == DATA && privilege == PRIVILEGED) ->
      (addr inside { [32'h0:32'h20] } && size == WORD && burst == SINGLE);
  }
endclass

class fast_reads_ahb_item extends vgm_ahb_item;
  constraint fast_reads {
    size dist { WORD := 3, BYTE := 1, HALFWORD := 1 };
    (direction == READ && size == WORD) -> delay == 0;
  }
endclass

Now we need an item that contains both constraints. This kind of gets us stumped. Which of these items should we extend from? Is our new item a corner_case item with an extra constraint? If so, then we should extend from corner_case_ahb_item:

class corner_case_fast_reads_ahb_item extends corner_case_ahb_item;
  constraint fast_reads {
    size dist { WORD := 3, BYTE := 1, HALFWORD := 1 };
    (direction == READ && size == WORD) -> delay == 0;
  }
endclass

Or is it a fast_reads item with an extra constraint? In that case we should extend from fast_reads_ahb_item:

class corner_case_fast_reads_ahb_item extends fast_reads_ahb_item;
  constraint corner_case {
    mode dist { DATA := 3, INSTR := 1 };
    privilege dist { PRIVILEGED := 3, USER := 1 };
    (mode == DATA && privilege == PRIVILEGED) ->
      (addr inside { [32'h0:32'h20] } && size == WORD && burst == SINGLE);
  }
endclass

No matter what we do, however, we're doubling up some code. The problem only gets worse if we want to add a third constraint and so on.

Our conceptual failure was that this new item is neither a corner_case_ahb_item nor a fast_reads_ahb_item with a little bit on top. It's actually both. We need to do multiple inheritance, but SystemVerilog only supports single inheritance. Bummer, huh?

Actually, no. We already talked about how to fake multiple inheritance using the mixin pattern in a previous post. Let's apply it here. Instead of having a corner_case item or a fast_reads item, let's have a mixin for each constraint:

class corner_case_mixin #(type T) extends T;
  constraint corner_case {
    mode dist { DATA := 3, INSTR := 1 };
    privilege dist { PRIVILEGED := 3, USER := 1 };
    (mode == DATA && privilege == PRIVILEGED) ->
      (addr inside { [32'h0:32'h20] } && size == WORD && burst == SINGLE);
  }
endclass

class fast_reads_mixin #(type T) extends T;
  constraint fast_reads {
    size dist { WORD := 3, BYTE := 1, HALFWORD := 1 };
    (direction == READ && size == WORD) -> delay == 0;
  }
endclass

Actually, we can still have those old items, but we should implement them using the mixins:

class corner_case_ahb_item extends corner_case_mixin #(vgm_ahb_item);
endclass

class fast_reads_ahb_item extends fast_reads_mixin #(vgm_ahb_item);
endclass

We can implement the new item with both constraints by applying the other mixin on top of a previously mixed in item:

class corner_case_fast_reads_ahb_item extends
  fast_reads_mixin #(corner_case_ahb_item);
endclass

It doesn't really matter what order we do it in. We'll get the same great flavor either way:

class fast_reads_corner_case_ahb_item extends
  corner_case_ahb_mixin #(fast_reads_ahb_item);
endclass

We can even apply the mixins successively starting from the base ahb_item:

class corner_case_fast_reads_ahb_item extends
  fast_reads_mixin #(corner_case_mixin #(vgm_ahb_item));
endclass

You get the idea. We can add as many as we want in whatever order we want.

As a bonus, we don't even need to have shared items anymore. We can only share mixins inside some central location in our package. We can shift the responsibility of defining items for the overrides back to the tests:

class test_write_read_corner_case_fast_reads extends test_write_read;

  // nested class
  class ovr_seq_item extends fast_reads_mixin #(corner_case_mixin #(
    vgm_ahb_item));
  endclass


  function void end_of_elaboration_phase(uvm_phase phase);
    uvm_factory factory = uvm_factory::get();
    factory.set_type_override_by_type(vgm_ahb_item::get_type(),
      ovr_seq_item::get_type());
  endfunction
endclass

By defining the override item inside the test as a nested class we make it clear that it's not supposed to be used anywhere else. We also make it impossible to accidentally reference items defined in other test files, because these items aren't declared in the package scope anymore. We just have to be careful not to use "by name" overrides, since that might get us into trouble (as items might share the same name).

What we've done here is traded up the value chain. We gained maintainability by doing away with doubled up constraints. Our approach also allows us to shift on the encapsulation scale (global override items vs. test encapsulated override items). We didn't create this from nothing, though. We added intelligence into our code by using the mixin patter.

I've taken some liberties with the code I posted by removing calls to UVM macros and constructor definitions, to keep it short and focus on the important topics. You can find the complete code on SourceForge. I've also added a third "bug" to investigate - "slow writes". Have a look at the commit history to see how the code base shrinks when using mixins, compared to a classical approach.

Sunday, March 15, 2015

Patching a Leaky Boat - Handling UVM Bugs

This week I stumbled on an issue with the UVM base class library (BCL). I was using the register layer to access some memories and some things just didn't add up. I've posted a description in the forums, so let's see what the higher ups say.

I need that functionality now, though. I can't wait for UVM 1.1e (which will never come out) or UVM 1.2a. I also wouldn't want to switch to UVM 1.2 yet, even if the issue were fixed there. This got me thinking what the best way to handle such a situation is.

A bit of background on the UVM standard: just the API document is standardized, not the BCL. The Accellera UVM library is only a proof of concept. A major plus for the EDA vendors in increased UVM adoption is that they can develop debug extensions for it, which should make us more productive. Such a feature needs infrastructure, though. I've seen two implementation models up to now. In the first case, the BCL is left unchanged and vendor extensions are added on top, via a separate package. In the second case, the BCL itself is modified to include vendor extensions. Both approaches are "legal" according to the UVM philosophy, because as long as the API stays the same and stuff behaves as described in the standard, they get the UVM seal of approval.

In the first case, fixing a BCL bug is easy. We can just take the BCL and patch it and we won't get any problems with the vendor extensions. In the second case, things aren't as straightforward. Here, the UVM package comes bundled with the simulator and is installed in some read-only location. Also, because each simulator version comes with a (potentially) different version of UVM, editing the code directly isn't feasible.

In my current project I'm using a vendor of the second persuasion. A key requirement I have is that I want to be able to easily switch between simulator versions. This means I need a non-intrusive fix. This is only possible if I can replace instances of a specific class with my own extended class. This is where the UVM factory comes in, but to be able to use it, the offending objects have to be created using the factory.

In our case, we want to replace all instances of uvm_reg_map with an extended class we'll call vgm_reg_map. Luckily, uvm_reg_map is registered with the factory and is instantiated using create(...). We'll do our fixes in a separate package, vgm_uvm_patches. We'll need to import this package and set a type override inside our verification environment:

import vgm_uvm_fixes::vgm_reg_map;

class some_tb_env extends uvm_env;
  function void build_phase(uvm_phase phase);
    patch();

    // Build env
    // ...
  endfunction


  function void patch();
    uvm_factory factory = uvm_factory::get();
    factory.set_type_override_by_type(uvm_reg_map::get_type(),
      vgm_reg_map::get_type());
  endfunction
endclass

How do we go about fixing uvm_reg_map? We'll need to create an extended class and register it with the factory:

class vgm_reg_map extends uvm_reg_map;
  `uvm_object_utils(vgm_reg_map)

  function new(string name="vgm_reg_map");
    super.new(name);
  endfunction
endclass

Because we care about the quality of our work, we're going to create unit tests that expose the issue. Ideally we'd also create unit tests for the existing behavior, to make sure that we don't break anything else. Since this is going to be a small fix, we won't do it because it's not really worth it. Here's a test that fails due to this bug:

`SVTEST(get_physical_addresses__max_offset__returns_end_addr)
  uvm_reg_addr_t addrs[];
  map.get_physical_addresses(32'h0, 32'hffff, 4, addrs);

  `FAIL_IF(addrs.size() != 1)
  `FAIL_IF(addrs[0] != 32'hffff)
`SVTEST_END

When trying to get the physical address of offset 0xffff, we get 0x3_fffc, causing the test to fail. The only thing we can do in this case is to copy the code for the offending function from uvm_reg_map and paste it into our extension. When trying to compile, we'll get some errors that the method tries to use local fields. The first one occurs at the following line:

int multiplier = m_byte_addressing ? bus_width : 1;

Since m_byte_addressing is declared as local, we can't use it in the extended class. The only way to get it's value is by using the get_addr_unit_bytes(...) function:

function int unsigned uvm_reg_map::get_addr_unit_bytes();
  return (m_byte_addressing) ? 1 : m_n_bytes;
endfunction

What this function returns is suspiciously similar to what the original developer tried to assign to multiplier. Assigning it the return value of the function and fixing the other references to local fields will make our unit test pass.

This method of fixing the issue seems kind of clunky. I'm not very comfortable copy/pasting so much code, but we had to do this because the offending method was so big and poorly encapsulated. It could have been split into sub-methods, which would have made it easier to test and change. We'll talk more about this in a future post after I finish reading Refactoring: Improving the Design of Existing Code.

The fix we made is not-intrusive to the UVM package, but it's intrusive to our own testbench code. We need to compile the package in our run script, but we also have to add the necessary factory override to our environment class. If there were to be an official fix in the future, we'd need to go back and remove them from our code.

I can't resist making a comment as to how aspect oriented programming would have been so much better here. In e we'd just create a file, implement our patch there and import it after importing UVM. All instances of the affected class would get the fix. Since there wouldn't be any changes in our code, such a fix would be truly non-intrusive.

You can find the code for the package on SourceForge. This can be a good starting point for developing a similar package for your company/department/team. If there's interest from the public, I can create an own repository for this package and add to it. Let me know in the comment section.

Sunday, March 8, 2015

Less Is More - Why I Favor Short Tests

We're not going to be looking at any code in this post. We are, however, going to examine the impact the length of the tests we write has on various aspects of the verification process. The "long vs. short tests" debate is something I often have with colleagues and every time I have to re-iterate the same points. Usually I can't touch on all of them because the discussion is cut short or because I just forget to bring some of them up. This post is my attempt at formalizing my point of view, for myself first of all, but also for others.

When we talk about simulation duration we have two things to consider:

  1. simulation time - how much time has passed within the simulation; it's usually measured at the nano-/micro-/millisecond scale
  2. processor time - how long it takes for the simulation to execute on the machine; values are typically measured in seconds, minutes or hours

The two are connected by the complexity of our DUT. For a bigger design it will take more processor time to simulate 50 microseconds. A short test is a test that can be simulated in a short amount of time. Assuming we can't do anything to influence our simulation speed (which is usually a function of the simulation tool and the machines we simulate on), to get shorter tests they need to simulate less.

 

Short tests are easier to develop

By testing less in one simulation run we don't have to write complex stimuli, which means that we'll finish building our test faster. We'll need more tests though, but this is what we have modern verification techniques for. Constrained random verification's most touted feature might be that it makes it easier to hit states in the design we wouldn't have dreamed of trying to hit, but to me its main use is as a test automation tool.

Even though I promised we won't look at code, let's just take a little sneak peak. Here's how a short test would look like. It just checks that the DUT can perform a certain operation of a specific kind and that's it:

<'
extend MAIN my_virtual_sequence {
  body() @driver.clock is only {
    do init_seq;
    do operation;
  };
};
'>

A longer test would do more than just test that a certain operation works. This test checks that all operations of that kind work and also tries to perform some other tasks afterwards:

<'
extend MAIN my_virtual_sequence {
  body() @driver.clock is only {
    do init_seq;
    
    for each (op) in [ OP1, OP2, OP3, OP4 ] {
      do operation keeping { .op == op };
      
      if oo == OP2 {
        do something_else;
        do even_more;
      };
      
      if op == OP4 {
        wait delay (100 ns);  // TODO ask designer what the appropriate delay is
        do some_other_operation;
      };
      
      // ...
    };
  };
};
'>

The example is pretty basic, but the second test would take longer to develop. There are more things to consider now: what values to loop over, what can we perform after a certain operation, what is an appropriate delay, etc. If there's a bug in the design when doing an OP2 operation, for example, it's going to make it all the more difficult for us. We'll get information overload when trying to do too much at once. Humans are constructed to break problems down: we split tasks into sub-task, which we then split into sub-tasks and so forth. Why not write our tests in such a way to mirror this?

More tests doesn't have to mean more code. Our first test has the possibility to check that any operation works and it does this using randomization. We will get a lot of redundant test runs, but what's more expensive, engineering time or compute time? I'd rather optimize the former first and then, only if required, the latter.

 

Short tests are easier to parallelize

Test length also has an impact on the duration of the regression suite. You're probably thinking "Duh! If I have more to simulate my regression will be longer." That's not what I mean. What I mean is that long tests impact our ability to efficiently run tests in parallel. Let's start with the obvious: if we're able to run 100 jobs in parallel, but we only have 50 tests we're making poor use of our compute resources. Assuming our tests are all the same length, our regression will run 50% slower than it potentially could.

half_wasted

The same thing happens if we have tests that are much longer than the others. At some point, it's only going to be these tests that are running, while the others are already done. We're now in the same situation as before, where we use less jobs that we potentially could.

long_test1

I'll give you an example of how a long test ruined it for me while I was running the final regressions on my last project. I had a problem with the compute farm and had to restart the regression. Almost all of my tests were done in less than 8 hours, but someone wrote one that was taking about 12 hours. When I started the regression again this test probably got submitted to a slower machine, because it took 20 hours or more to complete. This caused me to miss a whole day. Now, what was that test doing? It was checking that all sectors in our non-volatile memory were writable. Immediately from the description you can ask "why not write a test per sector then?". This would have cut down the simulation time by a factor of the number of sectors and it would have sped up the regression significantly.

 

Short tests are easier to debug

Regressions are there to weed out failures in the DUT. Should a fail occur, short, focused tests are easier to debug. Imagine running a test that takes one hour and we find an issue at the 50 minute mark. Before we can even begin to analyze it, we'll need to first run up to that point in time with all debug knobs on maximum, which is going to make the simulation take even longer. After reaching that point we have to analyze it, possibly together with the designer, and make a change in either the DUT or in the testbench if it turns out the issue wasn't a bug. Regardless of what we need to change, there's a pretty good chance that the patch won't work on the first try, which means we'll need to repeat the cycle all over again. We can lie to ourselves all we want and say that we'll handle something else while this test is re-running, but modern science tells us that humans aren't as good at multitasking as we think, so we'll be wasting effort by not being fully focused on the task at hand. It's either that or we're going to start surfing the Internet.

Even if we do decide to go down this route, good luck running your simulation for a long time with full waves. We're going to say hello to our friend, the simulator crash. Then again, we can run up to a certain point without full debug, turn it on and run to the point of the failure. At least this way it won't crash, but what will we do if we have to trace the problem backwards and we run out of waves? We'll have to adjust the time when we turn on debugging over and over again until we reach an appropriate tradeoff. That sounds like wasted time to me...

If we're clever, we're going to try isolate the issue in a short test and do our debugging on that. But, if a short test could have found this issue, why didn't we write it like this in the first place?

 

Short tests are easier to maintain

I deliberately used the word "short" so I can misuses it in this section. We can also refer to length in terms of lines of code. Granted, there are multiple factors that affect maintainability, but less code is easier to maintain because there are less opportunities to go wrong. What's more likely to be buggy, a "Hello world" program or a big testbench containing hundreds of classes and methods?

 

Long tests aren't inherently evil, but...

This doesn't mean that long tests aren't useful. Some issues might still lurk in the deep dark crevices of the design and it might only be possible to hit these by doing a very complicated sequence of operations. Other issues can also only be found when the planets are in some special alignment. What I'm advocating is to write tests that are as long as necessary and not longer. Other times it might be very useful to have a test that simply stresses the design over a long period, but is this really the kind of test that you need to run in a nightly regression? Most probably not, since it's highly unlikely that it's find anything new.

I for one will stick to my short tests that are easy to develop, debug, parallelize and maintain. What about you? Don't hesitate to share your thoughts on the topic in the comments section.