Wednesday, December 9, 2015

Fun and Games with CRV: Einstein's Puzzle (Revisited)

Two weeks ago, Aurelian from AMIQ published a post on how to solve the so-called Einstein's puzzle using e. At the end, he challenged us readers to try and improve on his solution. Not being one to shy away, I rolled up my sleeves and got to work.

He started out by defining a struct to hold the information about a resident:

<'
struct resident {
  nationality : nationality_t;
  house_color : house_color_t;
  cigarette : cigarette_t;
  pet : pet_t;
  drink : drink_t;
};
'>

He gave up on this idea after he tried to constrain the residents to have unique nationalities, house colors, etc. Here's what he tried:

<'
struct neighborhood {
  keep residents.nationality.is_a_permutation(all_values(nationality_t));
  // ...
};
'>

This looks reasonable, right? Why didn't it work then? That's because even though residents.nationality returns a list of all residents' nationalities, it isn't generative. I didn't find anything normative in the Language Reference or the Generation User Guide that explicitly states this, but this seems to be the case. The proper way of ensuring all nationalities are unique is to use the all_different(...) pseudo-method:

<'
struct neighborhood {
  keep residents.all_different(it.nationality);
  // ...
};
'>

Now that we've got our infrastructure set up, it's time to start converting the 15 facts into code. There are three types of facts given to us. The first type only involves individual residents. For example, we know that the one living in the red house is English. This can be represented as a constraint inside the resident class itself:

<'
extend resident {
  keep nationality == ENGLISH => house_color == RED; // #1
};
'>

There are seven more such constraints, but I won't show them here, since they are very similar to this one. You can find the complete code on SourceForge.

The second type of fact we know about our residents describes a characteristic of the resident living in a particular house. For example, we know that the resident in the first house is Norwegian. This kind of constraint needs to be added to the neighborhood:

<'
extend neighborhood {
  keep residents[0].nationality == NORWEGIAN; // #9
};
'>

The third type of fact gives hints about residents in relation to their neighbors. For example, we know that the green house is located to the left of the white house. Here we need to loop over all elements of the list. Per definition, the green house can't be the last one in the list, because then it wouldn't be located to the left of anything:

<'
extend neighborhood {
  // #4
  keep for each (resident) in residents {
    resident.house_color == GREEN =>
      index < 4 and residents[index+1].house_color == WHITE;
  };
};
'>

The fact that the Blend smoker lives next to the cat owner is a bit more involved, but it follows the same principle. This means that the Blend smoker is either located to the left (which is the same situation as before) or is located to the right, in which case the respective house can't be the first one:

<'
extend neighborhood {
  // #10
  keep for each (resident) in residents {
    resident.cigarette == BLEND =>
      index < 4 and residents[index+1].pet == CAT or
        index > 0 and residents[index-1].pet == CAT;
  };
};
'>

Since we have three more such constraints to write, we could save ourselves some typing by defining a macro:

<'
define <neighbors'statement> "<first'exp> neighbors <second'exp>" as {
  extend neighborhood {
    keep for each (resident) in residents {
      resident.<first'exp> =>
        index < 4 and residents[index+1].<second'exp> or
          index > 0 and residents[index-1].<second'exp>;
    };
  };
};
'>

The macro "call" would look like this:

<'
cigarette == BLEND neighbors pet == CAT; // #10
'>

With these three types of constraints we can model all fifteen facts and solve the puzzle. I was pretty surprised that it worked the first time and gave the right solution - the German keeps the fish. When I solved the zebra puzzle in SystemVerilog (which is essentially the same puzzle as this one, just with slightly different facts about the residents) I ran into problems because I used the implication operator. Basically, saying that the English resident lives in the red house doesn't just mean that nationality == ENGLISH => house_color == RED, but that at the same time house_color == RED => nationality == ENGLISH. There isn't any equivalence operator (also called double implication) in e, but this can be expressed using the equality operator, "==":

<'
extend resident {
  keep (nationality == ENGLISH) == (house_color == RED); // #1
};
'>

I've shown you my solution. Now it's time to pass the baton to you and challenge you to improve it even more.

Sunday, December 6, 2015

Packages, Class Names and UVM

Some time ago I wrote a post that challenged some of the established coding conventions of modern SystemVerilog. In particular, I expressed my displeasure with the fact that all training material from EDA companies, tutorial sites and other learning resources state that packages should always contain a "_pkg" suffix appended to the package name and that all identifiers in the package (class/function/constant names) should contain the package name as a prefix. I attribute this to the significant C legacy that exists in our field, as the C language doesn't have any construct for packaging code.

I've started to drop the package name prefix from any new code I'm writing, both for the blog (as you might have noticed), but also at work. By seeing how this works out in "real life", I've noticed some pitfalls. The first is, of course, that people will come and complain that this doesn't satisfy the commandments given to us by the lords of SystemVerilog. I've yet to hear any compelling argument against dropping package names from classes. Moreover, the only arguments I've ever heard were "this isn't how everybody else is doing it" and my favorite "we've always done it this way". Until someone can come up with something better, I'll continue to believe that the much larger communities of C++, Java and other modern programming languages are onto something.

Now let's look at what happens when applying this idea when also using UVM. Normally, we'd have a package that contains a class definition. Inside this class, we'd use the utils macro to reduce the amount of boilerplate code needed to make it a productive member of a UVM environment:

package some_package;
  // ...

  class some_class extends uvm_object;
    // ...

    `uvm_object_utils(some_class)
  endclass

endpackage

If we'd try to print an object of this class, we'd get something like this:

---------------------------------
Name      Type        Size  Value
---------------------------------
some_obj  some_class  -     @338
---------------------------------

The type column would rightly show some_class, but that isn't very informative, as some colleague pointed out. Having the package name as a prefix made it instantly possible to identify the scope where the class is defined. This is particularly helpful when classes from different packages use the same name.

And speaking of using the same name for multiple classes... Let's say that we also have another package that defines a some_class type:

package some_other_package;
  // ...

  class some_class extends uvm_object;
    // ...

    `uvm_object_utils(some_class)
  endclass

endpackage

Because the classes have the same name, when they get registered with the factory, we'll get the following warning:

UVM_WARNING @ 0: reporter [TPRGED] Type name 'some_class' already registered with factory. No 
string-based lookup support for multiple types with the same type name.

Aside from disabling the set_*_override_by_name(...) functions (which I anyway wouldn't recommend using), it doesn't do anything else. Everything else still works just fine. Nevertheless, extra warning message aren't nice, because they clutter the log file. For one or two classes it might be ok, but try working with multiple UVC packages that each define a driver, monitor, agent, etc. class... I tried to come up with a way to disable the warning, but I wasn't successful.

I've thought about these problems on multiple occasions, went down a few dead ends and dreamt up some silly solutions. I kept thinking that the problem was with UVM, that the macros were to restrictive because they don't consider the class's parent package. Then I realized that the name that gets displayed by the print(...) function and that gets registered with the factory is merely the one supplied as the macro argument. Instead of using just the class name, we can just as well use its fully qualified name, that includes the package name and the scope operator, "::". This means we can change our code to this:

  class some_class extends uvm_object;
    // ...

    `uvm_object_utils(some_package::some_class)
  endclass

Now we won't get any more warning from the UVM factory and the text displayed by print(...) will make it clear which class we're dealing with:

-----------------------------------------------
Name      Type                      Size  Value
-----------------------------------------------
some_obj  some_package::some_class  -     @338
-----------------------------------------------

With this small tweak, it's possible to drop the package prefix from classes while still getting nice prints in UVM and avoiding any warnings from the factory. Now we have two reasons less against shortening our class names.

Monday, November 23, 2015

Accessing Multiple Registers at Once

As I already mentioned, I gave a talk at DVCon Europe this year on how to implement burst accesses to memories modeled using UVM_REG. The motivation for that was that the register package can already handle bus protocols that don't support burst operation, but it requires more user guidance for protocols that do support it. A question that came up afterwards on the conference floor was what the best way to handle burst accesses to multiple registers might be. I tried to sketch out an answer on a piece of paper, but it was rather late in the day and I couldn't really gather my thoughts. I also have a difficult time expressing myself verbally when talking about abstract concepts. Talking is much more difficult than writing, because you don't get the chance to go back and iterate over certain aspects of the topic. Talking about coding problems is also particularly difficult to do when not in front of a computer. People who've worked with me know that I always like to have an editor window open and sketch out some pseudo-code when discussing something in more detail.

The handling of register bursts is a question that comes up from time to time, on places like the Accellera forum or StackOverflow. Since the person who asked me the question is also a reader of the blog, I thought it would be worth making a post out of it.

A solution would need to take two factors into account. It would need to pragmatic, i.e. do the job with the least amount of code necessary. If you would have asked me this question a little while back I would have stopped here. In the mean time, I've been toying with the idea of using the register abstraction layer as a means of achieving reuse (both lateral and vertical) of sequences. Most probably you'll be seeing more posts on the topic. The second factor I would thus consider important is portability, i.e. being able to take sequences from one project and use them in another.

As an example, let's take a simple design that has four registers, located at consecutive addresses:

class some_reg extends uvm_reg;
  rand uvm_reg_field FIELD0;
  rand uvm_reg_field FIELD1;
  rand uvm_reg_field FIELD2;
  rand uvm_reg_field FIELD3;

  // ...
endclass


class some_reg_block extends uvm_reg_block;
  rand some_reg SOME_REGS[4];

  virtual function void build();
    // ...
    foreach (SOME_REGS[i]) begin
      SOME_REGS[i].build();
      SOME_REGS[i].configure(this);
      default_map.add_reg(SOME_REGS[i], 'h4 * i);
    end
  endfunction

  // ...
endclass

When updating a register, we would use the built-in methods of uvm_reg to set the desired value and trigger a write:

class write_some_reg0 extends sequence_base;
  // ...

  virtual task body();
    uvm_status_e status;
    model.SOME_REGS[0].FIELD0.set('hff);
    model.SOME_REGS[0].update(status);
  endtask
endclass

Let's assume that the DUT has an AHB interface that supports burst accesses. This means that it's possible to access all four registers using a single AHB transaction. Converting from register accesses to bus items is usually done using a register adapter:

class reg_adapter extends uvm_reg_adapter;
  virtual function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
    burst b = burst::type_id::create("burst");
    if (!b.randomize() with {
      address == rw.addr;
      kind == SINGLE;
      direction == rw.kind == UVM_READ ? READ : WRITE;
      data[0] == rw.data;
    })
      `uvm_fatal("RANDERR", "Randomization error")
    return b;
  endfunction

  // ...
endclass

I'll assume everybody is familiar with how an adapter works. If not, the UVM User Guide is a good resource to get you up to speed on how the register model is integrated. This adapter can only handle accessing one register at a time. We need some way of telling it that we actually want to access more registers.

As seen in the links above, people will recommend using the optional extension argument of the read(...) and write(...) tasks to instruct the adapter that the access it's converting is actually a burst to more registers. The use model would be to have a class containing information about whether a register access is a burst:

class reg_burst_extension extends uvm_object;
  rand int unsigned num_regs;

  constraint valid_num_regs {
    num_regs inside { 1, 4 };
  }

  // ...
endclass

If num_regs is 1, then the access is a normal one, otherwise it's a burst. It's also a good idea to make the field of the extension random to allow for more generic sequences. When wanting to write all four registers at a time, we could set the values that we want our registers to take, construct an object of this class, set num_regs to 4 and pass it to the update(...) task:

class write_some_regs extends sequence_base;
  // ...

  virtual task body();
    uvm_status_e status;
    reg_burst_extension ext = reg_burst_extension::type_id::create("ext");
    ext.num_regs = 4;

    model.SOME_REGS[0].FIELD0.set('hff);
    model.SOME_REGS[1].FIELD1.set('hff);
    model.SOME_REGS[2].FIELD2.set('hff);
    model.SOME_REGS[3].FIELD3.set('hff);
    model.SOME_REGS[0].update(status, .extension(ext));
  endtask
endclass

The vanilla register adapter doesn't know anything about the extension we passed. We'll need a sub-class that can interpret the extra information and use it to generate a burst. If we don't pass an extension or pass an unsuitable extension, then we can just generate a SINGLE AHB transaction as before:

class ahb_reg_adapter extends vgm_ahb::reg_adapter;
  function new(string name = "ahb_reg_adapter");
    super.new(name);
  endfunction

  virtual function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
    uvm_reg_item item = get_item();
    reg_burst_extension ext;

    if (item.extension == null || !$cast(ext, item.extension) || ext.num_regs == 1)
      return super.reg2bus(rw);

    // ...
  endfunction
endclass

If we do want to do a burst access, then we need to collect the information from all registers and store it inside the AHB transaction that we want to start:

class ahb_reg_adapter extends vgm_ahb::reg_adapter;
  // ...

  virtual function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
    vgm_ahb::burst b = vgm_ahb::burst::type_id::create("burst");
    uvm_reg_item item = get_item();
    reg_burst_extension ext;
    uvm_reg regs[];
    uvm_reg_addr_t offset;
    uvm_reg_data_t data[];

    // ...

    offset = regs[0].get_offset(item.map);
    data = new[ext.num_regs];
    for (int i = 1; i < ext.num_regs; i++)
      regs[i] = item.map.get_reg_by_offset(offset + i*4);

    foreach (regs[i])
      data[i] = regs[i].get();

    if (!b.randomize() with {
      address == rw.addr;
      kind == vgm_ahb::INCR4;
      direction == rw.kind == UVM_READ ? vgm_ahb::READ : vgm_ahb::WRITE;
      foreach (data[i])
        data[i] == local::data[i];
    })
      `uvm_fatal("RANDERR", "Randomization error")
    return b;
  endfunction
endclass

This is the tried and true way of doing it. It's also pretty easy to implement. The problem with it, though, is that it's rather coupled with the verification environment. Let's assume that we get a second variant of our DUT that is a bit more bare-bones and only has an APB interface. Ideally, we'd want to be able to run the same sequences (or a subset thereof) in this second verification environment. Accessing single registers isn't a problem, as these would be handled by the vanilla APB register adapter (code not shown for brevity). When starting the burst access sequence (the one with the extension), we'd still like to see all four registers getting accessed, albeit via four different APB transfers. This means we'd need to have a register adapter that can start four transactions in one go:

class apb_reg_adapter extends vgm_apb::reg_adapter;
  // ...

  virtual function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
    // ...

    foreach (regs[i]) begin
      if (i == 0)
        continue;
      if (rw.kind == UVM_READ)
        fork
          automatic uvm_reg rg = regs[i];
          rg.read(status, data);
        join_none
      else
        fork
          automatic uvm_reg rg = regs[i];
          rg.write(status, rg.get());
        join_none
    end

    return super.reg2bus(rw);
  endfunction
endclass

The reg2bus(...) method is a function, so it can't block. It can also only return one bus transaction. That would be the one corresponding to the register we called write(...) on. If we'd like to access the other three registers as well, one would optimistically think that the other accesses could be forked out. This could get us in a world of trouble with race conditions, because the order in which the accesses would get processed isn't defined. It also doesn't work as expected, because the update(...) task returns before all accesses are finished. For writes this might not be such a big issue, but for reads this would be fatal, since we wouldn't be able to rely on the values stored in the registers to be up-to-date. I didn't really investigate how to improve on this, since the whole idea seems silly. A register adapter isn't meant for this kind of operation. It can only start one bus transaction based on one register access, not more. This was all fine and dandy when that transaction could be a burst (as for AHB), but it falls apart when we need to translate sequences that try to access all four registers at once. This means we can't reliably run the sequences that use the extension mechanism in the APB verification environment, at least not while having them go through an adapter. They could still be reused if we employed a different means of translating from register accesses, using a register sequencer layered on the APB sequencer that would run a translation sequence (more on this later).

The main takeaway point, though, is that while using the extension is easy to set up for the initial DUT (the one with AHB), it becomes trickier to port it to any subsequent variants of the design that use different bus protocols, particularly so if the protocols don't intrinsically support burst accesses. Even for other protocols that do support burst accesses (e.g. AXI), we'd still need to create a sub-class of the corresponding register adapter that can extract the information contained in the extension.

The problem stems from the fact that we're trying to shoehorn an unsuitable abstraction. Calls to uvm_reg::read(...)/write(...) ultimately end up creating an abstract register access, of type uvm_reg_item. Such a register item (which is a sub-class of uvm_sequence_item) can model anything from a small access that takes one bus cycle, to a very big access that takes multiple bus cycles (also called a burst). We're trying to model an access to four registers as an access to one of the registers that includes some side information to say if it's actually a burst or not.

A better idea might be to not go the way of using an extension. Instead, we could create a register item "by hand", fill it up with the appropriate information and send it out to be processed:

class write_some_regs extends sequence_base;
  // ...

  virtual task body();
    uvm_reg_item item;
    `uvm_create_on(item, model.default_map.get_sequencer());

    model.SOME_REGS[0].FIELD0.set('hff);
    model.SOME_REGS[1].FIELD1.set('hff);
    model.SOME_REGS[2].FIELD2.set('hff);
    model.SOME_REGS[3].FIELD3.set('hff);

    item.kind         = UVM_BURST_WRITE;
    item.offset       = model.SOME_REGS[0].get_offset();
    item.value        = new[4];
    foreach (item.value[i])
      item.value[i] = model.SOME_REGS[i].get();

    `uvm_send(item)
  endtask
endclass

Instead of starting a register item indirectly via a call to uvm_reg::write(...), we create one ourselves. We explicitly state that this is a burst access, by setting the kind field appropriately. The (misleadingly named) value field is actually an array that contains one element per burst transfer. Since we want to write to four registers, we set its size to 4 and its elements to the desired values of the registers.

This is one piece of the puzzle. Now we need to translate this uvm_reg_item to the bus transaction that the DUT needs to see. Trying to send this access through a register adapter might work for the AHB DUT, because the AHB adapter can start a single AHB transaction that is capable of representing the entire register item. Trying to send it through the APB adapter will lead to the same problem that we had before before, namely that we can't start multiple APB transactions based on it.

The UVM User Guide show us how to implement a different translation scheme, more sophisticated than the register adapter. As briefly mentioned above, it involves layering. As described in section 5.9.2.3 of the User Guide (UVM 1.1), we can have a register sequencer that serves as a landing pad for uvm_reg_items. A translation sequence running on the bus sequencer would get items from this register sequencer and could convert them to bus transactions.

For AHB, this is pretty easy to write:

class reg_xlate_sequence extends uvm_reg_sequence #(uvm_sequence #(burst));
  // ...

  virtual task do_reg_item(uvm_reg_item rw);
    burst b = burst::type_id::create("burst");

    if (!b.randomize() with {
      if (rw.kind inside { UVM_READ, UVM_BURST_READ })
        direction == READ;
      else
        direction == WRITE;
      address == rw.offset;
      data.size() == rw.value.size();
      foreach (data[i])
        data[i] == rw.value[i];
    })
      `uvm_fatal("RNDERR", "Randomization error")
    `uvm_send(b)
  endtask
endclass

Our translation sequence extends the built in uvm_reg_sequence, which already provides some facilities to perform translation (albeit based on a register adapter, which is the very thing we're trying to avoid). By overriding the do_reg_item(...) task, which gets called for each item that gets started on the register sequencer, we can implement our own scheme that generates one AHB transaction based on the contents of the uvm_reg_item to be converted. When creating this sequence, we need to specify the instance of the register sequencer and afterwards start it on the bus sequencer:

    vgm_ahb::reg_xlate_sequence reg2ahb_seq =
      vgm_ahb::reg_xlate_sequence::type_id::create("reg2ahb_seq");
    reg2ahb_seq.reg_seqr = reg_sequencer;
    uvm_config_db #(uvm_sequence_base)::set(ahb_agent.sequencer, "run_phase",
      "default_sequence", reg2ahb_seq);

For APB, the translation sequence is also pretty straightforward:

class reg_xlate_sequence extends uvm_reg_sequence #(uvm_sequence #(transfer));
  // ...

  virtual task do_reg_item(uvm_reg_item rw);
    transfer t = transfer::type_id::create("transfer");

    foreach (rw.value[i]) begin
      if (!t.randomize() with {
        if (rw.kind inside { UVM_READ, UVM_BURST_READ })
          direction == READ;
        else
          direction == WRITE;
        address == rw.offset + 4 * i;
        data == rw.value[i];
      })
        `uvm_fatal("RNDERR", "Randomization error")
      `uvm_send(t)
    end
  endtask
endclass

Now you might ask what the advantage is when doing it this way, as opposed to using the extension argument. Clearly we could save ourselves the trouble of creating our own uvm_reg_item in the register burst sequence (which takes up quite a bit of code, but even that could be encapsulated in a task) and just pass an extension to a call to write(...)/read(...) as we did before. The downside to this, though, would be that we would need a translation sequence that can extract the extension, which would create an unnecessary dependency. If we would be more diligent in creating our register item, we could even save ourselves the trouble of having to start a translation sequence for APB. If we'd fill a few more of its fields (like local_map and some others), the register package itself could handle splitting a burst into multiple transfers and run each of those through a register adapter. I didn't look too much into this, though... The reason for that is that I see this idea of creating our own uvm_reg_item for a register burst as a stepping stone for the next idea.

We could conceptually think of our burst access that covers multiple registers as a memory burst starting at a certain offset (in our case the offset of the first register) that is of a certain size (in our case 4). The uvm_mem class provides, aside from the write(...) and read(...) tasks, the burst_write(...) and burst_read(...) tasks which trigger bursts. We could shadow the registers with a dummy memory, that we would only use to start bursts. The register package would handle the heavy lifting of creating a uvm_reg_item based on our desired access.

Defining such a memory is trivial:

class shadow_mem extends uvm_mem;
  function new(string name = "shadow_mem");
    super.new(name, 4, 32);
  endfunction

  // ...
endclass

Since our register model is probably generated from a specification, we don't want to touch that code. Instead, we can instantiate the shadow memory inside a sub-class and make sure that we instantiate this class in our verification environment instead of the original one:

class ext_reg_block extends regs::some_reg_block;
  shadow_mem SOME_REGS_MEM;

  virtual function void build();
    super.build();

    SOME_REGS_MEM = shadow_mem::type_id::create("SOME_REGS_MEM");
    SOME_REGS_MEM.configure(this);

    default_map.add_mem(SOME_REGS_MEM, SOME_REGS[0].get_offset(default_map));
  endfunction

  // ...
endclass

We'll get warnings that the memory and the registers overlap, but these can be silenced.

We could call burst_write(...) on this memory with the appropriate arguments to trigger a burst that accesses all four registers. Since we have quite a few arguments to pass, this could get tedious, so we can define a helper task:

class shadow_mem extends uvm_mem;
  // ...

  virtual task update_regs(
    output uvm_status_e      status,
    input  uvm_path_e        path   = UVM_DEFAULT_PATH,
    input  uvm_reg_map       map = null,
    input  uvm_sequence_base parent = null,
    input  int               prior = -1,
    input  uvm_object        extension = null,
    input  string            fname = "",
    input  int               lineno = 0
  );
    uvm_reg_data_t values[4];
    ext_reg_block model;

    if (!$cast(model, get_parent()))
      `uvm_fatal("CASTERR", "Cast error")

    foreach (values[i])
      values[i] = model.SOME_REGS[i].get();

    burst_write(status, model.SOME_REGS[0].get_offset(), values, path, map,
      parent, prior, extension, fname, lineno);
  endtask
endclass

The update_regs(...) task is similar to burst_write(...), but it doesn't require us to pass an offset or the data values to be written. These are computed based on the desired values of the registers that the memory shadows. A similar task could be defined to read all the registers.

Our sequence that does a register burst would look like this:

class write_some_regs extends sequence_base;
  // ...

  virtual task body();
    uvm_status_e status;

    model.SOME_REGS[0].FIELD0.set('hff);
    model.SOME_REGS[1].FIELD1.set('hff);
    model.SOME_REGS[2].FIELD2.set('hff);
    model.SOME_REGS[3].FIELD3.set('hff);

    model.SOME_REGS_MEM.update_regs(status);
  endtask
endclass

Integrating this sequence is even more straight forward than before. For APB, we don't even need the register sequencer; the register adapter will suffice. For AHB, we could either have a register sequencer layered on the AHB sequencer (as in the previous section) or we could use a custom frontdoor sequence (as described in my DVCon Europe paper).

I've omitted a lot of the infrastructure code to keep the post focused. You can download the full example from SourceForge.

I don't consider the third approach, using a shadow memory, to be much more complicated than the first one, where we were using the extension argument. Sure it requires a bit more code to declare the shadow memory, especially the convenience tasks, but even that could could be abstracted and made reusable. Layering the shadow memory on bus protocols that don't support bursts is effortless (assuming that a register adapter is already available with the protocol UVC), because UVM_REG already contains a lot of code to handle this. It's only for bus protocols that support burst operation that we need to make sure that register/memory bursts get converted properly. A good UVC for such a protocol will also provide infrastructure for this, in the form of a translation sequence.

When using a custom extension argument to implement such register bursts, the translation scheme always has to be tailored to support this, by extending the generic register adapter or translation sequence to extract the information stored in the extension. It's also rather unintuitive that simpler protocols (that don't support burst operation) cause more headaches. Using the extension argument in this way might also interfere with other uses for it, where a user needs to pass in other side information (such as protection levels) to be translated.

The decision which scheme to use in a certain verification environment depends on whether portability (due to lateral or vertical reuse) is or isn't important.

If you have any other approaches to handling register bursts, I'd love to hear them in the comments section below.

Tuesday, November 10, 2015

How Do I Transfer Thee? Let Me Count the Ways

I'll be giving a talk this week at DVCon Europe about how to use the UVM REG classes to verify memory sub-systems. In particular, I'll focus on how to translate from abstract memory burst accesses (the kind started by calling uvm_mem::burst_read/write(...)) to bus transactions. This isn't as easy as translating register accesses where an adapter is enough, mainly because an adapter can't process accesses that are bigger than the underlying bus width.

As an example, let's look at what happens when trying to start a 16 byte memory burst on a 32 bit AHB bus. This could be represented in quite a few ways as sequences of AHB bursts:

  • an INCR4 WORD burst
  • an INCR8 HALFWORD burst
  • an INCR16 BYTE burst

We could also swap out the fixed INCR* bursts for INCR of non-fixed length and send the appropriate number of transfers. We could also represent the INCR4 burst as a four individual SINGLE WORD bursts (and do the same for the HALFWORD and BYTE bursts). Even within these four WORD bursts of length 1, we could send some of them as SINGLE bursts and some as INCR of length 1. We don't even need to start bursts of the same widths; we could send two HALFWORDS, followed by four BYTES, followed by two WORDS. The main point to take away from this paragraph is that there are a lot of possible ways to transfer 16 bytes.

This got me interested as to how many there are exactly. I tried to figure it out on paper using combinatorics, but this turned out to be pretty complicated. Since I wasn't smart enough to do the math, I decided to take the engineering approach and write a program that would count for me.

Counting how many AHB bursts are required is something we could model as a constraint problem. We need to model an AHB transaction and the constraints on its fields:

class ahb_burst;
  rand bit [31:0] address;
  rand enum { SINGLE, INCR, WRAP4, INCR4, WRAP8, INCR8, WRAP16, INCR16 } kind;
  rand enum { BYTE, HALFWORD, WORD } size;
  rand int unsigned incr_length;

  rand int num_transfers;
  rand int unsigned num_bytes;

  // ...
endclass

The num_transfers and num_bytes fields are there to help keep track of how many bytes we're transferring with a certain burst. The AHB spec imposes the following constraints on the fields:

class ahb_burst;
  // ...

  constraint aligned_address {
    size == WORD -> address[1:0] == 0;
    size == HALFWORD -> address[0] == 0;
  }

  constraint legal_incr_size {
    incr_length > 0;
  }

  constraint compute_num_transfers {
    kind == SINGLE -> num_transfers == 1;
    kind == INCR -> num_transfers == incr_length;
    kind inside { WRAP4, INCR4 } -> num_transfers == 4;
    kind inside { WRAP8, INCR8 } -> num_transfers == 8;
    kind inside { WRAP16, INCR16 } -> num_transfers == 16;
  }

  constraint compute_num_bytes {
    num_bytes == num_transfers * 2 ** size;
  }
endclass

We need another class to model the 16 byte memory burst, which will contain instances of AHB bursts:

class mem_burst_16;
  rand ahb_burst bursts[];

  constraint legal_size {
    bursts.size() <= 16;
    bursts.size() > 0;
  }

  function void pre_randomize();
    bursts = new[16];
    foreach (bursts[i])
      bursts[i] = new();
  endfunction

  // ...
endclass

Since randomization can't allocate new objects, we need to pre-allocate the AHB bursts. The solver can always throw away bursts it doesn't need. To keep things simple, let's assume that the first address is 0x0. We need to constrain the bursts to have incrementing addresses. This is important, because the address at which a burst starts determines the possible widths it has. For example, if the first burst is a SINGLE BYTE, then the following burst can't be of width HALFWORD or WORD, because it would start at address 0x1, which isn't aligned.

class mem_burst_16;
  constraint addresses {
    bursts[0].address == 0;
    foreach (bursts[i])
      if (i > 0)
        bursts[i].address == bursts[i-1].address + bursts[i-1].num_bytes;
  }

  // ...
endclass

We also need to constrain the sum of bytes sent in the memory burst to be exactly 16. We'll need a helper field for this:

class mem_burst_16;
  protected rand int unsigned bursts_num_bytes[];

  constraint max_16_bytes {
    foreach (bursts[i])
      bursts[i].kind == ahb_burst::INCR -> bursts[i].incr_length <= 16;

    bursts_num_bytes.size() == bursts.size();
    foreach (bursts[i])
      bursts_num_bytes[i] == bursts[i].num_bytes;
    bursts_num_bytes.sum() == 16;
  }

  // ...
endclass

Now we can start randomizing memory bursts. The idea is to keep a list of bursts we generate. If we encounter a burst we haven't seen before, we add it to the list. If the burst we generate is already in the list, we throw it away. We do this for some large number of iterations. This approach isn't the most efficient, because we're guaranteed to generate the same bursts quite a few times and have to do a lot of discarding. Also, the more our list grows, the more expensive it's going to be to check if a newly generated burst is already in the list.

I won't show the code for the search here (but you can get it from SourceForge). I tried executing it and as expected, it was ridiculously slow. Not only that, the tool started to crash at some point. I was lucky enough to get about 10000 iterations done and I wound up with around 500 unique burst combinations. That's a respectable number, but I wasn't really convinced that was it. Trying to run more iterations led to the tool crashing again.

It was clear that I was going about it all wrong. The idea of describing the state space of our problem using constraints is a good one, but the way we were searching for solutions was flawed. We were basically throwing darts at the dartboard and trying to hit all possible points. We'd need a more efficient way of throwing the darts.

I remember reading about Prolog a while back. Programs written in Prolog are declarative by nature (like our constraints). It seems like a good candidate to try to solve our problem. It's very well suited for expressing logical relationships and by using the clpfd (constraint logic programming over finite domains) package we can also express integer constraints like the ones we use in SystemVerilog. I don't want this post to become a Prolog tutorial, so I won't explain the code in too much detail.

The first thing we need is to be able to generate a single AHB burst:

gen_burst(X) :-
    Kind in 0..7,

    NumTransfers #> 0,
    NumTransfers #=< 16,
    Kind #= 0 #==> NumTransfers #= 1,
    Kind in 2..3 #==> NumTransfers #= 4,
    Kind in 4..5 #==> NumTransfers #= 8,
    Kind in 6..7 #==> NumTransfers #= 16,

    Size in 0..2,

    NumBytes #= NumTransfers * 2 ^ Size,

    Address in 0..15,
    Size #= 1 #==> mod(Address, 2) #= 0,
    Size #= 2 #==> mod(Address, 4) #= 0,

    X = ahb_burst(Kind, Size, Address, NumTransfers, NumBytes).

Since the clpfd package can only work with integers, we can't have nice enumerated names like INCR4 or HALFWORD in our constraints. We need to use their bit vector representations, which makes us lose a bit of readability.

Now that we described what an AHB burst is, we can group more of them together to form a memory burst:

gen_mem_burst(X) :-
    between(1, 16, Len),
    length(Bursts, Len),

    inst_ahb_burst(Bursts),

    % ...


inst_ahb_burst([]).
inst_ahb_burst([X|X1]) :- gen_burst(X), inst_ahb_burst(X1).

We declare a list called Bursts which contains at most 16 elements. The inst_ahb_bursts(...) predicate fills the list with AHB bursts using recursion. Next we constrain the address of the first burst to be 0x0:

gen_mem_burst(X) :-
    % ...

    nth0(0, Bursts, Burst0),
    constrain_address(Burst0, 0),

    % ...


constrain_address(Burst, Address) :-
    get_address(Burst, BurstAddress),
    BurstAddress #= Address,

    get_kind(Burst, Kind),
    get_num_bytes(Burst, NumBytes),
    Kind in {2, 4, 6} #==>
        Address mod NumBytes #= 0.
We've gotten a bit ahead of ourselves here by declaring a generic constrain_address(...) predicate. The idea is to use it to constrain the address of a burst depending on a certain value. For wrapping bursts, we also need to make sure that the address is aligned to the beginning of a line, otherwise we'll be missing values. This is a constraint we missed in our SystemVerilog code. Without it, we might end up with something like a sequence beginning with a SINGLE HALFWORD to address 0x0 followed by a WRAP4 BYTE to address 0x2. The wrapping burst would access address 0x2, 0x3, 0x0 and 0x1, as opposed to its INCR4 counterpart which would access 0x2, 0x3, 0x4 and 0x5. Since we only want to access every address once and only once, we need to exclude such situations.
Using the propagate_address(...) predicate we constrain the address of each subsequent burst, depending on the number of bytes transferred before it:
gen_mem_burst(X) :-
    % ...

    propagate_address(Bursts),

    % ...


propagate_address([_]).
propagate_address([X0, X1 | X]) :-
    get_address(X0, Address0),
    get_num_bytes(X0, NumBytes0),
    constrain_address(X1, Address0 + NumBytes0),
    propagate_address([X1|X]).

This is where the constrain_address(...) predicate becomes useful again. If it wouldn't restrict when we're allowed to start wrapping bursts we would end up missing address locations.

The last constraint we need is that the total number of bytes is 16:

gen_mem_burst(X) :-
    % ...

    get_burst_num_bytes(Bursts, NumBytes),
    sum(NumBytes, #=, 16),

    X = mem_burst(Len, Bursts).

Now we can write a predicate that prints and counts the number of ways we can transfer 16 bytes. At first I tried doing this in one go, but it was too much for my computer to handle. To make it more manageable we can count in different bins, depending on the number of AHB bursts we need. Here's the count(...) predicate:

count(N, R) :-
    gen_mem_burst(X),
    get_num_ahb_bursts(X, N),
    findall(X, label_mem_burst(X), Z),
    %maplist(print_mem_burst, Z),  % uncomment to see the solutions
    length(Z, R).

When starting a single AHB burst to transfer all 16 bytes, it could either be an INCR4 WORD burst, an INCR8 HALFWORD burst or an INCR16 BYTE burst. Instead of INCR* it could just as well be a WRAP* burst or a non-fixed length INCR with the corresponding number of transfers. This leads us to a total of 9 bursts. When starting two AHB bursts to transfer the 16 bytes, there are 115 possible combinations. When starting three, there are already 1591 possible combinations. Here's a table showing how many possible combinations there are per number of bursts and the total:

Num Bursts

Num Combinations
1 9
2 115
3 1591
4 12584
5 68499
6 270482
7 817974
8

1934737

9 3591184
10 5273632
11 6506624
12 5639936
13 3676160
14 1748992
15 507904
16 65536

Total:

30115959

Now that's a ridiculous amount of possibilities. If it would take one second to simulate each of these scenarios, then we'd need to leave our simulator running for about a year. When they say exhaustive verification is impossible these days, they really aren't kidding. Trying out all of them when verifying an AHB design doesn't necessarily make sense, but it's interesting to see just how many there are.

But wait, there's more... We made a logical error regarding wrapping bursts. A wrapping burst doesn't need to start at the address following the previous burst (or 0x0 if it's the first one). It just needs to contain that address. For example, to read addresses 0x0, 0x4, 0x8 and 0xC, we could perform a WRAP4 WORD burst starting at any of these addresses. To reflect this, we need to fix the constraint_address(...) predicate:

constrain_address(Burst, Address) :-
    get_address(Burst, BurstAddress),
    get_kind(Burst, Kind),

    % Incremental bursts
    Kind in {0, 1, 3, 5, 7} #==> BurstAddress #= Address,

    % Wrapping bursts
    get_num_bytes(Burst, NumBytes),
    Kind in {2, 4, 6} #==>
        Address mod NumBytes #= 0 #/\
        BurstAddress #>= Address #/\
        BurstAddress #< Address + NumBytes.

This will increase the number of burst combinations we have. I'll leave it as an exercise for interested readers to compute just how many this is.

What does this have to do with UVM REG? Well, if we were using a classical register adapter to convert this memory burst, we could only generate 16 combinations out of the multitude of possible ones: 4 WORD bursts, where the burst kind could be either SINGLE or INCR with one transfer. If you want to see how to get around this limitation and be able to generate any legal combination, check out my DVCon paper.