Monday, November 23, 2015

Accessing Multiple Registers at Once

As I already mentioned, I gave a talk at DVCon Europe this year on how to implement burst accesses to memories modeled using UVM_REG. The motivation for that was that the register package can already handle bus protocols that don't support burst operation, but it requires more user guidance for protocols that do support it. A question that came up afterwards on the conference floor was what the best way to handle burst accesses to multiple registers might be. I tried to sketch out an answer on a piece of paper, but it was rather late in the day and I couldn't really gather my thoughts. I also have a difficult time expressing myself verbally when talking about abstract concepts. Talking is much more difficult than writing, because you don't get the chance to go back and iterate over certain aspects of the topic. Talking about coding problems is also particularly difficult to do when not in front of a computer. People who've worked with me know that I always like to have an editor window open and sketch out some pseudo-code when discussing something in more detail.

The handling of register bursts is a question that comes up from time to time, on places like the Accellera forum or StackOverflow. Since the person who asked me the question is also a reader of the blog, I thought it would be worth making a post out of it.

A solution would need to take two factors into account. It would need to pragmatic, i.e. do the job with the least amount of code necessary. If you would have asked me this question a little while back I would have stopped here. In the mean time, I've been toying with the idea of using the register abstraction layer as a means of achieving reuse (both lateral and vertical) of sequences. Most probably you'll be seeing more posts on the topic. The second factor I would thus consider important is portability, i.e. being able to take sequences from one project and use them in another.

As an example, let's take a simple design that has four registers, located at consecutive addresses:

class some_reg extends uvm_reg;
  rand uvm_reg_field FIELD0;
  rand uvm_reg_field FIELD1;
  rand uvm_reg_field FIELD2;
  rand uvm_reg_field FIELD3;

  // ...
endclass


class some_reg_block extends uvm_reg_block;
  rand some_reg SOME_REGS[4];

  virtual function void build();
    // ...
    foreach (SOME_REGS[i]) begin
      SOME_REGS[i].build();
      SOME_REGS[i].configure(this);
      default_map.add_reg(SOME_REGS[i], 'h4 * i);
    end
  endfunction

  // ...
endclass

When updating a register, we would use the built-in methods of uvm_reg to set the desired value and trigger a write:

class write_some_reg0 extends sequence_base;
  // ...

  virtual task body();
    uvm_status_e status;
    model.SOME_REGS[0].FIELD0.set('hff);
    model.SOME_REGS[0].update(status);
  endtask
endclass

Let's assume that the DUT has an AHB interface that supports burst accesses. This means that it's possible to access all four registers using a single AHB transaction. Converting from register accesses to bus items is usually done using a register adapter:

class reg_adapter extends uvm_reg_adapter;
  virtual function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
    burst b = burst::type_id::create("burst");
    if (!b.randomize() with {
      address == rw.addr;
      kind == SINGLE;
      direction == rw.kind == UVM_READ ? READ : WRITE;
      data[0] == rw.data;
    })
      `uvm_fatal("RANDERR", "Randomization error")
    return b;
  endfunction

  // ...
endclass

I'll assume everybody is familiar with how an adapter works. If not, the UVM User Guide is a good resource to get you up to speed on how the register model is integrated. This adapter can only handle accessing one register at a time. We need some way of telling it that we actually want to access more registers.

As seen in the links above, people will recommend using the optional extension argument of the read(...) and write(...) tasks to instruct the adapter that the access it's converting is actually a burst to more registers. The use model would be to have a class containing information about whether a register access is a burst:

class reg_burst_extension extends uvm_object;
  rand int unsigned num_regs;

  constraint valid_num_regs {
    num_regs inside { 1, 4 };
  }

  // ...
endclass

If num_regs is 1, then the access is a normal one, otherwise it's a burst. It's also a good idea to make the field of the extension random to allow for more generic sequences. When wanting to write all four registers at a time, we could set the values that we want our registers to take, construct an object of this class, set num_regs to 4 and pass it to the update(...) task:

class write_some_regs extends sequence_base;
  // ...

  virtual task body();
    uvm_status_e status;
    reg_burst_extension ext = reg_burst_extension::type_id::create("ext");
    ext.num_regs = 4;

    model.SOME_REGS[0].FIELD0.set('hff);
    model.SOME_REGS[1].FIELD1.set('hff);
    model.SOME_REGS[2].FIELD2.set('hff);
    model.SOME_REGS[3].FIELD3.set('hff);
    model.SOME_REGS[0].update(status, .extension(ext));
  endtask
endclass

The vanilla register adapter doesn't know anything about the extension we passed. We'll need a sub-class that can interpret the extra information and use it to generate a burst. If we don't pass an extension or pass an unsuitable extension, then we can just generate a SINGLE AHB transaction as before:

class ahb_reg_adapter extends vgm_ahb::reg_adapter;
  function new(string name = "ahb_reg_adapter");
    super.new(name);
  endfunction

  virtual function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
    uvm_reg_item item = get_item();
    reg_burst_extension ext;

    if (item.extension == null || !$cast(ext, item.extension) || ext.num_regs == 1)
      return super.reg2bus(rw);

    // ...
  endfunction
endclass

If we do want to do a burst access, then we need to collect the information from all registers and store it inside the AHB transaction that we want to start:

class ahb_reg_adapter extends vgm_ahb::reg_adapter;
  // ...

  virtual function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
    vgm_ahb::burst b = vgm_ahb::burst::type_id::create("burst");
    uvm_reg_item item = get_item();
    reg_burst_extension ext;
    uvm_reg regs[];
    uvm_reg_addr_t offset;
    uvm_reg_data_t data[];

    // ...

    offset = regs[0].get_offset(item.map);
    data = new[ext.num_regs];
    for (int i = 1; i < ext.num_regs; i++)
      regs[i] = item.map.get_reg_by_offset(offset + i*4);

    foreach (regs[i])
      data[i] = regs[i].get();

    if (!b.randomize() with {
      address == rw.addr;
      kind == vgm_ahb::INCR4;
      direction == rw.kind == UVM_READ ? vgm_ahb::READ : vgm_ahb::WRITE;
      foreach (data[i])
        data[i] == local::data[i];
    })
      `uvm_fatal("RANDERR", "Randomization error")
    return b;
  endfunction
endclass

This is the tried and true way of doing it. It's also pretty easy to implement. The problem with it, though, is that it's rather coupled with the verification environment. Let's assume that we get a second variant of our DUT that is a bit more bare-bones and only has an APB interface. Ideally, we'd want to be able to run the same sequences (or a subset thereof) in this second verification environment. Accessing single registers isn't a problem, as these would be handled by the vanilla APB register adapter (code not shown for brevity). When starting the burst access sequence (the one with the extension), we'd still like to see all four registers getting accessed, albeit via four different APB transfers. This means we'd need to have a register adapter that can start four transactions in one go:

class apb_reg_adapter extends vgm_apb::reg_adapter;
  // ...

  virtual function uvm_sequence_item reg2bus(const ref uvm_reg_bus_op rw);
    // ...

    foreach (regs[i]) begin
      if (i == 0)
        continue;
      if (rw.kind == UVM_READ)
        fork
          automatic uvm_reg rg = regs[i];
          rg.read(status, data);
        join_none
      else
        fork
          automatic uvm_reg rg = regs[i];
          rg.write(status, rg.get());
        join_none
    end

    return super.reg2bus(rw);
  endfunction
endclass

The reg2bus(...) method is a function, so it can't block. It can also only return one bus transaction. That would be the one corresponding to the register we called write(...) on. If we'd like to access the other three registers as well, one would optimistically think that the other accesses could be forked out. This could get us in a world of trouble with race conditions, because the order in which the accesses would get processed isn't defined. It also doesn't work as expected, because the update(...) task returns before all accesses are finished. For writes this might not be such a big issue, but for reads this would be fatal, since we wouldn't be able to rely on the values stored in the registers to be up-to-date. I didn't really investigate how to improve on this, since the whole idea seems silly. A register adapter isn't meant for this kind of operation. It can only start one bus transaction based on one register access, not more. This was all fine and dandy when that transaction could be a burst (as for AHB), but it falls apart when we need to translate sequences that try to access all four registers at once. This means we can't reliably run the sequences that use the extension mechanism in the APB verification environment, at least not while having them go through an adapter. They could still be reused if we employed a different means of translating from register accesses, using a register sequencer layered on the APB sequencer that would run a translation sequence (more on this later).

The main takeaway point, though, is that while using the extension is easy to set up for the initial DUT (the one with AHB), it becomes trickier to port it to any subsequent variants of the design that use different bus protocols, particularly so if the protocols don't intrinsically support burst accesses. Even for other protocols that do support burst accesses (e.g. AXI), we'd still need to create a sub-class of the corresponding register adapter that can extract the information contained in the extension.

The problem stems from the fact that we're trying to shoehorn an unsuitable abstraction. Calls to uvm_reg::read(...)/write(...) ultimately end up creating an abstract register access, of type uvm_reg_item. Such a register item (which is a sub-class of uvm_sequence_item) can model anything from a small access that takes one bus cycle, to a very big access that takes multiple bus cycles (also called a burst). We're trying to model an access to four registers as an access to one of the registers that includes some side information to say if it's actually a burst or not.

A better idea might be to not go the way of using an extension. Instead, we could create a register item "by hand", fill it up with the appropriate information and send it out to be processed:

class write_some_regs extends sequence_base;
  // ...

  virtual task body();
    uvm_reg_item item;
    `uvm_create_on(item, model.default_map.get_sequencer());

    model.SOME_REGS[0].FIELD0.set('hff);
    model.SOME_REGS[1].FIELD1.set('hff);
    model.SOME_REGS[2].FIELD2.set('hff);
    model.SOME_REGS[3].FIELD3.set('hff);

    item.kind         = UVM_BURST_WRITE;
    item.offset       = model.SOME_REGS[0].get_offset();
    item.value        = new[4];
    foreach (item.value[i])
      item.value[i] = model.SOME_REGS[i].get();

    `uvm_send(item)
  endtask
endclass

Instead of starting a register item indirectly via a call to uvm_reg::write(...), we create one ourselves. We explicitly state that this is a burst access, by setting the kind field appropriately. The (misleadingly named) value field is actually an array that contains one element per burst transfer. Since we want to write to four registers, we set its size to 4 and its elements to the desired values of the registers.

This is one piece of the puzzle. Now we need to translate this uvm_reg_item to the bus transaction that the DUT needs to see. Trying to send this access through a register adapter might work for the AHB DUT, because the AHB adapter can start a single AHB transaction that is capable of representing the entire register item. Trying to send it through the APB adapter will lead to the same problem that we had before before, namely that we can't start multiple APB transactions based on it.

The UVM User Guide show us how to implement a different translation scheme, more sophisticated than the register adapter. As briefly mentioned above, it involves layering. As described in section 5.9.2.3 of the User Guide (UVM 1.1), we can have a register sequencer that serves as a landing pad for uvm_reg_items. A translation sequence running on the bus sequencer would get items from this register sequencer and could convert them to bus transactions.

For AHB, this is pretty easy to write:

class reg_xlate_sequence extends uvm_reg_sequence #(uvm_sequence #(burst));
  // ...

  virtual task do_reg_item(uvm_reg_item rw);
    burst b = burst::type_id::create("burst");

    if (!b.randomize() with {
      if (rw.kind inside { UVM_READ, UVM_BURST_READ })
        direction == READ;
      else
        direction == WRITE;
      address == rw.offset;
      data.size() == rw.value.size();
      foreach (data[i])
        data[i] == rw.value[i];
    })
      `uvm_fatal("RNDERR", "Randomization error")
    `uvm_send(b)
  endtask
endclass

Our translation sequence extends the built in uvm_reg_sequence, which already provides some facilities to perform translation (albeit based on a register adapter, which is the very thing we're trying to avoid). By overriding the do_reg_item(...) task, which gets called for each item that gets started on the register sequencer, we can implement our own scheme that generates one AHB transaction based on the contents of the uvm_reg_item to be converted. When creating this sequence, we need to specify the instance of the register sequencer and afterwards start it on the bus sequencer:

    vgm_ahb::reg_xlate_sequence reg2ahb_seq =
      vgm_ahb::reg_xlate_sequence::type_id::create("reg2ahb_seq");
    reg2ahb_seq.reg_seqr = reg_sequencer;
    uvm_config_db #(uvm_sequence_base)::set(ahb_agent.sequencer, "run_phase",
      "default_sequence", reg2ahb_seq);

For APB, the translation sequence is also pretty straightforward:

class reg_xlate_sequence extends uvm_reg_sequence #(uvm_sequence #(transfer));
  // ...

  virtual task do_reg_item(uvm_reg_item rw);
    transfer t = transfer::type_id::create("transfer");

    foreach (rw.value[i]) begin
      if (!t.randomize() with {
        if (rw.kind inside { UVM_READ, UVM_BURST_READ })
          direction == READ;
        else
          direction == WRITE;
        address == rw.offset + 4 * i;
        data == rw.value[i];
      })
        `uvm_fatal("RNDERR", "Randomization error")
      `uvm_send(t)
    end
  endtask
endclass

Now you might ask what the advantage is when doing it this way, as opposed to using the extension argument. Clearly we could save ourselves the trouble of creating our own uvm_reg_item in the register burst sequence (which takes up quite a bit of code, but even that could be encapsulated in a task) and just pass an extension to a call to write(...)/read(...) as we did before. The downside to this, though, would be that we would need a translation sequence that can extract the extension, which would create an unnecessary dependency. If we would be more diligent in creating our register item, we could even save ourselves the trouble of having to start a translation sequence for APB. If we'd fill a few more of its fields (like local_map and some others), the register package itself could handle splitting a burst into multiple transfers and run each of those through a register adapter. I didn't look too much into this, though... The reason for that is that I see this idea of creating our own uvm_reg_item for a register burst as a stepping stone for the next idea.

We could conceptually think of our burst access that covers multiple registers as a memory burst starting at a certain offset (in our case the offset of the first register) that is of a certain size (in our case 4). The uvm_mem class provides, aside from the write(...) and read(...) tasks, the burst_write(...) and burst_read(...) tasks which trigger bursts. We could shadow the registers with a dummy memory, that we would only use to start bursts. The register package would handle the heavy lifting of creating a uvm_reg_item based on our desired access.

Defining such a memory is trivial:

class shadow_mem extends uvm_mem;
  function new(string name = "shadow_mem");
    super.new(name, 4, 32);
  endfunction

  // ...
endclass

Since our register model is probably generated from a specification, we don't want to touch that code. Instead, we can instantiate the shadow memory inside a sub-class and make sure that we instantiate this class in our verification environment instead of the original one:

class ext_reg_block extends regs::some_reg_block;
  shadow_mem SOME_REGS_MEM;

  virtual function void build();
    super.build();

    SOME_REGS_MEM = shadow_mem::type_id::create("SOME_REGS_MEM");
    SOME_REGS_MEM.configure(this);

    default_map.add_mem(SOME_REGS_MEM, SOME_REGS[0].get_offset(default_map));
  endfunction

  // ...
endclass

We'll get warnings that the memory and the registers overlap, but these can be silenced.

We could call burst_write(...) on this memory with the appropriate arguments to trigger a burst that accesses all four registers. Since we have quite a few arguments to pass, this could get tedious, so we can define a helper task:

class shadow_mem extends uvm_mem;
  // ...

  virtual task update_regs(
    output uvm_status_e      status,
    input  uvm_path_e        path   = UVM_DEFAULT_PATH,
    input  uvm_reg_map       map = null,
    input  uvm_sequence_base parent = null,
    input  int               prior = -1,
    input  uvm_object        extension = null,
    input  string            fname = "",
    input  int               lineno = 0
  );
    uvm_reg_data_t values[4];
    ext_reg_block model;

    if (!$cast(model, get_parent()))
      `uvm_fatal("CASTERR", "Cast error")

    foreach (values[i])
      values[i] = model.SOME_REGS[i].get();

    burst_write(status, model.SOME_REGS[0].get_offset(), values, path, map,
      parent, prior, extension, fname, lineno);
  endtask
endclass

The update_regs(...) task is similar to burst_write(...), but it doesn't require us to pass an offset or the data values to be written. These are computed based on the desired values of the registers that the memory shadows. A similar task could be defined to read all the registers.

Our sequence that does a register burst would look like this:

class write_some_regs extends sequence_base;
  // ...

  virtual task body();
    uvm_status_e status;

    model.SOME_REGS[0].FIELD0.set('hff);
    model.SOME_REGS[1].FIELD1.set('hff);
    model.SOME_REGS[2].FIELD2.set('hff);
    model.SOME_REGS[3].FIELD3.set('hff);

    model.SOME_REGS_MEM.update_regs(status);
  endtask
endclass

Integrating this sequence is even more straight forward than before. For APB, we don't even need the register sequencer; the register adapter will suffice. For AHB, we could either have a register sequencer layered on the AHB sequencer (as in the previous section) or we could use a custom frontdoor sequence (as described in my DVCon Europe paper).

I've omitted a lot of the infrastructure code to keep the post focused. You can download the full example from SourceForge.

I don't consider the third approach, using a shadow memory, to be much more complicated than the first one, where we were using the extension argument. Sure it requires a bit more code to declare the shadow memory, especially the convenience tasks, but even that could could be abstracted and made reusable. Layering the shadow memory on bus protocols that don't support bursts is effortless (assuming that a register adapter is already available with the protocol UVC), because UVM_REG already contains a lot of code to handle this. It's only for bus protocols that support burst operation that we need to make sure that register/memory bursts get converted properly. A good UVC for such a protocol will also provide infrastructure for this, in the form of a translation sequence.

When using a custom extension argument to implement such register bursts, the translation scheme always has to be tailored to support this, by extending the generic register adapter or translation sequence to extract the information stored in the extension. It's also rather unintuitive that simpler protocols (that don't support burst operation) cause more headaches. Using the extension argument in this way might also interfere with other uses for it, where a user needs to pass in other side information (such as protection levels) to be translated.

The decision which scheme to use in a certain verification environment depends on whether portability (due to lateral or vertical reuse) is or isn't important.

If you have any other approaches to handling register bursts, I'd love to hear them in the comments section below.

Tuesday, November 10, 2015

How Do I Transfer Thee? Let Me Count the Ways

I'll be giving a talk this week at DVCon Europe about how to use the UVM REG classes to verify memory sub-systems. In particular, I'll focus on how to translate from abstract memory burst accesses (the kind started by calling uvm_mem::burst_read/write(...)) to bus transactions. This isn't as easy as translating register accesses where an adapter is enough, mainly because an adapter can't process accesses that are bigger than the underlying bus width.

As an example, let's look at what happens when trying to start a 16 byte memory burst on a 32 bit AHB bus. This could be represented in quite a few ways as sequences of AHB bursts:

  • an INCR4 WORD burst
  • an INCR8 HALFWORD burst
  • an INCR16 BYTE burst

We could also swap out the fixed INCR* bursts for INCR of non-fixed length and send the appropriate number of transfers. We could also represent the INCR4 burst as a four individual SINGLE WORD bursts (and do the same for the HALFWORD and BYTE bursts). Even within these four WORD bursts of length 1, we could send some of them as SINGLE bursts and some as INCR of length 1. We don't even need to start bursts of the same widths; we could send two HALFWORDS, followed by four BYTES, followed by two WORDS. The main point to take away from this paragraph is that there are a lot of possible ways to transfer 16 bytes.

This got me interested as to how many there are exactly. I tried to figure it out on paper using combinatorics, but this turned out to be pretty complicated. Since I wasn't smart enough to do the math, I decided to take the engineering approach and write a program that would count for me.

Counting how many AHB bursts are required is something we could model as a constraint problem. We need to model an AHB transaction and the constraints on its fields:

class ahb_burst;
  rand bit [31:0] address;
  rand enum { SINGLE, INCR, WRAP4, INCR4, WRAP8, INCR8, WRAP16, INCR16 } kind;
  rand enum { BYTE, HALFWORD, WORD } size;
  rand int unsigned incr_length;

  rand int num_transfers;
  rand int unsigned num_bytes;

  // ...
endclass

The num_transfers and num_bytes fields are there to help keep track of how many bytes we're transferring with a certain burst. The AHB spec imposes the following constraints on the fields:

class ahb_burst;
  // ...

  constraint aligned_address {
    size == WORD -> address[1:0] == 0;
    size == HALFWORD -> address[0] == 0;
  }

  constraint legal_incr_size {
    incr_length > 0;
  }

  constraint compute_num_transfers {
    kind == SINGLE -> num_transfers == 1;
    kind == INCR -> num_transfers == incr_length;
    kind inside { WRAP4, INCR4 } -> num_transfers == 4;
    kind inside { WRAP8, INCR8 } -> num_transfers == 8;
    kind inside { WRAP16, INCR16 } -> num_transfers == 16;
  }

  constraint compute_num_bytes {
    num_bytes == num_transfers * 2 ** size;
  }
endclass

We need another class to model the 16 byte memory burst, which will contain instances of AHB bursts:

class mem_burst_16;
  rand ahb_burst bursts[];

  constraint legal_size {
    bursts.size() <= 16;
    bursts.size() > 0;
  }

  function void pre_randomize();
    bursts = new[16];
    foreach (bursts[i])
      bursts[i] = new();
  endfunction

  // ...
endclass

Since randomization can't allocate new objects, we need to pre-allocate the AHB bursts. The solver can always throw away bursts it doesn't need. To keep things simple, let's assume that the first address is 0x0. We need to constrain the bursts to have incrementing addresses. This is important, because the address at which a burst starts determines the possible widths it has. For example, if the first burst is a SINGLE BYTE, then the following burst can't be of width HALFWORD or WORD, because it would start at address 0x1, which isn't aligned.

class mem_burst_16;
  constraint addresses {
    bursts[0].address == 0;
    foreach (bursts[i])
      if (i > 0)
        bursts[i].address == bursts[i-1].address + bursts[i-1].num_bytes;
  }

  // ...
endclass

We also need to constrain the sum of bytes sent in the memory burst to be exactly 16. We'll need a helper field for this:

class mem_burst_16;
  protected rand int unsigned bursts_num_bytes[];

  constraint max_16_bytes {
    foreach (bursts[i])
      bursts[i].kind == ahb_burst::INCR -> bursts[i].incr_length <= 16;

    bursts_num_bytes.size() == bursts.size();
    foreach (bursts[i])
      bursts_num_bytes[i] == bursts[i].num_bytes;
    bursts_num_bytes.sum() == 16;
  }

  // ...
endclass

Now we can start randomizing memory bursts. The idea is to keep a list of bursts we generate. If we encounter a burst we haven't seen before, we add it to the list. If the burst we generate is already in the list, we throw it away. We do this for some large number of iterations. This approach isn't the most efficient, because we're guaranteed to generate the same bursts quite a few times and have to do a lot of discarding. Also, the more our list grows, the more expensive it's going to be to check if a newly generated burst is already in the list.

I won't show the code for the search here (but you can get it from SourceForge). I tried executing it and as expected, it was ridiculously slow. Not only that, the tool started to crash at some point. I was lucky enough to get about 10000 iterations done and I wound up with around 500 unique burst combinations. That's a respectable number, but I wasn't really convinced that was it. Trying to run more iterations led to the tool crashing again.

It was clear that I was going about it all wrong. The idea of describing the state space of our problem using constraints is a good one, but the way we were searching for solutions was flawed. We were basically throwing darts at the dartboard and trying to hit all possible points. We'd need a more efficient way of throwing the darts.

I remember reading about Prolog a while back. Programs written in Prolog are declarative by nature (like our constraints). It seems like a good candidate to try to solve our problem. It's very well suited for expressing logical relationships and by using the clpfd (constraint logic programming over finite domains) package we can also express integer constraints like the ones we use in SystemVerilog. I don't want this post to become a Prolog tutorial, so I won't explain the code in too much detail.

The first thing we need is to be able to generate a single AHB burst:

gen_burst(X) :-
    Kind in 0..7,

    NumTransfers #> 0,
    NumTransfers #=< 16,
    Kind #= 0 #==> NumTransfers #= 1,
    Kind in 2..3 #==> NumTransfers #= 4,
    Kind in 4..5 #==> NumTransfers #= 8,
    Kind in 6..7 #==> NumTransfers #= 16,

    Size in 0..2,

    NumBytes #= NumTransfers * 2 ^ Size,

    Address in 0..15,
    Size #= 1 #==> mod(Address, 2) #= 0,
    Size #= 2 #==> mod(Address, 4) #= 0,

    X = ahb_burst(Kind, Size, Address, NumTransfers, NumBytes).

Since the clpfd package can only work with integers, we can't have nice enumerated names like INCR4 or HALFWORD in our constraints. We need to use their bit vector representations, which makes us lose a bit of readability.

Now that we described what an AHB burst is, we can group more of them together to form a memory burst:

gen_mem_burst(X) :-
    between(1, 16, Len),
    length(Bursts, Len),

    inst_ahb_burst(Bursts),

    % ...


inst_ahb_burst([]).
inst_ahb_burst([X|X1]) :- gen_burst(X), inst_ahb_burst(X1).

We declare a list called Bursts which contains at most 16 elements. The inst_ahb_bursts(...) predicate fills the list with AHB bursts using recursion. Next we constrain the address of the first burst to be 0x0:

gen_mem_burst(X) :-
    % ...

    nth0(0, Bursts, Burst0),
    constrain_address(Burst0, 0),

    % ...


constrain_address(Burst, Address) :-
    get_address(Burst, BurstAddress),
    BurstAddress #= Address,

    get_kind(Burst, Kind),
    get_num_bytes(Burst, NumBytes),
    Kind in {2, 4, 6} #==>
        Address mod NumBytes #= 0.
We've gotten a bit ahead of ourselves here by declaring a generic constrain_address(...) predicate. The idea is to use it to constrain the address of a burst depending on a certain value. For wrapping bursts, we also need to make sure that the address is aligned to the beginning of a line, otherwise we'll be missing values. This is a constraint we missed in our SystemVerilog code. Without it, we might end up with something like a sequence beginning with a SINGLE HALFWORD to address 0x0 followed by a WRAP4 BYTE to address 0x2. The wrapping burst would access address 0x2, 0x3, 0x0 and 0x1, as opposed to its INCR4 counterpart which would access 0x2, 0x3, 0x4 and 0x5. Since we only want to access every address once and only once, we need to exclude such situations.
Using the propagate_address(...) predicate we constrain the address of each subsequent burst, depending on the number of bytes transferred before it:
gen_mem_burst(X) :-
    % ...

    propagate_address(Bursts),

    % ...


propagate_address([_]).
propagate_address([X0, X1 | X]) :-
    get_address(X0, Address0),
    get_num_bytes(X0, NumBytes0),
    constrain_address(X1, Address0 + NumBytes0),
    propagate_address([X1|X]).

This is where the constrain_address(...) predicate becomes useful again. If it wouldn't restrict when we're allowed to start wrapping bursts we would end up missing address locations.

The last constraint we need is that the total number of bytes is 16:

gen_mem_burst(X) :-
    % ...

    get_burst_num_bytes(Bursts, NumBytes),
    sum(NumBytes, #=, 16),

    X = mem_burst(Len, Bursts).

Now we can write a predicate that prints and counts the number of ways we can transfer 16 bytes. At first I tried doing this in one go, but it was too much for my computer to handle. To make it more manageable we can count in different bins, depending on the number of AHB bursts we need. Here's the count(...) predicate:

count(N, R) :-
    gen_mem_burst(X),
    get_num_ahb_bursts(X, N),
    findall(X, label_mem_burst(X), Z),
    %maplist(print_mem_burst, Z),  % uncomment to see the solutions
    length(Z, R).

When starting a single AHB burst to transfer all 16 bytes, it could either be an INCR4 WORD burst, an INCR8 HALFWORD burst or an INCR16 BYTE burst. Instead of INCR* it could just as well be a WRAP* burst or a non-fixed length INCR with the corresponding number of transfers. This leads us to a total of 9 bursts. When starting two AHB bursts to transfer the 16 bytes, there are 115 possible combinations. When starting three, there are already 1591 possible combinations. Here's a table showing how many possible combinations there are per number of bursts and the total:

Num Bursts

Num Combinations
1 9
2 115
3 1591
4 12584
5 68499
6 270482
7 817974
8

1934737

9 3591184
10 5273632
11 6506624
12 5639936
13 3676160
14 1748992
15 507904
16 65536

Total:

30115959

Now that's a ridiculous amount of possibilities. If it would take one second to simulate each of these scenarios, then we'd need to leave our simulator running for about a year. When they say exhaustive verification is impossible these days, they really aren't kidding. Trying out all of them when verifying an AHB design doesn't necessarily make sense, but it's interesting to see just how many there are.

But wait, there's more... We made a logical error regarding wrapping bursts. A wrapping burst doesn't need to start at the address following the previous burst (or 0x0 if it's the first one). It just needs to contain that address. For example, to read addresses 0x0, 0x4, 0x8 and 0xC, we could perform a WRAP4 WORD burst starting at any of these addresses. To reflect this, we need to fix the constraint_address(...) predicate:

constrain_address(Burst, Address) :-
    get_address(Burst, BurstAddress),
    get_kind(Burst, Kind),

    % Incremental bursts
    Kind in {0, 1, 3, 5, 7} #==> BurstAddress #= Address,

    % Wrapping bursts
    get_num_bytes(Burst, NumBytes),
    Kind in {2, 4, 6} #==>
        Address mod NumBytes #= 0 #/\
        BurstAddress #>= Address #/\
        BurstAddress #< Address + NumBytes.

This will increase the number of burst combinations we have. I'll leave it as an exercise for interested readers to compute just how many this is.

What does this have to do with UVM REG? Well, if we were using a classical register adapter to convert this memory burst, we could only generate 16 combinations out of the multitude of possible ones: 4 WORD bursts, where the burst kind could be either SINGLE or INCR with one transfer. If you want to see how to get around this limitation and be able to generate any legal combination, check out my DVCon paper.