Tuesday, November 10, 2015

How Do I Transfer Thee? Let Me Count the Ways

I'll be giving a talk this week at DVCon Europe about how to use the UVM REG classes to verify memory sub-systems. In particular, I'll focus on how to translate from abstract memory burst accesses (the kind started by calling uvm_mem::burst_read/write(...)) to bus transactions. This isn't as easy as translating register accesses where an adapter is enough, mainly because an adapter can't process accesses that are bigger than the underlying bus width.

As an example, let's look at what happens when trying to start a 16 byte memory burst on a 32 bit AHB bus. This could be represented in quite a few ways as sequences of AHB bursts:

  • an INCR4 WORD burst
  • an INCR8 HALFWORD burst
  • an INCR16 BYTE burst

We could also swap out the fixed INCR* bursts for INCR of non-fixed length and send the appropriate number of transfers. We could also represent the INCR4 burst as a four individual SINGLE WORD bursts (and do the same for the HALFWORD and BYTE bursts). Even within these four WORD bursts of length 1, we could send some of them as SINGLE bursts and some as INCR of length 1. We don't even need to start bursts of the same widths; we could send two HALFWORDS, followed by four BYTES, followed by two WORDS. The main point to take away from this paragraph is that there are a lot of possible ways to transfer 16 bytes.

This got me interested as to how many there are exactly. I tried to figure it out on paper using combinatorics, but this turned out to be pretty complicated. Since I wasn't smart enough to do the math, I decided to take the engineering approach and write a program that would count for me.

Counting how many AHB bursts are required is something we could model as a constraint problem. We need to model an AHB transaction and the constraints on its fields:

class ahb_burst;
  rand bit [31:0] address;
  rand enum { SINGLE, INCR, WRAP4, INCR4, WRAP8, INCR8, WRAP16, INCR16 } kind;
  rand enum { BYTE, HALFWORD, WORD } size;
  rand int unsigned incr_length;

  rand int num_transfers;
  rand int unsigned num_bytes;

  // ...
endclass

The num_transfers and num_bytes fields are there to help keep track of how many bytes we're transferring with a certain burst. The AHB spec imposes the following constraints on the fields:

class ahb_burst;
  // ...

  constraint aligned_address {
    size == WORD -> address[1:0] == 0;
    size == HALFWORD -> address[0] == 0;
  }

  constraint legal_incr_size {
    incr_length > 0;
  }

  constraint compute_num_transfers {
    kind == SINGLE -> num_transfers == 1;
    kind == INCR -> num_transfers == incr_length;
    kind inside { WRAP4, INCR4 } -> num_transfers == 4;
    kind inside { WRAP8, INCR8 } -> num_transfers == 8;
    kind inside { WRAP16, INCR16 } -> num_transfers == 16;
  }

  constraint compute_num_bytes {
    num_bytes == num_transfers * 2 ** size;
  }
endclass

We need another class to model the 16 byte memory burst, which will contain instances of AHB bursts:

class mem_burst_16;
  rand ahb_burst bursts[];

  constraint legal_size {
    bursts.size() <= 16;
    bursts.size() > 0;
  }

  function void pre_randomize();
    bursts = new[16];
    foreach (bursts[i])
      bursts[i] = new();
  endfunction

  // ...
endclass

Since randomization can't allocate new objects, we need to pre-allocate the AHB bursts. The solver can always throw away bursts it doesn't need. To keep things simple, let's assume that the first address is 0x0. We need to constrain the bursts to have incrementing addresses. This is important, because the address at which a burst starts determines the possible widths it has. For example, if the first burst is a SINGLE BYTE, then the following burst can't be of width HALFWORD or WORD, because it would start at address 0x1, which isn't aligned.

class mem_burst_16;
  constraint addresses {
    bursts[0].address == 0;
    foreach (bursts[i])
      if (i > 0)
        bursts[i].address == bursts[i-1].address + bursts[i-1].num_bytes;
  }

  // ...
endclass

We also need to constrain the sum of bytes sent in the memory burst to be exactly 16. We'll need a helper field for this:

class mem_burst_16;
  protected rand int unsigned bursts_num_bytes[];

  constraint max_16_bytes {
    foreach (bursts[i])
      bursts[i].kind == ahb_burst::INCR -> bursts[i].incr_length <= 16;

    bursts_num_bytes.size() == bursts.size();
    foreach (bursts[i])
      bursts_num_bytes[i] == bursts[i].num_bytes;
    bursts_num_bytes.sum() == 16;
  }

  // ...
endclass

Now we can start randomizing memory bursts. The idea is to keep a list of bursts we generate. If we encounter a burst we haven't seen before, we add it to the list. If the burst we generate is already in the list, we throw it away. We do this for some large number of iterations. This approach isn't the most efficient, because we're guaranteed to generate the same bursts quite a few times and have to do a lot of discarding. Also, the more our list grows, the more expensive it's going to be to check if a newly generated burst is already in the list.

I won't show the code for the search here (but you can get it from SourceForge). I tried executing it and as expected, it was ridiculously slow. Not only that, the tool started to crash at some point. I was lucky enough to get about 10000 iterations done and I wound up with around 500 unique burst combinations. That's a respectable number, but I wasn't really convinced that was it. Trying to run more iterations led to the tool crashing again.

It was clear that I was going about it all wrong. The idea of describing the state space of our problem using constraints is a good one, but the way we were searching for solutions was flawed. We were basically throwing darts at the dartboard and trying to hit all possible points. We'd need a more efficient way of throwing the darts.

I remember reading about Prolog a while back. Programs written in Prolog are declarative by nature (like our constraints). It seems like a good candidate to try to solve our problem. It's very well suited for expressing logical relationships and by using the clpfd (constraint logic programming over finite domains) package we can also express integer constraints like the ones we use in SystemVerilog. I don't want this post to become a Prolog tutorial, so I won't explain the code in too much detail.

The first thing we need is to be able to generate a single AHB burst:

gen_burst(X) :-
    Kind in 0..7,

    NumTransfers #> 0,
    NumTransfers #=< 16,
    Kind #= 0 #==> NumTransfers #= 1,
    Kind in 2..3 #==> NumTransfers #= 4,
    Kind in 4..5 #==> NumTransfers #= 8,
    Kind in 6..7 #==> NumTransfers #= 16,

    Size in 0..2,

    NumBytes #= NumTransfers * 2 ^ Size,

    Address in 0..15,
    Size #= 1 #==> mod(Address, 2) #= 0,
    Size #= 2 #==> mod(Address, 4) #= 0,

    X = ahb_burst(Kind, Size, Address, NumTransfers, NumBytes).

Since the clpfd package can only work with integers, we can't have nice enumerated names like INCR4 or HALFWORD in our constraints. We need to use their bit vector representations, which makes us lose a bit of readability.

Now that we described what an AHB burst is, we can group more of them together to form a memory burst:

gen_mem_burst(X) :-
    between(1, 16, Len),
    length(Bursts, Len),

    inst_ahb_burst(Bursts),

    % ...


inst_ahb_burst([]).
inst_ahb_burst([X|X1]) :- gen_burst(X), inst_ahb_burst(X1).

We declare a list called Bursts which contains at most 16 elements. The inst_ahb_bursts(...) predicate fills the list with AHB bursts using recursion. Next we constrain the address of the first burst to be 0x0:

gen_mem_burst(X) :-
    % ...

    nth0(0, Bursts, Burst0),
    constrain_address(Burst0, 0),

    % ...


constrain_address(Burst, Address) :-
    get_address(Burst, BurstAddress),
    BurstAddress #= Address,

    get_kind(Burst, Kind),
    get_num_bytes(Burst, NumBytes),
    Kind in {2, 4, 6} #==>
        Address mod NumBytes #= 0.
We've gotten a bit ahead of ourselves here by declaring a generic constrain_address(...) predicate. The idea is to use it to constrain the address of a burst depending on a certain value. For wrapping bursts, we also need to make sure that the address is aligned to the beginning of a line, otherwise we'll be missing values. This is a constraint we missed in our SystemVerilog code. Without it, we might end up with something like a sequence beginning with a SINGLE HALFWORD to address 0x0 followed by a WRAP4 BYTE to address 0x2. The wrapping burst would access address 0x2, 0x3, 0x0 and 0x1, as opposed to its INCR4 counterpart which would access 0x2, 0x3, 0x4 and 0x5. Since we only want to access every address once and only once, we need to exclude such situations.
Using the propagate_address(...) predicate we constrain the address of each subsequent burst, depending on the number of bytes transferred before it:
gen_mem_burst(X) :-
    % ...

    propagate_address(Bursts),

    % ...


propagate_address([_]).
propagate_address([X0, X1 | X]) :-
    get_address(X0, Address0),
    get_num_bytes(X0, NumBytes0),
    constrain_address(X1, Address0 + NumBytes0),
    propagate_address([X1|X]).

This is where the constrain_address(...) predicate becomes useful again. If it wouldn't restrict when we're allowed to start wrapping bursts we would end up missing address locations.

The last constraint we need is that the total number of bytes is 16:

gen_mem_burst(X) :-
    % ...

    get_burst_num_bytes(Bursts, NumBytes),
    sum(NumBytes, #=, 16),

    X = mem_burst(Len, Bursts).

Now we can write a predicate that prints and counts the number of ways we can transfer 16 bytes. At first I tried doing this in one go, but it was too much for my computer to handle. To make it more manageable we can count in different bins, depending on the number of AHB bursts we need. Here's the count(...) predicate:

count(N, R) :-
    gen_mem_burst(X),
    get_num_ahb_bursts(X, N),
    findall(X, label_mem_burst(X), Z),
    %maplist(print_mem_burst, Z),  % uncomment to see the solutions
    length(Z, R).

When starting a single AHB burst to transfer all 16 bytes, it could either be an INCR4 WORD burst, an INCR8 HALFWORD burst or an INCR16 BYTE burst. Instead of INCR* it could just as well be a WRAP* burst or a non-fixed length INCR with the corresponding number of transfers. This leads us to a total of 9 bursts. When starting two AHB bursts to transfer the 16 bytes, there are 115 possible combinations. When starting three, there are already 1591 possible combinations. Here's a table showing how many possible combinations there are per number of bursts and the total:

Num Bursts

Num Combinations
1 9
2 115
3 1591
4 12584
5 68499
6 270482
7 817974
8

1934737

9 3591184
10 5273632
11 6506624
12 5639936
13 3676160
14 1748992
15 507904
16 65536

Total:

30115959

Now that's a ridiculous amount of possibilities. If it would take one second to simulate each of these scenarios, then we'd need to leave our simulator running for about a year. When they say exhaustive verification is impossible these days, they really aren't kidding. Trying out all of them when verifying an AHB design doesn't necessarily make sense, but it's interesting to see just how many there are.

But wait, there's more... We made a logical error regarding wrapping bursts. A wrapping burst doesn't need to start at the address following the previous burst (or 0x0 if it's the first one). It just needs to contain that address. For example, to read addresses 0x0, 0x4, 0x8 and 0xC, we could perform a WRAP4 WORD burst starting at any of these addresses. To reflect this, we need to fix the constraint_address(...) predicate:

constrain_address(Burst, Address) :-
    get_address(Burst, BurstAddress),
    get_kind(Burst, Kind),

    % Incremental bursts
    Kind in {0, 1, 3, 5, 7} #==> BurstAddress #= Address,

    % Wrapping bursts
    get_num_bytes(Burst, NumBytes),
    Kind in {2, 4, 6} #==>
        Address mod NumBytes #= 0 #/\
        BurstAddress #>= Address #/\
        BurstAddress #< Address + NumBytes.

This will increase the number of burst combinations we have. I'll leave it as an exercise for interested readers to compute just how many this is.

What does this have to do with UVM REG? Well, if we were using a classical register adapter to convert this memory burst, we could only generate 16 combinations out of the multitude of possible ones: 4 WORD bursts, where the burst kind could be either SINGLE or INCR with one transfer. If you want to see how to get around this limitation and be able to generate any legal combination, check out my DVCon paper.

5 comments:

  1. Thanks! There's some fun math to try here. How'd your talk go, and where can we find a link to the paper?

    ReplyDelete
    Replies
    1. The talk went ok, but not stellar (mostly, I guess, because it wasn't a topic of general interest). The proceedings should be available online sometime soon. I'll post a link when they are.

      Delete
  2. Great article as usual Tudor! Can't wait for the paper!

    ReplyDelete
  3. Very nice post!

    One small hint: You can pull the unifications directly into the head, shortening your code a bit and making it immediately clear what the argument is:

    gen_mem_burst(mem_burst(Len, Bursts)) :-
    ...

    Actually, you can also improve the name, since this predicate can not only be used to *generate*, but also to *test* the argument. So `mem_burst/1` would be a nice declarative name for a predicate that works in several directions.

    ReplyDelete