Monday, March 10, 2014

A Subtle Gotcha When Using fork...join

I want to start things out light with a recent experience I've had using fork...join statements.

My intention was to start several parallel threads inside a for loop and have them take the loop parameter as an input. I naively assumed the following code would do the job:

module top;

  initial begin
    for (int i = 0; i < 3; i++)
      fork
        some_task(i);
      join_none
  end

  task some_task(int i);
    $display("i = %d", i);
  endtask

endmodule

Do you see the problem with this code? If not, don't worry as I didn't at first either. Here is the output if you run it:

# i =           3
# i =           3
# i =           3

It seems that the for loop executed completely and only then did the spawned processes start, but they were given the latest value of i.

After digging around on the Internet I discovered the answer. The SystemVerilog LRM mentions that "spawned processes do not start executing until the parent thread executes a blocking statement". This explains why the for loop finished executing and why by the time the processes were spawned i had already reached the value '3'.

The LRM also shows exactly how to fix our problem: "Automatic variables declared in the scope of the fork...join block shall be initialized to the initialization value whenever execution enters their scope, and before any processes are spawned". Applying this to our code yields:

module top;

  initial begin
    for (int i = 0; i < 3; i++)
      fork
        automatic int j = i;
        some_task(j);
      join_none
  end

  task some_task(int i);
    $display("i = %d", i);
  endtask

endmodule

Now, for every loop iteration a new variable is allocated, which is then passed to the respective task. Running this example does indeed give the desired result:

# i =           2
# i =           1
# i =           0

I've been playing with parallel processes for quite some time now, but I didn't know about this until recently. I've mostly learned SystemVerilog from online tutorials, but I see that a quick read of the LRM might be of real use. Who knows what other tricks are in there?

I'll keep you updated with any more subtleties I find.

6 comments:

  1. in "desired result", why are the values in reversed order? I guess that's a copy/past typo

    btw, another trick you can play with in this is to add "#0;" after join_none in for loop.

    ReplyDelete
    Replies
    1. The order in which the spawned threads are scheduled is not defined. This particular simulator chose to switch to the last spawned thread first (the thread that prints 2). It's actually pretty cool that it did that, because this way we can also see that we shouldn't rely on any specific execution order when it comes to parallel threads, unless we use explicit synchronization mechanisms (such as semaphores).

      Delete
    2. Also, thanks for the tip with the extra #0 delay. This works inside tasks, where you're allowed to consume time, but if you have a function that spawns multiple threads, then you won't be allowed to do any delays (# statements, @ statements, etc.).

      Delete
    3. You are right about the order. It's just that I've never seen reversed order with the simulators I used.
      As to the #0, it's particularly useful when fork/join_none is used and you want to get the spawned thread run without advancing time. In fact, it's used in UVM source codes.
      Good posts~

      Delete
    4. Using #0 is the wrong thing to do. It is a hack that is a short term fix that creates problems later. The #0's in the UVM and other code create unnatural ordering issue that are fixed by adding extraneous loops of #0s. The problem using them to fix the example here is that instead of getting parallel threads, you get a series of ordered threads delayed by one delta cycle. -dave_59

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete