Photo by Markus Winkler on Unsplash

Concurrency in Ruby has always been somewhat of a black box to me. Many of its traits I’ve just taken for granted and never really dived deeper to understand it better. Its most prominent feature, GVL (Global VM Lock)*, which forbids true parallelism and makes our threads run only one at a time (I will talk about it briefly in the next section) is particularly interesting. It begs several fundamental questions, like:

  • When does Ruby put the currently running thread on the shelf, and start another one?
  • Is there some kind of timer, that tracks how long a thread’s been running and stops it when it exceeds its timeslice?
  • Does a thread voluntarily hands the control off to the next one or is it forced to do so by some other entity?

To answer these questions and to fill in some of my knowledge gaps, I’ve decided to put on my detective hat and go on a journey through Ruby internals. I’d be more than happy if you joined me on that trip!

* This article is all about CRuby/MRI 2.x. Other Ruby implementations may have very different threading scheme.

Concurrency Model in Ruby

Before we hit the road, let’s take a step back to quickly go over the concurrency model in Ruby. There are a lot of great articles, books, and other sources1,2, which thoroughly explain this topic, so I won’t go too deep into details here. Instead, let me just talk you through the most important parts. It may come in handy during our journey later on.

Threads Implementation

What happens when you create a new thread in Ruby? Under the hood, Ruby schedules a brand new OS native thread. To illustrate that, I’ve prepared a simple script that starts two threads running an empty, infinite loop:

t1 = Thread.new do
   loop {}
end
t1.name = 'Thread 1'
 
t2 = Thread.new do
   loop {}
end
t2.name = 'Thread 2'
[t1, t2].each(&:join)

If you ran this program, and then looked at the list of active tasks in your system, you’d get something similar to this:

$ top -H -p <Ruby process PID>
PID    USER    COMMAND
278    root    Thread 1
279    root    Thread 2
276    root    ruby

Clearly, it shows that both of them were started as independent OS threads.

Global VM Lock

Does the 1-to-1 mapping between Ruby and OS threads mean, that our code can run in parallel on multicore architectures? Unfortunately not. Despite using native threads, there’s a machinery that prevents multiple tasks from executing at the same time. This mechanism is called Global VM Lock or GVL, for short. Only one thread, currently possessing the lock, can be active. One question may come to your mind right now - why would anyone want to limit the number of active threads to just one? Well, this is a very good question.

One of the reasons is that the MRI implementation is not thread-safe. Internal structures of Ruby would easily get corrupted if two tasks were trying to access the same object at the same time. Of course, there are some tools, like locks or mutexes, which could be used to prevent it. That being said, this is a much broader topic that deserves separate article(s) on its own3. For this article, we just need to be aware of the GVL’s existence and its implications.

Closer Look at the Internals

After this rather short introduction, we are finally ready to let our adventure begin. However, every journey has to start somewhere, right? Where should we kick off ours then? Ruby source code sounds like a good choice, wouldn’t you agree?

Implementation of threads in Ruby is stored in the thread.c file. Here you can see, with your own eyes, the guts that make up every thread in your program. If you look for a single source of truth, this is the right place.

There’s a short description of the thread design at the beginning of that file:

A thread has mutex (GVL: Global VM Lock or Giant VM Lock) can run. When thread scheduling, running thread release GVL. If running thread try blocking operation, this thread must release GVL and another thread can continue this flow. After blocking operation, thread must check interrupt (RUBY_VM_CHECK_INTS).

This note tells us a few interesting things:

  • Thread has to acquire the lock to run
  • Thread releases the lock by itself
  • The lock is being released during blocking operations
  • RUBY_VM_CHECK_INTS has something to do with interrupt handling

First of all, it confirms the existence of GVL and multitasking limit which we’ve discussed previously. Then, we learn that threads are non-preemptive at the implementation level - they are responsible for releasing the lock to let another thread continue its flow. This answers one of our questions, great!

Another interesting insight is that the lock is released when our code starts a blocking operation, like reading or writing to a file. Is it the only thing that triggers context switching? If that’s the case, it’d mean that a thread not issuing any I/O calls would prevent other threads from running. It would starve them! It doesn’t sound good, does it? Let’s verify if that’s what happens!

Experiment 1

What kind of non-blocking operations can we do to track the execution order of our threads? We could use an in-memory data structure for storing their trails. Ideally, something that remembers the order of interactions with it. There is one fundamental data structure that has this property. The good old array.

How can we employ it? How about making every thread push its ID many times to a shared array? As a result, we’d get an array filled with thread IDs. Since each ID is distinct and belongs to only one thread, we can determine in which order threads were running (by looking at positions of their IDs in the array). That sounds like a good plan. Below is a simple script that does exactly that:

def experiment_1
  array = []

  t1 = Thread.new do
    1_000_000.times do
      array << "T1"
    end
  end

  t2 = Thread.new do
    1_000_000.times do
      array << "T2"
    end
  end

  t1.join
  t2.join
end

If you run this code you will get an array of size 2,000,000, filled with two values: T1 and T2. After that, we can just group consecutive values and count them. It’ll tell us how many push operations one thread performs before yielding control to the second one. Below is a simple visualization of such groupings (the length of a group corresponds to the number of push operations):

As we can see, threads were definitely taking turns. For some time T1 was adding elements to an array and then it passed control to T2, and so on. It means that, even without having any blocking calls, threads were being switched from time to time. Is there something else then, that ensures context switching even in the absence of I/O operations? Apparently, there is! Let’s go back to that mysterious RUBY_VM_CHECK_INTS expression to see what’s behind this magic formula.

RUBY_VM_CHECK_INTS

If you look for its definition, you’ll find it in the vm_core.h file:

#define RUBY_VM_CHECK_INTS(ec) rb_vm_check_ints(ec)
static inline void
rb_vm_check_ints(rb_execution_context_t *ec)
{
   VM_ASSERT(ec == GET_EC());
   if (UNLIKELY(RUBY_VM_INTERRUPTED_ANY(ec))) {
       rb_threadptr_execute_interrupts(rb_ec_thread_ptr(ec), 0);
   }
}

As you can see, it’s just a macro that invokes another function called rb_threadptr_execute_interrupts, defined in the thread.c file. Its implementation is much more complex, but a few interesting things are going on in there. First of all, as the name suggests, this routine handles a bunch of different interrupts. Secondly, there is one particularly interesting interrupt being dealt with, namely timer_interrupt:

rb_threadptr_execute_interrupts(rb_thread_t *th, int blocking_timing)
{
    ...
    if (timer_interrupt) {
        ...
        rb_thread_schedule_limits(limits_us);
   }
   ...
}

That looks promising! Judging by the name of the routine inside the if block, it has something to do with thread scheduling. Let’s take a look at the actual implementation of that function, to confirm our suspicions:

static void rb_thread_schedule_limits(uint32_t limits_us)
{
   if (!rb_thread_alone()) {
       rb_thread_t *th = GET_THREAD();
       if (th->running_time_us >= limits_us) {
           gvl_yield(th->vm, th);
           rb_thread_set_current(th);
       }
   }
}

Great! It looks like we’ve finally found the place we were looking for. Without a doubt, this is a control tower of our threads - it stops the current one to make room for others. The lock release takes place when two conditions are met:

  • There is more than one living thread
  • Time given to the current thread has elapsed

That makes sense. We are not really interested in yielding the control if there’s only one thread within a Ruby process. If there are other threads though, we need to routinely check if the time given to the current one elapsed, and eventually switch it to another task.

We can finally say that the RUBY_VM_CHECK_INTS macro is responsible for thread scheduling. What does it tell us? Well, we can for example look where it’s used to understand which operations, other than I/O calls, trigger context switching. It turns out that, except for blocking calls, this macro is being used during several other activities, like fibers switching or waiting for a child process to finish. However, there is one particularly interesting place in which this macro is used - the insns.def file. The place where YARV instructions are defined.

What is YARV? It’s an acronym for Yet another Ruby VM - a virtual machine of Ruby which executes our code. Before our code can be interpreted in YARV, it has to be first compiled to a series of low-level instructions. These instructions are defined in the insns.def file.

What kind of YARV commands trigger the RUBY_VM_CHECK_INTS macro? Here is the full list:

  • leave
  • throw
  • jump
  • branchif
  • branchunless
  • branchnil

If you disassemble your Ruby code, you’ll notice that these instructions form the basis of fundamental control structures, like loops, conditional statements, function calls, or error raising. Below is a very simple example, showing a disassembled form of a while loop:

code = <<-CODE
while true do
  puts 'Hello'
end
CODE
 
puts RubyVM::InstructionSequence.compile(code).disasm
0006 trace	1
0008 putself
0009 putstring	"Hello"
0011 opt_send_without_block <callinfo!mid:puts, argc:1, FCALL|ARGS_SIMPLE>, <callcache>
0014 pop
0015 jump	6

As we can see, the jump command (the last YARV instruction), lies at the heart of our infinite while loop. After calling the puts method, it makes the VM jump back to the instruction at the 0006 address, which is the starting point of our loop. When Ruby processes the jump command, it goes through the scheduling functions we’ve seen before and makes our thread release the lock.

Now it all makes sense! In Experiment 1, we didn’t have any blocking operations but since we had a loop, it allowed our threads to take turns. Could we then write code that prevents context switching entirely?

If we cannot use any blocking calls, loops, or call any methods, that may sound like a cumbersome task. There are certain operations though, that can be done without using these structures.

The well-known << operator needs no introduction. It works like the Array#push method - appends a given element to the end of an array. However, there is one important difference between them. The << operator is not a function call. Instead, it’s represented by a single YARV instruction: opt_ltlt. What’s more, it’s not one of those commands that invoke the RUBY_VM_CHECK_INTS macro internally. How could we leverage that? For example, we can write a very long routine that appends different values to an array, many times. Something like that:

array << 1
array << 2
array << 3
# … repeat x times
array << 9999999
Experiment 2

Our test would be very similar to the first experiment. The only difference is that, instead of using a loop to populate an array, we will call the << operator for each element explicitly, as a separate command. However, such a program would be very long, so instead of doing that, I’ll use the eval method that takes a string and evaluates it as Ruby code:

def experiment_2
  array = []

  code_t_1 = 1_000_000.times.map { "array << 'T1'" }.join(';')
  code_t_2 = 1_000_000.times.map { "array << 'T2'" }.join(';')

  t1 = Thread.new do
    eval(code_t_1)
  end

  t2 = Thread.new do
    eval(code_t_2)
  end

  t1.join
  t2.join
end

We can use the same strategy, as we did during Experiment 1, to determine the order in which threads were running.

Here is the result:

Wow! That’s cool. We were able to turn off the thread scheduling completely! As we can see, Thread T1 released the lock only after it finished populating the array, with one million copies of the T1 value. T2 hadn’t been able to kick in until then.

That’s a really interesting observation. It proves that thread switching may happen only during some specific operations (mentioned earlier). If our thread doesn’t execute any of them, it never gets interrupted.

Summary

After our little investigation, we can summarize what we’ve learned along the way.

The first discovery was that Ruby threads are non-preemptive (at the C level) - they voluntarily yield GVL to enable concurrency. Then, we were told that the scheduling takes place when the currently running thread performs an I/O call. On top of that, it also periodically checks if its time slice has elapsed and releases the CPU accordingly. These time checks are run only during particular operations, like:

  • Fibers switching
  • Waiting for the termination of a child process
  • Processing special YARV instructions

Hopefully, some of the findings shown in this article were interesting to you and shed more light on how threads work under the hood. That knowledge led us to an interesting experiment, which showed that if we don’t use any control structures and function calls, we could starve other threads, which are waiting for their time. By all means, this is not a very realistic scenario since it’s hard to imagine a program without loops or function calls.

Nevertheless, it feels good when you understand how tools that you use on a daily basis work internally. I had a really good time playing around with the scheduling scheme in Ruby, and I hope that you enjoyed it as well.

References

1) honeybadger.io/blog/ruby-concurrency-parallelism
2) ruby-hacking-guide.github.io/thread
3) speedshop.co/2020/05/11/the-ruby-gvl-and-scaling