echelon


A simple lock-free queue in C++

A while ago I needed a simple lock-free queue in C++ that supports a single producer as well as one consumer. The reason to use a lock-free data structure instead of a regular std::queue guarded by mutex and condition variable for me was to avoid the overhead of exactly those mechanisms. Also since I never came in contact with the interesting concept of lock-free data types, I thought this was an excellent opportunity to look into the matter.

You can never go too far, image by Baboon

Unfortunately (or fortunately) I could not find a production ready lock-free queue in C++ for my platform (Linux). There is the concurrent_queue class available in Visual Studio 2010, there’s the unofficial Boost.Lockfree library (code is here) but those options were not satisfying. I needed something simple that worked right out of the box.

Herb Sutter wrote an excellent article a while back about designing a lock-free queue. It’s straight forward and functional. The only flaw it has from my point of view is the use of C++0x atomics which I wanted to refrain from including in my project for now. There’s of course the unofficial Boost.Atomic library, but again I wanted something simple.

So I reread Herbs article, especially the passage that explains the requirements for a “lock-free-safe” variable: atomicity, order and the availability of a compare and swap operation. Digging into the Boost.Atomic code I found that it is rather simple to get atomicity and order when using gcc. I’m using version 4.4.3.

Atomicity can be ensured using the atomic functions of gcc and order can be ensured using a memory barrier trick:

type __sync_lock_test_and_set (type *ptr, type value)
asm volatile("" ::: "memory")

The first line is an atomic test and set operation and the second line is equivalent to a full memory barrier. With this functionality I was all set. Here is the modified code for the method the producer calls:

void produce(const T& t)
{
  last->next = new Node(t);                      // add the new item
  asm volatile("" ::: "memory");                   // memory barrier
  (void)__sync_lock_test_and_set(&last, last->next);
  while(first != divider)                       // trim unused nodes
  {
    Node* tmp = first;
    first = first->next;
    delete tmp;
  }
}

I replaced the atomic members with regular pointers but exchanged the assignments that require atomicity and order by calls to __sync_lock_test_and_set preceded by the full memory barrier trick. The function call must be cast to void to prevent a “value computed is not used” warning.

bool consume(T& result)
{
  if(divider != last)                        // if queue is nonempty
  {
    result = divider->next->value; 	              // C: copy it back
    asm volatile("" ::: "memory");                 // memory barrier
    (void)__sync_lock_test_and_set(÷r, divider->next);
    return true;          	                       // report success
  }
  return false;           	                    // else report empty
}

The full class can be downloaded here. Please note that the class lacks some important functionality if your use-case is non-trivial (a proper copy-constructor comes to mind).

What it is all about

There is a collection of definitions of lock-free and non-blocking and wait-free algorithms and data structures that can be interesting to read.

The idea of lock-free data structures is that they are written in a way that allows simultaneous access by multiple threads without the use of critical sections. They must be defined in a way that at any point in time any thread that has access to the data sees a well defined state of the structure. A modification was either done or not done, there is no in-between, no inconsistency.

This is why all critical modifications to the structure are implemented as atomic operations. It also explains why order is important: before critical stuff can be done, preparations have to be finished and must be seen by all participants.



2 Comments

#1 Yevgeni wrote on April 18, 2012:

There is a google cross-platform implementation of atomic operations in gperftools and chromium projects. The implementation can be seen here:

http://code.google.com/p/gperftools/source/browse/trunk/src/base/?r=109

Did not try it myself, but intend to do it in the near future.

#2 virgo wrote on August 16, 2012:

I added one more trivial yet useful function.

bool empty()
{
return(divider == last);
}

Sorry, the comment form is closed at this time.