The six deadly coding sins concerning volatile Modifier

1. No volatile when it’s needed

Let’s look at a specific example.  Suppose we’re developing software for an AVR 8-bit embedded processor.  Since multiplies are going to happen in software, we’re probably interested in seeing how slow they are, so we know how hard to try to avoid them.  So we write a little benchmark program like this:

/*
 * volatile test.c
 * Created: 7/14/2016 2:51:17 PM
 * Author : Kareem A.Abdullah
 */
#include <stdint.h>
#define TCCR1B (*(uint16_t *)(0x4E))
#define TCNT1  (*(uint16_t *)(0x4C))
uint16_t time_mul (void);
signed char a, b, c;

int main(void)
{
 int process_time;
 process_time=time_mul();
 while (1)
     {
     }
}

uint16_t time_mul (void)
{
 TCCR1B = 0x01;
 uint16_t before = TCNT1;
 c = a * b;
 uint16_t after = TCNT1;
 return after - before;
}

Here TCNT1 points to a hardware register living at address 0x4C.  This register provides access to Timer/Counter 1: a free-running 16-bit timer that we assume is configured to run at some rate appropriate for this experiment.  We read the register before and after the multiply operation, and subtract to find the duration.  Side note: although at first glance this code looks like it fails to account for the case where TCNT1 overflows from 65535 to 0 during the timing run, it actually works properly for all durations between 0 and 65535 ticks.

Unfortunately, when we run this code at optimization level (-O2), it always reports that the multiply operation required zero clock ticks.  To see what went wrong, let us look at the assembly language:

vola

 

Now the problem is obvious: both reads from the TCNT1 register have been eliminated and the function is simply returning the constant zero (avr-gcc returns a 16-bit value in the r24:r25 register pair).

How can the compiler get away with never reading from TCNT1?  First, let’s remember that the meaning of a C program is defined by the abstract machine described in the C standard.  Since the rules for the abstract machine say nothing about hardware registers
the C implementation is permitted to assume that two reads from an object, with no intervening stores, both return the same value.  Of course, any value subtracted from itself is zero.  So the translation performed by avr-gcc here is perfectly correct; it is our program that’s wrong.

To fix the problem, we need to change the code so that TCNT1 points to a volatile location.

#define TCNT1 (*(volatile uint16_t *)(0x4C))

Now, the C implementation is not free to eliminate the reads and also it cannot assume the same value is read both times.  This time the compiler outputs better code:

vola1

Although this assembly code is correct, our C code still contains a latent error.  We’ll explore it later.

Normally, you will find definitions for device registers in system header files.  If so, you will not need to use volatile in this case.  But it may be worth checking that the definitions are correct, they aren’t always.

Let’s look at another example.  In an embedded system you are implementing, some computation must wait for an interrupt handler to fire.  Your code looks like this:

#include <stdint.h>
#include <avr\interrupt.h>

int done=0;
void wait_for_done (void);
void __vector_4 (void) __attribute((SIGNAL));

int main(void)
{
 wait_for_done();
 while (1)
 {
 }
}

void __vector_4 (void)
 {
 done = 1;
 }
void wait_for_done (void)
 {
 while (!done) ;
 }

Here wait_for_done() is designed to be called from the non-interrupt context, whereas __vector_4() will be invoked by the interrupt controller in response to some external event.  We compile this code into assembly:

__vector_4:
  push r0
  in r0,__SREG__
  push r0
  push r24
  ldi r24,lo8(1)
  sts done,r24
  pop r24
  pop r0
  out __SREG__,r0
  pop r0
  reti
wait_for_done:
  lds r24,done
.L3:
  tst r24
  breq .L3
  ret

The code for the interrupt handler looks good: it stores to done as intended.  The rest of the interrupt handler is just AVR interrupt boilerplate.  However, the code for wait_for_done() contains an important flaw: it is spinning on r24 instead of spinning on a RAM location.  This happens because the C abstract machine has no notion of communication between concurrent flows (whether they are threads, interrupts, or anything else).  Again, the translation is perfectly correct, but does not match the developer’s intent.

If we mark done as a volatile variable, the interrupt handler code does not change, but wait_for_done() now looks like this:

wait_for_done:
.L3:
  lds r24,done
  tst r24
  breq .L3
  ret

This code will work.  The issue here is one of visibility.  When you store to a global variable in C, what computations running on the machine are guaranteed to see the store?  When you load from a global variable, what computations are assumed to have produced the value?  In both cases, the answer is “the computation that performs the load or store is assumed to be the only computation that matters.”  That is, C makes no visibility guarantees for normal variable references.  The volatile qualifier forces stores to go to memory and loads to come from memory, giving us a way to ensure visibility across multiple computations (threads, interrupts, co-routines, or whatever).

Again, our C code contains a latent bug that we’ll investigate later.

Summary: The abstract C machine is connected to the actual machine in only a few places.  The memory behavior of the actual machine may be very different from the operations specified in source code.  If you require additional connections between the two levels of abstraction, for example to access device registers, the volatile qualifier can help.

2. Too much volatile when it’s not needed

In a well-designed piece of software, volatile is used exactly where it is needed.  It serves as documentation, saying in effect “this variable does not play by the C rules: it requires a strong connection with the memory subsystem.”  In a system that uses too much volatile, variables will be indiscriminately labeled as volatile, without any technical justification.  There are three reasons why this is bad.  First, it’s bad documentation and will confuse subsequent maintainers.  Second, volatile sometimes has the effect of hiding program bugs such as race conditions(I’ll cover the race condition later in another article).  If your code needs volatile and you don’t understand why, this is probably what is happening.  Far better to actually fix the problem than to rely on a hack you do not understand to solve a problem you do not understand.  Finally, volatile causes inefficiency by handicapping the optimizer.  The overhead that it introduces is hard to track down since it is spread out all over the system.

Summary: Use volatile only when you can provide a precise technical justification.  Volatile is not a substitute for thought.

3. Misplacing the  qualifier!

At the level of C syntax, volatile is a type qualifier.  It can be applied to any type, following rules that are similar to, but not quite the same as, the rules for the const qualifier.  The situation can become confusing when qualified types are used to build up more complex types.  For example, there are four possible ways to qualify a single-level pointer:

int volatile *p;                      // pointer to int volatile
volatile int *p_to_vol;              // pointer to volatile int
int *volatile vol_p;                 // volatile pointer to int
volatile int *volatile vol_p_to_vol; // volatile pointer to volatile int

In each case, either the pointer is volatile or not, and the pointer target is volatile or not.  The distinction is crucial: if you use a “volatile pointer to regular int” to access a device register, the compiler is free to optimize away accesses to the register.  Also, you will get slow code since the compiler will not be free to optimize accesses to the pointer. it’s an easy mistake to make.  It’s also easy to overlook when vetting code since your eye may just be looking for a volatile somewhere.

For example, this code is wrong:

int *volatile REGISTER = 0xfeed;
*REGISTER = new_val;

To write clear, maintainable code using volatile, a reasonable idea is to build up more complex types using typedefs.  For example we could first make a new type “vint” which is a volatile int:

typedef volatile int __IOint;

Next, we create a pointerto __IOint:

__IOint *REGISTER = 0xfeed;

Members of a struct or union can be volatile, and structs/unions can also be volatile.  If an aggregate type is volatile, the effect is the same as making all members volatile.

We might ask, does it make sense to declare an object as both const and volatile?

const volatile int *p;

Although this initially looks like a contradiction, it is not.  The semantics of const in C are “I agree not to try to store to it” rather than “it does not change.”  So in fact this qualification is perfectly meaningful and would even be useful, for example, to declare a timer register than spontaneously changes value, but that should not be stored to.

Summary: Since C’s type declaration syntax is not particularly readable or intuitive, volatile qualifiers must be placed with care.  Typedefs are a useful way to structure complex declarations.

4. Expecting volatile to enforce ordering with non-volatile accesses

The question is: What was wrong with the fixed C code examples above, where we added volatile to the TCNT1 register handle and to the done flag?  The answer, depending on who you believe, is either “nothing” or else “the compiler may reorder the operations in such a way as to create broken output.”

One thought is that compilers may not move accesses to global variables around accesses to volatile variables.  There seems to be a consistent reading of the standard that backs this up.  The problem with this reading is that important compilers are based on a different interpretation, which says that accesses to non-volatile objects can be arbitrarily moved around volatile accesses.

Take this simple example :

volatile int ready;
int message[100];

 

void foo (int i)
 {
 message[i/10] = 42;
 ready = 1;
 }

The purpose of foo() is to store a value into the message array and then set the ready flag so that another interrupt or thread can see the value.  From this code, GCC,and some other compilers  emit very similar assembly:

foo:
  movl 4(%esp), %ecx
  movl $1717986919, %edx
  movl $1, ready
  movl %ecx, %eax
  imull %edx
  sarl $31, %ecx
  sarl $2, %edx
  subl %ecx, %edx
  movl $42, message(,%edx,4)
  ret

Obviously the programmer’s intent is not respected here, since the flag is stored prior to the value being written into the array.  A number of embedded compilers refuse to do this kind of reordering as a deliberate choice to prefer safety over performance.

One way to fix this problem is to declare message as a volatile array.  The C standard is unambiguous that volatile side effects must not move past sequence points, so this will work.  On the other hand, adding more volatile qualifiers may suppress interesting optimizations elsewhere in the program.  Wouldn’t it be nice if we could force data to memory only at selected program points without making things volatile everywhere?

The construct that we need is a “compiler barrier.”  The C standard does not provide this, but many compilers do.  For example, GCC and sufficiently compatible compilers support a memory barrier that looks like this:

asm volatile ("" : : : :"memory");

It means roughly “this inline assembly code, although it contains no instructions, may read or write all of RAM.”  The effect is that the compiler dumps all registers to RAM before the barrier and reloads them afterwards.  Moreover, code motion is not permitted around the barrier in either direction.  Basically a compiler barrier is to an optimizing compiler as a memory barrier is to an out-of-order processor.

We can use a barrier in the code example:

volatile int ready;
int message[100];
void foo (int i)
 {
 message[i/10] = 42;
 asm volatile ("": : : "memory");
 ready = 1;
 }

Now the output is what we wanted:

foo:
  movl 4(%esp), %ecx
  movl $1717986919, %edx
  movl %ecx, %eax
  imull %edx
  sarl $31, %ecx
  sarl $2, %edx
  subl %ecx, %edx
  movl $42, message(,%edx,4)
  movl $1, ready
  ret

What about compilers that fail to support memory barriers?  One bad solution is to hope that this kind of compiler isn’t aggressive enough to move accesses around in a harmful way.  Another bad solution is to insert a call to an external function where you would put the barrier.  Since the compiler doesn’t know what memory will be touched by this function, it may have a barrier-like effect.  A better solution would be to ask your compiler vendor to fix the problem and also to recommend a workaround in the meantime.

Summary: Most compilers can and will move accesses to non-volatile objects around accesses to volatile objects, so don’t rely on the program ordering being respected.

5. Assuming volatile accesses are translated correctly

Compilers are not totally reliable in their translation of accesses to volatile-qualified objects. here’s a quick example:

volatile int x;
void foo (void)
 {
 x = x;
 }

The proper behavior of this code on the actual machine is unambiguous: there should be a load from x, then a store to it.  However, the port of GCC to the MSP430 processor behaves differently:

foo:
  ret

The emitted function is a nop.  It is wrong.  In general, compilers based on gcc 4.x are mostly volatile-correct.

Summary: If your code makes correct use of volatiles and still does not work, consider reading the compiler’s output to make sure it has emitted the proper memory operations.

6. Using volatile to get atomicity

Earlier we saw a case where volatile was used to make a value visible to a concurrently running computation.  This was — in limited circumstances — a valid implementation choice.  On the other hand it is never valid to use volatile to get atomicity.

Somewhat surprisingly for a systems programming language, C does not provide guarantees about atomicity of its memory operations, regardless of the volatility of objects being accessed.  Generally, however, individual compilers will make guarantees such as “aligned accesses to word-sized variables are atomic.”

In most cases, you use locks to get atomicity.  If you’re lucky, you have access to well-designed locks that contain compiler barriers.  If you’re programming on bare metal on an embedded processor, you may not be so lucky.  If you have to devise your own locks, it would be wise to add compiler barriers.  For example, older versions of TinyOS for AVR chips used these functions to acquire and release the global interrupt lock:

char __nesc_atomic_start (void)
 {
 char result = SREG;
 __nesc_disable_interrupt();
 return result;
 }
void __nesc_atomic_end (char save)
{
 SREG = save;
 }

Since these functions can be  inlined, it was always possible for the compiler to move code outside of a critical section.  most people changedthe locks to look like this:

char__nesc_atomic_start(void)
{
char result = SREG;
 __nesc_disable_interrupt();
 asm volatile("" : : : "memory");
 return result;
 }
void __nesc_atomic_end(char save)
 {
 asm volatile("": : : "memory");
 SREG = save;
 }

Perhaps interestingly, this had no effect on TinyOS efficiency and even made the code smaller in some cases.

Summary: Volatile has nothing to do with atomicity.  Use locks.

Conclusion

Optimizing compilers are tricky to reason about, as are out-of-order processors.  Also, the C standard contains some very dark corners.  The volatile qualifier invites all of these difficulties to come together in one place and interact with one another.  Furthermore, it provides weaker guarantees than people commonly assume.

If you any question concerning the article, you can leave it as a comment.follow us on our facebook page and stay tuned.

PART1: HISTORY OF VOLATILE MODIFIER

Advertisements

One thought on “The six deadly coding sins concerning volatile Modifier

  1. Pingback: History of Volatile modifier | Embeddedx

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s