Facts

Besides Myths, there are also quite some facts related to an RTOS. Here you find a number of them.

- Why Regular Tracing does not work

 

- Why Time-Outs are used wrong

 

Why Regular Tracing does not work

 

Why Time-Outs are used wrong

To prove that a preemptive multithreaded system is working correctly, often some form of tracing is used. Most forms of tracing however are intrusive. The trace code influences the systems behavior in such a way that the result of the trace may not be considered proof of the systems correctness. There are a number of reasons for this. First, the added trace code influences the timing behavior of the system, trace statements take time. Especially when the system contains race conditions these might well go undetected because of the added cycles. With trace statements the system works correctly and without it doesn’t. But let’s assume there are no race conditions. The different threads making up the system are well synchronized and tracing is added to check whether the different threads are activated in the order as dictated by the design. Then another problem pops up. Most types of tracing use a shared resource. Take the most basic form, printf. This is used to write some diagnostic text to a file. Access to this file can not be preempted and therefore when using this type of tracing the printf statements must be executed atomically which is accomplished by using a mutex. The effect of this is that when two or more threads try to trace at the same moment, only one can continue and the others will be preempted. This changes the complete scheduling model of the system. Again, the trace is no proof of the systems correctness. Because printf does impose a relative large performance penalty, often use is made of some circular memory buffer. The threads place trace information in this buffer. This buffer however suffers from the same problem as the file used with printf. It is a shared resource which must be protected by a mutex, again ruining the scheduling. The final solution observed is that each thread is given its own buffer which prevents the aforementioned problem. The trace statements are then written with a time stamp and after the test run the contents of the different buffers is merged based on this time stamp. This has as a big disadvantage that it is complex to implement, uses a lot of system resources (time and memory) and still is not completely reliable since the resolution of the time stamps need to be very high in order to obtain a reliable model of the thread activation

Even if one succeeds in adding tracing while preventing the mentioned problems, still tracing in user code does often not provide the desired information. In principle a thread can be preempted at any point in the code. It might well be the preemption takes place just before the trace statement,  making it look as if the thread ran at a later moment. Likewise the thread can be preempted immediately after the trace statement leading to the inverse problem. Often this is solved by adding many trace statements which in turn magnifies the aforementioned problems. The actual difficulty with this type of tracing is that it is added to user code and not to the scheduler itself. Only the scheduler has the correct information as when a thread is activated or deactivated.
Top of page...

 

Most real time kernels offer a time-out mechanism by specifying a time-out value with every function that can wait for a resource. Why do we at AVIX-RT think this is wrong? To start, one should think of different categories of errors. Seen from a certain level of abstraction there are two types of errors. The first type is related to the domain an embedded system is targeting. These are the things that can go wrong in the context the embedded system is operating in. Examples of these are: A communication link breaking down, an oil pressure measured by the system becoming too high etc. In other words, these are the events that should be handled by the embedded system since that is the goal it has been developed for. The second category are software errors. These are the type of errors we, as software people make and that should be found during development and testing. Think of illegal memory access, divide by zero etc. Most of the time both categories are called errors but things become more clear if we make more explicit what we mean. The first category I will call domain errors, the second software errors.

Second, one can have a look at a system to determine where a systems behavior is deterministic. Generally accepted is that a software system should behave in a deterministic fashion. Problem here is that on the interface with the outside world, the system as a whole can never rely on everything happening in a deterministic fashion. One can not tell if and when an oil pressure becomes too high or when a communication link will break down. Nevertheless, the desire is to make the system as deterministic as possible. For this reason a clean approach is to divide the system in an internal part and an external part, the latter forming the interface to the outside world. It is on this interface that time-out’s conceptually prove to be useful. For a thread waiting for an oil pressure to become too high, it is acceptable that it wakes up now and then just to do something else, even if this just is to let other parts of the system know it it still alive and the system as a whole is behaving as intended. Once more, this is a conceptual time-out.

As it proves, this waiting for the outside world is done using RTOS primitives and it is for this reason most vendors placed a time-out capability on the RTOS primitives. Through this time-out capability, a thread can be awakened when the desired event does not occur and finish other tasks. Although acceptable on the interface to the outside world, the inside of the system should behave deterministic. When two threads battle over a mutex, one of the thread wins and the other is preempted. Actually in places like this a time-out makes no sense. Since the time-out is available on the RTOS primitive however, programmers have started to use it in these places too, and sometimes they even have a good reason to do so. Top of page...

AVIX-RT © 2006-2015, All Rights Reserved

Legal Disclaimer

Privacy Policy