What is a Race Condition?
The definition of race condition is the same as what is defined as data race by the Java Language Specification (JLS).
http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html - jls-17.4.5
“Race condition is a condition when multiple threads are accessing shared memory in undetermined order, and when at least one access is for “write” i.e. modifying the memory content”.
Please note that the term “Data Race” is just a subset of the more general term “Race Condition”
FAQ About Multithreading Reliability
If you would have searched Stackoverflow.com on the term “race condition”, in February 2015, you would have found 12,462 hits on the questions. In October, 2016 the number of questions is 14,118. The questions are of two categories: "Is this ... a race condition?" and "How to locate a race condition in a specific ... code". That is approximately 100 additional questions per month, where each question has multiple answers.
http://stackoverflow.com/search?q=race+condition
Here is an answer that offers a generic view on the subject and provides its reasoning.
In a very abstract language, a race condition is a condition of race, a condition of intermittently unpredictable results.
Race conditions are one of the most challenging issues in contemporary programming and are a primary cause of unstable, intermittent, and unreliable software behavior. They can not be properly diagnosed by traditional debuggers (see further) or by log files (see further) and the cognitive, 'between the ears' approach to solve the issues were proven to provide over 30% of improper fixes, even when the presence of race conditions was noticed.
How Do You Detect Race Conditions?
The solution for detecting all EXPERIENCED race conditions in the context of multithreading exists and the problem is absolutely decidable by a proper dynamic analysis tool with 0% false positive result. The technology was build by our team at Thinking Software, Inc. and the tool is called Race Catcher™.
Can You Prevent Race Conditions From Occuring?
Cognitive reasoning of race conditions analysis has proven to be a difficult for humans task. Using specially built libraries is also requiring not making cognitive mistakes
“If debugging is the process of removing bugs, then programming must be the process of putting them in.” (Edsger W. Dijkstra)
We cannot prevent them from occurring, HOWEVER we can immediately identify them upon their very first manifestation, and prevent them from re-occurring, much like we can not prevent misspellings or syntax errors from their first manifestation.
Being able to identify misspellings or software syntax errors statically is defined by their static nature. They are manifested as soon as you typed them. Race conditions have a dynamic nature and they manifest dynamically.
Having a proper tool that catches and automatically diagnoses them upon their very first occurrence has the same effect on saving one's time and on the final result's reliability as you get from a built-in syntax checker that catches all manifested during one's writing spelling errors.
Are “data race” and “race condition”, two different sets of conditions? Is one a subset of another? Are these the same conditions?
Race Catcher™ is addressing the generic issue of uncertainty of events ordering possible due to the freedom of a generic JVM Thread Scheduler.
Separating ‘race condition’ in the context of software program and ‘data race’ is not done by the book and does not address the real issue of eliminating the intermittent incorrectness in results and providing a higher level of software reliability.
The Java Language Specification (JLS) http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html - jls-17.4.5
formally defines “data race ”using 'happens-before-relatioships' between actions within a process. It in turn defines 'happens-before' via order of the actions and visibility of their result by the following ordered actions.
The disconnect between the proponents of defining “data race” separately comes from the notion of what is "simultaneous" or "concurrent" access to a shared memory. How simultaneous is "simultaneous"?
The answer is obviously not here, since what we are trying to define is really the uncertainty of ordering. Is it that “read – modify- write back” series of operations from two or more threads have to occur so simultaneously that before one writes back, the other one reads. Or is it sufficient to say that the 'simultaneously' means that one event can come before or after another, or on top of another in absolute time such that it would cause overlapping one thread’s “read-modify-write” events with another thread’s “read-modify-write” events or with another thread’s “read” event.
While defining the rules for correctly synchronized programs, JLS is using the terms “happened before” hb(x,y) – meaning ‘x’ must happen before ‘y’ and that the result of ‘x’ must be “visible” to ‘y’. The specification does not speak about that the hb(x,y) must refer only to the operations of “read-modify- write back” components of ‘write’, but speaks in general of any events that are intended to be ordered for the correct execution of the intended algorithm, no matter what reordering a JVM’s thread scheduler may decide to make.
Is race detection an un-decidable problem? Is it even possible to find one using any tool at all?
Properly built dynamic analysis tool will immediately pinpoint and automatically diagnose 'race conditions' (or data races). As mentioned above, a 'race condition' has to manifest itself (it has to happen) to be diagnosed by such tool, however the result will be immediate and 0% false positive.
Is the presence of context switching required for a race condition to occur?
Context switching is not required for race condition to be experienced when more than one core is involved in running the process.
Is it possible to “debug” a race condition using a debugger? Should one use logging to “debug” a race?
Debuggers will not help you catch a race, since debugging environment debugs the debugging environment. The thread scheduler is presented there with completely different sets of threads and locks.
Using logging to debug a race and tracing backwards to understand the race is also simply impractical for any sufficiently complex multithreading application. Another point to make is that logging to a file will create additional synchronization, which will disappear as soon as the logging is disabled.
Can we label some race conditions as “benign”?
The question of "Which race condition can be called “benign” and can be ignored?" is best answered here: ‘How to miscompile programs with “benign” data races’
http://hboehm.info/boehm-hotpar11.pdf
The point is that what one may see as “benign” can easily become very harmful as a result of different compiler optimizations.
The best approach to this question is “Just say No to “benign” races” as it is well said in the article ‘Benign data races: what could possibly go wrong’ https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
What technologies are available to address detection of race conditions?
Static analysis tools . The shortcomings:
- a) Traditionally accepted shortcomings of static analysis tools are their large rate of false positive diagnosis. The false positive results come from assumptions that specific states are possible, when in fact they are not, but the reasoning behind such understanding would be too complex.
- b) The other shortcoming of static analysis tools is in missing actual races. That is due to the fact that static analysis tools have to address unlimited combinations of states (“State explosion” issue). http://babelfish.arc.nasa.gov/trac/jpf/wiki/intro/testing_vs_model_checking
Thus they are approaching the subject by studying subsets that they can chew on and as such are missing actual races.
Dynamic analysis tools: the traditional shortcoming is in large overhead prohibiting their use in production, however not all dynamic analysis tools are created equal.
However, the tool that we have built, Race Catcher™, after years of working on different optimizations, provides overhead that is 100s of times smaller than some other dynamic analysis tools, and is actually usable in production.
A good dynamic code analyzer provides 0% false positive results. This is because it pinpoints and analyzes races that have been actually manifested.