As I think most people in the computer industry are now aware, we have moved from a period of computers having ever-increasing clock speeds, to a period of computers having ever-increasing core count. Although today's workstations and laptops only have 2- or 4-core processors, the days of having 64, 256, 2048 cores are rapidly approaching. Of course, mainframes and supercomputers have had this level of parallelism for many years, but the crucial difference is that this sort of processor count is now widespread and not the preserve of just a small, elite part of the business.
Concurrency has been with us since the beginning of computing; first as time-sharing, later as multi-processing (aka multi-tasking) and then multi-threading. Time-sharing was introduced to increase utilization and availability, multi-processing to allow applications to have concurrency, and multi-threading because multiple processes were often too heavyweight for the concurrency applications needed. As the mainframe server people and the HPC people know, the tools of the multi-threading trade—locks, monitors, semaphores, etc—create overhead.
In many of the processors available today, and many of those available in the near future, the model is of communication between processors through shared memory. Shared memory is also a big problem: many of the problems associated with shared memory, multi-threaded programming are caused by unanticipated changes to memory shared between threads. Debugging such problems can be problematic because the problems are non-deterministic—i.e. you are unlikely to get the same behaviour for two consecutive executions of a program.
Java was a revolution in programming in two ways, it reintroduced virtual machine based languages into the mainstream of computing, and it made support for multi-threaded programming integral to, and at the heart of, the programming language. C++ and C have always relied on external libraries, most notably PThreads and Boost.Threads. As followers of C++ will know though, the new C++ standard (C++0x) introduces threads as an integral part of the C++ language.
Processor manufacturers and language designers are promoting threads as the tool for managing concurrency. Yet it is well known amongst trainers and people actually writing multi-threaded systems that multi-threading is hard, and is the most problematic topic for most programmers. Multi-threaded programming is complex and difficult; it is the source of many errors and bugs; and it is hard to debug multi-threaded systems. So more cores result in more threads, and hence more trouble.
What have the people who have been doing parallel programming for years been doing to combat the problem? The HPC community using Fortran, C++ and C have focused on OpenMP and MPI. MPI is a standard library for handling message passing between processes on different processors. This is really a way of managing a program spread over a cluster of network-connected processors. OpenMP is a way of using annotations in a Fortran, C++ or C program to direct the compiler how to "automatically" parallelize the code for using multiple threads on a multi-threaded system. So MPI seems less relevant to multicore than OpenMP. Could OpenMP be the answer to the problem of programming threads? Possibly, but using OpenMP brings its own problems. It is however a step forward from programming threads directly.
Where is Java going in this increasingly parallel world? The java.util.concurrent package puts the emphasis on lock-free programming, data structures with more fine-grained control of parallel access, and futures. Futures are a way of programming that avoids explicit locking. A future is a value that is being calculated in a parallel thread but the value of which is not needed just yet: execution occurs in parallel until it cannot continue without the value. So threads are executed to the point where they cannot continue and then they block until the value is available. This is very much a fork-join way of working, and in most cases can be used to create the maximum amount of parallelism.
Are futures the future? Well the C++ standards committee seems to think so. The C++0x standard introduces futures as an integral part of the language. This is especially pleasing for me as I led various research programs during the 1990s, the most important of which created and tested a futures-based way of programming using C++.
Are futures the only future? Definitely not, they are but one tool in the armoury. The important thing in concurrency and parallelism is to minimize sharing and the need for critical sections. This can be taken to the extreme, as was done in occam and later Erlang. In these languages, processes with their own memory are the unit of concurrency and parallelism; there is no shared memory per se. The processes communicate using message passing. In a sense occam and Erlang take the MPI model to its logical conclusion. Moreover, there is a mathematical formalism that allows a much more disciplined approach to system construction, Communicating Sequential Processes (CSP). In a way, it is a shame that threads were invented. In many ways it would have been better to stay with lightweight processes, message passing and no shared memory. Of course, this way of working can be imposed on a shared memory, thread-based system, so there is hope.
Is there another alternative? Indeed there is, it is called transactional memory. In this model, the memory itself provides a transaction-based update scheme, very much based on the idea of an atomic database transaction. Sun especially, but also IBM and ARM, and others, but not Intel, are producing or planning processors that offer hardware support for transactional memory. In the x86 world though the only option down this route is software transactional memory (STM). In this model, the atomic commits are handled entirely in software. Perhaps surprisingly, the Haskell community has been at the forefront of experimenting with this model, though Intel have a specialized compiler providing STM for C++. We believe they are using this to understand whether the transactional memory approach has any mileage, and whether they need to go the route already trodden by IBM, ARM and Sun of providing hardware support for transactional memory in their future processors.
So should Fortran, C++, C, Java et al. invest in a transactional memory model? Well it would be a way forward. Should they instead drop the notion of shared memory and create lightweight processes as the tool for applications programmers following the occam and Erlang approach? Possibly. It would certainly remove a large amount of non-determinacy. Should Java follow Fortran, C++ and C and implement OpenMP? Well it is possible, the annotations system allows such a thing to be done. However it seems unlikely as the Java community has become wedded to multi-threading, and the java.util.concurrent package is not finding the penetration of use that it should
Everyone agrees that threads are as much the problem as they are the solution. Application programmers should not be using threads directly, they should be using high-level abstractions. The question is will they realize this? Will they have the gumption to get trained up in the tools and techniques, especially those of algorithm design, before writing their massively parallel applications. We certainly hope so. Two reasons: we want to offer the training, but also the successful exploitation of multicore devices depends on it.
Posted: 17th May 2008 | By Gil Tene :
Interesting overview, with some very good points. Transactional memory is already alive and kicking, and powering production systems in everything from banks and telcos to retail and web portal applications.
As you may be aware, Azul Systems has been shipping Java focused transactional memory systems for the past couple of years, with machines ranging to 768 cores in a flat SMP configuration, so we can speak to the subject of MANY shared-memory cores and practical use of hardware assisted transactional memory from real experience.
Our systems are used to power JVMs on Solaris, Linux, HPUX and AIX, and can be found running a wide variety of commercial Java workloads in the wild today. With so many cores, you REALLY care about Amdahl's law. We've found that with transactional memory supported from the hardware all the way up to the JVM, we've been able to safely treat Java synchronized block as transactions (which is what they are semantically equivalent to), rather than forced-serial blocks (which is simply one valid way of executing them).
You can find some detailed information in our "Optimistic Thread Concurrency" whitepaper at
CTO, Azul Systems
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: Electronicdawn Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761