specSTM: Software Transactional Memory with Thread-Level Speculation Support
Multicore architectures are already the norm for most commodity computing devices. Each year, affordable multicore machines are including more and more cores. While quad-core processors supporting up to eight simultaneous hardware threads are already regarded as commodity hardware, hexa-cores and octo-core or even chips with tens or even hundreds of cores promise to be an affordable reality in the close future.
This trend calls for concurrent programs that expose enough parallelism to maximize the utilization of such increasing computing resources. Yet, concurrent programs are significantly more difficult to code than sequential ones, especially for inexpert programmers. As the number of available cores grows each year, the challenge of coding programs that exploit that much parallelism becomes more and more amplified. Although new emerging paradigms such as Transactional Memory or Thread-Level Speculation promise to simplify concurrent programming, they still exhibit key limitations when the goal is to expose high levels of parallelism.
The traditional approach for a programmer that departs from some original single-threaded program that he wishes to parallelize consists of two main tasks: i) to identify code blocks that can run concurrently and fork them into threads; and ii) to identify and synchronize atomic critical sections in each thread’s code. Both tasks constitute hard challenges for any programmer.
In this project, we advocate that that combining TM and TLS is not only possible as it can boost parallelism that each individual approach would attain at the cost of comparable overheads. The key insight of the specSTM project is that the levels of parallelism that are individually attainable through TM or TLS can be multiplied if both approaches are efficiently combined. specSTM aims to unify TM and TLS in a single software run-time environment, building the first hybrid TLS+TM run-time
environment.
Building from our experience in STM, specSTM project will depart from stand-alone STM algorithms combine them with TLS support. Hence, specSTM will leverage STM with the ability to automatically parallelize each thread forked by the programmer (in the multi-threaded TM program). More precisely, specSTM will divide the code of the currently active transaction at each thread into multiple tasks that will run in parallel. If no conflicts arise among the multiple tasks, then the transaction can commit earlier. Furthermore, specSTM can even be more optimistic and speculative execute future transactions of a thread, even when the current transaction in that thread is still active. If the current transaction commits and the transaction(s) running in out of order tasks spawned by TLS did not perform any read that violates the corresponding thread's program order, then further parallelism is possible.
Building an efficient hybrid run-time poses a number of challenges. Firstly, a naïve combination of TLS and TM can easily result in a space and time overhead explosion due to undesirable collateral effects of each approach on the other. Furthermore, well-studied issues such as task division and spawning in TLS or contention management in TM are designed for stand-alone scenarios, and may lead to pathological situations in specSTM’s combined scenario.
To achieve our goals, we propose to work on the following main topics:
- Unified TLS+STM algorithms, that multiply the levels of parallelism that TM achieves by the parallelism that TLS extracts from each thread, while retaining acceptable execution overheads;
- Efficient support for task nesting, in order to enable advanced parallelization methods;
- Strategies for dividing the multi-threaded program into tasks and for deciding whether or not to spawn each such task;
- Mechanisms for controlling the side-effects of inconsistent speculative reads;
- Support for replication on specSTM, for increased dependability.
To evaluate the results of this work, we plan to use the reference TM benchmarks and reference TLS benchmarks. The first collection will allow us to measure the effective speed-ups that specSTM achieves relatively to a TM-only run-time, whereas the second one will provide us with results relatively to state of the art TLS engines.
Main coordinator: João Pedro Barreto
Partners: LPD/EPFL

