GCC for Transactional Memory


Page for collaborative developments on GCC for Transactional Memory (TM).

The project is part of the compilation platform in HiPEAC. Please provide feedback on the features and the preliminary design. Discussion holds topics of interest for TM-researchers. Further, an email address is found at the end of the design document. Thank you very much.

Current Efforts

This section describes the ongoing efforts as well as achievements from the near past. It is updated regularly and hopes to provoke some comments. Until now, the most effort was spent on literature studies to setup the goals of the project and create a roadmap. Goals and influential literature can be found in the section Design Document below. Currently we are investigating how to do implement checkpointing in GCC and how to instrument with calls to the STM run-time.

Weekly Reports

01.06. - 07.06.
Presentation of the roadmap and goals at the Compilation Platform meeting at the Computing Systems Week in Barcelona. Concerns were expressed about the weak isolation memory model, targeted by our work. Further, the task concept of OpenMP 3.0 will influence the implementation and the interaction of tasks and transactions is yet undefined. A last question addressed the compiler optimisations implemented during the project, which is not our main focus.
In case you are interested in participating in the discussion, please go to the section Discussion or send us an email, if you would like to see a new topic created.
Other points were attending a meeting with participants from the Velox project, RedHat, AMD, BSC and Intel to coordinate the efforts in TM.
First meeting with Andrea Marongiu at INRIA to work on the GCC implementation.

08.06. - 14.06.
Announced the project on the GCC mailing list, hoping to get some feedback from people working on the TUPLE and the GOMP-3.0 branches.
Close inspection of the OpenTM implementation with emphasis on OpenMP-based code. Identifying the differences to the bare TM implementation and learn from their approach. Revisited the interface of the TinySTM including the source code to acquire a deeper understanding of the STM library.

15.06. - 21.06.
Advancing to the Tanger environment to see how code generation for the TinySTM is done. Tanger's checkpointing mechanism copies the active stack frame before entering a transaction. The mechanism is necessary to undo effects of aborted transactions in that particular stack frame. Compared to Intel's rather sophisticated scheme, it seemed a simple and reasonable solution to drop the instrumentation of stack variables. A following investigation in the GCC revealed that stack specific information is not available at the level of Gimple. The idea of writing an RTL pass just to copy the stack is not to tempting. Any ideas on how to solve the problem (preferably at Gimple level) are appreciated.
A first version of the Wiki is setup.

22.06. - 28.06.
Thanks to Torvald, we gained some new insights in the Tanger checkpointing. Unfortunately, in GCC the unavailability of stack information at Gimple level persists and, thus, we opt for a different scheme. Similar to the one proposed by Intel, we want to introduce two basic blocks (one to capture the contents of live-in variables and one to restore the values in case of a rollback). This includes moving to SSA form and hoping to arrange with the optimisations that take place.
The implementation advanced in the following ways: The "lowering" pass, resembling the OpenMP structure was dropped in favour of some additional modifications of the parser. The "expansion" pass remains and inserts calls to the STM runtime. The instrumentation of variables in transactions was implemented, but further refinements are necessary.

29.06. - 05.07.
A first step was to find a way to introduce the call to setjmp. Simply using the GCC builtin did not work, because the compiler assumes, that the BUILT_IN_SETJMP was already processed and converted in three separate calls. Correspondingly GCC triggers an exception. The current implementation works around this issue by building the function call directly. First tests are very promising. Surprisingly, a builtin for sigsetjmp was not found in GCC.
As a next step a few issues with the control flow graph were solved. The code cleanups included finding some bugs related to exceeding the basic block boundary while using a block_stmt_iterator.
The next step was to start working on the implementation details to introduce new basic blocks.
An updated version (0.91) of the design document is now available on the webpage.

06.07. - 11.07.
The checkpointing mechanism on SSA form advances. Currently two basic blocks (one to save the values of variables and one to recover the values of variables) are introduced during the gtm_expand-pass. Depending on the return value of the setjmp-call one or the other basic block is executed before entering the transaction. This allows to return to a consistent state in case of a transaction rollback. The next step will be to introduce the actual checkpointing on the basic blocks depending on the liveness analysis available on SSA form.
A more detailed description of the interaction of passes is found in the updated design document (version 0.92).

12.07. - 20.07.
Attended the HiPEAC Summer School ACACES 2008.

21.07. - 27.07.
Access to the liveness information of the variables on SSA form is now available inside the pass. Copying the contents of the variables introduces some challenges while SSA form has to be preserved. The next efforts will be to solve these issues. Currently adjusting the dominance information of the basic blocks as well as introducing Phi nodes to merge the different versions of variables are the main challenges.

28.07. - 03.08.
Now the dominance information as well as the SSA properties are preserved. The checkpointing mechanism "suffers" from the optimisation passes. The next step is to prevent the kind of optimisations that does not preserve the semantics of the checkpointing pass while still enabling for "useful" optimisations.
Design document (version 0.93) now illustrates the checkpointing pass.

04.08. - 10.08.
Still no perfect solution to the optimisation problems from last week. In addition a problem with the instrumentation occured. The function call of the instrumentation takes the address of a variable as argument. Taking the address of a variable leads to a corrupted program representation (the verify_stmts-check triggers). The problem occurs because in GIMPLE represenation a variable is a register and taking the address changes it from being a variable to a memory location. Subsequently the value must be transferred to a register before it is usable. The amount of additional variables rises and we will investigate how to avoid some unnecessary instrumentation.

11.08. - 17.08.
Moving the expansion pass to SSA failed this week, although it would have been nice to have just one pass, it proved to difficult to maintain all SSA invariants "manually". The main problem was that introducing a void-Pointer on SSA is different from introducing a temporary variable. Subsequently, some specific data structures were not available (the symbol memory tag was missing). Since other passes like alias analysis rely on the correct layout of the information, this pass broke. Consequently, the expansion is done on GIMPLE-level. The ongoing efforts to compensate for taking the address of a variable still persist.

18.08. - 24.08.
First small programs synchronised with transactions are working correctly. The test cases are using Pthreads or OpenMP to parallelise the code. For a working integration with OpenMP the modifications to the parser had to be revised. The implementation is still experimental and not robust. Further, the checkpointing mechanism seems to work fine, just looking at the assembly. At run time it fails leading to unexpected results. This behaviour bares further investigation and will hopefully be solved during the next week.

25.08. - 31.08.
Finally the checkpointing mechanism works. After reorganising the basic blocks and their order, the implementation works. Now the run-time behaviour of the checkpointing mechanism is as expected. In addition some checks to avoid leaving the scope of a transaction without committing were implemented. This means that return, break, continue and goto should be handled correctly in case they leave the transaction. This week we managed to move from the 4.3.0 branch to the 4.3.2 and started a pre-release of the GnuTM implementation. At the bottom of this page you may download a patch against GCC version 4.3.2 , a document containing information how to install and use the features and some test cases. Please feel free to experiment and get back to us. We are very interested in all arising problems and look forward to your feedback. Thank you very much.

01.09. - 07.09.
The development of GnuTM continued this week with the following features. First, support for the keyword "tm_abort" was added and mapped to the corresponding STM function. A warning is issued in case tm_abort is called without a transactional context, causing no side effects. Second, TinySTM version 0.9.5 is now supported. Some optimisations concerning the placement of function calls to the STM run-time are investigated. So far no general solution is found but we are talking to the STM developers. Then, a bug was fixed that was caused because GCC collapsed two basic blocks in between the two GTM-passes. Subsequently the information from the first pass was no longer valid in the second. We are still hoping to get some feedback and experience reports from our release and would be happy to help if you are facing problems.

08.09. - 14.09.
Our current efforts are to implement function cloning to allow for transactional versions of functions. The idea is to modify the existing GCC code base as little as possible and reuse parts of the existing GTM implementation to decide which parts (variables, memory locations, ...) of the function to instrument. Currently the biggest issue is to avoid the sharing of tree nodes between the original and the cloned function. As soon as this issue is fixed, the transactional version is instrumented with calls to the STM run-time. Calls inside transactions to a function where a transactional clone is available will be redirected to it.

15.09. - 21.09.
Now parts of the function cloning work. Currently the function cloning is done eagerly. Subsequently all functions seen at compile time are cloned. This assures that every function that is compiled at the same time as the main function has a transactional clone. Since not every function needs a transactional version, the overhead is rather high. Here is some potential for optimisations: on demand function cloning. For TinySTM version 0.9.5 the function cloning and redirection of the calls works fine. With the older version the compilation works but at run-time a segmentation fault occurs due to passing the transaction handle as a parameter.

22.09. - 28.09.
Now passing the transaction handle as a parameter to the cloned function works fine. Since this was the main issue with STM interfaces that require explicit passing of transaction handles, the transactional clones are now usable. Further, issues with the GCC IPA passes are identified. The temporary solution is to prevent the inlining of functions to prevent the execution of uninstrumented code. Since this solution does not perform very well, we start to investigate how to integrate TM into the IPA frame work.

29.09. - 05.10.
Trying to compile some transactified SPEC loop kernels (parallelised with OpenMP), revealed some more problems with the function cloning. The current solution is to disable the function cloning for TM if OpenMP is used. Further the integration of the TANGER interface advances but has not been tested yet. This as well as the bug fixes and IPA integration are on the list for future developments.

06.10. - 12.10.
The transactified loop kernels also revealed some serious problem with the OpenMP front-end integration, leading to some uninstrumented pointers inside transactions. After fixing this issue another instrumentation problem became obvious: the optimisation on SSA form in GCC did not remove any stm read barriers. On one hand this is the intended behaviour of a barrier to prevent code motion and resist optimisations. On the other hand the optimisation passes even optimised the temporary that the return value of the stm load was asigned to and, thus, left an orphaned read barrier in the code. By changing the attributes assigned to the barrier, we were able to achieve the desired results.

13.10. - 19.10.
This week was dedicated to investigating further possibilities to exploit the GCC attribute mechanism to enable more aggressive optimisations on barriers while still preserving the transactional properties. The findings so far are that the attributes found last week are optimal in a sense that they allow the optimisers to remove orphaned barriers and still yield correct results. Further tests with marking the write barriers as being pure as well, lead to optimisations that were not respecting the barrier properties anymore and, thus, lead to incorrect code. The next step will be to introduce the barriers at a later stage in the optimisation process so that the number of introduced barriers is already minimal.
Since last week and thanks to Red Hat our patch found its way into a seperate branch. Please checkout:
svn://gcc.gnu.org/svn/gcc/branches/transactional-memory/
Since our patch was for version 4.3.2 and this branch tracks mainline gcc, there is still some transformation work to be done so don't expect it to work right away.

Memory Consistency Model

Memory consistency has a central position in assuring the programmer that his program executes as expected. Currently, two views which memory model transactions should target are expressed. On one hand there is the approach targeting an enhanced weak-isolation memory model. Transactions are expected to behave like locks and the STM enforces an ordering of transactions [here]. On the other hand a strong isolation memory model is proposed. In addition STM mechanisms to change the commit order are favoured [here].
Although strong isolation grants composability, we do not consider it to be practical in an unmanaged environment.
Further, the upcoming C++ standard will not consider strong isolation:
From the Abstract: "We give no semantics to programs with data races. There are no benign data races in C++."
We share this view of treating racy programs and want transactions to behave accordingly in an unmanaged environment.

Design Document

Please find the design document below this paragraph. The document describes the road map of the project as well as the design goals. Originally the design document was intended for internal use only. Due to popular demand we decided to published it here. In case you find errors or have questions, please don't hesitate to ask. Please provide feedback on the features and the preliminary design. An email address is listed at the end of the document. Thank you very much.

AttachmentSize
TM_for_GCC_design_0_94.pdf101.67 KB
Install_and_Usage_of_GnuTM.pdf61.14 KB
TM_test_cases_PLEASE_Remove_Fake_Ending.tar_.txt20 KB
GnuTM_patch_0.94.txt74.74 KB