boosting performance of transactional memory through transactional read tracking and set associative locks
abstract
multi-core processors have become so prevalent in server, desktop, and even embedded systems that they are considered the norm for modem computing systems. the trend is likely toward many-core processors with many more than just 2, 4, or 8 cores per cpu. to benefit from
the increasing number of cores per chip, application developers have to develop parallel programs [1]. traditional lock-based programming is too difficult and error prone for most of programmers and is the domain of experts. deadlock, race, and other synchronization bugs are some of the challenges of lock-based programming. to make parallel programming mainstream, it is necessary to adapt parallel programming by the majority of programmers and not just experts, and thus simplifying parallel programming has become an important challenge.
transactional memory (tm) is a promising programming model for managing concurrent accesses to the shared memory locations. transactional memory allows a programmer to specify a section of a code to be "'transactional", and the underlying system guarantees atomic execution of the code. this simplifies parallel programming and reduces the possibility of synchronization bugs.
this thesis develops several software- and hardware-based techniques to improve performance of existing transactional memory systems. the first technique is transactional read tracking (trt). trt is a software-based approach that employs a locking mechanism for transactional read and write operations. the performance of trt depends on memory access patterns of applications. in some cases, trt falls behind the baseline scheme. to further improve performance of trt, we introduce two hybrid methods that dynamically switches between trt and the baseline scheme based on applications’ behavior.
the second optimization technique is set associative lock (sal). memory locations are mapped to a lock table in order to synchronize accesses to the shared memory locations. direct mapped lock tables usually result in collision which leads to false aborts. in sal, we increase
associativity of the lock table to reduce false abort. while sal improves performance in most of the applications, in some cases, it increases execution time due to overhead of lock tables in software. to cope with this problem, we propose hardware-sal (hw-sal) which moves the
set associative lock table to the hardware. as such, true power of set associativity will be harnessed without sacrificing performance.