Multiprocessor Technology for Transaction-Oriented Systems

A project at Computer Engineering, Chalmers: High-Performance Computer Architecture Group


Contents:

o Project description
o Project members
o Publications
o Collaborations
o Funding
o Open positions

Project Description

Shared-memory multiprocessors are an enabling technology for high-performance computing because of their general-purpose nature, simple programming interface, and their exploitation of commodity components which makes them cost-effective.

Our focus in this project is on transaction-oriented applications exemplified by on-line transaction processing (OLTP), decision-support systems (DSS), and telecom applications. These applications play an increasingly important role and pose a challenge for multiprocessor architects, especially regarding the design of the memory hierarchy. Apart from studying new approaches to design memory hierarchies, we also study design methods applicable to the software systems (application and OS) that aim at taking advantage of the data access locality.

A key methodological approach in our research is to develop detailed simulation models of entire systems with all interacting hardware and software components so as to study how design alternatives impact on application performance. Because the execution of parallel applications can be accurately modelled, performance bottlenecks can be revealed and new design methods can be developed. For this purpose, the group has developed a simulation platform for commercial applications together with the Swedish Institute of Computer Science that is based on the SimICS simulation technology. The simulation platform is able to execute operating systems for the sun4m kernel architecture, and currently boots unmodified Linux 2.0.30 as well as Solaris 2.6. In addition, any application compiled for those operating systems on the sun4m architecture executes on the simulator. For multiprocessor architecture evaluations, we have added a detailed memory system simulator to the platform that allows us to model the impact the system organization and latencies have on the execution time for any multiprocessor organization and the applications that executes on the platform. Examples include the PostgreSQL database handler from Berkeley on the Linux 2.0.30 operating system running queries from the TPC-D decision-support benchmark suite (all binaries are the same as for a SPARCstation 4 workstation).

Other concrete results obtained so far are a number of innovative solutions to achieve a higher memory system performance for especially database applications. In especially OLTP workloads, database tables are often traversed through indexes that are implemented by hash structures. This results in poor locality. We have considered how to improve performance of pointer-chasing by using prefetching and proposed and evaluated a particularly promising prefetch technique that has also shown promise on other important pointer-intensive workloads (e.g., expert systems). Another notable result is the identification of a particular performance problem in coherence mechanisms caused by load-store sequences from the same processor. Previously proposed methods to tackle such sequences have not been successful for OLTP workloads. An improved cache coherence protocol has been proposed that detects and eliminates the coherence overhead of such sequences.

The work done in this project is supported by NUTEK, TFR, SSF, and Sun Microsystems. Besides a collaboration with Sun Microsystems, we also collaborate with Ericsson Resarch in a project that aims at studying how to use multiprocessor technology for telecom applications. This collaboration is funded by NUTEK.


Project members


Publications

Jim Nilsson and Fredrik Dahlgren, ``Reducing Ownership Overhead for Load-Store Sequences in Cache-Coherent Multiprocessors,'' To appear in Proceedings of the 2000 International Parallel and Distributed Computing Symposium, May 2000. (PDF).

Martin Kämpe and Fredrik Dahlgren, ``Exploration of the Spatial Locality on Emerging Applications and the Consequences for Cache Performance,'' To appear in Proceedings of the 2000 International Parallel and Distributed Computing Symposium, May 2000. (PDF).

Magnus Karlsson, Fredrik Dahlgren and Per Stenström: ``A Prefetching Technique for Irregular Accesses to Linked Data Structures,'' To appear in the Proceedings of the 6th International Conference on High Performance Computer Architecture (HPCA'6), January 2000. (PS).

Jim Nilsson and Fredrik Dahlgren, ``Improving Performance of Load-Store Sequences for Transaction Processing Workloads on Multiprocessors,'' in Proceedings of the 1999 International Conference on Parallel Processing, September 1999. (PDF).

Ashley Saulsbury, Su-Jaen Huang, and Fredrik Dahlgren, ``Efficient Management of DRAM Memory Hierarchies in Embedded DRAM Systems,'' in Proc. of the ACM International Conference on Supercomputing, June 1999.

Peter Magnusson, Fredrik Dahlgren, Håkan Grahn, Magnus Karlsson, Fredrik Larsson, Fredrik Lundholm, Andreas Moestedt, Jim Nilsson, and Per Stenström: ``SimICS/sun4m: A Virtual Workstation,'' in Proceedings of the 1998 USENIX Annual Technical Conference, June 1998.

Jim Nilsson, Fredrik Dahlgren, Magnus Karlsson, Peter Magnusson, and Per Stenström: ``Computer System Evaluation with Commercial Workloads,'' in Proceedings of the 1998 IASTED Conference on Modelling and Simulation, May 1998.

Per Stenström, E. Hagersten, D. Lilja, M. Martonosi, and M. Venogupal: "Trends in Shared-Memory Multiprocessing", in IEEE Computer , Vol. 30, No. 12, pp. 44-50, December 1997.


Collaborations

The project has active collaborations with groups at the following companies or research institutes:

Funding

The project is funded by