-
P. Stenström and L. Philipson: "A layered emulator for design evaluation of MIMD multiprocessors with shared memory," in Proc. of PARLE (Parallel Architectures and Languages Europe), Lecture Notes in Computer Science, No 258, Springer-Verlag, pp. 329-344, June 1987.(pdf)
-
P. Stenström: "VLSI Support for a Cactus Stack Oriented Memory Organization," in Proc. of the 21st Hawaii International Conference on System Sciences, pp. 211-220, January 1988.(pdf)
-
P. Stenström, D. Vrsalovic, and Z. Segall: "Shared Data Structures in a Distributed System -- Performance Evaluation and Practical Considerations," in Proc. of IFIP TC 7/WG 7.3, International Seminar on Performance of Distributed and Parallel Systems, pp. 15-30, December 1988.(pdf)
-
P. Stenström: "A Cache Consistency Protocol for Multiprocessors with Multistage Networks," in Proc. of 16th Annual International Symposium on Computer Architecture, pp. 407-415, May 1989.(pdf)
-
E. Belitskaja, V. Sidorenko, and P. Stenström: "Testing of Memory with Defects of Fixed Configuration," in Second International Workshop on Algebraic and Combinatorial Coding Theory, pp. 24-27, Leningrad, September 1990. (pdf)
-
P. Stenström, F. Dahlgren, and L. Lundberg: "A Lockup-free Multiprocessor Cache Design," in Proc. of International Conference on Parallel Processing, Vol 1, pp 246-250, August 1991.(pdf)
-
F. Dahlgren and P. Stenström: "On Reconfigurable On-chip Data Caches," in Proc. of 24th ACM/IEEE International Symposium on Microarchitecture, pp. 189-198, November 1991.(pdf)
-
F. Dahlgren and P. Stenström: "Reducing Write Latencies for Shared Data in a Multiprocessor with a Multistage Network," in Proc. of 25th Hawaii International Conference on System Sciences, pp. 449-456, January 1992.(pdf)
-
P. Stenström: "A Latency-Hiding Scheme for Multiprocessors with Buffered Multistage Networks," in Proc. of International Parallel Processing Symposium, pp. 39-42, March 1992.(pdf)
-
P. Stenström, T. Joe, and A. Gupta: "Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures," in Proc. of 19th Annual International Symposium on Computer Architecture, pp. 80-91, May 1992.(pdf)
-
H. Nilsson and P. Stenström: "The Scalable Tree Protocol -- A Cache Coherence Approach for Large-Scale Multiprocessors," in Proc. of Fourth IEEE Symposium on Parallel and Distributed Processing, pp. 498-507, December 1992.(pdf)
-
M. Brorsson and P. Stenström: "Visualising Sharing Behaviour and its Relation to Shared Memory Management," in Proc. of 1992 International Conference on Parallel and Distributed Systems, pp. 528-536, December 1992.(pdf)
-
H. Nilsson and P. Stenström: "Performance Evaluation of Link-Based Cache Coherence Schemes," in Proc. of 26th Hawaii International. Conference on System Sciences, pp. 486-495, January 1993.(pdf)
-
P. Stenström, H. Nilsson, and J. Skeppstedt: "Using Graphics and Animation to Visualize Instruction Pipelining and its Hazards," in Proc. of ICSEE'93, pp. 130-135 January 1993.(pdf)
-
M. Brorsson, F. Dahlgren, H. Nilsson, and P. Stenström: "The CacheMire Test Bench--A Flexible and Effective Approach for Simulation of Multiprocessors," in Proc. of 26th IEEE Annual Simulation Symposium, pp. 41-49, March 1993.(pdf)
-
P. Stenström, M. Brorsson, and L. Sandberg: "An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing," in Proc. of 20th ACM/IEEE Annual International Symposium on Computer Architecture, pp. 109-118, May 1993.(pdf)
-
M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenström: "The Detection and Elimination of Useless Misses in Multiprocessors," in Proc. of 20th ACM/IEEE Annual International Symposium on Computer Architecture, pp. 88-97, May 1993. (pdf)
-
F. Dahlgren, M. Dubois, and P. Stenström: "Fixed and Adaptive Sequential Prefetching for Shared-Memory Multiprocessors," in Proc. of 1993 International Conference on Parallel Processing, pp. 56-63, August 1993.(pdf)
-
F. Dahlgren, M. Dubois, and P. Stenström: "Combined Performance Gains of Simple Cache Protocol Extensions, in Proc. of 21st ACM/IEEE Annual International Symposium on Computer Architecture, pp. 187-197, April 1994. (pdf)
-
H. Nilsson and P. Stenström: "An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic," in Proc. of PARLE (Parallel Architectures and Languages Europe), pp. 363-374, June 1994. Best Paper Award at the conference.(pdf)
-
F. Pong, P. Stenström, and M. Dubois: "An Integrated Methodology for Verification of Correctness of Cache Coherence Protocols" in Proc. of 1994 International Conference on Parallel Processing, pp. 158-165, August 1994.(pdf)
-
F. Dahlgren and P. Stenström: "Reducing the Write Traffic for a Hybrid Cache Protocol," in Proc. of 1994 International Conference on Parallel Processing, pp. 166-173, August 1994. (pdf)
-
J. Skeppstedt and P. Stenström: "Simple Compiler Algorithms to Reduce Ownership Overhead in Cache Coherence Protocols," in Proc. of 6th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pp. 286-296, October 1994.(pdf)
-
M. Brorsson and P. Stenström: "Modelling Accesses to Stationary Data in Shared Memory Multiprocessors," in Proc. of the 7th International Conference on Parallel and Distributed Computing (PDCS'94), pp. 802-807, October 1994. (pdf)
-
M. Brorsson and P. Stenström: "Modelling Accesses to Migratory and Producer-Consumer Characterised Data in a Shared Memory Multiprocessor," in Proc. of Sixth IEEE Symposium on Parallel and Distributed Processing, pp. 612-619, October 1994.(pdf)
-
M. Björkman, F. Dahlgren, and P. Stenström: "Using Hints to Reduce Read Miss Penalties for Flat COMA Protocols, in Proc. of 28th Hawaii International Conference on System Sciences, pp. 242-251, January 1995.(pdf)
-
F. Dahlgren and P. Stenström "Effectiveness of Stride and Sequential Hardware-based Prefetching in Shared-Memory Multiprocessors, in Proc. of First International Conference on High Performance Computer Architecture (HPCA-1), pp. 68-77, January 1995.(pdf)
-
H. Grahn and P. Stenström: "Efficient Strategies for Software-Only Directory Protocols in Shared-Memory Multiprocessors," in Proc. of 22nd ACM/IEEE Annual International Symposium on Computer Architecture, pp. 38-47, June 1995.(pdf)
-
J. Skeppstedt and P. Stenström: "A Compiler Algorithm that Reduces Read Latency in Ownership-Based Cache Coherence Protocols," in Proc. of Parallel Architectures and Compilation Techniques, pp. 69-78, July 1995.(pdf)
-
F. Dahlgren, J. Skeppstedt, and P. Stenström: "Effectiveness of Hardware-Based and Compiler-Controlled Snooping Protocol Extensions," in Proc. of the International Conference on High Performance Computing, pages 87-92, December 1995.(pdf)
-
M. Karlsson and P. Stenström: "Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM-Switches and Bus-Based Multiprocessor Servers," in Proc. of Second International Conference on High Performance Conputer Architecture, pages 4-13, Jan. 1996. (pdf)
-
H. Grahn and P. Stenström: "Relative Performance of Software-Only and Hardware-Only Directory Protocols Under Latency Tolerating and Reducing Techniques," in Proceedings of the 11th International Parallel Processing Symposium, pages 500-506, April 1997.(pdf)
-
P. Stenström and J. Skeppstedt: "A Performance Tuning Approach for Shared-Memory Multiprocessors" in Proceedings of EUROPAR'97, pp. 72-84, August 1997.(pdf)
-
J. Nilsson, F. Dahlgren, M. Karlsson, P. Magnusson, P. Stenström: "Computer System Evaluation with Commercial Workloads" in Proc. of IASTED Conference on Modeling and Simulation. pp. 293-297, May 1998.(pdf)
-
P Magnusson, F Dahlgren, H. Grahn, M. Karlsson, F. Larsson, A. Moestedt, J. Nilsson, P Stenström, and B. Werner: "SimICS/Sun4m: A Virtual Workstation. In Proc. of USENIX98,pp. 119-130, June 1998.(pdf)
-
T. Lundqvist and P. Stenström: "Integrating Path and Timing Analysis using Instruction-Level Simulation Techniques," in Proc. of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems. June 1998.(pdf)
-
T. Lundqvist and P. Stenström: "Timing Anomalies in Dynamically Scheduled Processors," in Proc. of 1999 IEEE Real-Time System Symposium (RTSS'99), pp. 12-21 Dec. 1999.(pdf)
-
T. Lundqvist and P. Stenström. "A Method to Improve the Estimated Worst-Case Performance of Data Caching". in Proc. of 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99), pp. 255-262, Dec 1999.(pdf)
-
M. Karlsson, F. Dahlgren, and P. Stenström: "A Prefetching Technique for Irregular Accesses to Linked Data Structures," 6th IEEE Int. Symp. on High-Performance Computer Architecture (HPCA-6), pp. 206-217, 2000.(pdf)
-
M. Karlsson, F. Dahlgren, and P. Stenström: "An Analytical Model for Working-Set Sizes in Decision Support Systems," In Proc. of ACM SIGMETRICS,pp. 275-285, 2000.(pdf)
-
A. Saulsbury, F. Dahlgren, and P. Stenström: "Recency-Based TLB Preloading" in 27th ACM/IEEE Int. Symp. on Computer Architecture (ISCA-27), pp. 117-127, 2000.(pdf)
-
J. Jalminger and P. Stenström "Boosting Energy-Efficiency of Off-Chip Caches using Selective Data Prefetching", in Proc. of IEEE Workshop on Complexity-Effective Computer Design, held in conjunction with ISCA-2000, June 2000.(pdf)
-
P. Rundberg and P. Stenström: Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors," in 4th Workshop on IEEE Multi-Threaded Execution, Architecture and Compilation (in conj. with Micro-33), Dec 2000. (Received the Best Paper award.)(pdf)
-
U. Assarsson and P. Stenström: Evaluation of Load-Distribution Strategies for Hierarchical View Frustum Culling and Collision Detection. in EuroPar 2001, pages 663-673, Aug 2001.(pdf)
-
F. Warg and P. Stenström: Limits on Speculative Module-Level Parallelism in Imperative and Objective-Oriented Programs on CMP Platforms. In Proc. of Int. Conf. on Parallel Architectures and Compiler Techniques (PACT'2001), pages 221-230, Sept. 2001.(pdf)
-
M. Kämpe, P. Stenström, M. Dubois: The FAB Predictor: Using Fourier Analysis to Predict the Outcome of a Conditional Branch. In Proc. of 8th IEEE Int. Symp. on High-Performance Computer Architecture (HPCA-8), February 2002.(pdf)
-
J. Hollmann, A. Ardö, and P. Stenström: "Empirical Observations regarding Predictability in User Access Behavior in a Distributed Digital Library System". In Second International Workshop on Internet Computing and E-Commerce (ICEC'02), April 2002.(pdf)
-
M. Ekman, F. Dahlgren, and P. Stenström: "TLB and Snoop Energy-Reduction using Virtual Caches for Low-Power Chip-Multiprocessors". In Proc. of ACM ISLPED-2002.(pdf)
-
M. Ekman, F. Dahlgren, and P. Stenström: Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors. In Proc. of Workshop on Duplicating, Deconstructing, and Debunking (WDDD-1), May 2002. (pdf)
-
M. Kämpe, P. Stenstrom, M. Dubois: Self-Correcting LRU Replacement Policies. Tech. Report, Department of Computer Engineering, In Second Workshop on Caching, Coherence, and Consistency (WC3 '02) June 2002.(pdf)
-
Jianwei Chen, Michel Dubois, and P. Stenstrom: SimWattch: An Approach to Integrate Complete-System with User-Level Performance/Power Simulators. In Proc. of IEEE ISPASS-2003, March 2003. (pdf)
-
J. Nilsson, A. Landin, P. Stenström: Coherence Predictor Cache: A Resource Efficient Coherence Message Prediction Infrastructure. In 6th IEEE International Symposium on Parallel and Distributed Processing Symposium, Abstract: page 10 (on CD) April 2003. (pdf)
-
P. Rundberg and P. Stenström: Speculative Lock Reordering: Optimistic Out-of-Order Execution of Critical Sections. In 6th IEEE International Symposium on Parallel and Distributed Processing Symposium, Abstract: page 11 (on CD) April 2003.(pdf)
-
F. Warg and P. Stenström: Improving Speculative Thread-Level Parallelism through Module Run-Length Prediction. In 6th IEEE International Symposium on Parallel and Distributed Processing Symposium, Abstract: page 12 (on CD) April 2003.(pdf)
-
J. Hollmann, A. Ardö, P. Stenström: Evaluation of Document Prefetching in a Distributed Digital Library. March 2003. To appear in 7th European Conference and Research on Advanced Technology for Digital Libraries (ECDL'2003).(pdf)
-
J. Jalminger and P. Stenström: A Novel Approach to Cache Block Reuse Prediction. To appear in ICPP-2003, Oct. 2003. (pdf)
-
M. Ekman and P. Stenstrom: Performance and Power Impact of Issue-width in Chip-Multiprocessor Cores. To appear in ICPP-2003, Oct. 2003. (pdf)
-
John Hughes, Kjell Jeppsson, Per Larsson-Edefors, Mary
Sheeran, Per Stenstrom, Lars "J" Svensson, FlexSoC: Combining Flexibility and
Efficiency in SoC Designs. To appear at the IEEE Norchip 2003 Conference.
November 2003.
-
M. Kämpe, P. Stenström, M. Dubois: Self-Correcting LRU
Replacement Policies. Tech. Report, Department of Computer Engineering, In ACM
Computing Frontiers (Invited). April 2004.
-
Magnus Ekman and Per Stenstrom. Enhancing Simulation Speed
using Matched-Pair Comparison. In Proc. of 2005 IEEE ISPASS. April
2005.
-
Magnus Ekman and Per Stenstrom. A Cost-Effective Memory
Organization for Future Servers. In Proc. of 2005 IEEE Int. Symp. on
Parallel and Distributed Systems.
-
Fredrik Warg and Per Stenstrom: Reducing Misspeculation
Overhead for Module-Level Speculative Execution. In ACM Computing
Frontiers. May 2005.
-
Martin Thuresson and Per Stenstrom. Evaluation of Extended
Dictionary-Based Static Code Compression Techniques. In ACM Computing
Frontiers. May 2005.
-
Magnus Ekman and Per Stenstrom: A Robust Memory Compression
Scheme. In the 32nd IEEE/ACM Ann. Int. Symposium on Computer Architecture.
Madison, June, 2005.
-
E. Vallejo, M. Galluza, A. Cristal, F. Vallejo, R Beivide, P.
Stenstrom, J. Smith, M. Valero. Implementing Kilo-Instruction Multiprocessors.
In Proc. of 2005 IEEE International Conference on Pervasive Services.
Santorini. July 2005
-
Md. Mafijul Islam and Per Stenstrom: Reduction of Energy
Consumption in Processors by Early Detection and Bypassing of Trivial
Operations. To appear in 6th Conference on Embedded Computer Systems:
Architectures, Modelling, and Simulation (SAMOS VI). July 2006.
-
J.Jeong, P. Stenstrom and M. Dubois. Simple,
Penalty-Sensitive Replacement Policies for Caches. In Proc. of 2006 ACM Int.
Conf. on Computing Frontiers. May 2006.
-
H. Dybdahl and P. Stenstrom. Enhancing Lower Level Cache
Performance by Early Miss Determination and Bypassing. To appear in the Proc. of
the 11th Asia-Pacific Computer Systems Architecture Conference (ACSAC06).
Shanghai, Sept 2006.
-
F. Warg and P. Stenstrom. Dual-Thread Speculation. Two
Threads in the Machine is Better than Eight in the Bush. Accepted to SBAC 2006.
May 2006
-
M. Thuresson and P. Stenstrom. Scalable Value-Cache Based
Compression Schemes for Multiprocessors. Accepted to SBAC 2006. May 2006.
-
H. Dybdahl, P. Stenstrom, L. Natvig, A Cache-Partition Aware
Replacement Policy for Chip Multiprocessors. (Best Paper Award.) Accepted to ACM
2006 HiPC. July 2006.
-
H. Dybdahl, P. Stenstrom, L. Natvig, A Cache Replacement
Algorithm based on Frequency and Recency for Chip Multiprocessors. Accepted to
2006 IEEE MEDEA workshop (in conjunction with PACT 2006), September 2006.
-
M. M. Waliullah and P. Stenstrom. Starvation-Free Commit
Arbitration Policies for Transactional Memory Systems. Accepted to IEEE dasCMP
workshop (held in conjunction with IEEE Micro 2006). Dec. 2006.
Shekhar Y. Borkar, Norm Jouppi, Per Stenstrom. Microprocessors in the Era of
Terascale Integration. Invited Paper.To appear in DATE 2007. April 2007.
-
H. Dybdahl and P. Stenstrom. An Adaptive Shared/Private NUCA
Cache Partiotioning Scheme for Chip Multiprocessors. Accepted to IEEE HPCA 2007.
February 2007.
-
Md. Mafijul Islam, Alexander Busck, Mikael Engbom, Simji Lee,
Michel Dubois, Per Stenstrom. Limits on Thread-Level Speculative Parallelism in
Embedded Applications. Accepted 11th IEEE INTERACT workshop (in conjunction with
IEEE HPCA 2007). January 2007.
-
Magnus Bjork, Magnus Sjalander, Lars Svensson, Martin
Thuresson, John Hughes, Kjell Jeppson, Jonas Karlsson, Per Larsson-Edefors, Mary
Sheeran, and Per Stenstrom. Exposed Datapath for Efficient Computing. 2007
HiPEAC workshop on Reconfigurable Computing. January 2007.
-
Martin Thuresson, Magnus Själander, Magnus Björk, Lars
Svensson, Per Larsson-Edefors, Per Stenstrom. FlexSoC: Utilizing Exposed
Datapath Control for Efficient Computing. In Proc. of IEEE SAMOS 2007. July 2007
-
Md. Mafijul Islam and Per Stenstrom. Energy and Performance
Tradeoffs between Instruction Reuse and Trivial Computations for Embedded
Applications. Accepted in IEEE International Symposium on Embedded Computer
Systems. April 2007.
-
M. M. Waliullah and Per Stenstrom. Starvation-Free Commit
Arbitration Policies for Transactional Memory Systems. In ACM Computer
Architecture News, Vol. 35, No. 1, March 2007.
-
Md. Mafijul Islam, Alexander Busck, Mikael Engbom, Simji Lee,
Michel Dubois, Per Stenstrom. Limits on Thread-Level Speculative Parallelism in
Embedded Applications. To appear in ICPP 2007, September 2007.
-
M. M. Waliullah and P. Stenstrom. Starvation-Free
Transactional Memory System Protocols. EUROPAR 2007. August 2007
E. Vallejo, M. Galluzi A.. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J.
Smith, M. Valero: Implicit Transactional Memory in Kilo-Instruction Processors.
Invited. In Proc. of the 11th Asia-Pacific Computer Systems Architecture
Conference (ACSAC06). Shanghai, Sept 2007.
-
A. Bardine, P. Foglia, G. Gabrielli, C. A. Prete, and P.
Stenstrom. Improving Power Efficiency of D-NUCA Caches. In ACM SIGARCH
Computer Architecture News. December 2007.
-
M. M. Waliullah and P. Stenstrom. Reducing Roll-back Overhead
in Transactional Memory Systems by Checkpointing Conflicting Accesses. In
Proc. of IEEE IPDPS 2008. March 2008.
-
M.M. Waliullah and P. Stenstrom. Efficient Management of
Speculative Data in Hardware Transactional Memory Systems. In Proc. of IEEE
SAMOS 2008.. July 2008.
-
M. Thuresson and P. Stenstrom. Accommodation of the Bandwidth
of Large Cache Blocks using Cache/Memory Link Compression. In Proc. of ICPP
2008. September 2008.
-
M. Thuresson, M. Själander, P. Stenstrom. A Flexible
Code-Compression Scheme using Partitioned Look-Up Tables. Submitted to 4th
Int. Conf. on High-Performance and Embedded Architectures and Compilers.
January 2009.
-
M. M. Waliullah and P. Stenstrom. Intermediate Checkpointing
with Conflicting Access Prediction in Transactional Memory Systems. In Proc.
of First MULTIPROG workshop (in conjunction with the Third Int. Conf on
HiPEAC). January 2008.
-
Alessandro Bardine, Pierfrancesco Foglia, Giacomo Gabrielli,
Cosimo Antonio Prete and Per Stenstrom. A Micro-Architectural Power-Saving
Technique for D-NUCA Caches. In Proc. of 4th Workshop on Unique Chips and
Systems (in conjunction with 2008 IEEE ISPASS). April 2008.
-
Mafijul Md. Islam and Per Stenstrom. Zero Loads: Canceling
Load Requests by Tracking Zero Values. In the IEEE MEDEA Workshop (In
concjunction with PACT). October, 2008.