Publications

Textbooks

  1. L. Ohlsson and P. Stenström: "Computer Organization and Assembly Language Programming," Studentlitteratur and Chartwell-Bratt, ISBN 91-44-26461-5, January 1987.
  2. P. Stenström: ``68000 Microcomputer Organization and Programming," Prentice-Hall, ISBN 0-13-584855-5, September 1992.

Journal and Magazine Papers

  1. P. Stenström: "Reducing Contention in Shared-Memory Multiprocessors," in IEEE Computer, Vol 21, No 11, pp. 26-37, November 1988. (pdf)
  2. P. Stenström: "A Survey of Cache Coherence Schemes for Multiprocessors," in IEEE Computer, Vol 23, No 6, pp. 12-24, June 1990.(pdf)
  3. H. Grahn, P. Stenström, and M. Dubois: "Implementation and Evaluation of Update-Based Cache Protocols Under Relaxed Memory Consistency Models," in Future Generation Computer Systems, Vol. 11, No. 3, pp. 247-271, June 1995.(pdf)
  4. F. Dahlgren and P. Stenström: "Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors," in Journal of Parallel and Distributed Computing, Vol 26. No 2, pp. 193-210, April 1995.(pdf)
  5. F. Dahlgren, M. Dubois, and P. Stenström: "Sequential Hardware Prefetching in Shared-Memory Multiprocessors," in IEEE Trans. on Parallel and Distributed Systems, Vol. 6 No 7, pp. 733-746, July 1995.(pdf)
  6. M. Dubois, J. Skeppstedt, and P. Stenström: "Essential Misses and Memory Traffic in Coherence Protocols," in Journal of Parallel and Distributed Computing, Vol. 29, No 2, pp. 108-125, October 1995.(pdf)
  7. F. Dahlgren and P. Stenström "Evaluation of Stride and Sequential Hardware-based Prefetching in Shared-Memory Multiprocessors," in IEEE Trans. on Parallel and Distributed Systems, Vol. 7, No. 4, pp. 385-398, April 1996. (pdf)
  8. M. Brorsson and P. Stenström: "Characterising and Modelling Shared-Memory Accesses in Multiprocessor Programs," in Parallel Computing, No 22, pp. 869-893, 1996.(pdf)
  9. P. Stenström, M. Balldin, and J. Skeppstedt: "The Design of a Non-Blocking Load Processor Architecture," in Microprocessors and Microsystems, No 20, pp. 111-123, 1996.(pdf)
  10. J. Skeppstedt and P. Stenström: "Using Dataflow Analysis to Reduce Overhead in Cache Coherence Protocols," in ACM Transactions on Programming Languages and Systems, Vol 18, No 6, pp. 659-682, November 1996.(pdf)
  11. H. Grahn and P. Stenström: "Evaluation of an Adaptive Update-Based Cache Protocol," in Journal of Parallel and Distributed Computing, 39(2):168-180, December 1996.(pdf)
  12. P. Stenström, M. Brorsson, F. Dahlgren, H. Grahn, and M. Dubois: "Boosting Performance of Shared-Memory Multiprocessors," in IEEE Computer, pp. 63-70, July 1997.(pdf)
  13. M. Karlsson and P. Stenström: "Effectiveness of Dynamic Prefetching in Multiple-Writer Distributed Virtual Shared Memory Systems," in Journal of Parallel and Distributed Computing, Vol. 43, No. 2, pp. 79-93, 1997.(pdf)
  14. F. Dahlgren, M. Björkman and P. Stenström: "Reducing the Read Miss Penalty for Flat COMA Protocols, in the Computer Journal, Vol. 40, No. 4, pp. 208-219, 1997.(pdf)
  15. P Stenström, Erik Hagersten, David Lilja, Margaret Martonosi, and Madan Venugopal: "Trends in Shared-Memory Multiprocessing," in IEEE Computer, Vol. 30, No. 12, pp. 44-50, December 1997. (pdf)
  16. F. Dahlgren, J. Skeppstedt, and P. Stenström: "An Evaluation of Hardware-Based and Compiler-Controlled Snooping Cache Protocol Extensions," in Journal of Future Generation Computer Systems, No. 13, pp. 469-487, 1998.(pdf)
  17. F. Dahlgren, M. Dubois, and P. Stenström: "Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors," in IEEE Transactions on Computers, Vol. 47, No 10, pp. 1041-1055, Oct. 1998.(pdf)
  18. J. Skeppstedt, F. Dahlgren, and P. Stenström: "Evaluation of Compiler-Controlled Updating to Reduce Coherence-Miss Penalties in Shared-Memory Multiprocessors," in Journal of Parallel and Distributed Computing, Vol. 56, No 2, pp. 122-153, 1999.(pdf)
  19. T. Lundqvist and P. Stenström: "An Integrated Path and Timing Analysis Method Based on Cycle-Level Symbolic Execution," In Journal of Real-Time Systems, 17 (2/3):183-207, November 1999.(pdf)
  20. H. Grahn and P. Stenström: "Comparative Evaluation of Latency-Tolerating and Reducing Techniques for Hardware-Only and Software-Only Directory Protocols", Journal of Parallel and Distributed Computing, Vol. 60, No. 7, pp. 807-834, July 2000.(pdf)
  21. J. Jalminger and P. Stenström "Improving Energy-Efficiency in Off-Chip Caches using Selective Prefetching", In Journal of Microprocessors and Microsystems, No 26, pp. 107-121, 2002.(pdf)
  22. P. Rundberg and P. Stenström: An All-SoftwareThread-Level Data Dependence Speculation System for Multiprocessors," Journal of Instruction Level Parallelism, Vol 3. Oct 2002.(pdf)
  23. Håkan Grahn and Per Stenström, "A Comparative Evaluation of Hardware-Only and Software-Only Directory Protocols in Shared-Memory Multiprocessors," Journal of Systems Architecture, 2003.

  24. Jonas Jalminger and Per Stenstrom: A Cache Block Reuse Prediction Scheme. Journal of Microprocessors and Microsystems. July 200

  25. J. Chen, M. Dubois, and P. Stenstrom: Integrating Complete-system and User-level Performance/Power Simulators: the SimWattch Approach. To appear in IEEE Micro Magazine, June 2006.

  26. K. De Bosschere, G. Gaydadjiev, X. Martorell, N. Navarro, M. O’Boyle, D. Pnevmatikatos, A. Ramirez, P. Sainrat, A. Seznec, P. Stenstrom, and O. Temam. High-Performance Embedded Architecture and Compilation Roadmap. In Transactions on High-Performance Embedded Architectures and Compilers. Vol 1, No 3. Dec. 2006.

  27. Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David B. Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter P. Puschner, Jan Staschulat, Per Stenstrom. The Determination of Worst-Case Execution Times — Overview of Methods and Survey of Tools. ACM Trans. Embedded Comput. Syst. 7(3): (2008)

  28. F. Warg and P. Stenstrom. Dual-Thread Speculation: A Simple Approach to Uncover Thread-Level Parallelism on a Simultaneous Multithreaded Processor. International Journal of Parallel Programming 36(2): 166-183 (2008)

  29. M. Thuresson, L. Spracklen, P. Stenstrom. Memory Link Compression Schemes: A Value Locality Perspective. IEEE Transactions on Computers, Jan 2008.

  30. M. M. Waliullah and P. Stenstrom. Schemes for Avoiding Starvation in Transactional Memory Protocols. Accepted Journal of Concurrency and Computation: Practice and Experience. May 2008.

  31. Jaeheon Jeong, Per Stenstrom, and Michel Dubois. Simple Penalty-Sensitive Cache Replacement Policies. Journal of ILP. Vol 10, July 2008.

Conference and Workshop Papers (refereed)

  1. P. Stenström and L. Philipson: "A layered emulator for design evaluation of MIMD multiprocessors with shared memory," in Proc. of PARLE (Parallel Architectures and Languages Europe), Lecture Notes in Computer Science, No 258, Springer-Verlag, pp. 329-344, June 1987.(pdf)
  2. P. Stenström: "VLSI Support for a Cactus Stack Oriented Memory Organization," in Proc. of the 21st Hawaii International Conference on System Sciences, pp. 211-220, January 1988.(pdf)
  3. P. Stenström, D. Vrsalovic, and Z. Segall: "Shared Data Structures in a Distributed System -- Performance Evaluation and Practical Considerations," in Proc. of IFIP TC 7/WG 7.3, International Seminar on Performance of Distributed and Parallel Systems, pp. 15-30, December 1988.(pdf)
  4. P. Stenström: "A Cache Consistency Protocol for Multiprocessors with Multistage Networks," in Proc. of 16th Annual International Symposium on Computer Architecture, pp. 407-415, May 1989.(pdf)
  5. E. Belitskaja, V. Sidorenko, and P. Stenström: "Testing of Memory with Defects of Fixed Configuration," in Second International Workshop on Algebraic and Combinatorial Coding Theory, pp. 24-27, Leningrad, September 1990. (pdf)
  6. P. Stenström, F. Dahlgren, and L. Lundberg: "A Lockup-free Multiprocessor Cache Design," in Proc. of International Conference on Parallel Processing, Vol 1, pp 246-250, August 1991.(pdf)
  7. F. Dahlgren and P. Stenström: "On Reconfigurable On-chip Data Caches," in Proc. of 24th ACM/IEEE International Symposium on Microarchitecture, pp. 189-198, November 1991.(pdf)
  8. F. Dahlgren and P. Stenström: "Reducing Write Latencies for Shared Data in a Multiprocessor with a Multistage Network," in Proc. of 25th Hawaii International Conference on System Sciences, pp. 449-456, January 1992.(pdf)
  9. P. Stenström: "A Latency-Hiding Scheme for Multiprocessors with Buffered Multistage Networks," in Proc. of International Parallel Processing Symposium, pp. 39-42, March 1992.(pdf)
  10. P. Stenström, T. Joe, and A. Gupta: "Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures," in Proc. of 19th Annual International Symposium on Computer Architecture, pp. 80-91, May 1992.(pdf)
  11. H. Nilsson and P. Stenström: "The Scalable Tree Protocol -- A Cache Coherence Approach for Large-Scale Multiprocessors," in Proc. of Fourth IEEE Symposium on Parallel and Distributed Processing, pp. 498-507, December 1992.(pdf)
  12. M. Brorsson and P. Stenström: "Visualising Sharing Behaviour and its Relation to Shared Memory Management," in Proc. of 1992 International Conference on Parallel and Distributed Systems, pp. 528-536, December 1992.(pdf)
  13. H. Nilsson and P. Stenström: "Performance Evaluation of Link-Based Cache Coherence Schemes," in Proc. of 26th Hawaii International. Conference on System Sciences, pp. 486-495, January 1993.(pdf)
  14. P. Stenström, H. Nilsson, and J. Skeppstedt: "Using Graphics and Animation to Visualize Instruction Pipelining and its Hazards," in Proc. of ICSEE'93, pp. 130-135 January 1993.(pdf)
  15. M. Brorsson, F. Dahlgren, H. Nilsson, and P. Stenström: "The CacheMire Test Bench--A Flexible and Effective Approach for Simulation of Multiprocessors," in Proc. of 26th IEEE Annual Simulation Symposium, pp. 41-49, March 1993.(pdf)
  16. P. Stenström, M. Brorsson, and L. Sandberg: "An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing," in Proc. of 20th ACM/IEEE Annual International Symposium on Computer Architecture, pp. 109-118, May 1993.(pdf)
  17. M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenström: "The Detection and Elimination of Useless Misses in Multiprocessors," in Proc. of 20th ACM/IEEE Annual International Symposium on Computer Architecture, pp. 88-97, May 1993. (pdf)
  18. F. Dahlgren, M. Dubois, and P. Stenström: "Fixed and Adaptive Sequential Prefetching for Shared-Memory Multiprocessors," in Proc. of 1993 International Conference on Parallel Processing, pp. 56-63, August 1993.(pdf)
  19. F. Dahlgren, M. Dubois, and P. Stenström: "Combined Performance Gains of Simple Cache Protocol Extensions, in Proc. of 21st ACM/IEEE Annual International Symposium on Computer Architecture, pp. 187-197, April 1994. (pdf)
  20. H. Nilsson and P. Stenström: "An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic," in Proc. of PARLE (Parallel Architectures and Languages Europe), pp. 363-374, June 1994. Best Paper Award at the conference.(pdf)
  21. F. Pong, P. Stenström, and M. Dubois: "An Integrated Methodology for Verification of Correctness of Cache Coherence Protocols" in Proc. of 1994 International Conference on Parallel Processing, pp. 158-165, August 1994.(pdf)
  22. F. Dahlgren and P. Stenström: "Reducing the Write Traffic for a Hybrid Cache Protocol," in Proc. of 1994 International Conference on Parallel Processing, pp. 166-173, August 1994. (pdf)
  23. J. Skeppstedt and P. Stenström: "Simple Compiler Algorithms to Reduce Ownership Overhead in Cache Coherence Protocols," in Proc. of 6th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pp. 286-296, October 1994.(pdf)
  24. M. Brorsson and P. Stenström: "Modelling Accesses to Stationary Data in Shared Memory Multiprocessors," in Proc. of the 7th International Conference on Parallel and Distributed Computing (PDCS'94), pp. 802-807, October 1994. (pdf)
  25. M. Brorsson and P. Stenström: "Modelling Accesses to Migratory and Producer-Consumer Characterised Data in a Shared Memory Multiprocessor," in Proc. of Sixth IEEE Symposium on Parallel and Distributed Processing, pp. 612-619, October 1994.(pdf)
  26. M. Björkman, F. Dahlgren, and P. Stenström: "Using Hints to Reduce Read Miss Penalties for Flat COMA Protocols, in Proc. of 28th Hawaii International Conference on System Sciences, pp. 242-251, January 1995.(pdf)
  27. F. Dahlgren and P. Stenström "Effectiveness of Stride and Sequential Hardware-based Prefetching in Shared-Memory Multiprocessors, in Proc. of First International Conference on High Performance Computer Architecture (HPCA-1), pp. 68-77, January 1995.(pdf)
  28. H. Grahn and P. Stenström: "Efficient Strategies for Software-Only Directory Protocols in Shared-Memory Multiprocessors," in Proc. of 22nd ACM/IEEE Annual International Symposium on Computer Architecture, pp. 38-47, June 1995.(pdf)
  29. J. Skeppstedt and P. Stenström: "A Compiler Algorithm that Reduces Read Latency in Ownership-Based Cache Coherence Protocols," in Proc. of Parallel Architectures and Compilation Techniques, pp. 69-78, July 1995.(pdf)
  30. F. Dahlgren, J. Skeppstedt, and P. Stenström: "Effectiveness of Hardware-Based and Compiler-Controlled Snooping Protocol Extensions," in Proc. of the International Conference on High Performance Computing, pages 87-92, December 1995.(pdf)
  31. M. Karlsson and P. Stenström: "Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM-Switches and Bus-Based Multiprocessor Servers," in Proc. of Second International Conference on High Performance Conputer Architecture, pages 4-13, Jan. 1996. (pdf)
  32. H. Grahn and P. Stenström: "Relative Performance of Software-Only and Hardware-Only Directory Protocols Under Latency Tolerating and Reducing Techniques," in Proceedings of the 11th International Parallel Processing Symposium, pages 500-506, April 1997.(pdf)
  33. P. Stenström and J. Skeppstedt: "A Performance Tuning Approach for Shared-Memory Multiprocessors" in Proceedings of EUROPAR'97, pp. 72-84, August 1997.(pdf)
  34. J. Nilsson, F. Dahlgren, M. Karlsson, P. Magnusson, P. Stenström: "Computer System Evaluation with Commercial Workloads" in Proc. of IASTED Conference on Modeling and Simulation. pp. 293-297, May 1998.(pdf)
  35. P Magnusson, F Dahlgren, H. Grahn, M. Karlsson, F. Larsson, A. Moestedt, J. Nilsson, P Stenström, and B. Werner: "SimICS/Sun4m: A Virtual Workstation. In Proc. of USENIX98,pp. 119-130, June 1998.(pdf)
  36. T. Lundqvist and P. Stenström: "Integrating Path and Timing Analysis using Instruction-Level Simulation Techniques," in Proc. of ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems. June 1998.(pdf)
  37. T. Lundqvist and P. Stenström: "Timing Anomalies in Dynamically Scheduled Processors," in Proc. of 1999 IEEE Real-Time System Symposium (RTSS'99), pp. 12-21 Dec. 1999.(pdf)
  38. T. Lundqvist and P. Stenström. "A Method to Improve the Estimated Worst-Case Performance of Data Caching". in Proc. of 6th International Conference on Real-Time Computing Systems and Applications (RTCSA'99), pp. 255-262, Dec 1999.(pdf)
  39. M. Karlsson, F. Dahlgren, and P. Stenström: "A Prefetching Technique for Irregular Accesses to Linked Data Structures," 6th IEEE Int. Symp. on High-Performance Computer Architecture (HPCA-6), pp. 206-217, 2000.(pdf)
  40. M. Karlsson, F. Dahlgren, and P. Stenström: "An Analytical Model for Working-Set Sizes in Decision Support Systems," In Proc. of ACM SIGMETRICS,pp. 275-285, 2000.(pdf)
  41. A. Saulsbury, F. Dahlgren, and P. Stenström: "Recency-Based TLB Preloading" in 27th ACM/IEEE Int. Symp. on Computer Architecture (ISCA-27), pp. 117-127, 2000.(pdf)
  42. J. Jalminger and P. Stenström "Boosting Energy-Efficiency of Off-Chip Caches using Selective Data Prefetching", in Proc. of IEEE Workshop on Complexity-Effective Computer Design, held in conjunction with ISCA-2000, June 2000.(pdf)
  43. P. Rundberg and P. Stenström: Low-Cost Thread-Level Data Dependence Speculation on Multiprocessors," in 4th Workshop on IEEE Multi-Threaded Execution, Architecture and Compilation (in conj. with Micro-33), Dec 2000. (Received the Best Paper award.)(pdf)
  44. U. Assarsson and P. Stenström: Evaluation of Load-Distribution Strategies for Hierarchical View Frustum Culling and Collision Detection. in EuroPar 2001, pages 663-673, Aug 2001.(pdf)
  45. F. Warg and P. Stenström: Limits on Speculative Module-Level Parallelism in Imperative and Objective-Oriented Programs on CMP Platforms. In Proc. of Int. Conf. on Parallel Architectures and Compiler Techniques (PACT'2001), pages 221-230, Sept. 2001.(pdf)
  46. M. Kämpe, P. Stenström, M. Dubois: The FAB Predictor: Using Fourier Analysis to Predict the Outcome of a Conditional Branch. In Proc. of 8th IEEE Int. Symp. on High-Performance Computer Architecture (HPCA-8), February 2002.(pdf)
  47. J. Hollmann, A. Ardö, and P. Stenström: "Empirical Observations regarding Predictability in User Access Behavior in a Distributed Digital Library System". In Second International Workshop on Internet Computing and E-Commerce (ICEC'02), April 2002.(pdf)
  48. M. Ekman, F. Dahlgren, and P. Stenström: "TLB and Snoop Energy-Reduction using Virtual Caches for Low-Power Chip-Multiprocessors". In Proc. of ACM ISLPED-2002.(pdf)
  49. M. Ekman, F. Dahlgren, and P. Stenström: Evaluation of Snoop-Energy Reduction Techniques for Chip-Multiprocessors. In Proc. of Workshop on Duplicating, Deconstructing, and Debunking (WDDD-1), May 2002. (pdf)
  50. M. Kämpe, P. Stenstrom, M. Dubois: Self-Correcting LRU Replacement Policies. Tech. Report, Department of Computer Engineering, In Second Workshop on Caching, Coherence, and Consistency (WC3 '02) June 2002.(pdf)
  51. Jianwei Chen, Michel Dubois, and P. Stenstrom: SimWattch: An Approach to Integrate Complete-System with User-Level Performance/Power Simulators. In Proc. of IEEE ISPASS-2003, March 2003. (pdf)
  52. J. Nilsson, A. Landin, P. Stenström: Coherence Predictor Cache: A Resource Efficient Coherence Message Prediction Infrastructure. In 6th IEEE International Symposium on Parallel and Distributed Processing Symposium, Abstract: page 10 (on CD) April 2003. (pdf)
  53. P. Rundberg and P. Stenström: Speculative Lock Reordering: Optimistic Out-of-Order Execution of Critical Sections. In 6th IEEE International Symposium on Parallel and Distributed Processing Symposium, Abstract: page 11 (on CD) April 2003.(pdf)
  54. F. Warg and P. Stenström: Improving Speculative Thread-Level Parallelism through Module Run-Length Prediction. In 6th IEEE International Symposium on Parallel and Distributed Processing Symposium, Abstract: page 12 (on CD) April 2003.(pdf)
  55. J. Hollmann, A. Ardö, P. Stenström: Evaluation of Document Prefetching in a Distributed Digital Library. March 2003. To appear in 7th European Conference and Research on Advanced Technology for Digital Libraries (ECDL'2003).(pdf)
  56. J. Jalminger and P. Stenström: A Novel Approach to Cache Block Reuse Prediction. To appear in ICPP-2003, Oct. 2003. (pdf)
  57. M. Ekman and P. Stenstrom: Performance and Power Impact of Issue-width in Chip-Multiprocessor Cores. To appear in ICPP-2003, Oct. 2003. (pdf)
  58. John Hughes, Kjell Jeppsson, Per Larsson-Edefors, Mary Sheeran, Per Stenstrom, Lars "J" Svensson, FlexSoC: Combining Flexibility and Efficiency in SoC Designs. To appear at the IEEE Norchip 2003 Conference. November 2003.

  59. M. Kämpe, P. Stenström, M. Dubois: Self-Correcting LRU Replacement Policies. Tech. Report, Department of Computer Engineering, In ACM Computing Frontiers (Invited). April 2004.

  60. Magnus Ekman and Per Stenstrom. Enhancing Simulation Speed using Matched-Pair Comparison. In Proc. of 2005 IEEE ISPASS. April 2005.

  61. Magnus Ekman and Per Stenstrom. A Cost-Effective Memory Organization for Future Servers. In Proc. of 2005 IEEE Int. Symp. on Parallel and Distributed Systems.

  62. Fredrik Warg and Per Stenstrom: Reducing Misspeculation Overhead for Module-Level Speculative Execution. In ACM Computing Frontiers. May 2005.

  63. Martin Thuresson and Per Stenstrom. Evaluation of Extended Dictionary-Based Static Code Compression Techniques. In ACM Computing Frontiers. May 2005.

  64. Magnus Ekman and Per Stenstrom: A Robust Memory Compression Scheme. In the 32nd IEEE/ACM Ann. Int. Symposium on Computer Architecture. Madison, June, 2005.

  65. E. Vallejo, M. Galluza, A. Cristal, F. Vallejo, R Beivide, P. Stenstrom, J. Smith, M. Valero. Implementing Kilo-Instruction Multiprocessors. In Proc. of 2005 IEEE International Conference on Pervasive Services. Santorini. July 2005

  66. Md. Mafijul Islam and Per Stenstrom: Reduction of Energy Consumption in Processors by Early Detection and Bypassing of Trivial Operations. To appear in 6th Conference on Embedded Computer Systems: Architectures, Modelling, and Simulation (SAMOS VI). July 2006.

  67. J.Jeong, P. Stenstrom and M. Dubois. Simple, Penalty-Sensitive Replacement Policies for Caches. In Proc. of 2006 ACM Int. Conf. on Computing Frontiers. May 2006.

  68. H. Dybdahl and P. Stenstrom. Enhancing Lower Level Cache Performance by Early Miss Determination and Bypassing. To appear in the Proc. of the 11th Asia-Pacific Computer Systems Architecture Conference (ACSAC06). Shanghai, Sept 2006.

  69. F. Warg and P. Stenstrom. Dual-Thread Speculation. Two Threads in the Machine is Better than Eight in the Bush. Accepted to SBAC 2006. May 2006

  70. M. Thuresson and P. Stenstrom. Scalable Value-Cache Based Compression Schemes for Multiprocessors. Accepted to SBAC 2006. May 2006.

  71. H. Dybdahl, P. Stenstrom, L. Natvig, A Cache-Partition Aware Replacement Policy for Chip Multiprocessors. (Best Paper Award.) Accepted to ACM 2006 HiPC. July 2006.

  72. H. Dybdahl, P. Stenstrom, L. Natvig, A Cache Replacement Algorithm based on Frequency and Recency for Chip Multiprocessors. Accepted to 2006 IEEE MEDEA workshop (in conjunction with PACT 2006), September 2006.
     

  73. M. M. Waliullah and P. Stenstrom. Starvation-Free Commit Arbitration Policies for Transactional Memory Systems. Accepted to IEEE dasCMP workshop (held in conjunction with IEEE Micro 2006). Dec. 2006.
    Shekhar Y. Borkar, Norm Jouppi, Per Stenstrom. Microprocessors in the Era of Terascale Integration. Invited Paper.To appear in DATE 2007. April 2007.
     

  74. H. Dybdahl and P. Stenstrom. An Adaptive Shared/Private NUCA Cache Partiotioning Scheme for Chip Multiprocessors. Accepted to IEEE HPCA 2007. February 2007.
     

  75. Md. Mafijul Islam, Alexander Busck, Mikael Engbom, Simji Lee, Michel Dubois, Per Stenstrom. Limits on Thread-Level Speculative Parallelism in Embedded Applications. Accepted 11th IEEE INTERACT workshop (in conjunction with IEEE HPCA 2007). January 2007.

     

  76. Magnus Bjork, Magnus Sjalander, Lars Svensson, Martin Thuresson, John Hughes, Kjell Jeppson, Jonas Karlsson, Per Larsson-Edefors, Mary Sheeran, and Per Stenstrom. Exposed Datapath for Efficient Computing. 2007 HiPEAC workshop on Reconfigurable Computing. January 2007.

  77. Martin Thuresson, Magnus Själander, Magnus Björk, Lars Svensson, Per Larsson-Edefors, Per Stenstrom. FlexSoC: Utilizing Exposed Datapath Control for Efficient Computing. In Proc. of IEEE SAMOS 2007. July 2007
     

  78. Md. Mafijul Islam and Per Stenstrom. Energy and Performance Tradeoffs between Instruction Reuse and Trivial Computations for Embedded Applications. Accepted in IEEE International Symposium on Embedded Computer Systems. April 2007.
     

  79. M. M. Waliullah and Per Stenstrom. Starvation-Free Commit Arbitration Policies for Transactional Memory Systems. In ACM Computer Architecture News, Vol. 35, No. 1, March 2007.
     

  80. Md. Mafijul Islam, Alexander Busck, Mikael Engbom, Simji Lee, Michel Dubois, Per Stenstrom. Limits on Thread-Level Speculative Parallelism in Embedded Applications. To appear in ICPP 2007, September 2007.
     

  81. M. M. Waliullah and P. Stenstrom. Starvation-Free Transactional Memory System Protocols. EUROPAR 2007. August 2007
    E. Vallejo, M. Galluzi A.. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J. Smith, M. Valero: Implicit Transactional Memory in Kilo-Instruction Processors. Invited. In Proc. of the 11th Asia-Pacific Computer Systems Architecture Conference (ACSAC06). Shanghai, Sept 2007.

  82. A. Bardine, P. Foglia, G. Gabrielli, C. A. Prete, and P. Stenstrom. Improving Power Efficiency of D-NUCA Caches. In ACM SIGARCH Computer Architecture News. December 2007.

  83. M. M. Waliullah and P. Stenstrom. Reducing Roll-back Overhead in Transactional Memory Systems by Checkpointing Conflicting Accesses. In Proc. of IEEE IPDPS 2008. March 2008.

  84. M.M. Waliullah and P. Stenstrom. Efficient Management of Speculative Data in Hardware Transactional Memory Systems. In Proc. of IEEE SAMOS 2008.. July 2008.

  85. M. Thuresson and P. Stenstrom. Accommodation of the Bandwidth of Large Cache Blocks using Cache/Memory Link Compression. In Proc. of ICPP 2008. September 2008.

  86. M. Thuresson, M. Själander, P. Stenstrom. A Flexible Code-Compression Scheme using Partitioned Look-Up Tables. Submitted to 4th Int. Conf. on High-Performance and Embedded Architectures and Compilers. January 2009.

  87. M. M. Waliullah and P. Stenstrom. Intermediate Checkpointing with Conflicting Access Prediction in Transactional Memory Systems. In Proc. of First MULTIPROG workshop (in conjunction with the Third Int. Conf on HiPEAC). January 2008.

  88. Alessandro Bardine, Pierfrancesco Foglia, Giacomo Gabrielli, Cosimo Antonio Prete and Per Stenstrom. A Micro-Architectural Power-Saving Technique for D-NUCA Caches. In Proc. of 4th Workshop on Unique Chips and Systems (in conjunction with 2008 IEEE ISPASS). April 2008.

  89. Mafijul Md. Islam and Per Stenstrom. Zero Loads: Canceling Load Requests by Tracking Zero Values. In the IEEE MEDEA Workshop (In concjunction with PACT). October, 2008.

 

Book Chapters (refereed)

  1. P. Stenström: "Shared-Memory Multiprocessors: A Cost-Effective Approach to High-Performance Computing," in Parallel Computing: Paradigms and Applications, Albert Zomaya (editor), ISBN: 1-85032-188-4 International Thomson Computer Press (London, U.K), 1996.
  2. P. Stenström, E. Hagersten, D. Lilja, M. Martonosi, M. Venugopal: "Shared-Memory Multiprocessing: Current State and Future Directions.", in Advances in Computers, Marvin Zelkowitz (editor), Academic Press, Vol. 53, pages 2-46, 2000.

Newsletters (unrefereed)

  1. M. Brorsson and P. Stenström: "Visualisation of Cache Coherence Bottlenecks in Shared-Memory Multiprocessor Applications," in NewsLetter of the Technical Committee on Computer Architecture, No 3, pp. 32-36, 1993. (pdf)
  2. P. Stenström: "Conception de la memoire dans les multiprocesseurs a memorie partagee," in Calculateurs Paralleles, Vol 6, No 3, pp. 83-136, 1994. Translated into French by Christine Rochange and Pascal Sainrat of Institute de Recherche en Informatique de Toulouse.(pdf)
  3. M. Karlsson and P. Stenström "Using Prefetching to Hide Lock Acquisition Latency in Distributed Virtual Shared Memory Systems," in NewsLetter of the Technical Committee on Computer Architecture. March 1997.(pdf)
  4. P. Stenström and F. Dahlgren: "A Holistic Approach to Computer System Design Education based on System Simulation Techniques," in NewsLetter of the Technical Committee on Computer Architecture, pp. 48-50, February 1999.(pdf)

Editorials

  1. P. Stenström: "Scalable Shared-Memory Architectures: Introduction to Minitrack," in Proc. of 27th Hawaii International Conference on System Sciences, pp. 520-521, January 1994.(pdf)
  2. P. Stenström and F. Dahlgren: "Applications for Shared-Memory Multiprocessors: Guest Editors' Introduction, in IEEE Computer, December 1996.(pdf)
  3. P. Stenström: "Architectural Trends for Shared-Memory Multiprocessors," in Proc of 30th Hawaii International Conference on System Sciences. January 1997. (pdf)
  4. P. Stenström and Patrice Quinton. "Parallel Computer Architecture and Image Processing," in Proc. of EUROPAR'97, pp. 763-765, Aug. 1997 (pdf)
  5. V. Milutinovic and P. Stenström "Opportunities and Challenges for Distributed Shared-Memory Multiprocessors. Guest Editors' Introduction, Proceedings of the IEEE. Vol 87 No 3, pp 399-404, March 1999.(pdf)
  6. S. Muller, P. Stenström, M. Valero, and S. Vassiliadis: "Parallel Computer Architecture", in Proc. of EUROPAR'00, Aug. 2000.(pdf)
  7. F. Mueller and P. Stenström " Proceedings of the 2003 International Conference on Langauges, Tools, and Compilers for Embedded Systems. ACM SIGPLAN, San Diego, June 2003(pdf)
  8. Proceedings of the 2004 IEEE/ACM Proceedings of the 31st International Symposium on Computer Architecture, editors Michel Dubois, Arndt Bode, and Per Stenström. Munich, June 2004.
  9. F. Mueller and P. Stenstrom. Introduction to Special Issue on “Languages, Compilers, and Tools for Embedded Systems,” in ACM Trans. on Embedded Computer Systems. 2005.
  10. B. Monien, G. Gao, H. Simon, P. Spirakis, P. Stenstrom. Introduction to Special Issue on “2004 International Parallel and Distributed Processing Symposium” in Journal of Parallel and Distributed Computing. 2005.
  11. P. Stenstrom, M. O’Boyle, F. Bodin, M. Cintra, Sally A. McKee (eds). Transactions on HiPEAC, Vol 1. Springer Verlag, 2007
  12. K. De Bosschere, D. Kaeli, P. Stenstrom, T. Ungerer, D. Whalley (eds). Proceedings of the 2007 International Conference on HiPEAC. Springer Verlag, January, 2007.
  13. M. Dubois and P. Stenstrom (eds). Proceedings of the 2007 ACM International Conference on Computing Frontiers. May 2007.
  14. P. Stenstrom, M. Dubios, M. Katevenis, and R. Gupta (eds). Proceedings of the 2008 International Conference on HiPEAC. Springer Verlag, January, 2008

  15. J. Carter, A. Gonzalez, and P. Stenstrom (eds). Proceedings of the 2008 IEEE International Symposium on High-Performance Computer Architecture. February 2008.