High Performance Memory SystemsHaldun Hadimioglu, David Kaeli, Jeffrey Kuskin, Ashwini Nanda, Josep Torrellas Springer Science & Business Media, 2003. gada 31. okt. - 297 lappuses The State of Memory Technology Over the past decade there has been rapid growth in the speed of micropro cessors. CPU speeds are approximately doubling every eighteen months, while main memory speed doubles about every ten years. The International Tech nology Roadmap for Semiconductors (ITRS) study suggests that memory will remain on its current growth path. The ITRS short-and long-term targets indicate continued scaling improvements at about the current rate by 2016. This translates to bit densities increasing at two times every two years until the introduction of 8 gigabit dynamic random access memory (DRAM) chips, after which densities will increase four times every five years. A similar growth pattern is forecast for other high-density chip areas and high-performance logic (e.g., microprocessors and application specific inte grated circuits (ASICs)). In the future, molecular devices, 64 gigabit DRAMs and 28 GHz clock signals are targeted. Although densities continue to grow, we still do not see significant advances that will improve memory speed. These trends have created a problem that has been labeled the Memory Wall or Memory Gap. |
Saturs
Introduction to HighPerformance Memory Systems | 5 |
12 PowerAware Reliable and Reconfigurable Memory | 6 |
13 SoftwareBased Memory Tuning | 7 |
14 ArchitectureBased Memory Tuning | 9 |
15 Workload Considerations | 11 |
Speculative Locks Concurrent Execution of Critical Sections in SharedMemory Multiprocessors | 15 |
22 Speculative Locks | 16 |
23 Evaluation | 25 |
102 Related Work | 160 |
103 Algorithms | 161 |
104 Results | 165 |
References | 166 |
Array Merging A Technique for Improving Cache and TLB Behavior | 169 |
112 Related Work | 170 |
113 Basic Notions | 172 |
114 Cacheconscious Merging | 174 |
24 Conclusions | 26 |
Acknowledgments | 27 |
Dynamic Verification of Cache Coherence Protocols | 29 |
32 Dynamic Verification of Cache Coherence | 33 |
33 SMP Coherence Checker Correctness Coverage and Specificity | 37 |
34 Coherence Checker Overhead | 38 |
35 Related Work | 42 |
36 Future Work | 43 |
References | 44 |
TimestampBased Selective Cache Allocation | 47 |
42 Related Work | 48 |
43 Evaluation Methodology | 50 |
45 Selective Allocation | 52 |
46 Experimental Results | 57 |
47 Future Work | 58 |
PowerEfficient Cache Coherence | 67 |
52 Snoopy Coherence Protocols | 68 |
53 Methodology | 70 |
54 Directory Protocols | 76 |
55 Simulation Results | 79 |
56 Conclusion | 81 |
References | 82 |
Improving Power Efficiency with an Asymmetric SetAssociative Cache | 83 |
62 Related Work | 85 |
63 Methodology and Modeling | 88 |
64 Asymmetric SetAssociative Cache | 89 |
65 Results | 94 |
66 Discussion and Future Work | 96 |
67 Conclusions | 98 |
References | 99 |
Memory Issues in HardwareSupported Software Safety | 101 |
72 Historical Context | 102 |
73 Motivating Applications | 104 |
74 Architectural Mechanisms | 108 |
75 Results | 112 |
76 Conclusions | 114 |
References | 115 |
Reconfigurable Memory Module in the RAMP System for Stream Processing | 117 |
82 RAMP Architecture | 119 |
83 Cluster Architecture | 121 |
84 Memory Module Architecture | 123 |
86 Controller | 127 |
87 Handshake Blocks | 130 |
88 Scan Chain Register | 131 |
810 Conclusion | 133 |
Performance of Memory Expansion TechnologyMXT | 139 |
92 Overview of MXT Hardware | 141 |
93 The MXT Memory Management Software | 143 |
94 Performance Evaluation | 144 |
95 Related Work | 152 |
96 Conclusions | 155 |
ProfileTuned Heap Access | 157 |
115 Case study | 177 |
116 Experimental Results | 179 |
117 Conclusions | 182 |
Software Logging under Speculative Parallelization | 185 |
122 Speculative Parallelization and Versioning | 187 |
123 Speculation Protocol Used | 190 |
124 Efficient Software Logging 1241 Log Operations | 191 |
125 Evaluation Methodology 1251 Simulation Environment | 193 |
126 Evaluation | 195 |
128 Conclusion | 197 |
An Analysis of Scalar Memory Accesses in Embedded and Multimedia Systems | 203 |
132 Previous Work | 204 |
133 Experimental Setup | 205 |
134 Results | 206 |
135 Conclusion and Future Work | 213 |
References | 214 |
BandwidthBased Prefetching for ConstantStride Arrays | 217 |
142 Previous Work | 218 |
143 OffChip Bandwidth | 219 |
144 Cache Conflicts | 221 |
145 Algorithm Details | 222 |
147 Conclusion | 228 |
References | 229 |
Performance Potential of Effective Address Prediction of Load Instructions | 231 |
152 Effective Address Predictors | 234 |
153 Evaluation Methodology | 238 |
154 Results | 242 |
155 Related Work | 246 |
156 Conclusion and Future Work | 247 |
References | 249 |
Evaluating Novel Memory System Alternatives for Speculative Multithreaded Computer Systems | 253 |
162 Background and Motivation | 254 |
163 The Superthreaded Architecture Model | 255 |
164 Methodology | 256 |
165 Results | 258 |
166 Conclusion | 261 |
References | 264 |
Evaluation of Large L3 Caches Using TPCH Trace Samples | 267 |
172 TPCH Traces | 268 |
173 Evaluation Methodology | 271 |
174 Simulation Results | 272 |
176 Conclusion | 279 |
Acknowledgments | 280 |
Exploiting Intelligent Memory for Database Workloads | 283 |
182 Related Work | 284 |
183 FlexRAM | 285 |
184 FlexDB | 286 |
185 Experimental Setup | 289 |
186 Experimental Results | 291 |
297 | |
Citi izdevumi - Skatīt visu
High Performance Memory Systems Haldun Hadimioglu,David Kaeli,Jeffrey Kuskin,Ashwini Nanda,Josep Torrellas Ierobežota priekšskatīšana - 2011 |
High Performance Memory Systems Haldun Hadimioglu,David Kaeli,Jeffrey Kuskin,Ashwini Nanda,Josep Torrellas Priekšskatījums nav pieejams - 2012 |
High Performance Memory Systems Haldun Hadimioglu,David Kaeli,Jeffrey Kuskin,Ashwini Nanda,Josep Torrellas Priekšskatījums nav pieejams - 2012 |
Bieži izmantoti vārdi un frāzes
Address Predictor algorithm allocation applications array array merging asymmetric caches benchmarks byte cache block cache coherence cache conflicts cache line cache misses chapter checker checking cluster coherence misses coherence protocol compiler compression Computer Architecture configuration counter cycle data fetch DEAP direct-mapped effective address prediction entry evaluation execution Figure hardware hybrid IEEE implementation instructions International Symposium iteration L3 cache latency level-1 data cache logging lookup loop loop fusion memory accesses memory hierarchy memory module memory system merging miss rate Miss ratio multiprocessor optimization P.Array P.Mem parallel performance physical memory pipeline pointer power consumption prefetch Proceedings processor queries references request reuse scalar scan chain serial snooping set-associative cache shared level-1 shows simulation SPEC speculative locks speculative memory buffer speculative threads speedup structure Superscalar Superthreaded Table techniques thread units trace samples transaction workloads