Data Access and Storage Management for Embedded Programmable Processors
Francky Catthoor, K. Danckaert, K.K. Kulkarni, E. Brockmeyer, Per Gunnar Kjeldsberg, T. van Achteren, Thierry Omnes
Springer Science & Business Media, 2013. gada 14. marts - 306 lappuses
Data Access and Storage Management for Embedded Programmable Processors gives an overview of the state-of-the-art in system-level data access and storage management for embedded programmable processors. The targeted application domain covers complex embedded real-time multi-media and communication applications. Many of these applications are data-dominated in the sense that their cost related aspects, namely power consumption and footprint are heavily influenced (if not dominated) by the data access and storage aspects. The material is mainly based on research at IMEC in this area in the period 1996-2001. In order to deal with the stringent timing requirements and the data dominated characteristics of this domain, we have adopted a target architecture style that is compatible with modern embedded processors, and we have developed a systematic step-wise methodology to make the exploration and optimization of such applications feasible in a source-to-source precompilation approach.
In a first part of the book, we introduce the context and motivation, followed by a once-over-lightly view of the entire approach, illustrated on a relevant driver from the targeted application domain. In part 2, we show how source-to-source code transformations play a crucial role in the solution of the earlier mentioned data transfer and storage bottleneck in modern processor architectures for multi-media and telecommunication applications. This is especially true for embedded applications where cost issues like memory footprint and power consumption are vital. It is also shown that many of these code transformations can be defined in a platform-independent way. The resulting optimized code behaves better on any of the modern platforms. The steps include global data-flow and loop transformations, data reuse decisions, high-level estimators and the link with parallelisation and multi-processor partitioning. In part 3 we discuss our research efforts relating to the mapping of embedded applications to specific memory organisations in embedded programmable processors. In a traditional processor-based environment, compilers perform memory optimizations assuming a fully fixed hardware target architecture with only maximal performance in mind. However, in an embedded context also cost issues and especially power consumption and memory footprint play a dominant role too. Usually the timing requirements are given and the application designer is mostly interested in the trade-off between timing characteristics of the different application tasks and their cost effects. For this purpose Pareto type trade-off curves are the most suitable vehicle to address this design problem. The steps involved here include the storage cycle budget distribution, support of modern memory architectures like SDRAMs, and cache related issues.
Lietotāju komentāri - Rakstīt atsauksmi
Ierastajās vietās neesam atraduši nevienu atsauksmi.
RELATED COMPILERWORK ON DATA TRANSFER AND STORAGE MANAGEMENT
GLOBAL LOOP TRANSFORMATIONS
AUTOMATED DATA REUSE EXPLORATION TECHNIQUES
STORAGE CYCLE BUDGET DISTRIBUTION
Systemwide energy cost versus cycle budget tradeoff
CONCLUSIONS AND FUTURE WORK
Citi izdevumi - Skatīt visu
algorithm allocation applications approach array ATOMIUM cache misses common iteration space complex computation conflict graph conflict misses constraints context control flow copy-candidate cost function cycle budget data layout organization data mapping data transfer data-dominated data-flow data-path dependency cone dependency vectors dimension direct mapped DTSE DTSE methodology elements example execution ordering exploration Figure fixed hardware in-place mapping initial iteration domain iteration nodes kpart LCDO locality loop fusion loop nest loop tiling loop transformations main memory memory accesses memory architecture memory bandwidth memory hierarchy memory organisation motion estimation multi-media off-chip on-chip optimisation optimized ordering vector outermost Pareto curve partitioning performance polytope model polytopes power consumption problem real-time reduce SCBD scheduling SDRAM search space shown in Fig signals solution step storage requirement subsection system-level target techniques tiling time-frame total number trade-off transfer and storage