On March 11, 2015 the ALMA consortium presents a paper titled Reuse Distance Analysis for Locality Optimization in Loop-Dominated Application at the DATE2015.This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C code and exposes critical parts for memory related optimizations on embedded systems that can heavily affect systems performance, power and cost. The tool includes enhanced features for data reuse distance analysis and source code transformation recommendations for temporal locality optimization. Several of data reuse distance measurement algorithms have been implemented leading to different trade-offs between accuracy and profiling execution time. The proposed tool can be easily and seamlessly integrated into different software development environments offering a unified environment for application development and optimization. The novelties of our work over a similar optimization tool are also discussed. MemAddIn has been applied for the dynamic computation of data reuse distance for a number of different applications. Experimental results prove the effectiveness of the tool through the analysis and optimization of a realistic image processing application.
On Nov 5th, 2014 the ALMA consortium presented a paper titled Toward Scalable Source Level Accuracy Analysis for Floating-point to Fixed-point Conversion at the ICCAD 2014.In this paper, we explain how polyhedral methods to build high-level, more scalable accuracy models extend the applicability of analytical approach to verify the accurate conversion of floating-point to fixed-point. In an embedded systems context, many numerical algorithms must be converted from floating-point to fixed-point to meet cost and area constraints. While cheap and power-efficient, fixed-point arithmetic comes at the price of reduced accuracy and the designer must make sure that accuracy constraints are satisfied. Usually, one resorts to (slow) simulations. Analytical approaches have been proposed in order to avoid costly simulations and allow more thorough design-space exploration. However, their applicability is limited by severe scalability issues. We explain how to overcome this issues using polyhedral methods.
On Aug 28th, 2014 the ALMA consortium presented a paper titled Assigning and Scheduling Hierarchical Task Graphs to Heterogeneous Resources at PATAT2014.
Task Scheduling is an important problem having many practical applications. In the paper, the authors present a new method for solving complex real life scenarios by first solving a series of subproblems using Hierarchical Task Graphs (HTGs) (graphs that use existing “logical groupings” of tasks to describe hierarchy); and then gradually working from deeper levels of the HTG to the top level until a solution to the full problem emerges.
The paper is published in the Proceedings of the 10th International Conference of the Practice and Theory of Automated Timetabling (PATAT 2014), p. 40-52, ISBN: 978-0-9929984-0-0
On Aug 28th, 2014 the ALMA consortium presented a paper titled A Hierarchical Architecture Description for Flexible Multicore System Simulation at the ISPA 2014.
System Simulation is a crucial step within an iterative optimization and parallelization flow to find the optimal distribution of tasks and resources of a parallel application in a multicore system.
In this paper we present a hierarchical architecture description in combination with a flexible simulation environment. The hierarchical description enables the simulation framework to simulate different abstraction levels per system module. An application and software developer can now select the appropriate accuracy level of system simulation modules to find an optimal trade-off between simulation speed and simulation accuracy.
On May 19th, 2014 the ALMA consortium presented a paper titled A hybrid ILP-CP model for mapping Directed Acyclic Task Graphs to multicore architectures during the 21st Reconfigurable Architectures Workshop at the 28th Annual International Parallel & Distributed Processing Symposium (IPDPS 2014) at Phoenix, Arizona.
Directed Acyclic Task Graphs serve as typical kernel representation for embedded applications. Modern embedded multicore architectures raise new challenges for efficient mapping and scheduling of task DAGs providing a large number of heterogeneous resources. In this paper, a hybrid Integer Linear Programming - Constraint Programming method that uses the Benders decomposition is used to find proven optimal solutions. The proposed method is augmented with cuts generation schemes for accelerating the solution process. Experimental results show that the proposed method systematically outperforms an ILP-based solution method.The paper will be published in the Proceedings of the 2014 IEEE 28th International Parallel & Distributed Processing Symposium Workshops.
The HiPEAC newsletter 38 features a one-page article on the ALMA project titled Wondering how to program your next multi-core?
Your next embedded platform will probably have more processor cores than the previous. Do you, as an embedded systems programmer, have the tools to program and exploit the performance of the many cores? At DATE 2014, the ALMA consortium showed how ALMA makes life easier for multi-core programmers with a first demonstration of the ALMA automatic parallelization approach of the ALMA tool chain.Read more in the HiPEAC newsletter....
The ALMA consortium submitted a paper titled 'Profile-Guided Compilation of Scilab Algorithms for Multiprocessor Systems' to ARC 2014
The paper describes the iterative approach within the ALMA toolchain to optimize application parallelization. The profile-guided approach is based on application and system simulation generating valuable performance information used as feedback to application parallelization tools. This allows the toolchain to improve the performance and quality of generated parallel applications.
During ARC, the ALMA consortium also organises a special session on the ALMA tool chain
On December 10, 2013 , the ALMA consortium submitted a paper titled 'A flexible implementation of the PSO algorithm for fine- and coarse-grained reconfigurable embedded systems' during ReConFig 2013
This paper evaluates the parallel implementation of the Particle Swarm Optimization (PSO) algorithm on the Kahrisma architecture. The performance results are compared to the results obtained for a software solution running on a Microblaze-based SoC implemented on an FPGA.
The paper is published in the Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), 2013.
On OCT 10, 2013 the ALMA consortium presented a paper titled 'Dynamic Source Code Analysis for Memory Hierarchy Optimization in Multimedia Applications' at DASIP 2013.
Realizing image and signal processing algorithms in embedded systems is a three step process including algorithmic design, implementation and mapping to a target architecture and memory hierarchy.
On Sept 11, 2013 the ALMA consortium's paper titled 'Compiling Scilab to high performance embedded multicore systems' was published in Elsevier's Micpro.
The mapping process of high performance embedded applications to today’s multiprocessor system-on-chip devices suffers from a complex toolchain and programming process. The problem is the expression of parallelism with a pure imperative programming language, which is commonly C. This traditional approach limits the mapping, partitioning and the generation of optimized parallel code, and consequently the achievable performance and power consumption of applications from different domains.
The Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb (ALMA) European project aims to bridge these hurdles through the introduction and exploitation of a Scilab-based toolchain which enables the efficient mapping of applications on multiprocessor platforms from a high level of abstraction. The holistic solution of the ALMA toolchain allows the complexity of both the application and the architecture to be hidden, which leads to better acceptance, reduced development cost, and shorter time-to-market.
Driven by the technology restrictions in chip design, the end of exponential growth of clock speeds and an unavoidable increasing request of computing performance, ALMA is a fundamental step forward in the necessary introduction of novel computing paradigms and methodologies.
Keywords: Software toolchain; Multi-processor system-on-chip; Scilab; Compilation; Fine- and coarse-grain parallelization.
Get the article ''Compiling Scilab to high performance embedded multicore systems' .....
On August 27th, 2013 the ALMA consortium submitted a paper titled 'Scheduling using Integer Programming in heterogeneous parallel execution environments' during MISTA2013.
A computer program can be represented by a Directed Acyclic Graph (DAG) that captures the dependencies between the individual tasks in the program. This paper outlines a new mathematical model of Integer Programming for scheduling tasks over multiple processors serving as the execution environment. The multi-level approach called MATHL is benchmarked and consistently outperforms other DAG approaches with regards to the minimum overall execution time (a.k.a. schedule length or makespan), even for cases consisting over several hundreds of nodes.
The paper is published in the Proceedings of 6th Multidisciplinary International Conference on Scheduling (MISTA 2013), pp. 392 - 401.
On September 4, 2013the ALMA consortium submitted a paper titled 'Coarse-grain Optimization and Code Generation for Embedded Multicore Systems' during the 16th Euromicro Conference on Digital System Design (DSD 2013).
This paper discusses coarse grain parallelism extraction and optimization issues and explains how parallel code is generated for the ALMA toolset.
The paper is published in the Proceedings of the 16th Euromicro Conference on Digital System Design (DSD 2013).
On July 30th, 2013 the ALMA consortium submitted a paper titled 'Coarse grain parallelization using Integer Programming' during the 11th IEEE International Conference on Industrial Informatics.
The paper presents key parts of the coarse grain parallelism optimization. The coarse grain parallelism optimization considers the hierarchical task graph of a program and produces an optimized parallel schedule.
The paper is published in the Proceedings of the 11th IEEE International Conference on Industrial Informatics
On March 27, 2013 the ALMA consortium submitted a paper titled 'Coarse Grained Parallelism Optimization for Multicore Architectures: The ALMA Project Approach' during the ARC2013 conference.
The paper (publised by Springer) discusses the coarse grained parallelism optimization step of the ALMA EU FP7 project. The current results look promising, as the possibility to use Integer Programming and provide optimal results to the problem model seems feasible and efficient.
Keywords: Coarse grain parallelization, Scliab, Integer Programming.
Get the article 'Coarse Grained Parallelism Optimization for Multicore Architectures: The ALMA Project Approach' .....
On Dec 5, 2012 the ALMA consortium submitted a paper titled 'A Compilation- and Simulation-Oriented Architecture Description Language for Multicore Systems' during the EUC2012 conference.
The paper (see abstract on the right) explains the ALMA Architecture Description Language (ADL) and how its uses makes the ALMA Toolchain more independent of the hardware. The ADL provides necessary hardware architecture information for optimizing and parallelizing the application source code for multiprocessor Systems-on-Chip. The addition of simulation aspects to the ADL and a library based system simulator to the ALMA Toolchain further simplifies system and application performance evaluation to provide additional information for iterative application optimizations.
Get the article 'A Compilation- and Simulation-Oriented Architecture Description Language for Multicore Systems'.....
On Sept 6, 2012 the ALMA consortium submitted a paper titled 'From Scilab To High Performance Embedded Multicore Systems – The ALMA Approach' during the DSD 2012 conference.
The paper (see abstract on the right) explains how the ALMA parallelization approach will streamline the work for experts and narrow the entry gap for non-experts in parallel software development. Driven by the technology restrictions in chip design, the end of exponential growth of clock speeds, and an unavoidable increasing request of computing performance, ALMA is a fundamental step forward in the necessary introduction of novel computing paradigms and methodologies.
Get the article 'From Scilab To High Performance Embedded Multicore Systems – The ALMA Approach'.....
On July 17, 2012 the ALMA consortium submitted a paper titled 'From Scilab to Multicore Embedded Systems: Algorithms and Methodologies' during the SAMOS 2012 conference.
The paper (see abstract on the right) explains that the ALMA parallelization approach in a nutshell attempts to manage the complexity associated with the creation of parallel software for multi-core systems by alternating focus between very localized and holistic view program optimization strategies. In this manner, ALMA intends to narrow the entry gap for non-experts in parallel software development as well as to streamline the work for experts.
Get the article 'From Scilab to Multicore Embedded Systems: Algorithms and Methodologies'.....
On July 10, 2012 the ALMA consortium submitted a paper titled 'A Flexible Approach for Compiling Scilab to Reconfigurable Multi-Core Embedded Systems' during the ReCoSoc 2012 conference.
The paper (see abstract on the right) explains how the ALMA parallelization approach will streamline the work for experts and narrow the entry gap for non-experts in parallel software development.
Get the article 'A Flexible Approach for Compiling Scilab to Reconfigurable Multi-Core Embedded Systems'.....
On Jan 25, 2012, the ALMA consortium presented a poster titled Architecture oriented paraLlelization for high performance embedded Multicore systems using scilAb during the HiPEAC 2012 conference.
The poster outlines the key features of the ALMA project. It shows Recore's and KIT's multi-core architectures, which will be used in validating the ALMA tool chain.
It also illustrates how ALMA makes a target hardware architecture into 'just another parameter' in the tool chain, freeing the programmer of a mandatory understanding of the target hardware architecture.