Monday 28 May 2012

Heterogeneous computing


Heterogeneous computing systems refer to electronic systems that use a variety of different types of computational units. A computational unit could be a general-purpose processor (GPP), a special-purpose processor (i.e. digital signal processor (DSP) or graphics processing unit (GPU)), a co-processor, or custom acceleration logic (application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA)). In general, a heterogeneous computing platform consists of processors with different instruction set architectures (ISAs). The demand for increased heterogeneity in computing systems is partially due to the need for high-performance, highly reactive systems that interact with other environments (audio/video systems, control systems, networked applications, etc.). In the past, huge advances in technology and frequency scaling allowed the majority of computer applications to increase in performance without requiring structural changes or custom hardware acceleration. While these advances continue, their effect on modern applications is not as dramatic as other obstacles such as the memory-wall and power-wall come into play.12 Now, with these additional constraints, the primary method of gaining extra performance out of computing systems is to introduce additional specialized resources, thus making a computing system heterogeneous.34 This allows a designer to use multiple types of processing elements, each able to perform the tasks that it is best suited for.5 The addition of extra, independent computing resources necessarily allows most heterogeneous systems to be considered parallel computing, or multi-core (computing) systems. Another term sometimes seen for this type of computing is "hybrid computing".6 Hybrid-core computing is a form of heterogeneous computing wherein asymmetric computational units coexist with a "commodity" processor.
The level of heterogeneity in modern computing systems gradually rises as increases in chip area and further scaling of fabrication technologies allows for formerly discrete components to become integrated parts of a system-on-chip, or SoC. As an example, many new processors now include built-in logic for interfacing with other devices (SATA, PCI, Ethernet, RFID, Radios, UARTs, and Memory Controllers), as well as programmable functional units and hardware accelerators (GPUs, Encryption Co-processors, programmable network processors, A/V encoders/decoders, etc.).

Common features


Heterogeneous accretion systems present new challenges not begin in archetypal constant systems. The attendance of assorted processing elements raises all of the issues circuitous with constant alongside processing systems, while the akin of adverse in the arrangement can acquaint non-uniformity in arrangement development, programming practices, and all-embracing arrangement capability. Areas of adverse can cover 7:

ISA or apprenticeship set architecture

Compute elements may accept altered apprenticeship set architectures, arch to bifold incompatibility.

ABI or appliance bifold interface

Compute elements may adapt anamnesis in altered ways. This may cover both endianness, calling convention, and anamnesis layout, and depends on both the architectonics and compiler getting used.

API or appliance programming interface

Library and OS casework may not be analogously accessible to all compute elements.

Low-Level Implementation of Language Features

Language appearance such as functions and accoutrement are generally implemented application action pointers, a apparatus which requires added adaptation or absorption if acclimated in amalgamate environments.

Memory Interface and Hierarchy

Compute elements may accept altered accumulation structures, accumulation coherency protocols, and anamnesis admission may be compatible or non-uniform anamnesis admission (NUMA). Differences can aswell be begin in the adeptness to apprehend approximate abstracts lengths as some processors/units can alone accomplish byte-, word-, or admission accesses.

Interconnect

Compute elements may accept differing types of interconnect abreast from basal memory/bus interfaces. This may cover committed arrangement interfaces, Direct anamnesis admission (DMA) devices, mailboxes, FIFOs, and scratchpad memories, etc.

Heterogeneous platforms generally crave the use of assorted compilers in adjustment to ambition the altered types of compute elements begin in such platforms. This after-effects in a added complicated development action compared to constant systems process; as assorted compilers and linkers have to be acclimated calm in a adamant way in adjustment to appropriately ambition a amalgamate platform. Interpretive techniques can be acclimated to adumbrate heterogeneity, but the amount (overhead) of estimation generally requires the use of just-in-time accumulation mechanisms that aftereffect in a added circuitous run-time arrangement that may be clashing in embedded, or real-time scenarios.

Heterogeneous computing platforms


Texas Instruments OMAP
Analog Devices Blackfin
IBM Cell
SpursEngine
Emotion Engine
Intel IXP Network Processors
Xilinx Platform FPGAs (Virtex-II Pro, Virtex 4 FX, Virtex 5 FXT)
Cray XD1
SRC Computers SRC-6 and SRC-7
Convey Computer Corporation's HC-1
Atmel Diopsis
Intel Sandy Bridge and AMD Fusion CPUs
Intel "Stellarton" (Atom + FPGA)

Applications

A sensor grid based architecture has many applications such as environmental and habitat monitoring, healthcare monitoring of patients, weather monitoring and forecasting, military and homeland security surveillance, tracking of goods and manufacturing processes, safety monitoring of physical structures and construction sites, smart homes and offices, and many other uses currently beyond our imagination. Various architectures that can be used for such applications as well as different kinds of data analysis and data mining that can be conducted. Examples of architectures that integrate a mobile sensor network and Grids are given in 7

Programming Heterogeneous Computing Architectures

Programming heterogeneous machines can be difficult since developing programs that make best use of characteristics of different processors increases the programmer's burden. It increases code complexity and decreases portability of the code by requiring hardware specific code to be interleaved throughout application code 8. Balancing the application workload across processors can be problematic, especially given that they typically have different performance characteristics. There are different conceptual models to deal with the problem, for example using a coordination language and program building blocks (programming libraries and/or higher order functions). Each block can have a different native implementation for each processor type. Users simply program using these abstractions and an intelligent compiler chooses the best implementation based on the context.9