Computer Science

Parallel Architectures

Parallel architectures refer to computer systems designed to carry out multiple operations simultaneously. They are characterized by the use of multiple processing units that work together to execute tasks in parallel, leading to improved performance and efficiency. Parallel architectures are commonly used in high-performance computing, scientific simulations, and data-intensive applications.

Written by Perlego with AI-assistance

6 Key excerpts on "Parallel Architectures"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Microelectronics
    eBook - ePub
    • Jerry C. Whitaker(Author)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)

    ...Multithreaded architectures take advantage of these advances to obtain high-performance systems. FIGURE 16.5 Superscalar and superpipelined systems: (a) pipelined, (b) superpipelined, (c) superscalar. 16.4 Multiple Processor Systems Parallelism can be introduced into a computer system at several levels. Probably the simplest level is to have multiple processors in the system. If parallelism is to be incorporated at the processor level, usually one of the following structures is used: SIMD, MIMD, or multicomputers. A SIMD structure allows several data streams to be acted on by the same instruction stream, as shown in Fig. 16.6(a). Some problems map well into a SIMD architecture, which uses a single instruction stream and avoids many of the pitfalls of coordinating multiple streams. Usually, this structure requires that the data be bit serial, and this structure is used extensively in applications such as computer graphics and image processing. The SIMD structure provides significant throughput for these problems. For many applications that require a single data stream to be manipulated by a single instruction stream, the SIMD structure works slower than the other structures because only one instruction stream is active at a time. To overcome this difficulty structures that could be classified as a combination SIMD/MIMD structure have been applied. In a SIMD system, one instruction stream may control thousands of data streams. Each operation is performed on all data streams simultaneously. MIMD systems allow several processes to share a common set of processors and resources, as shown in Fig. 16.6(b). Multiple processors are joined together in a cooperating environment to execute programs. Typically, one process executes on each processor at a time...

  • The Electrical Engineering Handbook

    ...The extraction of coarse-grained parallelism from a software description and, indeed, the study of languages used to describe parallel software are a flourishing area of research. With the introduction of the single-chip multiprocessor (Olukotun et al., 1996), the dividing line between research into high-performance system architectures and high-performance processors is becoming blurred. Very coarse-grained programmable logic devices such as Chess (Marshall et al., 1999) or the RAW machines proposed by Waingold et al. (1997) can be considered radically different forms of multiprocessor architecture. These approaches eliminate traditional instruction-set interfaces and instead rely heavily on compilation to directly customize the hardware to a particular application. Relying on compilation is possible because the hardware consists of a simple regular array of interchangable processing units. Clearly, this blurs the boundary between compiler research and traditional high-level synthesis, another area under intense investigation (Detton and Wawrzynek, 1999) Extending this approach of exposing the inner architecture of a processor to the compiler even further results in the possibility of more fine-grained reconfigurable computing techniques using field programmable gate arrays (Luk et al.,). 2.6 Summary It is becoming increasingly attractive to build computer systems containing several interacting processors. In general, it is significantly more cost-effective to exploit the parallelism inherent in an algorithm by using multiprocessor approaches than it is to design a single faster uniprocessor. Multiprocessor architectures have been categorized by the existence of single or multiple instruction and data streams. This chapter has examined multiple-instruction, multipledata, and MIMD architectures in some detail. The most common form of MIMD multiprocessor arrangement is one of symmetric multiprocessors (SMP)...

  • A Workout in Computational Finance
    • Andreas Binder, Michael Aichinger(Authors)
    • 2013(Publication Date)
    • Wiley
      (Publisher)

    ...In shared memory architectures, all processors have access to the whole available memory as a global address space, albeit access speeds to different parts of the memory may not be uniform on some architectures. In distributed memory architectures, each processor has its own local memory, and communicates with others by sending messages via a communication network. Distributed memory systems are often comprised of smaller shared memory systems connected by a fast network. Another way of classifying Parallel Architectures is by their execution model. Flynn’s taxonomy (Barney) distinguishes between four classes – the two most relevant for present hardware are MIMD (multiple instruction, multiple data) and SIMD (single instruction, multiple data). In MIMD systems, the processing units are capable of executing different instruction streams on different data independently, as schematically shown in Figure 18.1. Multicore CPUs, for instance, fall into this class. In SIMD architectures, on the other hand, all processing units execute the same instruction at any given time, but each processing unit can operate on different data. This way, certain problems can be parallelized very efficiently, as shown in Figure 18.2. Streaming multiprocessors contained in modern GPUs are of this type. FIGURE 18.1 The figure shows the principle of MIMD processors, where every processor may be executing a different instruction stream. Currently, computers built on the MIMD architecture are the most common type of parallel computers. Note that many MIMD architectures nowadays also include SIMD execution sub-components. FIGURE 18.2 The figure shows the working principle of SIMD processors for the example of an element-wise multiplication of two arrays A and B. Note how a single instruction stream is executed on multiple data...

  • VLSI Design
    eBook - ePub
    • M. Michael Vai(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)

    ...Chapter 11 Parallel Structures Two brains are better than one … Pipelining and parallel processing can be used to improve system throughputs, often significantly. If an application either processes a large amount of data or contains a large number of iterations (i.e., loops), it is a candidate for parallelism exploration. In a parallel processing system, processing elements cooperate to produce the desired result. This requires the creation of a parallel algorithm which, in many cases, involves the parallelization of an existing sequential algorithm. Parallel Architectures and algorithms for general-purpose parallel processing have been extensively studied for several decades. We are interested in the creation and application of application specific parallel processing techniques enabled by the low cost and high density of VLSI circuits. General-purpose parallel processing is discussed in this chapter to provide the necessary background for the VLSI parallel algorithm/architecture development to be presented in Chapter 12. There is a main difference between general-purpose parallel processing and special-purpose Parallel Architectures. With only a few exceptions, general-purpose Parallel Architectures employ only a few relatively powerful processing elements (PEs) while special-purpose Parallel Architectures have the potential of using a large number of simple PEs. We use the term “PE” instead of “processor” since the latter often implies a microprocessor. In the realm of VLSI, a PE could be as simple as an integrated multiplier and accumulator. 11.1 Parallel Architectures An informal definition for the term “parallel processing” is that multiple PEs are utilized in a coordinated manner to support the solving of a problem...

  • Parallel Programming for Modern High Performance Computing Systems

    ...Furthermore, volunteer based computing systems are discussed that can be lower cost alternative approaches suitable for selected problems. In these cases, however, reliability of computations as well as privacy might be a concern. Finally, for completeness, a grid based approach is discussed as a way to integrate clusters into larger computing systems. Chapter 3 first describes main concepts related to parallelization. These include data partitioning and granularity, communication, allocation of data, load balancing and how these elements may impact execution time of a parallel application. Furthermore, the chapter introduces important metrics such as speed-up and parallel efficiency that are typically measured in order to evaluate the quality of parallelization. The chapter presents main parallel processing paradigms, their concepts, control and data flow and potential performance issues and optimizations. These are abstracted from programming APIs and are described in general terms and are then followed by implementations in following chapters. Chapter 4 introduces basic and important parts of selected popular APIs for programming parallel applications. For each API a sample application is presented. Specifically, the following APIs are presented: 1.  Message Passing Interface (MPI) for parallel applications composed of processes that can exchange messages between each other...

  • Software Engineering for Embedded Systems
    eBook - ePub

    Software Engineering for Embedded Systems

    Methods, Practical Techniques, and Applications

    • Robert Oshana, Mark Kraeling(Authors)
    • 2019(Publication Date)
    • Newnes
      (Publisher)

    ...22 shows the scalable nature of data parallelism. Fig. 22 Data parallelism is scalable with the data size. In the example given in Fig. 23 an image is decomposed into sections or “chunks” and partitioned to multiple cores to process in parallel. The “image in” and “image out” management tasks are usually performed by one of the cores (an upcoming case study will go into this in more detail). Fig. 23 Data parallel approach. 4.3 Task Parallelism Task parallelism distributes different applications, processes, or threads to different units. This can be done either manually or with the help of the operating system. The challenge with task parallelism is how to divide the application into multiple threads. For systems with many small units, such as a computer game, this can be straightforward. However, when there is only one heavy and well-integrated task the partitioning process can be more difficult and often faces the same problems associated with data parallelism. Fig. 24 is an example of task parallelism. Instead of partitioning data to different cores the same data are processed by each core (task), but each task is doing something different on the data. Fig. 24 Task parallel approach. Task parallelism is about functional decomposition. The goal is to assign tasks to distinct functions in the program. This can only scale to a constant factor. Each functional task, however, can also be data parallel. Fig. 25 shows this. Each of these functions (atmospheric, ocean, data fusion, surface, wind) can be allocated to a dedicated core, but only the scalability is constant. Fig. 25 Function allocation in a multicore system (scalability limited). 5 Multicore Programming Models A “programming model” defines the languages and libraries that create an abstract view of a machine. For multicore programming the programming model should consider the following: • Control—this part of the programming model defines how parallelism is created and how dependencies (orderings) are enforced...