Computer Science

MIMD

MIMD, or Multiple Instruction, Multiple Data, is a parallel computing architecture where multiple processors execute different instructions on different pieces of data simultaneously. This allows for independent processing of multiple tasks, making it suitable for complex and diverse computational workloads. MIMD systems can be either shared memory or distributed memory architectures.

Written by Perlego with AI-assistance

4 Key excerpts on "MIMD"

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.
  • Microelectronics
    eBook - ePub
    • Jerry C. Whitaker(Author)
    • 2018(Publication Date)
    • CRC Press
      (Publisher)
    Fig. 16.6(a) . Some problems map well into a SIMD architecture, which uses a single instruction stream and avoids many of the pitfalls of coordinating multiple streams. Usually, this structure requires that the data be bit serial, and this structure is used extensively in applications such as computer graphics and image processing. The SIMD structure provides significant throughput for these problems. For many applications that require a single data stream to be manipulated by a single instruction stream, the SIMD structure works slower than the other structures because only one instruction stream is active at a time. To overcome this difficulty structures that could be classified as a combination SIMD/MIMD structure have been applied. In a SIMD system, one instruction stream may control thousands of data streams. Each operation is performed on all data streams simultaneously.
    MIMD systems allow several processes to share a common set of processors and resources, as shown in Fig. 16.6(b) . Multiple processors are joined together in a cooperating environment to execute programs. Typically, one process executes on each processor at a time. The difficulties with traditional MIMD architectures lie in fully utilizing the resources when instruction streams stall (due to data dependencies, control dependencies, synchronization problems, memory accesses, or I/O accesses) or in assigning new processes quickly once the current process has finished execution. An important problem with this structure is that processors may become idle due to improper load balancing. Implementing an operating system (OS) that can execute on the system without creating imbalances is important to maintain a high utilization of resources.
    FIGURE 16.6 System level parallelism: (a) SIMD, (b) MIMD, (c) multicomputer.
    The next system, which is also popular due to its simple connectivity, is the distributed system or multicomputer. A network connects independent processors as shown in Fig. 16.6(c) . Each processor is a separate entity, usually running an independent operating system process. A multicomputer will usually use message passing to exchange data and/or instruction streams between the processors. The main difficulties with the multicomputer are the latency involved in passing messages and the difficulty in mapping some algorithms to a distributed memory system.
    16.5  Memory Hierarchy
    High-performance computer systems use a multiple level memory hierarchy ranging from small, fast cache memory to larger, slower main memory to improve performance. Parallelism can be introduced into a system through the memory hierarchy as depicted in Fig. 16.7
  • A Workout in Computational Finance
    • Andreas Binder, Michael Aichinger(Authors)
    • 2013(Publication Date)
    • Wiley
      (Publisher)
    FIGURE 18.1 The figure shows the principle of MIMD processors, where every processor may be executing a different instruction stream. Currently, computers built on the MIMD architecture are the most common type of parallel computers. Note that many MIMD architectures nowadays also include SIMD execution sub-components.
    FIGURE 18.2 The figure shows the working principle of SIMD processors for the example of an element-wise multiplication of two arrays A and B. Note how a single instruction stream is executed on multiple data. Modern GPUs are often wide SIMD implementations which are capable of branches, loads, and stores on 128 or 256 bits at a time.
    As a consequence of the variety in hardware, different high-level programming frameworks and APIs (application programming interfaces) have evolved. Here, we only discuss a number of widely used ones, and again refer the reader to Barney for a more exhaustive list. In the Message Passing Interface (MPI) programming model, for instance, each process only has access to its own local memory; multiple processes can be executed either on the same physical machine and/or on an arbitrary number of machines connected via a network. Processes can communicate and exchange data by sending and receiving messages. This is a cooperative operation, i.e., data is sent by one process and explicitely received by another; any change in the receiving process’s memory is thus made with its explicit participation. MPI has originated from distributed memory architectures, but can also be used on shared memory systems. An excellent tutorial for MPI can be found in Barney. The OpenMP framework is designed for shared memory MIMD systems and accomplishes parallelism by using threads, where a thread is the smallest unit of processing that can be scheduled by the operating system. It uses the fork-join model of parallel execution: A master thread creates a team of parallel threads (fork). The directives of the program enclosed in the parallel region (for example in a loop) are then executed in parallel. All threads have unrestricted access to the same shared memory, which is also used for communication between the threads. When the team’s threads complete the execution of the parallel region they synchronize and terminate, leaving only the master thread (join). The fork-join model is schematically displayed in Figure 18.3
  • VLSI Design
    eBook - ePub
    • M. Michael Vai(Author)
    • 2017(Publication Date)
    • CRC Press
      (Publisher)
    Even a strictly sequential architecture can be improved by embedding a special form of parallel processing called pipelining. For example, microprocessors can use a pipeline to streamline the multiple phases of a CPU cycle (e.g., fetch instruction, decode instruction, fetch operand, execution, store data, etc.) Pipelining borrows the idea from an assembly line. The principle is to divide the task into a series of sequential subtasks, each of which is executed in a hardware stage that operates concurrently with other stages in the pipeline. Assuming a continuous stream of tasks, a pipeline can improve the throughput of a system by overlapping the subtasks of multiple tasks. Pipelining is an important technique in VLSI architecture and we will discuss its details in this chapter.
    According to the taxonomy that uses instruction and data streams to classify parallel architectures, four architecture classes are possible. We have already explained the SISD architecture. Two other practical parallel architectures are SIMD (single instruction stream multiple datum stream) and MIMD (multiple instruction stream multiple datum stream). A general diagram of a parallel architecture is shown in Fig. 11.2 . A number of PEs are interconnected through an interconnection network for the purpose of communication. Typically a parallel architecture operates as a back-end co-processor and its interface to the world is handled by a host processor. In Fig. 11.2 the host processor is shown to communicate with the PEs through an interconnection network.
    Fig. 11.2 Conceptual view of a parallel architecture.
    Due to the complexity of an MIMD architecture, most general-purpose parallel computers have adopted the SIMD architecture. Recently, expensive general-purpose parallel computers have been largely replaced by an approach called distributed computing, which uses a group of networked computers to form an MIMD architecture. While it is theoretically possible to configure an MISD (multiple instruction stream single datum stream) architecture, its usefulness is extremely limited.
    Many derivations from the parallel architecture shown in Fig. 11.2
  • The Electrical Engineering Handbook
    Chapter 1 , “Computer Architecture,” by Morris Chang, provides an introduction to computer architecture, including microprogramming, memory hierarchy in computer systems, and input and output systems. A computer system consists of processor(s), main memory, clocks, terminals, disks, network interface, and input/output devices. The power of computation can be maximized via a systematic and seamless integration of hardware cores, operating systems, and application softwares. Throughout the 70s, microprogramming computer architecture design was the most dominant approach, and it had a fundamental influence on early development of computing systems. In general, the hierarchy of systems using microprogramming is divided into application softwares, operating systems, machine language, microprogramming, and physical devices. Since the early 80s, the ever-increasing processing power offered by VLSI technology (as governed by the Moore’s law) has fundamentally changed the computer design concept. The popularity of reduced instruction set computing (RISC) has virtually eliminated the need for microprogramming. More generally, computer architectures have undergone a rapid change driven by the VLSI, deep-submicron, and nanoscale device technologies.
    Chapter 2 is “Multiprocessors,” by Peter Y. K. Cheung, G. A. Constantinides, and Wayne Luk. Multiprocessors are categorized by Flynn into SISD (single instruction single data), SIMD (single instruction multiple data), MISD (multiple instruction single data), and MIMD (multiple instruction multiple data) machines. The chapter addresses fundamental issues such as how multiple processors share data, how cache memories on different processors maintain their data correctly, and how multiple processors coordinate with each other. In addition, the following two performance goals of multiprocessor system are explored: (l) increased throughput for independent tasks distributed among a number of processors, and (2) faster execution of a single task on multiple processors. The organization and programming language for the multiprocessor systems vary significantly, depending on the goal.
    Chapter 3