Computer Science

CPU Performance

CPU performance refers to the speed and efficiency at which a computer's central processing unit (CPU) can execute instructions. It is measured in clock speed, which is the number of cycles per second that the CPU can perform, and in instructions per clock (IPC), which is the number of instructions that can be executed in a single clock cycle. Improving CPU performance is a key goal in computer hardware design.

Written by Perlego with AI-assistance

7 Key excerpts on "CPU Performance"

  • IT Career JumpStart
    eBook - ePub

    IT Career JumpStart

    An Introduction to PC Hardware, Software, and Networking

    • Naomi J. Alpern, Joey Alpern, Randy Muller(Authors)
    • 2011(Publication Date)
    • Sybex
      (Publisher)
    The goal of processor performance is to make applications run faster. Performance is commonly defined by how long it takes for a specific task to be executed. Traditionally, processor performance has been defined as how many instructions can be completed in each clock cycle, or instructions per clock (IPC), times the number of clock cycles. Thus, performance is measured as
    IPC × Frequency
    Processor Types: A First Look
    So many types of computer processors, also referred to as microprocessors, are on the market today that it can be quite confusing to wade through them all. All processors are not created equal, and each processor has its own characteristics that make it unique. For instance, a processor that is built around an architecture common to other processors of the same time period may actually operate at double or triple the speed. Fierce competition among the various chip makers lays the groundwork for new technological innovations and constant improvements.
    The most obvious difference among processors is the physical appearance of the chips, meaning that many processors differ noticeably from one another in size and shape. The first processor that Intel released was packaged in a small chip that contained two rows of 20 pins each. As processor technology improved, the shape and packaging scheme of the processor also changed. Modern processors, such as the Intel Core i7 class processors, use the same socket as the Xeon processors and can only be placed on the motherboard , which has the appropriate socket. This design also reduces the cost involved in producing the CPU .
    motherboard The main board in a computer that manages communication between devices internally and externally.
    central processing unit (CPU) The microprocessor, or brain, of the computer. It uses logic to perform mathematical operations that are used to manipulate data.
    Another noticeable difference among processors is the type of instruction set they use. The instruction sets that are most common to processors are either Complex Instruction Set Computing (CISC) or Reduced Instruction Set Computing (RISC)
  • Computer Principles and Design in Verilog HDL
    • Yamin Li(Author)
    • 2015(Publication Date)
    • Wiley
      (Publisher)
    store for example, to read and write I/O data. We say that these kinds of CPUs adopt a memory-mapped I/O address space.

    1.3 Improving Computer Performance

    Computer performance has improved incredibly since the first electronic computer was created. This rapid improvement came from the advances in IC technology used to build computers and the innovation in computer design. This section describes computer performance evaluation, trace-driven simulation, and high-performance parallel computers.

    1.3.1 Computer Performance Evaluation

    If we focus only on the execution time of real programs, then we can say that the shorter the execution time, the higher the performance. Therefore, we simply define the performance as the reciprocal of the time required. To calculate the execution time of a program, we have the following equation:
    where I is the number of executed instructions of a program, CPI is the average clock cycles per instruction, and TPC is the time per clock cycle which is the reciprocal of the clock frequency (F).
    Many researchers around the world are trying to improve computer performance by reducing the value of each of the three terms in the expression above. Architecture designers and compiler developers are trying to reduce the number of required instructions (I) of a program. Architecture and CPU designers are trying to decrease the CPI. And CPU designers and IC engineers are trying to increase the clock frequency (F).
    Note that these three parameters are not independent. For example, CISC CPUs may reduce the instruction count I by providing complex instructions, but it may result in an increase of CPI; RISC CPUs may decrease CPI and TPC, but it may cause the increase of I.
  • Programming for Problem-solving with C
    eBook - ePub

    Programming for Problem-solving with C

    Formulating algorithms for complex problems (English Edition)

    The clock speed refers to the rate at which the CPU can execute instructions. It is measured in Hz (Hertz). A clock governs CPU operation. The CPU retrieves and carries out one instruction with each tick of the clock. Cycles per second are the unit of measurement for the speed of the clock, and one hertz is equal to one cycle per second. When a CPU has a greater clock speed, it can process instructions at a faster rate. Processor running at 3.6 GHz executes 3.6 billion cycles per second. The speed of the older processors was measured in megahertz millions of cycles per second).
    The total number of processor cores
    The CPU is comprised of an element known as the core. A CPU typically consists of a single-core processor. Most contemporary central processing units feature two, four, or even more cores. For instance, a dual-core CPU contains two cores, whereas a quad-core CPU contains four cores. A single-core processor can only fetch and carry out one instruction at a time, whereas a dual-core processor can fetch and carry out two instructions at a time. A CPU with four cores can execute even more instructions in the same amount of time than a processor with only two cores.
    Cache memory
    A cache is a short-sized memory that operates at a very fast speed and is located on the CPU. It contains the data and instructions that are used again and again. The larger the cache is, the faster the frequently used instructions and data may be transferred into the processor and utilized. This is because more space is dedicated to storing them in the cache.
    The memory unit
    Memory units are responsible for not only storing data but also instructions or programs. The data and the program are both retrieved from memory by the central processing unit (CPU), which then performs operations on them (processes them). If there are any intermediate results produced, these will also be preserved in memory. The final result that is generated by the CPU is saved in the memory. Figure 2.7
  • The Art of Writing Efficient Programs
    A high-performing program cannot be described so simply because performance can be defined only with respect to specific targets. Nonetheless, in this book, and in particular, in this chapter, we are largely concerned with the computational performance or throughput: how fast can we solve a given problem with the hardware resources we have? This type of performance is closely related to efficiency: our program will deliver the result faster if every computation it executes brings us closer to the result, and, at every moment, we do as much computing as possible. This brings us to the next question: just how much computing can be done, say, in one second? The answer, of course, will depend on what hardware you have, how much of it, and how efficiently your program can use it. Any program needs multiple hardware components: processors and memory, obviously, but also networking for any distributed program, storage, and other I/O channels for any program that manipulates large amounts of external data, possibly other hardware, depending on what the program does. But everything starts with the processor, and so, perforce, does our exploration of high-performance programming. Furthermore, in this chapter, we will limit ourselves to a single thread of execution; concurrency will come later. With this narrower focus, we can define what this chapter is about: how to make the best use of the CPU resources using a single thread. To understand this, we first need to explore what are the resources that a CPU has. Of course, different generations and different models of processors will have a different assortment of hardware capabilities, but the goal of this book is two-fold: first, to give you a general understanding of the subject, and second, to equip you with the tools necessary to acquire more detailed and specific knowledge. The general overview of the computational resources available on any modern CPU can be summarized, unfortunately, as it's complicated
  • vSphere High Performance Cookbook - Second Edition
    • Kevin Elder, Christopher Kusek, Prasenjit Sarkar(Authors)
    • 2017(Publication Date)
    • Packt Publishing
      (Publisher)

    CPU Performance Design

    In this chapter, we will cover the tasks related to CPU Performance design. You will learn the following aspects of CPU Performance design:
    • Critical performance consideration - VMM scheduler
    • CPU scheduler - processor topology/cache-aware
    • Ready time - warning sign
    • Spotting CPU overcommitment
    • Fighting guest CPU saturation in SMP VMs
    • Controlling CPU resources using resource settings
    • What is most important to monitor in CPU Performance
    • CPU Performance best practices
    Passage contains an image

    Introduction

    Ideally, a performance problem should be defined within the context of an ongoing performance management process. Performance management refers to the process of establishing performance requirements for applications in the form of a service-level agreement (SLA ) and then tracking and analyzing the achieved performance to ensure that those requirements are met. A complete performance management methodology includes collecting and maintaining baseline performance data for applications, systems, and subsystems, for example, storage and network.
    In the context of performance management, a performance problem exists when an application fails to meet its predetermined SLA. Depending on the specific SLA, the failure might be in the form of excessively long response times or throughput below some defined threshold.
    ESXi and virtual machine (VM) performance tuning are complicated because VMs share the underlying physical resources, in particular, the CPU.
    Finally, configuration issues or inadvertent user errors might lead to poor performance. For example, a user might use a symmetric multiprocessing (SMP ) VM when a single processor VM would work well. You might also see a situation where a user sets shares but then forgets about resetting them, resulting in poor performance because of the changing characteristics of other VMs in the system.
    If you overcommit any of these resources, you might see performance bottlenecks. For example, if too many VMs are CPU-intensive, you might experience slow performance because all the VMs need to share the underlying physical CPU.
  • Computer Systems Performance Evaluation and Prediction
    • Paul Fortier, Howard Michel(Authors)
    • 2003(Publication Date)
    • Digital Press
      (Publisher)
    3

    Fundamental Concepts and Performance Measures

    3.1 Introduction

    Computer systems architects and designers look for configurations of computer systems elements so that system performance meets desired measures. What this means is that the computer system delivers a quality of service that meets the demands of the user applications. But the measure of this quality of service and the expectation of performance vary depending on who you are. In the broadest context we may mean user response time, ease of use, reliability, fault tolerance, and other such performance quantities. The problem with some of these is that they are qualitative versus quantitative measures. To be scientific and precise in our computer systems performance studies, we must focus on measurable quantitative qualities of a system under study.
    There are many possible choices for measuring performance, but most fall into one of two categories: system-oriented or user-oriented measures. The system-oriented measures typically revolve around the concepts of throughput and utilization. Throughput is defined as the average number of items (e.g., transactions, processes, customers, jobs, etc.) processed per unit of measured time. Throughput is meaningful when we also know information about the capacity of the measured entity and the presented workload of items at the entity over the measured time period. We can use throughput measures to determine systems capacity by observing when the number of waiting items is never zero and determining at what level, based on the system’s presented workload, the items never wait. Utilization is a measure of the fraction of time that a particular resource is busy. One example is CPU utilization. This could measure when the CPU is idle and when it is functioning to perform a presented program.
    The user-oriented performance measures typically include response time or turnaround time. Response time and turnaround time refer to a view of the system’s elapsed time from the point a user or application initiates a job on the system and when the job’s answer or response is returned to the user. From this simple definition it can readily be seen that these are not clear, unambiguous measures, since there are many variables involved. For example, I/O channel traffic may cause variations in the measure for the same job, as would operating systems load, or CPU loads. Therefore, it is imperative that if this measure is to be used, the performance modeler must be unambiguous in his or her definition of this measure’s meaning. These user measures are all considered random, and, therefore, their measures are typically discussed in terms of expected or average values as well as variances from these values.
  • Computer Systems Architecture
    The first value in the formula defines the size of the program, that is, the number of instruction to be executed. This value is driven from the program written by the developers, and cannot be changed by the hardware. In this sense, the hardware assumes that the number of instructions to be executed is a constant number. Of course, for a different input, the number of instructions executed may differ. If there is a need to decrease this number, for example, due to long execution times, the program will have to be analyzed and the time-consuming portion will have to be rewritten using a more efficient algorithm. Alternatively, using a more sophisticated compiler may produce a shorter code, in terms of number of instructions executed. The compiler that is responsible for converting the high-level programming language instructions to machine-level instructions may sometimes speed up the times, for example, by eliminating redundant pieces of code or a better registers usage.
    The second value in the formula (CPI ratio) is an important enhancement factor that has changed over the years in order to increase execution speed. Reducing the number of cycles required for a single instruction has a direct effect on the processor’s performance. During the 1980s, the average CPI was five; that is, for executing a single machine instruction, five cycles were required. Modern processors, on the other hand, have a CPI of less than one, which means the processor is capable of running several instructions in parallel during the same cycle.
    The third value (cycle time) is another important enhancement factor addressed by many computer manufacturers. During the past three decades, the clock rate increase is over three orders of magnitude. In the last decade, however, the trend of reducing the cycle time was replaced by the trend of increasing the number of processors or cores. Combined with the software engineering trends of using threads,* the multiple execution units provide much better performance enhancements.
    Following this brief explanation of performance, we can proceed to a more general discussion.
    If processor X is said to be n times faster than processor Y, it means that the performance of X is n times the performance of Y. However, since performance and execution times are inversely proportional, the execution time on processor Y will be n times the execution time of processor X.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.