Computer Science

Cache Memory

Cache memory is a type of high-speed memory that stores frequently used data for quick access. It is located closer to the CPU than main memory, allowing the processor to access data more quickly. The use of cache memory can significantly improve the performance of a computer system.

Written by Perlego with AI-assistance

Related key terms

12 Key excerpts on "Cache Memory"

eBook - ePub
Foundations of Computing
Essential for Computing Studies, Profession And Entrance Examinations - 5th Edition
- Pradeep K. Sinha, Priti Sinha(Authors)
- 2022(Publication Date)
- BPB Publications
  (Publisher)
When the CPU attempts to read a memory word (instruction or data) during execution of a program, the system checks whether the word is in the cache. If so, the system delivers the word to the CPU from the cache. If not, the system reads a block of main memory, consisting of some fixed number of words including the requested word, into the cache and then delivers the requested word to the CPU. Because of the feature of locality of reference, when the system fetches a block of memory words into the cache to satisfy a single memory reference, it is likely that there will soon be references to other words in that block. That is, the next time the CPU attempts to read a word, it is likely that it finds it in the cache and saves the time needed to read the word from main memory. You might think that the odds of the CPU finding the word it needs in the cache are small, but statistics shows that in more than 90% of time, the needed word is available in the cache. As the name implies, Cache Memory is a memory in hiding (the word “cache” literally means a hiding place for treasure or stores) and is not addressable by normal users of the computer system. The hardware transfers the needed instructions/data between the cache and main memory without any programmer intervention. In fact, application programmers are unaware of its presence and use. Figure 4.7. Illustrating the operation of Cache Memory. Figure 4.7 illustrates the general concept of Cache Memory usage. Actual systems may have some variations. For example, many computer systems have two separate cache memories called instruction cache and data cache. They use instruction cache for storing program instructions and data cache for storing data. This allows faster identification of availability of accessed word in Cache Memory and helps in further improving system's performance. Many computer systems also have multiple levels of caches (such as level one and level two caches, often referred to as L1 and L2 caches)
Sign up to read
Learn more about book
eBook - ePub
Modern Computer Architecture and Organization
- Jim Ledin, Dave Farley(Authors)
- 2022(Publication Date)
- Packt Publishing
  (Publisher)
Cache Memory is a high-speed memory region (compared to the speed of main memory) that temporarily stores program instructions or data for future use. Usually, these instructions or data items have been retrieved from main memory recently and are likely to be needed again shortly.

The primary purpose of Cache Memory is to increase the speed of repeatedly accessing the same memory location and nearby memory locations. To be effective, accessing the cached items must be significantly faster than accessing the original source of the instructions or data, referred to as the backing store .

When caching is in use, each attempt to access a memory location begins with a search of the cache. If the requested item is present, the processor retrieves and uses it immediately. This is called a cache hit . If the cache search is unsuccessful (a cache miss ), the instruction or data item must be retrieved from the backing store. In the process of retrieving the requested item, a copy is added to the cache for anticipated future use.
Cache Memory is used for a variety of purposes in computer systems. Some examples of Cache Memory applications are:
- Translation lookaside buffer (TLB) : The TLB, as we saw in Chapter 7 , Processor and Memory Architectures , is a form of Cache Memory used in processors supporting paged virtual memory. The TLB contains a collection of virtual-to-physical address translations that speed up access to page frames in physical memory. As instructions execute, each main memory access requires a virtual-to-physical translation. Successful searches of the TLB result in much faster instruction execution compared to the page table lookup process following a TLB miss. The TLB is part of the MMU and is not directly related to the varieties of processor Cache Memory discussed later in this section.
- Disk drive caches : Reading and writing the magnetized platters of rotating disk drives is orders of magnitude slower than accessing dynamic RAM (DRAM ) devices. Disk drives generally implement Cache Memory to store the output of read operations and to temporarily hold data in preparation for writing. Drive controllers often store more data than the quantity originally requested in internal Cache Memory with the expectation that future reads will request data adjacent to the initial request. If this turns out to be a correct assumption, which it often is, the drive can satisfy the second request immediately from cache without the delay associated with accessing the disk platters.
Sign up to read
Learn more about book
eBook - ePub
DSP Software Development Techniques for Embedded and Real-Time Systems
- Robert Oshana(Author)
- 2006(Publication Date)
- Newnes
  (Publisher)
C Cache Optimization in DSP and Embedded Systems
A cache is an area of high-speed memory linked directly to the embedded CPU. The embedded CPU can access information in the processor cache much more quickly than information stored in main memory. Frequently-used data is stored in the cache.

There are different types of caches but they all serve the same basic purpose. They store recently-used information in a place where it can be accessed very quickly. One common type of cache is a disk cache. This cache model stores information you have recently read from your hard disk in the computer’s RAM, or memory. Accessing RAM is much faster than reading data off the hard disk and therefore this can help you access common files or folders on your hard drive much faster. Another type of cache is a processor cache which stores information right next to the processor. This helps make the processing of common instructions much more efficient, and therefore speeding up computation time.

There has been historical difficulty in transferring data from external memory to the CPU in an efficient manner. This is important for the functional units in a processor as they should be kept busy in order to achieve high performance. However, the gap between memory speed and CPU speed is increasing rapidly. RISC or CISC architectures use a memory hierarchy in order to offset this increasing gap and high performance is achieved by using data locality.

Principle of locality

The principle of locality says a program will access a relatively small portion of overall address space at any point in time. When a program reads data from address N, it is likely that data from address N+1 is also read in the near future (spatial locality) and that the program reuses the recently read data several times (temporal locality). In this context, locality enables hierarchy. Speed is approximately that of the uppermost level. Overall cost and size is that of the lowermost level. A memory hierarchy from the top to bottom contains registers, different levels of cache, main memory, and disk space, respectively (Figure C.1
Sign up to read
Learn more about book
eBook - ePub
Embedded Systems
A Contemporary Design Tool
- James K. Peckol(Author)
- 2019(Publication Date)
- Wiley
  (Publisher)
These are known as secondary memory and are shown in the diagram by the block on the left. At the bottom are the smallest, fastest memories called Cache Memory; these are typically higher speed SRAMs. These devices also tend to be the most expensive. In the middle of the hierarchy is main or prima ry memory. These are either lower speed SRAM devices or, more commonly, DRAM memories. CPU registers are sometimes included in the ranking as higher speed memory than cache. The motivation for building a memory system as a hierarchical collection of different kinds of memories is that we would prefer an application program to execute as quickly as possible. Accessing memory takes time; each access contributes to the time required to execute an instruction that can have a significant negative impact on real‐time performance in an embedded application. We will not consider secondary storage; the typical embedded applications will not use this. The discussion here will focus on main memory and cache, the last two blocks on the right. These can be implemented using (variations on) the designs presented in the previous sections. 4.15 Basic Concepts of Caching icache, dcache Cache is a small, fast memory that temporarily holds copies of block data and program instructions from the main memory. The increased speed of Cache Memory over that of main memory components offers the prospective for programs to execute much more rapidly if the instructions and data can be held in cache. Many of today's higher performance microprocessors, implemented around the Harvard architecture, will internally support both an icache (instruction cache) and a dcache (data cache). We will now examine the concept of caching in greater detail. We will look first at the ideas behind caching, what cache is, why it works, and some of the potential difficulties encountered in embedded applications
Sign up to read
Learn more about book
eBook - ePub
Modern Computer Architecture and Organization
Learn x86, ARM, and RISC-V architectures and the design of smartphones, PCs, and cloud servers
- Jim Ledin(Author)
- 2020(Publication Date)
- Packt Publishing
  (Publisher)
Chapter 8 : Performance-Enhancing Techniques
The fundamental aspects of processor and memory architectures discussed in previous chapters enable the design of a complete and functional computer system. However, the performance of such a system would be poor compared to most modern processors without the addition of features to increase the speed of instruction execution.

Several performance-enhancing techniques are employed routinely in processor and system designs to achieve peak execution speed in real-world computer systems. These techniques do not alter what the processor does in terms of program execution and data processing; they just help get it done faster.

After completing this chapter, you will understand the value of multilevel Cache Memory in computer architectures and the benefits and challenges associated with instruction pipelining. You'll also understand the performance improvement resulting from simultaneous multithreading and the purpose and applications of single instruction, multiple data processing.
The following topics will be covered in this chapter:

Cache Memory

Instruction pipelining

Simultaneous multithreading

SIMD processing

Cache Memory

A Cache Memory is a memory region that stores program instructions or data, usually instructions or data that have been accessed recently, for future use. The primary purpose of Cache Memory is to increase the speed of repeatedly accessing the same memory location or nearby memory locations. To be effective, accessing the cached data must be significantly faster than accessing the original source of the data, referred to as the backing store .

When caching is in use, each attempt to access a memory location begins with a search of the cache. If the data is present, the processor retrieves and uses it immediately. This is called a cache hit . If the cache search is unsuccessful (a cache miss
Sign up to read
Learn more about book
eBook - ePub
Computer Systems Architecture
- Aharon Yadin(Author)
- 2016(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
6 Cache Memory Cache Memory
This chapter focuses on Cache Memory. By using the general architecture figure, we can relate to the Cache Memory and its contribution to system performance (Figure 6.1 ).

As stated in the previous chapter, Cache Memory is an important layer in the memory hierarchy, and its main contribution is in improving the execution speed. The memory hierarchy is depicted once again in Figure 6.2 , but this time the emphasis is on the sizes of the various levels of the hierarchy. The slowest and largest level (as far as capacity is concerned) is the disks. Currently, the standard disks used in personal computers (PCs) have a capacity that starts with several hundreds of gigabytes and goes up to several terabytes. Furthermore, utilizing cloud computing, in which the system’s resources reside on remote servers, the disk capacity increases significantly. The main memory (random access memory [RAM]), which represents the second level, has a standard capacity ranging from several gigabytes up to hundreds of gigabytes. The Cache Memory, which is the next level, is usually divided into several components, each with a different purpose and a different size. The last level of the memory hierarchy is the registers which usually are very limited.

The RAM described in the previous chapter is used for storing programs and data. There is another memory component called read-only memory (ROM), which is used by the operating system and the hardware and is intended for components (programs and/or data) that do not change frequently. Despite its name, some of the currently available ROMs can be changed; sometimes, a special recording device is required. Even so, their main use remains for special operating systems or hardware functions. As such, ROM is not available for standard computer programs.

One of the important attributes of ROM is the fact it is a nonvolatile memory, which means it retains its content even if the power is switched off. For that reason, ROM is used, for example, by the boot programs that are responsible for bringing the system up. Other components stored in the ROM are programs or data required for managing some input and output devices. Usually, these types of data will not be modified during the life span of the device. In modern computers, some of the ROM is replaced by flash memory, which is a nonvolatile device that can be rewritten if the need arises.
Sign up to read
Learn more about book
eBook - ePub
Computer Architecture and Security
Fundamentals of Designing Secure Computer Systems
- Shuangbao Paul Wang, Robert S. Ledley(Authors)
- 2012(Publication Date)
- Wiley
  (Publisher)
Modern computers usually use random access memory (RAM) as the main memory. Dynamic RAM (DRAM) is most often seen on personal computers as it is cheaper and can be highly integrated due to its lower power consumption. DRAM need to be refreshed periodically to avoid data loss. Static RAM (SRAM) is faster than DRAM but with less integration rate. It is also more expensive so it is commonly seen on servers or special, fast computers.

3.3 Cache Memory

We know registers are a special type of memory that offer the fastest speed but with very limited numbers. On the other hand, memory (RAM) is much cheaper compared with registers and can be integrated in large quantities with easy access. But its speed is slower. To fill the gap, there is another type of memory called Cache Memory. A memory hierarchy of computer systems is shown in Figure 3.6 .

Figure 3.6 A memory hierarchy of computer systems

The cache is a small amount of fast memory that sits between the processor and memory to bridge the speed gap between the CPU and main memory (Hwang, 1993). Cache is much smaller than the main memory. The working mechanism for the Cache Memory is to prefetch the data from the main memory and make them handy when the processor needs them. If the prediction is accurate then the processor can get the data directly from the fast Cache Memory without requiring the main memory to be accessed.

It is not surprising people would ask why Cache Memory works and how to predict the data needed before executing the program. Let us look at the “block” idea. Suppose we want to add two matrices with M row and N column together, we need to add M × N times. If the data are all in cache, we call it a read hit , then it would save more time getting data with the cache than sending the address to the address buffer (MAR) and waiting for data from the data buffers (MDR) with every add operation.

If the data that the processor is requesting are not in the cache, we call a read miss
Sign up to read
Learn more about book
eBook - ePub
Computer Principles and Design in Verilog HDL
- Yamin Li(Author)
- 2015(Publication Date)
- Wiley
  (Publisher)
Chapter 11 Memory Hierarchy and Virtual Memory Management
Memory is a temporary place for storing programs (instructions and data). It is commonly implemented with dynamic random access memory (DRAM). Because DRAM is slower than the CPU (central processing unit), an instruction cache and a data cache are fabricated inside the CPU. Not only the caches but also TLBs (translation lookaside buffers) are fabricated for fast translation from a virtual address to a physical memory address.

This chapter describes the memory structures, cache organizations, virtual memory management, and TLB organizations. The mechanism of the TLB-based MIPS (microprocessor without interlocked pipeline stages) virtual memory management is also introduced.

11.1 Memory

A computer consists of a CPU, the memory, and I/O interfaces. Memory is used to store programs that are being executed by the CPU. There are many types of memory, but we discuss only the following four types of memory in this book.

SRAM (static random access memory), which is fast and expensive, is used to design caches and TLBs. Some high-performance computers also use it as the main memory.

DRAM, which is large and inexpensive, is mainly used as the computer's main memory.

ROM (read-only memory), which is nonvolatile and cheap, is typically used to store the computer's initial start-up program or firmware in embedded systems.

CAM (content addressable memory), which is a very special memory, is mainly used to design a fully associative cache or TLB.

Except for ROM, all memories are volatile. It means that when the power supply is off, the contents in the memory will be lost. The contents in such memories are not usable when the power supply is just turned on. Therefore, there must be a ROM in a computer or embedded system.

“Random access” means that any location of the memory can be accessed directly by providing the address of that location. There are some other types of memory that cannot be accessed randomly, the FIFO (first-in first-out) memory, for instance.
Sign up to read
Learn more about book

eBook - ePub

Programming for Problem-solving with C

Formulating algorithms for complex problems (English Edition)

Dr. Kamaldeep(Author)
2023(Publication Date)
BPB Publications
(Publisher)

Cache Memory is located between ultra-fast registers and main memory. It holds the frequently used data that are used by the CPU again and again. It is made up of SRAM chips. Holding repeatedly required data avoids accessing slower memory (DRAM—main memory) by the CPU, which enhances the computer’s performance as SRAM chips are faster than DRAM chips. The Cache Memory is generally divided into levels:

L1 cache : Present on the CPU chip (Internal cache).
L2 cache : Built outside the CPU, on the motherboard, the size is greater than L1.
L3 cache : Extra cache, not normally used, built outside of the CPU on the motherboard. L3 is larger than the L1 and L2 cache but faster than the main memory.

The sizes of Cache Memory are generally in KB and MB. The connection of memories with the CPU is given in Figure 2.16 (address, data, and control buses).

Figure 2.16: Connection of memories with CPU

Main memory: It can hold data in GB (Giga Byte). The modern computer has 4GB, 8 GB, and 16 GB of RAM. The main memory (RAM) is also present inside the computer and cannot be separated from it. It is present on the motherboard.
Secondary memory: It can hold data in TB (Tera Byte). The modern compiler has 1TB, 2 TB, and 4 TB storage. It can be internal (online) or external (offline). It is treated as infinite memory because it can be added to the computer when required.

Measuring the memory

The smallest unit for measuring memory is a bit. One bit means 0 or 1. The combination of four words is known as the nibble, and the combination of eight words is a byte. The following Table 2.1 contains the detail of memory units.

Name	Description	In base 2	In base 10	Symbol
1 Bit (Binary Digit)	0 or 1	0 or 1	–	Bit
1 Nibble	4 bits	2 2 Bits	–	Nibble
1 Byte	8 bits	2 3 Bits	–	B
1 Kilobyte	1,024 Byte	2 10 Bytes	10 3 Bytes	KB
1 Megabyte	1,024 KB	2 20 Bytes	10 6 Bytes	MB
1 Gigabyte	1,024 MB	2 30 Bytes	10 9 Bytes	GB
1 Terabyte	1,024 GB	2 40 Bytes	10 12 Bytes	TB
1 Petabyte	1,024 TB

Learn more about book

eBook - ePub
Digital System Design - Use of Microcontroller
- Shenouda Dawoud, R. Peplow(Authors)
- 2022(Publication Date)
- River Publishers
  (Publisher)
6 System Memory THINGS TO LOOK FOR…

The memory as the centre of the computing system model

The different types and classes of memories

Semiconductor memories-SRAM and DRAM

Memory interfacing

Timing diagrams

AVR and Intel microcontroller memory systems

6.1 Introduction

Any computing system (microprocessor or microcontroller-based or general purpose computer) can be defined as a state machine that combines three components (Figure 6.1 ); memory, processor, and an I/O system. The memory is the centre of this model. It is possible, because of this centrality, to describe the computer as a memory–centered system. The memory is the space that holds information consists of programmes and data. The programme space stores the instructions of the programme in execution, the operating system, compiler, and other system software. In this model, the information in memory at any instant represents the process state at that instant. The information (data and instructions) flow to the processor (the logic in the figure) where it is processed and modified according to the instructions, and then the new modified data (and in some designs the instructions) returns back to the memory. Storing the new data in the memory updates the system state. This flow is shown in Figure 6.1 as state machine. The data arriving to the computer system from its inputs have to be stored at first at the memory to become part of its contents. Any information needed from the computer must come from the information stored in the memory and the system uses the output devices to provide the external world with the needed information.

Figure 6.1 Computer as State machine.

As the centre of the computer system, the memory has a significant effect on many of the design metrics discussed in Chapter 1 :

Performance: The use of slow-memory (i.e. large access time) degrades the overall performance of the system. Slow-memory may create “memory-bottlenecks”, which causes the processor to work below its performance capabilities.
Sign up to read
Learn more about book
eBook - ePub
Computer Architecture
Fundamentals and Principles of Computer Design, Second Edition
- Joseph D. Dumas II(Author)
- 2016(Publication Date)
- CRC Press
  (Publisher)
Figure 2.3 .

Notice that the upper levels of the hierarchy are the fastest (most closely matched to the speed of the computational hardware) but the smallest in terms of storage capacity. This is often due at least somewhat to space limitations, but it is mainly because the fastest memory technologies, such as SRAM, are the most expensive. As we move down the hierarchy, lower levels are composed of slower but cheaper and higher density components, so they have larger storage capacities. This varying capacity of each level is symbolized by drawing the diagram in the shape of a triangle.

Figure 2.2 Memory hierarchy (conceptual).

Figure 2.3 Memory hierarchy (typical of modern computer systems).

Because the higher levels of the memory hierarchy have smaller capacities, it is impossible to keep all the information (program code and data) we need in these levels at one time. In practice, each higher level of the hierarchy contains only a subset of the information from the levels below it. The fundamental idea underlying the hierarchical memory concept is that we want to make as many accesses (as a percentage of the total) as we can to the upper levels of the hierarchy while only rarely having to access the lower levels, such that the resulting, overall memory system (taking into account all devices) approaches the speed of the highest levels while maintaining a capacity and cost per gigabyte approximating that of the lowest levels (the secondary storage devices). This requires a complex and well-thought-out design of which, for best acceptance, the details should be hidden from the end user. As much as possible, only the system designers should have to deal with the details of managing the memory system for optimal performance. However, if one is to be responsible for specifying computer systems whose performance is important or for developing code to run in such an environment, it is worthwhile to study the techniques used to optimize memory systems.
Sign up to read
Learn more about book
eBook - ePub
Fundamentals of Parallel Multicore Architecture
- Yan Solihin(Author)
- 2015(Publication Date)
- Chapman and Hall/CRC
  (Publisher)
For decades, the increase in CPU speed has been much faster than the decrease in the access latency of the main memory. Up until roughly 2001-2005, CPU speed as measured in its clock frequency grew at the rate of 55% annually, while the memory speed grew at the rate of only 7% annually [24]. This speed gap produced an interesting implication. While in the past, a load instruction could get the needed datum from main memory in one CPU clock cycle, in recent systems it requires hundreds of processor clock cycles to get a datum from the main memory. Dependences between a load instruction (producer) and instructions that use the loaded value (consumers) dictate that the consumer instructions must wait until the load obtains its datum before they can execute. With the latency of loading datum from the main memory in the order of hundreds of cycles, the CPU may stall for much of that time because it runs out of instructions that are not dependent on the load. Hence, it is critical to performance that most data accesses are supplied to the CPU with low latencies. Caches provide such support.

A cache is a relatively small memory for keeping data that is likely needed by the requestor. The concept of a cache is universal since it can be used as a software construct or a hardware component. In this chapter, we focus on hardware caches that exist between the processor and the main memory.

An example of a memory hierarchy is shown in Figure 5.1 . It shows a configuration in which there are twelve processor cores on a chip. Each core has a private Level 1 (L1) data cache and a Level 1 instruction cache. Each core has a Level 2 (L2) cache that holds both instructions and data (referred to as “e.g., unified”). There is a Level 3 (L3) cache that is shared by all cores. Due to its size, the L3 cache may be banked, and each bank may be local to each core, but remote to other cores, meaning that it is accessible by all cores but at differing latencies. The typical range of access latencies in 2013 (in terms of CPU clock cycles) and capacity of each cache is shown in the figure. After the L3 cache, there may be an off-die L4 cache and the main memory.

Figure 5.1: A memory hierarchy configuration in a multicore system in 2013.

The example in the figure is similar to the memory hierarchy of the IBM Power8 processor. In the Power8, each core has 4-way simultaneous multithreading (SMT ), which means that it can execute four threads simultaneously by fetching from two different program counters. Most of the processor core resources such as register files and functional units are shared by the four threads. A Power8 die also has twelve cores, so there are a total of 48 threads that can run simultaneously. Each of the cores has a 32KB L1 instruction cache and a 64KB L1 data cache. Each core also has a private 512KB L2 cache, so in total the L2 caches have 6MB of capacity. Both the L1 and L2 caches use SRAM cells. The L3 cache is 12-way banked, and each bank has an 8MB capacity, for a total of 96MB over all banks. The L4 cache is located off the die on the memory buffer controller, which is connected to the main memory. The L3 and L4 caches are implemented on DRAM on logic process, a technology referred to as embedded DRAM (eDRAM
Sign up to read
Learn more about book

Learn about this page

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

View all