<<< EISNER::$2$DIA6:[NOTES$READONLY]MICRONOTE.NOTE;1 >>> -< TOEM MicroNotes >- ================================================================================ Note 9.0 LSI-11/73 Cache Concepts No replies JAWS::KAISER 307 lines 25-MAR-1985 09:18 -------------------------------------------------------------------------------- +---------------+ +-----------------+ | d i g i t a l | | uNOTE # 009 | +---------------+ +-----------------+ +----------------------------------------------------+-----------------+ | Title: Cache Concepts and the LSI-11/73 | Date: 02-JUL-84 | +----------------------------------------------------+-----------------+ | Originator: Charlie Giorgetti | Page 1 of 6 | +----------------------------------------------------+-----------------+ The goal is to introduce the concept of cache and its particular implementation on the LSI-11/73 (KDJ11-A). This is not a detailed discussion of the different cache organizations and their impact on system performance. What Is A Cache ? ----------------- The purpose of having a cache is to simulate a system having a large amount of moderately fast memory. To do this the cache system relies on a small amount of very fast, easily accessed memory (the cache), a larger amount of slower, less expensive memory (the backing store), and the statistics of program behavior. The goal is to store some of the data and its associated addresses in the cache and all of the data at its usual addresses (including the currently cached data) in the backing store. If it can be arranged that most of the time when the processor needs data it is located in fast memory, then the program will execute more quickly, slowing down only occasionally for main memory operations. The placement of data in the cache should not be a concern to the programmer but is a consequence of how the cache functions. Figure 1 is an example of a memory organization showing a cache with backing store. If the data needed by the microprocessor (uP) can be found in the cache then it is accessed much faster due to the local data path and faster cache memory than by having to access the backing store on the slower system bus. +-----------+ System Bus +------+ CPU Internal Buses | System | For Memory and I/O Options | |<-----------------+->| Bus |<--------------------------> | uP | | | Interface | | | |<----------+ | +-----------+ | +------+ Fast Path | | +-------+---------+ to Cache | | | | +-+------+--+ | System Memory | | | | (Backing Store) | | Cache | | | | | +-----------------+ +-----------+ Figure 1 - An Example Memory System with Cache Page 2 A cache memory system can only work if it can successfully predict most of the time what memory locations the program will require. If a program accessed data from memory in a completely random fashion, it would be impossible to predict what data would be needed next. If this was the case a cache would operate no better then a conventional memory system. Programs rarely generate random addresses. In many cases the subsequent memory address referenced is often very near the current address accessed. This is the principle of program locality. The next address generated is in the neighborhood of the current address. This behavior helps makes cache systems feasible. The concept of program locality is not always adhered to, but is a statement of how many programs behave. Many programs execute code in a linear fashion or in loops with predictable results in next address generation. Jumps and context switching give the appearance of random address generation. The ability to determine what word a program will reference next is never completely successful and therefore the correct "guesses" are a statistical measure of the size and organization of the cache, and the behavior of the program being executed. The measure of a cache performance is a statistical evaluation of the number of memory references found versus not found in cache. When memory is referenced and the address is found in the cache this is known as a hit. When it is not it is termed a miss. Cache performance is usually stated in terms of the hit ratio or the miss ratio where these are defined as: Number of Cache Hits Hit Ratio = --------------------------------- Total Number of Memory References Miss Ratio = 1 - Hit Ratio The LSI-11/73 Cache Implementation ---------------------------------- The cache organization chosen must be one that can be implemented within the physical and cost constraints of the design. The LSI-11/73 implements a direct map cache. A direct map organization has a single unique cache location for a given address and this is where the associated data from backing store are maintained. This means an access to cache requires one address comparison to determine if there is a hit. The significance of this is that a small amount of circuitry is required to perform the comparison operation. The LSI-11/73 has an 8 KByte cache. This means that there are 4096 unique address locations each of which stores two bytes of information. Page 3 The cache not only maintains the data from backing store but it also includes other information that is needed to determine if its content is valid. These are parity detection and valid entry checking. The following diagram shows the logical layout of the cache and what each field and its associated address in the cache is used for. Binary Cache Entry Address P V TAG P1 B1 P0 B0 +---+---+-------------+---+----------+---+----------+ 000000000000 | | | | | | | | +---+---+-------------+---+----------+---+----------+ 000000000001 | | | | | | | | +---+---+-------------+---+----------+---+----------+ 000000000010 | | | | | | | | +---+---+-------------+---+----------+---+----------+ . . . . . . +---+---+-------------+---+----------+---+----------+ 111111111101 | | | | | | | | +---+---+-------------+---+----------+---+----------+ 111111111110 | | | | | | | | +---+---+-------------+---+----------+---+----------+ 111111111111 | | | | | | | | +---+---+-------------+---+----------+---+----------+ Figure 2 - LSI-11/73 Cache Layout The Cache Entry Address is the address of one of 4096 entries within the cache. This value has a one-to-one relationship with a field in each address that is generated by the processor (described in the next section on how the physical address accesses cache). Each field has the following meaning: Tag (TAG) - This nine bit field contains information that is compared to the address label, described in the next section on how the physical address accesses cache. When the physical address is generated, the address label is compared to the tag field. If there is a match it can be considered a hit provided that there is entry validation and no parity errors. Cache Data (B0 and B1) - These two bytes are the actual data stored in cache. Valid Bit (V) - The valid bit indicates whether the information in B0 and B1 is usable as data if a cache hit occurs. The valid bit is set when the entry is allocated during a cache update which occurs as a result of a miss. Tag Parity Bit - (P) - Even parity calculated for the value stored in the tag field. Page 4 Parity Bits (P0 and P1) - P0 is even parity calculated for the data byte B0 and P1 is odd parity calculated for the data byte B1. When the processor generates a physical address, the on-board cache control logic must determine if there is a hit by looking at the unique location in cache. To determine what location to check, the cache control logic considers each address generated as being made up of three unique parts. The following are the three fields of a 22-bit address (in an unmapped or 18-bit environment the label field is six or four bits less respectfully): 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 +--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+ +--+ | | | | | | | | | | | | | | | | | | | | | | | | | +--+--+--+--+--+--+--+--+--+ +--+--+--+--+--+--+--+--+--+--+--+--+ +--+ |<-------- LABEL --------->| |<-------------- INDEX ------------>| BYTE SELECT Figure 2 - Components of a 22-bit Address For Cache Address Selection Each field has the following meaning: Index - This twelve bit field determines which one of the 4096 cache data entries to compare with for a cache hit. The index field is the displacement into the cache and corresponds to the Cache Entry Address. Label - Once the location in the cache is selected, the nine bit label field is compared to the tag field stored in the cache entry under consideration. If the address label and the tag field match, the valid bit is set, and there is no parity error, then a hit has occurred. Byte Select Bit - This bit determines if the reference is on an odd or even byte boundary. All Q-bus reads are word only so this bit has no effect on a cache read. Q-bus writes can access either words or bytes. If there is a word write the cache will be updated if there is a hit. If there is a miss a new cache entry will be made. If there is a byte write, the cache will only be updated if there is a hit. A miss will not create a new entry on a byte write. The LSI-11/73 direct map cache must update the backing store on a memory write. The LSI-11/73 uses the write through method. With this technique, writes to backing store occurs concurrently with cache writes. The result is that the backing store always contains the same data as the cache. Page 5 Features Of The LSI-11/73 Cache ------------------------------- The LSI-11/73 direct map cache has a number of features that assist in the performance of the overall system in addition to the speed enhancement as a result of faster memory access. These features consist of the following: o Q-bus DMA monitoring o I/O page reference monitoring o Memory management control of cache access o Program control of cache parameters o Statistical monitoring of cache performance The LSI-11/73 cache control logic monitors the Q-bus during DMA transactions. When an address that has its data stored in cache is accessed during DMA, the cache and backing store contents might no longer be the same. This is an unacceptable situation. The cache control logic invalidates a cache entry if the address is used during DMA. This also includes addresses used during Q-bus Block Mode DMA transfers. Memory references to the I/O page are not cached since that data is volatile, meaning its contents can change without a Q-bus access. Since the cache could end up with stale data, I/O references are not cached. There are situations for which using the cache to store information for faster access is not desirable. An example is a device that resides in the I/O page, and is true in other instances as well. One situation is a device that does not reside in the I/O page but can change its contents without a bus reference, such as dual ported memory. Another situation is partitioning and tuning an application for instruction code execution versus data being manipulated. In this case the instruction stream may execute many times over for different data values. Speed enhancement can be obtained if the instructions are cached while the data is not cached. By forcing the data never to be cached it cannot replace instructions in the cache. The memory management unit (MMU) of the LSI-11/73 can assist in this situation. Pages of memory allocated for data can be marked to bypass the cache and therefore not effect instructions that loop many times. The cache and the MMU work together to achieve the goal of increased system performance. The dynamics of cache operation are under program control through use of the Cache Control Register (CCR), an LSI-11/73 on-board register. This register can "turn" the cache on or off, force cache parity errors for diagnostic testing, and invalidate all cache entries. The details of the CCR are described in the KDJ11-A CPU Module User's Guide (part number EK-KDJ1A-UG-001). During system design or at run-time the performance enhancements provided by the cache system can be monitored under program control. This is accomplished by using another LSI-11/73 on-board register the Page 6 Hit/Miss Register (HMR). This register tracks the last six memory references and indicates if a hit or miss took place. The details of the HMR are also described in the KDJ11-A CPU Module User's Guide. Summary ------- Caches are a mechanism that can help improve overall system performance. The dynamics of a given cache are dictated by the organization and the behavior of the programs running on the machine. The LSI-11/73 cache is designed to be flexible in its use, simple in implementation, and enhance application performance. More detailed discussions on how caches work and other cache organizations can be found in computer architecture texts that have a discussion of memory hierarchy.