The access to graphics data always has region locality on the frame buffer: there are high spatial localities in both X and Y directions. The traditional index mapping scheme designed originally for CPU cache exploits only the spatial locality in address space. However, the proposed designs achieve lower miss rates for most of the benchmarks.Ĭache has been introduced into many Graphics processing units (GPUs) to decrease the frequency of data transfer between high-performance computing units and low-speed long-latency external memory. The proposed designs require a few extra storage bits which adds a small overhead on the hardware complexity in comparison with the conventional cache. To evaluate the proposed design based on the overall hit rate, twenty-three benchmarks from SPEC CPU 2006 were simulated using the SuperESCalar simulator. Since the proposed technique is based on combining multiple sets of the shared part to form a larger set, that is shared between memory blocks with different indices, we have chosen the name “set folding.” The decision as to where to map a memory block depends on the number of misses encountered at each of the potential target sets. A memory block can be mapped to one subset from the exclusive type or one of multiple subsets from the shared type. In addition to their standard cache indexing role, the shared subsets are configured to host blocks with different indices. The exclusive is configured as a traditional cache where each block is mapped to the set whose index matches the block index. Each set is divided into a group of subsets: the first is referred to as the exclusive subset, and the rest are the shared subsets. The proposed designs reduce the conflict misses in the last level multi-way set associative cache. In this work, we propose cache memory designs that reduce the number of conflict misses significantly. Reducing the cache miss ratio will definitely improve the execution time of an application. Instructions and data are fetched from a fast cache instead of a slow memory to save hundreds of cycles. The cache memory has a direct effect on the performance of a computer system.
0 Comments
Leave a Reply. |