Why AMD and Intel Opted for Shared L3 Cache Over a Single Large L2 Cache

The design choices regarding cache architecture in CPUs, particularly the sharing of L3 cache among multiple cores versus sharing a single large L2 cache, stem from several technical and architectural considerations. Here are the main reasons why AMD and Intel opted for a shared L3 cache architecture:

Performance and Latency: L2 Cache Speed

L2 cache is typically faster than L3 cache. Sharing a single large L2 cache among many cores could introduce significant latency when multiple cores access the cache simultaneously. This could lead to contention where cores compete for access to the same cache lines, resulting in delays.

Scalability: Core Count and Cache Coherency

As the number of cores in a CPU increases, managing a single large L2 cache becomes increasingly complex and less efficient. A shared L3 cache can scale better with more cores as it centralizes the cache storage while allowing for lower latency communication between cores. Implementing a single large L2 cache would complicate cache coherency protocols. A shared L3 cache simplifies the architecture as it can act as a coherent storage point for data accessed by multiple cores.

Cost and Complexity: Silicon Real Estate and Design Complexity

A large L2 cache shared among many cores would require more silicon area and power, increasing manufacturing costs. A smaller L2 cache per core combined with a centralized L3 cache optimizes space and power use. Managing a single large L2 cache would increase the complexity of the CPU design. A shared L3 cache allows for simpler cache management and data retrieval strategies.

Access Patterns and Workloads: Data Sharing and Locality of Reference

In many workloads, data is often shared between cores, making a shared L3 cache advantageous. This allows for efficient data sharing and reduces the need for repeated accesses to slower main memory. The architecture also leverages the principle of locality, where frequently accessed data remains in the faster L1 and L2 caches while less frequently accessed data can reside in the shared L3 cache.

Conclusion

In summary, AMD and Intel's decision to implement a shared L3 cache architecture is a result of balancing performance, scalability, manufacturing costs, and complexity. This design allows for efficient data access patterns, minimizes latency issues, and simplifies cache coherency, making it a practical choice for modern multi-core processors.

Keywords: L3 Cache, CPU Architecture, Cache Coherency