Everything You Should Know About Caching

Designing high-performance computing systems requires considering the restrictions of the memory hierarchy, which separates computer storage devices into levels based on response time. Aron Brand, CTO, CTERA, discusses what a cache is, how it works and how it can help organizations ride data trends.

Since response time and capacity are related, higher capacity devices almost always have higher access latency. Take, for example, a cloud storage system such as Amazon S3. S3 offers practically limitless capacity, but accessing an S3 object may be millions of times slower, in terms of latency, than accessing data in local RAM.

When thinking of maximizing the performance of these high capacities, high latency storage devices, it is important to consider the concept of access locality. The observation here is that in most software workloads, â€œwarmâ€ data that was accessed recently are more likely to be needed in the near future, compared to â€œcoldâ€ data that was not accessed for a while. Caching is a technique that exploits access locality to reduce the latency of accessing high-capacity storage devices. This means we can use a small, low latency memory device as a â€œcacheâ€ for warm data, reducing the amount of access to the large, high capacity memory device.

What Is a Cache?

A cache is a small, fast storage space that stores frequently accessed data. It is typically used to improve the performance of accessing information. When a user tries to access a data element such as an object, file or record, the cache checks to see if the data is already stored in the cache. If it is, the data is retrieved from the cache, which is much faster than accessing the data from its original location. If the data is not in the cache, it is retrieved from its original location and stored in the cache for future access.

Caches are used in many different areas to improve performance. For example, web browsers use caches to store frequently accessed web pages so they can be quickly retrieved the next time they are needed. Operating systems use caches to store often-used files and data so that they can be quickly accessed when needed.

In a hybrid cloud environment, a cache can be used to store frequently accessed data from a cloud storage system at an edge location near the users. This can improve performance by reducing the need to access data from the public cloud over Internet links with much higher latency and lower bandwidth than a private LAN connection.

In this context, the cache also has another benefit, which is improving reliability: If the link to the cloud is down, the data is still available in the cache, so cached data can still be accessed locally.

How Does a Cache Work?

Since the cache is, by definition, smaller than the cached dataset, sometimes it is necessary to evict data from the cache to make place for other data. Cache eviction is the process of removing data from the cache when it is no longer needed or when space is needed for new data. Several algorithms can be used to determine which data to evict from the cache. These algorithms take into account factors such as how often the data is accessed when it was last accessed and how much space is available in the cache.

Can Caches also Accelerate Writes?

It depends.

Caches classified as â€œwrite-backâ€ caches can accelerate writes by storing data in the cache and then writing it back to the underlying storage system in the background. This can dramatically improve the responsiveness and user experience, as write operations are considered successful when the data is stored in the cache. In contrast, â€œwrite-throughâ€ caches write data to the underlying storage system and the cache simultaneously, which can be much slower.

What are the Important Metrics for Monitoring a Cache?

When accessing data from a cache, the operation may hit or miss. Cache hits occur when the data a user wants to access is already stored in the cache and thus can be provided at local speeds. Cache misses occur when the data a user wants to access is not in the cache and needs to be retrieved from the slower origin. A cache hit rate is defined as the percentage of times data is found in the cache. Monitoring cache hits or misses, and cache hit rate is important because it can help to determine whether the cache is working as expected. If the cache hit rate is too low, it may indicate that the cache is not being used effectively and needs to be reconfigured.

What if I Want Certain Data to Always be Retained in the Cache?

Cache pinning is a technique that can be used to ensure that certain data is always retained in the cache. This is useful for data that is accessed frequently or is mission-critical. When data is pinned in the cache, it is never evicted, even when the cache is full.

Are there Any Drawbacks to Using a Cache?

One drawback of using a cache is that it can introduce cost and complexity to the system. Caches must be carefully designed and managed to maximize their effectiveness. Some workloads exhibit inadequate access locality and are not well-suited for caching â€“ for example, consider the case where an application needs to perform bulk processing of the entire dataset. This would not use the cache effectively, as the traversal would cause retrieval of the entire corpus of data, and each portion of the data would be accessed only once. In this case, the cache would be completely inefficient. In fact, the situation is even worse: the traversal operation would â€œpolluteâ€ the cache with stale data, meaning that it would become ineffective for other users that are trying to access their data. To solve this, intelligent cache algorithms may introduce fairness policies to prevent single users from adversely affecting the performance of others. In addition, state-of-the-art caching algorithms may use heuristics or machine learning techniques to detect and avoid caching data that is not likely to be reused.

See More: 6 Data Center Trends You Should Know

Are Caching and Tiering the Same Thing?

No. The fundamental difference between caching and tiering is that caching stores a COPY of the frequently accessed data in a fast, nearby storage, and ALL the data at the origin. In contrast, tiering divides the data so that frequently accessed data is stored only on the â€œfastâ€ storage tier, and infrequently accessed data is stored only on the â€œcapacityâ€ storage tier.

How are you leveraging caching? Share with us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to know!