• About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us
TechTrendFeed
  • Home
  • Tech News
  • Cybersecurity
  • Software
  • Gaming
  • Machine Learning
  • Smart Home & IoT
No Result
View All Result
  • Home
  • Tech News
  • Cybersecurity
  • Software
  • Gaming
  • Machine Learning
  • Smart Home & IoT
No Result
View All Result
TechTrendFeed
No Result
View All Result

A Sensible Information to Semantic Caching With Redis LangCache

Admin by Admin
January 6, 2026
Home Software
Share on FacebookShare on Twitter


Semantic cache is a complicated caching mechanism that differs from conventional caching, which depends on precise key phrase matching; it shops and retrieves information based mostly on semantic similarity. Redis LangCache is a totally hosted semantic caching service that helps cache LLM prompts and responses semantically, thereby decreasing LLM utilization prices.

On this tutorial, let’s learn to shortly create a easy utility and use LangCache for caching LLM queries. Additionally, see if we are able to mix fuzzy logic to enhance the responses.  

Step 1: Set Up Redis LangCache

If you happen to shouldn’t have an account but, create an account on https://cloud.redis.io/. Upon getting logged in, 

  1. Navigate to databases and create a brand new database (I used the free plan right here).
  2. Click on on LangCache within the left menu and create an occasion of the LangCache service. I used the “Fast service creation” choice to create the LangCache service. 
  3. Copy the API key and hold it protected.
  4. Click on on the LangCache service you simply created. Once you click on the “Join” button, a fast join information will seem on the fitting with examples of how to hook up with your LangCache occasion.

Step 2: Create a Easy Python Script

Let’s create a easy Python script that can first verify the cache for a immediate. If a match is discovered, it returns the cached response. If not, it sends the immediate to the LLM, caches the response, and returns it.

Connect with LangCache:

lang_cache =  LangCache( 
    server_url="",
    cache_id=",
)

Search the cache earlier than sending it to LLM:

outcome = lang_cache.search(
        immediate=question,
        similarity_threshold=0.90,
    )

This similarity_threshold determines how intently the immediate should match. The next worth means stricter matching.

Deal with a cache hit:

if outcome :
        for entry in outcome.information:
            print("Cache Hit!")
            print("Cache Response:::")
            print(f"Immediate: {entry.immediate}")
            print(f"Response: {entry.response}")
            print(f"Rating: {entry.similarity}")
            return

Deal with cache miss and retailer response:

# Calling LLM right here
response = requests.put up(url, json=payload, headers=headers)
response_json = response.json()
response_text = response_json["choices"][0]["message"]["content"]

# --- Storing the reponse from LLM in LangCache ---
save_response = lang_cache.set(
     immediate=question,
     response=response_text,
  )

Advantages of Semantic Caching

  • For related queries, responses are fetched from the cache, avoiding costly LLM calls.
  • Sooner response instances.

Issues to Watch Out For

  • Similarity threshold: Set it thoughtfully. Too excessive and you will miss helpful matches. Too low and you will obtain irrelevant outcomes.
  • Accuracy: Even with an optimum threshold, outcomes might not at all times be excellent.
  • Knowledge privateness: In multi-tenant architectures, guarantee correct information partitioning so customers don’t see one another’s data.
  • Cache eviction: Know when and the way to evict cache entries.

Let’s Run the Code

First Question: (Cache Is Empty)

Question : Temporary historical past on Capital of France
Response: 
      Cache Miss!
    Redirecting to LLM

The question and the response are actually saved within the cache.

Modified Question

Question:  ----Temporary historical past on Paris---
Response: 
    Cache Hit!
    Cache Response:::
    Immediate: Temporary historical past on Capital of France
    Rating: 0.92440444

Though the question modified, the cache returned a semantically related outcome with a similarity rating of 0.92

One other Variation

Question:  ----Temporary historical past on France---
Response: 
    Cache Hit!
    Cache Response:::
    Immediate: Temporary historical past on Capital of France
    Rating: 0.9121176

Oops! We requested for the historical past of France, however we received the historical past of the capital of France. Although Paris performs a serious function in France’s historical past, the context is completely different. One is a metropolis, and the opposite is a rustic!

Tuning the Threshold

Let’s enhance the edge to 0.92 and clear the cache.

Question:  ----Temporary historical past on Capital of France---
Response:
    Cache Miss!
    LLM Response: 
#####################################

Question:  ----Temporary historical past on Paris---
Response:
    Cache Hit!
    Cache Response:::
        Immediate: Temporary historical past on Capital of France
        Rating: 0.92440444
#####################################

Question:  ----Temporary historical past on France---
Response:
    Cache Miss!
    LLM Response: 

It appears to be working higher! 

Efficiency Comparability

Let’s evaluate the time it takes to question from the cache vs. querying from the LLM.

Semantic Cache Outcomes — Sorted by Time

Question

Consequence

Matched Immediate

Similarity Rating

Response Supply

Time (seconds)

Temporary historical past on the Capital of France

Cache Miss

LLM

0.8499

Temporary historical past on Paris

Cache Hit

Temporary historical past on the Capital of France

0.9244

Cache

0.2705

Temporary historical past on France

Cache Miss

LLM

1.2543

Temporary historical past on the Capital of France

Cache Hit

Temporary historical past on the Capital of France

1.0

Cache

1.1139

Temporary historical past on Paris

Cache Hit

Temporary historical past on France

0.9386

Cache

0.2761

Temporary historical past on France

Cache Hit

Temporary historical past on France

1.0

Cache

0.2798

Temporary historical past on the  Capital of France

Cache Hit

Temporary historical past on the Capital of France

1.0

Cache

1.0178

Temporary historical past on Paris

Cache Hit

Temporary historical past on France

0.9386

Cache

0.2806

Temporary historical past on France

Cache Hit

Temporary historical past on France

1.0

Cache

0.2778

How does Langcache work? clarify…

Cache Hit

How does Langcache work? clarify…

1.0

Cache

0.2930

Observations:

  1. Although there are some anomalies, the response from the cache is far quicker, which is apparent.
  2. A excessive similarity threshold is held to reuse the cached response.
  3. For distinct solutions, a better similarity threshold is really helpful.
  4. Primarily based on the question and enterprise necessities, at all times tune and experiment, as outcomes fluctuate with embedding fashions, similarity thresholds, and so forth.

One Extra Instance

Question:  ----How does Langcache work---
Response:
    Cache Miss!
    LLM Response: < Offers a 20 strains response>
    Time to get response from LLM API: 0.8144 seconds
----#####################################---
Question:  ----How does Langcache work---
Response:  
    Cache Hit!
    Time for cache lookup: 0.3162 seconds
    Cache Response:::
    Immediate: How does Langcache work
    Response: < Offers the identical 20 strains response>
    Rating: 1.0

All is properly until now. Let’s modify the question a bit!

----#####################################---
Question:  ----How does Langcache work, clarify in 5 lines---
Response:
    Cache Hit!
    Time for cache lookup: 0.2719 seconds
    Cache Response:::
    Immediate: How does Langcache work
    Response: < Offers the identical 20 strains response>
    Rating: 0.9714471

----#####################################---

We requested for a 5-line response, however received the cached 20-line one. This highlights the significance of tuning and utilizing attributes to scope responses.

Semantic Cache vs. Fuzzy Match 

Fuzzy matching works based mostly on approximate string matching. It really works finest for dealing with typos, spelling variants, and near-duplicate strings, whereas semantic match operates on the that means and context ranges.

Let’s have a look at the distinction between them in motion. Let’s evaluate the semantic rating (LangCache) with the fuzzy rating (Ratcliff–Obershelp algorithm) when matching two strings.

Querying for the primary time:

Question: Does Semantic cache work?
Response:
    Cache Miss!
    Cache Response:::
    Immediate: How does Semantic cache work?
    Response: Semantic caching shops question outcomes together with their semantic descriptions, enabling new queries to be answered partially or totally by reusing cached information.

Querying the identical once more — discover that the semantic rating and fuzzy rating are fairly shut.

Question: How does Semantic cache work?
Response:
    Cache Hit!
    Cache Response:::
    Immediate: How does Semantic cache work?
    Response: Semantic caching shops question outcomes together with their semantic descriptions, enabling new queries to be answered partially or totally by reusing cached information.
    Semantic rating  (Langacache) : 0.95
    Fuzzy match rating : 0.93

Let’s attempt with a couple of extra variations:

Question: Will Semantic cache work?
Response:
    Cache Hit!
    Cache Response:::
    Immediate: How does Semantic cache work?
    Response: Semantic caching shops question outcomes together with their semantic descriptions, enabling new queries to be answered partially or totally by reusing cached information.
    Semantic rating  (Langacache) : 0.94
    Fuzzy match rating : 0.83

Question: What does Semantic cache imply?
Response:
    Cache Hit!
    Cache Response:::
    Immediate: How does Semantic cache work?
    Response: Semantic caching shops question outcomes together with their semantic descriptions, enabling new queries to be answered partially or totally by reusing cached information.
    Semantic rating  (Langacache) : 0.94
    Fuzzy match rating : 0.79

As you’ll be able to see, fuzzy match focuses on “seems like,” whereas semantic matching focuses on “means like.”

Full implementation of the above is offered right here.

Fuzzy match could be mixed with the semantic cache in a couple of methods:

  1. Retailer the final ‘n’ prompts and do a fuzzy match on these. Use semantic caching provided that there isn’t a match discovered. 
  2. When a excessive similarity threshold/rating is used for the semantic cache (e.g., > 0.95), we find yourself caching prompts for each cache miss. This process will lead to a number of near-duplicates. We will use fuzzy match to establish these close to duplicates and retailer solely the prompts which are completely different
  3. If the caching layer incorporates a number of close to duplicates, we are able to use fuzzy match for compaction.

Ultimate Ideas

When constructing functions with semantic caching, efficient outcomes depend upon steady testing and context-aware tuning. Similarity thresholds, immediate patterns, and cache scope ought to be adjusted based mostly on workload habits and accuracy necessities. Redis LangCache allows fine-grained management via attributes that partition and scope cached responses. Semantic caching turns into much more environment friendly when fuzzy matching logic is added, placing a steadiness between accuracy and elevated cache hit charges. When mixed, these strategies can enhance latency, decrease LLM prices, and supply constant outcomes whereas sustaining accuracy and relevance.

Completely satisfied coding!

Tags: CachingGuideLangCachepracticalRedissemantic
Admin

Admin

Next Post
AI agent-driven browser automation for enterprise workflow administration

AI agent-driven browser automation for enterprise workflow administration

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Trending.

Reconeyez Launches New Web site | SDM Journal

Reconeyez Launches New Web site | SDM Journal

May 15, 2025
Safety Amplified: Audio’s Affect Speaks Volumes About Preventive Safety

Safety Amplified: Audio’s Affect Speaks Volumes About Preventive Safety

May 18, 2025
Flip Your Toilet Right into a Good Oasis

Flip Your Toilet Right into a Good Oasis

May 15, 2025
Apollo joins the Works With House Assistant Program

Apollo joins the Works With House Assistant Program

May 17, 2025
Discover Vibrant Spring 2025 Kitchen Decor Colours and Equipment – Chefio

Discover Vibrant Spring 2025 Kitchen Decor Colours and Equipment – Chefio

May 17, 2025

TechTrendFeed

Welcome to TechTrendFeed, your go-to source for the latest news and insights from the world of technology. Our mission is to bring you the most relevant and up-to-date information on everything tech-related, from machine learning and artificial intelligence to cybersecurity, gaming, and the exciting world of smart home technology and IoT.

Categories

  • Cybersecurity
  • Gaming
  • Machine Learning
  • Smart Home & IoT
  • Software
  • Tech News

Recent News

Tech Life – Chatbots altering minds

Tech Life – Chatbots altering minds

February 11, 2026
Subsequent Gen Spotlights: Turning Behavioural Intelligence right into a Highly effective Instrument In opposition to Fraud and Crime – Q&A with Paddy Lawton, Co-Founding father of FACT360

Subsequent Gen Spotlights: Turning Behavioural Intelligence right into a Highly effective Instrument In opposition to Fraud and Crime – Q&A with Paddy Lawton, Co-Founding father of FACT360

February 11, 2026
  • About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us

© 2025 https://techtrendfeed.com/ - All Rights Reserved

No Result
View All Result
  • Home
  • Tech News
  • Cybersecurity
  • Software
  • Gaming
  • Machine Learning
  • Smart Home & IoT

© 2025 https://techtrendfeed.com/ - All Rights Reserved