Introduction

The world of search has been rapidly evolving with the introduction of new technologies. In today's data-driven world, the ability to efficiently search through vast amounts of information is paramount. Traditional keyword-based search algorithms have served us well, but advancements in machine learning have opened up new possibilities. One such breakthrough is the use of neural search, which leverages text embedding and sentence similarity models to create a more intuitive and accurate search experience. In this blog post, we will take a deep dive into the world of neural search and explore how it can transform your search workflows from zero to hero.

Step 1. What is an Embedding and How does it work?

In simple terms, embedding can be thought of as a way to represent words or phrases in a contextual space. For example, the word `lead` is used in different meanings:

Doctor told me that he has lead poisoning.
New information will lead me to believe we will lose the case.

As humans, we can easily understand the contextual difference between these two uses of the word. However, teaching this contextual understanding to computers is challenging. To overcome this, we create a `contextual space` with multiple dimensions, where we map every known word and sentence. By doing this, we can establish a `distance` between different meanings of words, allowing computers to better grasp the nuances of language.

As you see from the pseudo-example, even the word is same, their coordinates in the “contextual space” are vastly different or in other words, “far”.

This basic technique forms the foundation of today's Large Language Models (LLMs). You see, computers don't naturally understand words like we do; they work with numbers. By using text embedding, we can create numerical representations that capture the meaning of words. This allows us to build models that can work with these numbers and process language in a more sophisticated way.

Step 2. How do we evolve into Sentence Similarity Models?

Sentence similarity models take these concepts to the next level. They go beyond just representing individual words and calculate the average of the coordinates of the words within a sentence. Various techniques, such as mean average or mean square root, can be used to calculate this average. Once this calculation is done, we measure the distance between the coordinates of two sentences. This distance gives us an indication of how similar or different the sentences are in terms of their meaning.

Let’s check out the example from before, but let’s include the prior sentences to the mix and see the mean’s of each sentence:

As you can observe, even though the two sentences do not have any words in common, they are still considered close in terms of similarity. This is because they share the same context and convey a similar meaning. Just like us humans, we can understand the connection between these sentences based on their context.

Step 3. Full-Fledged Semantic Search

Now that we have laid the foundation, let's delve into semantic search. Simply put, semantic search involves finding the closest matches to a given input. To accomplish this, we need to calculate and organize all the content we have into a contextual space prior. Once this contextual space is created, we can use the same technique of measuring sentence similarity to identify the nearest points or matches to our query. This allows us to retrieve the most relevant and similar results based on the meaning and context of the input.

As you see from the graph, the words are laid on to the graph and once we put the query, it will grab the nearest points on this graph for us. The procedure goes like this:

Get the search input.
Convert it to Embedding.
Calculate the mean of the input and put the coordinates to the graph.
Find and Return the nearest points available.

By implementing these key components, you are now equipped to build your own semantic search system. While it may appear simpler compared to the initial stages, it is still a task that requires careful handling.

In our upcoming blog post, we will delve into the architecture of a semantic search system and explore the tools available to support and automate the entire migration process. We will guide you on how to seamlessly integrate semantic search into your existing system, making implementation a breeze. Stay tuned for more insights and practical tips!