Searching through a large database of blog posts can be a daunting task, especially if there are thousands of articles. However, using vectorized search and cosine similarity, you can quickly query your blog posts and retrieve the most relevant content.
In this blog post, we’ll show you how to query a list of blog posts (from this blog) using a combination of vectorized search with cosine similarity and OpenAI ChatCompletions. We’ll be using OpenAI’s embeddings API to vectorize the blog post articles and Pinecone, a vector database, to store and query the vectors. We’ll also show you how to retrieve the contents of the article, create a prompt using the ChatCompletion API, and return the result to a web page.
ℹ️ Sample code is on GitHub: https://github.com/gbaeke/gpt-vectors
ℹ️ If you want an introduction to embeddings and cosine similarity, watch the video on YouTube by Part Time Larry.
Setting Up Pinecone
Before we can start querying our blog posts, we need to set up Pinecone. Pinecone is a vector database that makes it easy to store and query high-dimensional data. It’s perfect for our use case since we’ll be working with high-dimensional vectors.
ℹ️ Using a vector database is not strictly required. The GitHub repo contains app.py
, which uses scikit-learn to create the vectors and perform a cosine similarity search. Many other approaches are possible. Pinecone just makes storing and querying the vectors super easy.
ℹ️ If you want more information about Pinecone and the concept of a vector database, watch this introduction video.
First, we’ll need to create an account with Pinecone and get the API key and environment name. In the Pinecone UI, you will find these as shown below. There will be a Show Key and Copy Key button in the Actions section next to the key.

Once we have an API key and the environment, we can use the Pinecone Python library to create and use indexes. Install the Pinecone library with pip install pinecone-client
.
Although you can create a Pinecone index from code, we will create the index in the Pinecone portal. Go to Indexes and select Create Index. Create the index using cosine as metric and 1536 dimensions:

The embedding model we will use to create the vectors, text-embedding-ada-002
, outputs vectors with 1536 dimensions. For more info see OpenAI’s blog post of December 15, 2022.
To use the Pinecode index from code, look at the snippet below:
import pinecone
pinecone_api = "<your_api_key>"
pinecone_env = "<your_environment>"
pinecone.init(api_key=pinecone_api, environment=pinecone_env)
index = pinecone.Index('blog-index')
We create an instance of the Index
class with the name “blog-index” and store this in index
. This index will be used to store our blog post vectors or to perform searches on.
Vectorizing Blog Posts with OpenAI’s Embeddings API
Next, we’ll need to vectorize our blog post articles. We’ll be using OpenAI’s embeddings API to do this. The embeddings API takes a piece of text and returns a high-dimensional vector representation of that text. Here’s an example of how to do that for one article or string:
import openai
openai.api_key = "<your_api_key>"
article = "Some text from a blog post"
vector = openai.Embedding.create(
input=article,
model="text-embedding-ada-002"
)["data"][0]["embedding"]
We create a vector representation of our blog post article by calling the Embedding
class’s create
method. We pass in the article text as input and the text-embedding-ada-002
model, which is a pre-trained language model that can generate high-quality embeddings.
Storing Vectors in Pinecone
Once we have the vector representations of our blog post articles, we can store them in Pinecone. Instead of storing vector per vector, we can use upsert to store a list of vectors. The code below uses the feed of this blog to grab the URLs for 50 posts, every post is vectorized and the vector is added to a Python list of tuples, as expected by the upsert method. The list is then added to Pinecone at once. The tuple that Pinecone expects is:
(id, vector, metadata dictionary) e.g. (0, vector for post 1, {"url": url to post 1}
Here is the code that uploads the first 50 posts of baeke.info to Pinecone. You need to set the Pinecone key and environment and the OpenAI key as environment variables. The code uses feedparser to grab the blog feed, and BeatifulSoup to parse the retrieved HTML. The code serves as an example only. It is not very robust when it comes to error checking etc…
import feedparser
import os
import pinecone
import numpy as np
import openai
import requests
from bs4 import BeautifulSoup
# OpenAI API key
openai.api_key = os.getenv('OPENAI_API_KEY')
# get the Pinecone API key and environment
pinecone_api = os.getenv('PINECONE_API_KEY')
pinecone_env = os.getenv('PINECONE_ENVIRONMENT')
pinecone.init(api_key=pinecone_api, environment=pinecone_env)
# set index; must exist
index = pinecone.Index('blog-index')
# URL of the RSS feed to parse
url = 'https://blog.baeke.info/feed/'
# Parse the RSS feed with feedparser
feed = feedparser.parse(url)
# get number of entries in feed
entries = len(feed.entries)
print("Number of entries: ", entries)
post_texts = []
pinecone_vectors = []
for i, entry in enumerate(feed.entries[:50]):
# report progress
print("Processing entry ", i, " of ", entries)
r = requests.get(entry.link)
soup = BeautifulSoup(r.text, 'html.parser')
article = soup.find('div', {'class': 'entry-content'}).text
# vectorize with OpenAI text-emebdding-ada-002
embedding = openai.Embedding.create(
input=article,
model="text-embedding-ada-002"
)
# print the embedding (length = 1536)
vector = embedding["data"][0]["embedding"]
# append tuple to pinecone_vectors list
pinecone_vectors.append((str(i), vector, {"url": entry.link}))
# all vectors can be upserted to pinecode in one go
upsert_response = index.upsert(vectors=pinecone_vectors)
print("Vector upload complete.")
Querying Vectors with Pinecone
Now that we have stored our blog post vectors in Pinecone, we can start querying them. We’ll use cosine similarity to find the closest matching blog post. Here is some code that does just that:
query_vector = <vector representation of query> # vector created with OpenAI as well
search_response = index.query(
top_k=5,
vector=query_vector,
include_metadata=True
)
url = get_highest_score_url(search_response['matches'])
def get_highest_score_url(items):
highest_score_item = max(items, key=lambda item: item["score"])
if highest_score_item["score"] > 0.8:
return highest_score_item["metadata"]['url']
else:
return ""
We create a vector representation of our query (you don’t see that here but it’s the same code used to vectorize the blog posts) and pass it to the query
method of the Pinecone Index
class. We set top_k=5
to retrieve the top 5 matching blog posts. We also set include_metadata=True
to include the metadata associated with each vector in our response. That way, we also have the URL of the top 5 matching posts.
The query
method returns a dictionary that contains a matches
key. The matches
value is a list of dictionaries, with each dictionary representing a matching blog post. The score
key in each dictionary represents the cosine similarity score between the query vector and the blog post vector. We use the get_highest_score_url
function to find the blog post with the highest cosine similarity score.
The function contains some code to only return the highest scoring URL if the score is > 0.8. It’s of course up to you to accept lower matching results. There is a potential for the vector query to deliver an article that’s not highly relevant which results in an irrelevant context for the OpenAI ChatCompletion API call we will do later.
Retrieving the Contents of the Blog Post
Once we have the URL of the closest matching blog post, we can retrieve the contents of the article using the Python requests
library and the BeautifulSoup
library.
import requests
from bs4 import BeautifulSoup
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
article = soup.find('div', {'class': 'entry-content'}).text
We send a GET request to the URL of the closest matching blog post and retrieve the HTML content. We use the BeautifulSoup
library to parse the HTML and extract the contents of the <div>
element with the class “entry-content”.
Creating a Prompt for the ChatCompletion API
Now that we have the contents of the blog post, we can create a prompt for the ChatCompletion API. The crucial part here is that our OpenAI query should include the blog post we just retrieved!
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{ "role": "system", "content": "You are a polite assistant" },
{ "role": "user", "content": "Based on the article below, answer the following question: " + your_query +
"\nAnswer as follows:" +
"\nHere is the answer directly from the article:" +
"\nHere is the answer from other sources:" +
"\n---\n" + article }
],
temperature=0,
max_tokens=200
)
response_text=f"\n{response.choices[0]['message']['content']}"
We use the ChatCompletion API with the gpt-3.5-turbo
model to ask our question. This is the same as using ChatGPT on the web with that model. At this point in time, the GPT-4 model was not available yet.
Instead of one prompt, we send a number of dictionaries in a messages list. The first item in the list sets the system message. The second item is the actual user question. We ask to answer the question based on the blog post we stored in the article
variable and we provide some instructions on how to answer. We add the contents of the article to our query.
If the article is long, you run the risk of using too many tokens. If that happens, the ChatCompletion call will fail. You can use the tiktoken library to count the tokens and prevent the call to happen in the first place. Or you can catch the exception and tell the user. In the above code, there is no error handling. We only include the core code that’s required.
Returning the Result to a Web Page
If you are running the search code in an HTTP handler as the result of the user typing a query in a web page, you can return the result to the caller:
return jsonify({
'url': url,
'response': response_text
})
The full example, including an HTML page and Flask code can be found on GitHub.
The result could look like this:

Conclusion
Using vectorized search and cosine similarity, we can quickly query a database of blog posts and retrieve the most relevant post. By combining OpenAI’s embeddings API, Pinecone, and the ChatCompletion API, we can create a powerful tool for searching and retrieving blog post content using natural language.
Note that there are some potential issues as well. The code we show is merely a starting point:
- Limitations of cosine similarity: it does not take into account all properties of the vectors, which can lead to misleading results
- Prompt engineering: the prompt we use works but there might be prompts that just work better. Experimentation with different prompts is crucial!
- Embeddings: OpenAI embeddings are trained on a large corpus of text, which may not be representative of the domain-specific language in the posts
- Performance might not be sufficient if the size of the database grows large. For my blog, that’s not really an issue. 😀