OpenAI Text Embeddings + Streamlit + Pinecone Vector Database - part 2

OpenAI Text Embeddings + Streamlit + Pinecone Vector Database - part 2

In the previous post, I covered the basic idea of what I was trying to achieve. It's a short read and contains the context backing this article. However, here is a quick recap:

I'm using OpenAI to generate text, generate embeddings of that text, and save the embeddings in the Pinecone vector database in addition to taking advantage of Pincone's ability to perform similarity searches to see whether the question has been asked before - all with help of a UI put together using Streamlit framework.

In this post I'm just going to provide the codebase - still a work in progress - used so far for putting together this prototype. Note that this was my first soiree with Python - Streamlit is a Python-based framework. So, there might be places where I could have done a better job, feel free to provide advice.

You can find the code located at my github repo, flashcards_data. There are still some work to do around:

  • fixes to the UI

  • more refactoring

  • design, and implementation of a communication mechanism to transfer/share the data with my web application.

Thank you for reading.