7
minutes
Floris Schoenmakers

Create a Custom AI RAG from your Google Sheet

Imagine you want to build AI applications using your own data but lack expertise in databases, APIs, or large language models (LLMs). A straightforward solution is to create a copy of your data in a ‘shadow’ database using tools like Google Sheets, Notion, or Airtable.

Then, you can connect this database to an LLM without writing code. In this example we are going to connect a (live) google sheet to a LLM. The relations are as follows:

Introduction

When experimenting with task-specific LLMs and agentic applications, having relevant source data is essential. Bearing in mind that:

  1. Privacy: privacy sensitive data must remain within your control. When unsure where/how the data is stored and used, please anonymize the data.
  2. Dynamic updates: the data that is most relevant to a company is usually frequently updated, which justifies the set up of a live shadow database instead of using data exports. The workflow in this demo can handle both live data and data exports.

For innovation teams experimenting with AI applications, setting up no-code ‘shadow’ databases can be valuable. These databases allow teams to query and build upon their own data. By developing small proof-of-concepts or proof-of-technologies, organizations can better justify future investments in enterprise applications and agents.In this blog, I’ll show you how to create a simple ‘layman’s’ database that connects to an LLM via a no-code solution. The steps are as follows:

  1. Choose a platform like Google Sheets, Airtable, or Notion. (This example uses Google Sheets.)
  2. Create an API (JSON) using an external service (Sheety).
  3. Set up a workflow with DataStax - LangFlow (low code/no code).
  4. Start querying and interacting with your own data directly.

Background

Large Language Models (LLMs) are trained on fixed datasets and may not include the most up-to-date information or specific details about niche topics. To make LLMs useful in a business context, they need access to detailed, specific, and frequently updated data.

This is where Retrieval Augmented Generation (RAG) comes into play. RAG enables LLMs to retrieve relevant information from external sources—such as documents, images, or databases—and use it as context in prompts. This approach improves the model’s responses by making them more accurate and contextually relevant.

To retrieve this information efficiently, RAG often relies on a vector store or vector database. These databases store data (text, images, etc.) as numerical embeddings (vectors) that represent their meaning. When a query is made, the vector database finds the most relevant items by comparing embeddings, ensuring the LLM has the right context to generate better answers.


Step 1: create an API

First we need to turn our Google sheet into a database that can `talk`. By using platforms as Sheetly (source) you can generate an API for your sheet - also with authentication. 

Step 2: setup a workflow

By using a platform like Datastax, you can work from a template in which most essential steps are already pre-programmed. As we want to load data and ‘add’ it to the knowledge of the LLM, we pick the Vector Store RAG template.

In the template workflow, you have two workflows:

  1. The Load Data workflow: this workflow loads the data and enables a Vector Search. In the template the data upload is currently a file upload, but we are going to change that to an API Request. 
  2. The Retriever workflow: this workflow integrates the front-end chat interface, with the ‘our own’ database. You can see in this workflow that it should search in the same Database and Collection that has been populated in the load data workflow.

Step 3: adjust workflow and parameters

The whole workflow looks like this:

By going to the Playground in the upper right corner, you can now chat with your own database. 

My Google Sheet database contains over 150,000 records of Spotify tracks, complete with detailed variables such as track name, artist, popularity, genre, and more. To demonstrate how this setup works, I asked the system: “What are the most tracks that have something to do with a flower? Give me 10 examples.” 

The LLM interpreted the context of the question—searching for associations with flowers—and queried the database accordingly.This example shows how the system combines natural language understanding with precise data retrieval, making it a powerful tool for exploring large datasets without needing complex queries or technical expertise..

Concluding

Google Sheets offers a simple and accessible way to start creating shadow databases for AI experiments. Tools like Notion and Airtable expand this functionality further by allowing the inclusion of PDFs and other documents, making them powerful alternatives for more complex datasets. With platforms like DataStax, you can publish your own applications, streamlining the process of turning experimental workflows into tangible results.

In an upcoming blog, I’m going to show how you can use similar workflows to build an AI agents that perform research and analysis on competitors.

The same principles of LLMs and RAGs outlined in this guide can be applied to more complex enterprise solutions. Eli5 leverages these methods so we can create tailored AI applications and agents that interact seamlessly with their own data, opening up new opportunities for innovation and operational efficiency.

Floris Schoenmakers
Chief Venture and Growth Officer
current