Two ways to feed external data into GPT without using plugins

Jason Fan
4 min readApr 17, 2023

--

ChatGPT plugins are a great way to solve this problem for use cases where companies find it acceptable to send their users to OpenAI’s website. However, there are far more use cases where companies would never consider sending their customers to another company’s platform and giving up control over their customers experience. Customer support is the obvious example, but so is the emerging conversational interfaces companies are building into their core product like the features Duolingo and Quizlet launched recently.

As more and more companies try building applications with LLMs, one of the key problems that’s emerged is the question of how to use external data in these applications. Without being able to feed external into the prompt, LLMs like ChatGPT are limited to topics covered in its training data, which ends in 2021 which is not very useful for enterprise applications.

For these cases, there are only two ways for companies to feed external data into LLMs: fine-tuning and retrieval augmented generation (RAG).

Fine Tuning

Fine tuning involves providing a list of example prompt/completion pairs to the model to adjust its weights to have a bias towards tokens that appear frequently in the provided examples. As long as the examples include information from your external dataset, the fine-tuned model will generate completions that use that information.

Pros

  • You can easily generate question & answer pairs from a dataset with an LLM
  • Once fine-tuned, there’s very little work required to generate queries

Cons

  • Fine tuning is expensive
  • Fine tuning does not provide the ability to prove the provenance of responses
  • When the source of truth changes, models must be retrained on the new information
  • It’s difficult to measure the impact of fine tuning because responses are non-deterministic
  • Fine tuning tends to lead to over-fitting and the inability to handle new and unexpected user inputs. If the user asks something unexpected, a fine-tuned model is likely to hallucinate and give its best guess at what the answer should be instead of stating that it doesn’t know the answer.

Fine tuning is like sending your LLM to college — it’ll be able to answer questions with high confidence, but you have no idea where it got that information from or whether it’s correct. As a result, fine tuning is a better fit for use cases where creativity is more important than truthfulness and traceability. For example, fine tuning a foundation model to turn it into one optimized for storytelling, therapy, or to play the role of a specific character works wonderfully.

Retrieval augmented generation (RAG)

The RAG technique involves taking external data, generating embeddings from it, and uploading it to a vector database. Whenever a user has a query, you perform a semantic search on your vector database to find the data that’s most relevant to the query. Then, you feed the results of the query into the LLM’s context window, giving it access to information not present in its training data which it can summarize and use to respond to the query.

Pros

  • Easy to scale — just add more content to your vector database and your LLM automatically gets smarter
  • Can be optimized by using better embeddings and cleaner data, both of which can be improved deterministically
  • Generalizes well for unexpected input. If no relevant information can be found, it’s possible to use a different prompt or even a hard-coded response to handle edge cases gracefully.

Cons

  • LLMs powered by RAG are less creative since they constrained to responding only with the information provided their context window.
  • Context windows are limited and get shorter as the required completion gets longer.

Using RAG is like asking your LLM to do its own research. It won’t be as creative with its answers, but you can see exactly why it gave the answer it did and where the information comes from. RAG is a great fit for use cases where knowing the provenance of the information provided by your LLM is important, such as for customer support, workplace search, or other enterprise use cases.

Summary

In summary, fine-tuning involves adjusting the model’s weights based on a list of example prompt/completion pairs, while RAG involves generating embeddings from external data and performing a semantic search to find relevant information to feed into the LLM’s context window. Both methods are valuable for different use cases, but RAG really shines when it comes to high stakes use cases like those found in enterprise applications.

If you’re thinking of building ChatGPT-like applications for your company’s internal data, here’s a list of tools you can use today to help implement retrieval augmented generation.

Libraries

  • LangChain — a popular python library for chaining prompts together to build LLM agents
  • LlamaIndex — another popular python library for implementing data models that can be used to load external data into vector databases

SaaS tools

  • Sidekick — a tool that lets you sync data from your SaaS tools (Notion, Confluence, Zendesk, etc) to a vector database and query it with GPT from an API endpoint. Get started in less than 15 min and without writing a single line of code.

--

--