LLMs give you the flexibility of natural languages to express your query and format your answers.
But this comes with the same flaws as human interactions: fuzzy interpretations and lack of truthfulness.
By plugging a database before/after the query/answers, one can build more than necessary safeguards.
An example is Question & Answering with a vector database:
– Use a SBERT model to vectorize the query
– Retrieve some context from the query in the database with cosine similarity
– Build a prompt from the query and context
– Send the prompt to the QnA model
In that example, the context is retrieved from a previously indexed database.
For instance from WooCommerce products descriptions and attributes.
This limits the answers to a subset of the model’s universe, the one indexed in the database.
A step further is to fine-tune the model.
And another step further is to fully train the model, including the tokenizer: this will restrict even more answers with a limited vocabulary (for safety or specialised domains like mathematics or chemistry or finance or legal).