Agent context and knowledge
|
LLMs are trained to generate plausible text to complete a prompt. While many advances in training these models have been made, there is no general guarantee that the model provides not only plausible, but also factually correct completions. To improve this, you can provide extra knowledge to the LLM within its "context window". |
Context management
Last N-Messages
Configure how many of the "old" messages in a conversation thread are resent to the LLM when generating a new response. Use a low number to save token costs, use a higher number to give the LLM more context of the conversation.
| When compaction is enabled, Last N-Messages is disabled. Compaction takes over managing context limits automatically. |
Compaction
Enable compaction to automatically summarize older parts of a conversation thread when token usage approaches the model’s context window limit. This allows long-running threads to continue without losing important context, and is an alternative to limiting history via Last N-Messages.
When compaction is triggered, all messages since the last compaction boundary are summarized into a compact log. The summary is injected at the start of the next request in place of the raw message history.
Configure the following when compaction is enabled:
- Model
-
The model used to generate the conversation summary. If the configured model fails to load, the agent’s own model is used as a fallback.
- Context Window (tokens)
-
The context window size of the model in tokens. Used together with the threshold to determine when compaction is triggered.
- Threshold (%)
-
When the latest response’s total token usage reaches this percentage of the configured context window, compaction is triggered automatically before the next request.
Pre-processing script
Use a pre-processing script to dynamically extend the agent’s system instruction at runtime. The script runs on every agent invocation, before the LLM is called, and its output is appended to the static system instruction you configured.
This is useful when the context you want to provide to the agent is not known at design time. For example, fetching user-specific data, resolving runtime variables, or injecting content from an external source.
The script receives the following data in the variable payload:
{
userInfo: {
id: string,
username: string,
name: string,
email: string,
language: string,
}, // the logged-in user
input: string, // the request from the user
variables: Record<string, string>, // the variables
threadID: string, // the thread ID
agentConfig: {
model: string, // the model uuid
temperature: number, // the temperature
tools: ai_tool[], // the AI tools configured for the agent
},
}
Assign a string value to the variable result in the script. This string is
appended to the system instruction before the request is sent to the LLM.
const { userInfo } = payload;
const DOCTORS = [
"f705a4dc-1546-489a-9780-b8c464257df7",
"3c9e1a02-84fb-4d1e-b77a-df012e567890",
];
result = DOCTORS.includes(userInfo.id)
? `The user is a medical professional (${userInfo.name}). You may discuss clinical details, diagnoses, and treatment options.`
: `The user is a patient (${userInfo.name}). Keep responses simple and avoid clinical terminology.`;
Vector search configurations
Select one or more vector-enabled tables as passive knowledge sources. On every user message, a semantic similarity search runs automatically against the selected tables and the matching results are injected into the agent’s context, with no tool call or agent decision required.
Configure Max. number of supplied contexts and Similarity Threshold to control how many results are injected and how closely they must match.
Similarity scores are not universally calibrated. A score of 0.7 with
one embedding model may be a strong match; with another it may be noise. Test
your dataset and tune the threshold for your specific case.
|
| Use Vector Search Configurations for unstructured data such as instruction manuals, knowledge base articles, and support notes, where the agent should always have relevant context injected automatically. For structured data with discrete fields (sales orders, inventory, records), use an AI tool of type Table Definition instead. The agent calls it on demand and can filter, paginate, and write back. |
For full control over the similarity search, use a Server Script tool and call
the findSimilar function inside it.
The agent generates the search input itself based on the conversation, and you
control exactly what is queried and returned. This lets you implement a custom
semantic search strategy, add WHERE filters, combine results from multiple
tables, or post-process matches before they reach the agent.
|
| Vector Search Configurations fire silently on every message. A Server Script tool fires only when the agent decides it is needed. |