Agent context and knowledge

LLMs are trained to generate plausible text to complete a prompt. While many advances in training these models have been made, there is no general guarantee that the model provides not only plausible, but also factually correct completions.

To improve this, you can provide extra knowledge to the LLM within its "context window".

Context management

Last N-Messages

Configure how many of the "old" messages in a conversation thread are resent to the LLM when generating a new response. Use a low number to save token costs, use a higher number to give the LLM more context of the conversation.

When compaction is enabled, Last N-Messages is disabled. Compaction takes over managing context limits automatically.

Compaction

Enable compaction to automatically summarize older parts of a conversation thread when token usage approaches the model’s context window limit. This allows long-running threads to continue without losing important context, and is an alternative to limiting history via Last N-Messages.

When compaction is triggered, all messages since the last compaction boundary are summarized into a compact log. The summary is injected at the start of the next request in place of the raw message history.

Configure the following when compaction is enabled:

Model: The model used to generate the conversation summary. If the configured model fails to load, the agent’s own model is used as a fallback.
Context Window (tokens): The context window size of the model in tokens. Used together with the threshold to determine when compaction is triggered.
Threshold (%): When the latest response’s total token usage reaches this percentage of the configured context window, compaction is triggered automatically before the next request.

Pre-processing script

Use a pre-processing script to dynamically extend the agent’s system instruction at runtime. The script runs on every agent invocation, before the LLM is called, and its output is appended to the static system instruction you configured.

This is useful when the context you want to provide to the agent is not known at design time. For example, fetching user-specific data, resolving runtime variables, or injecting content from an external source.

The script receives the following data in the variable payload:

{
    userInfo: {
        id: string,
        username: string,
        name: string,
        email: string,
        language: string,
    }, // the logged-in user
    input: string, // the request from the user
    variables: Record<string, string>, // the variables
    threadID: string, // the thread ID
    agentConfig: {
        model: string, // the model uuid
        temperature: number, // the temperature
        tools: ai_tool[], // the AI tools configured for the agent
    },
}

Assign a string value to the variable result in the script. This string is appended to the system instruction before the request is sent to the LLM.

const { userInfo } = payload;

const DOCTORS = [
    "f705a4dc-1546-489a-9780-b8c464257df7",
    "3c9e1a02-84fb-4d1e-b77a-df012e567890",
];

result = DOCTORS.includes(userInfo.id)
    ? `The user is a medical professional (${userInfo.name}). You may discuss clinical details, diagnoses, and treatment options.`
    : `The user is a patient (${userInfo.name}). Keep responses simple and avoid clinical terminology.`;

Vector search configurations

Select one or more vector-enabled tables as passive knowledge sources. On every user message, a semantic similarity search runs automatically against the selected tables and the matching results are injected into the agent’s context, with no tool call or agent decision required.

Configure Max. number of supplied contexts and Similarity Threshold to control how many results are injected and how closely they must match.

Similarity scores are not universally calibrated. A score of 0.7 with one embedding model may be a strong match; with another it may be noise. Test your dataset and tune the threshold for your specific case.

Use Vector Search Configurations for unstructured data such as instruction manuals, knowledge base articles, and support notes, where the agent should always have relevant context injected automatically. For structured data with discrete fields (sales orders, inventory, records), use an AI tool of type Table Definition instead. The agent calls it on demand and can filter, paginate, and write back.

For full control over the similarity search, use a Server Script tool and call the findSimilar function inside it. The agent generates the search input itself based on the conversation, and you control exactly what is queried and returned. This lets you implement a custom semantic search strategy, add WHERE filters, combine results from multiple tables, or post-process matches before they reach the agent.

Vector Search Configurations fire silently on every message. A Server Script tool fires only when the agent decides it is needed.

Agent context and knowledge

Context management

Last N-Messages

Compaction

Pre-processing script

Vector search configurations

Related topics