Relevance AI's two-part billing model confuses almost every new user. Here is how it works and how to stay in control.
The Two-Part Billing Model
Relevance AI charges for two things separately, and the interaction between them is what catches most users off guard:
| Cost type | What it covers | Analogy |
|---|---|---|
| Actions | Every step your agent takes (tool calls, LLM calls, data reads, outputs) | Like CPU credits -- consumed per operation |
| Vendor Credits | The underlying LLM API costs (GPT-4o tokens, Claude tokens, etc.) | Passed-through API costs with no markup |
Since September 2025, Vendor Credits have zero markup -- you pay the same as calling the API directly. Paid plans also let you bring your own API key to bypass Vendor Credits entirely, routing model costs through your own OpenAI/Anthropic account.
What Consumes Actions
Actions are consumed by every discrete step an agent takes. A single agent run can consume many Actions:
- Each LLM call the agent makes: 1 Action
- Each tool execution (web search, code runner, API call): 1 Action
- Each knowledge base search: 1 Action
- Each data transformation or enrichment step: 1 Action
An agent that does: classify query (1) -> search knowledge base (1) -> call external API (1) -> generate response (1) consumes 4 Actions per user message. A conversation with 5 back-and-forth turns: up to 20 Actions.
Audit your most-used agents with Relevance AI's execution logs to see how many Actions each run consumes. You will likely find 1-2 agent designs consuming most of your Action budget.How to Reduce Action Consumption
1. Combine tool calls where possible
Instead of an agent making separate calls for each piece of data (3 Actions), design a single tool that fetches everything needed in one call (1 Action). This requires designing tools with broader scope.
2. Use simpler agents for simple queries
Not every user message needs a full reasoning chain. Use a classifier to route simple queries (FAQ-style) to a direct knowledge base answer (2 Actions) instead of the full agent pipeline (5-8 Actions).
3. Reduce agent iterations
Agents with vague instructions tend to iterate more -- trying multiple approaches before reaching an answer. Tighter, more specific instructions reduce unnecessary iterations and Action consumption.
Bring Your Own API Key
On paid plans, you can connect your own OpenAI or Anthropic API key. This routes all model costs through your account, eliminates Vendor Credits, and makes model costs predictable and directly visible in your OpenAI/Anthropic usage dashboard.
- Go to Relevance AI Settings > Integrations > LLM Providers
- Add your OpenAI API key (or Anthropic, Cohere, etc.)
- Select this key as the default for your agent or specific tools
- Monitor costs in your OpenAI/Anthropic dashboard -- they now appear there directly
When using BYOK (Bring Your Own Key), your costs split across two bills: Actions to Relevance AI, model tokens to OpenAI/Anthropic. Set up billing alerts on both accounts.Setting Budget Limits
Relevance AI does not yet have native budget caps per agent (as of early 2026), but you can implement soft limits through monitoring:
- Set billing alerts in your Relevance AI account for Action thresholds
- Use the Relevance AI API to check Action consumption programmatically and disable agents that exceed a threshold
- For always-on agents: set a daily Action budget and monitor it via the dashboard
# Check usage via Relevance AI API
import httpx
async def get_current_usage(api_key: str, region: str) -> dict:
async with httpx.AsyncClient() as client:
response = await client.get(
f"https://api-{region}.stack.tryrelevance.com/latest/usage",
headers={"Authorization": api_key},
)
return response.json()
# Set up a monitoring cron job to alert when nearing limits
usage = await get_current_usage(api_key="your-key", region="us")
if usage.get("actions_used", 0) > 0.8 * usage.get("actions_limit", 1):
send_alert("Relevance AI Actions at 80% of monthly limit")Quick Reference
- Actions = per-step cost; Vendor Credits = model token cost (no markup on paid plans)
- Audit execution logs to find which agents consume the most Actions
- Design tools with broader scope to reduce per-run Action count
- Use BYOK (Bring Your Own Key) to route model costs through your own API account
- Set billing alerts on both Relevance AI (Actions) and OpenAI/Anthropic (tokens) when using BYOK
- Route simple queries to direct KB lookups to avoid full agent pipeline costs