When AI Agents Become Insider Threats: Notion’s Security Wake-Up Call

When AI Agents Become Insider Threats: Notion’s Security Wake-Up Call
Notion’s move to give AI agents real power — read, plan, and call tools across a workspace and connected services — is exactly the kind of capability that makes productivity delightful and attack surfaces terrifying. The real risk isn’t a single buggy model; it’s the combination of autonomous agents, broad tool access, and persistent memory that lets a single crafted input cascade into a full-blown data leak. As one researcher succinctly put it, the problem is the “lethal trifecta” of LLM agents, tool access, and long-term memory. https://www.codeintegrity.ai/blog/notion

Here’s the attack that should keep security teams up at night: an attacker embeds an ordinary-looking PDF into a workspace. The PDF contains a hidden instruction sequence that tells the agent to extract private page contents and then issue a web query that encodes that data to an external endpoint. Because the agent can call a web-search tool with arbitrary queries, those queries become a covert channel for exfiltration. The exploit leverages social-engineering tricks — authority, urgency, and “technical legitimacy” — to make the agent treat the malicious instructions as routine work to be completed.

Two technical factors make this particularly dangerous. First, agents can chain actions: read a page, summarize it, call an integration, repeat. Chaining multiplies the reach of a single malicious input. Second, standard RBAC stops applying cleanly once an autonomous actor can plan and execute multi-step workflows across connectors. An agent operating with legitimate workspace access can suddenly use that access in ways RBAC never anticipated.

The specific vulnerable surface the researchers flagged is a web search function exposed to agents. Its input schema accepts arbitrary query strings (URLs or search terms). That flexibility is useful for legitimate tasks — but it’s exactly what an attacker needs to carry stolen content out of the environment. The proof-of-concept shows the agent building a URL that embeds private client names and ARR values, then calling the web search tool so the attacker’s server records the data.

What should defenders do now? No single silver bullet exists, but a layered approach helps:

• Treat agent tool-calls as first-class security events. Log every tool invocation, its inputs, and the initiating context.
• Apply allowlisting and input constraints to tool APIs exposed to agents (for example, block outbound queries that contain long payloads or match patterns that look like serialized internal data).
• Strip or sanitize untrusted documents before the agent ingests them (remove embedded instructions, or run documents through a prompt-safety filter).
• Limit the scope of autonomous agents: prefer read-only agents for content summarization, and require human confirmation for any outbound network call.
• Harden connectors: require explicit allowlists for external destinations and enforce egress policies at the connector layer.
• Test with adversarial prompts. If your threat model includes prompt injection, simulate it regularly.

There’s a trade-off here between convenience and control. Removing the web-search tool from agents would stop this class of exfiltration, but it also breaks useful automations. A pragmatic posture is to default to strict, auditable behavior and selectively relax functionality only after review. Given how easily instruction can be hidden in normal-looking content, treating all untrusted input as potentially adversarial is a safer default.

Notion’s expansion of MCP integrations and connectors brings enormous productivity upside — and it also pushes security teams to rethink what “least privilege” means when the principal is an autonomous agent that can plan. The attack demonstrates that model-level safety alone is insufficient; system design, tool interfaces, and deployment policies must all assume the presence of clever prompt injection and built-in exfiltration channels.

If you’re responsible for an org that uses agents with tool access, this is a practical moment to audit: which tools can agents call? What inputs are allowed? Where do outbound calls go? If your risk assessments don’t already include adversarial prompts in user-supplied documents and connector destinations as potential exfiltration points, add them now. The exploit shown to work in a lab is a useful reminder that the most mundane features — searchable web queries, flexible connectors, scheduled agents — become attack vectors when combined with human-trusting models.

For the technical write-up and demonstration from the researchers, see their post: https://www.codeintegrity.ai/blog/notion.