MCP in LLMs Allows Major Exploits: Security Audit Exposes Critical Flaws

Related Blogs
Keerthi
Keerthi Ganesh

AI/ML Developer

5 min read

Introduction: The Rise of Agentic AI and the Need for Secure Integration

As AI agents grow more sophisticated and interconnected, the infrastructure supporting them must scale in complexity and security. A key innovation addressing this need is the Model Context Protocol (MCP) — a standardized framework from Anthropic that facilitates consistent communication between large language models (LLMs) and external APIs, services, and tools.

The MCP is envisioned as the “USB-C port of AI,” offering a plug-and-play experience that simplifies the orchestration of complex, agentic workflows. Services like Claude Desktop, OpenAI Agents, slack have already adopted the protocol, and its open-source repository has exploded in popularity.

However, with such power and flexibility come equally potent risks. This blog explores the findings of a groundbreaking study— MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits by Brandon Radosevich and John Halloran — that reveals how MCP-enabled LLMs can be exploited to perform malicious actions, posing significant security threats to developers and users alike.

The Problem: LLMs Can Be Tricked into Malicious Behavior via MCP

The Attack Surface

The paper outlines how attackers can exploit the tooling and access levels granted through MCP to manipulate LLMs into executing harmful behaviors. These threats fall into three main categories:

  1. Malicious Code Execution (MCE): Injecting scripts or commands into system-critical files to establish backdoors or persistent threats.
  2. Remote Access Control (RAC): Granting unauthorized users remote login privileges, such as through SSH key injection.
  3. Credential Theft (CT): Stealing API keys, access tokens, and sensitive environment variables — often silently.

These attacks are made more dangerous because they don’t require direct interaction with the LLM; indirect access via retrieval tools or improperly sanitized inputs can be enough.

How Do These Attacks Work?

1. Malicious Code Execution (MCE)

MCP allows LLMs to interface directly with a file system. Attackers can exploit this by instructing the model to inject commands (e.g., netcat) into shell config files like .bashrc. When a terminal session is opened, the payload executes, creating a covert access point.

End-to-End MCE attack

In Figure (a), Claude inserts a netcat command into the victim’s bash run configuration file, setting up a listener on port 4444 that activates each time the victim launches a new terminal session. In Figure (b), the attacker waits for incoming connections by listening on port 4444. When the victim opens a terminal and the connection is established, as shown in Figure (c), the attacker gains unauthorized system access.

End-to-End MCE attack

Claude Example:

  1. Encoded Attack: Claude decodes octal-encoded commands and halts execution, citing security risks.
  2. Plaintext Attack: When presented in a straightforward form, Claude executes it without adequate objection.

Implication: In shared environments such as offices, coworking spaces, or labs, this could lead to unauthorized shell access, allowing lateral movement or data exfiltration.

2. Remote Access Control (RAC)

By leveraging MCP’s file-writing capabilities, LLMs can add an attacker’s public SSH key to the victim’s authorized_keys file. Once embedded, the attacker can access the system remotely with minimal effort.

Refused and completed RAC attacks

Claude’s safety measures are activated by a RAC attack (Figure a), where Claude identifies the potential security threats involved in the request. Despite this, Claude proceeds to execute the steps of the RAC attack (Figure b).

Remote Access Control (RAC)
  • Guardrail Weakness: Claude may flag suspicious phrasing, but a subtle rewording often bypasses its safety systems.
  • Llama-3.3-70B-Instruct typically completes such requests unless keywords like “backdoor” or “break into” are explicitly used.

Key Concern: Attackers don’t need to be sophisticated—just creative in prompt phrasing.

3. Credential Theft (CT)

Using tools like Slack MCP servers or environment variable access, LLMs can retrieve sensitive keys (OpenAI, AWS, Huggingface, etc.) and post them in public or internal Slack channels.

Successful RADE attack for CT

Using a vector database containing CT instructions centered on the theme of “MCP,” Claude is directed to locate entries related to the MCP and carry out associated tasks. Claude follows these instructions, successfully executing a RAC attack and granting attackers access to the target system. The conversation has been shortened for clarity; the full exchange is shown in Figure a and b.

RADE attack for CT

Real-World Example: Claude used Slack MCP tools to exfiltrate credentials to a shared company Slack channel. These credentials became visible to every team member, drastically increasing the risk of exposure or abuse.

Beyond Direct Prompts: RADE Attacks

The authors introduce a novel class of indirect attacks: Retrieval-Agent Deception (RADE).

What is a RADE Attack?

An attacker creates a public file with hidden MCP commands related to a specific theme (e.g., "MCP"). This file ends up on the victim's system and is added to a vector database (using a Chroma MCP server). When a user queries the database using Claude, the LLM retrieves and executes the malicious commands.

RADE attack for CT

Why Is This Dangerous?

  1. No direct prompt required — attackers can embed commands in files shared online or via third-party tools.
  2. Autonomous execution — LLMs follow prompts to query, retrieve, and execute, unaware of the embedded payloads.

Example: Claude retrieves a file with embedded export commands for API keys and posts them to Slack using MCP tools. No malicious prompt needed — just a user querying “MCP.”

McpSafetyScanner: A Proactive Defense Tool

To address these vulnerabilities, the researchers introduced McpSafetyScanner, the first multi-agent framework designed to audit and secure MCP servers.

How It Works (Three-Agent Architecture):

  1. Hacker Agent
    Simulates an attacker probing the MCP setup to find weaknesses.
  2. Auditor Agent
    Cross-references discovered vulnerabilities with public knowledge bases (arXiv, Hacker News, etc.) and recommended remediations.
  3. Supervisor Agent
    Oversees the scanning process and produces a detailed, actionable security report.
Three-Agent Architecture

Broader Impact and Future Directions

The MCP has the potential to revolutionize agentic AI workflows by offering seamless integration and interoperability. However, this paper is a strong signal to the community: convenience must not come at the cost of security.

Key Insights:

  • LLMs can be manipulated in subtle, non-obvious ways.
  • Current safety systems are inconsistent and fallible.
  • Proactive, layered defense mechanisms (like McpSafetyScanner) are essential.

What's Next?

  • Open-source release of McpSafetyScanner.
  • Continuous community-led auditing and improvement of MCP integrations.
  • Integration of automated scanning into AI deployment pipelines.

Conclusion

MCP is poised to be a cornerstone in the future of AI development, enabling dynamic and flexible AI systems. But this flexibility comes with significant risk.

This study confirms:

  • LLMs can be tricked into dangerous actions with or without explicit prompts.
  • Guardrails alone are not sufficient.
  • Tools like McpSafetyScanner are essential to maintaining security in complex agentic architectures.

As the foundation of AI systems shifts from isolated models to interconnected agents, it's imperative we secure the plumbing — because the smartest AI means little if it’s also the easiest to exploit.

Read our blog on ‘Implementing Anthropic’s Model Context Protocol (MCP) for AI Applications and Agents’.

At Bluetick Consultants, we don't just build AI — we secure it. Our team specializes in developing robust, enterprise-grade models with security at the core. We ensure your AI systems are not just intelligent, but resilient.

Reference

Brandon Radosevich, John Halloran, "MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits", arXiv, 2024.

Security blind spots in AI models can cost more than just performance. Let us help you build safe, reliable, and future-ready AI solutions.

Talk to our AI Security Experts at Bluetick Consultants.

Back To Blogs


contact us