Five things the docs don't tell you about MCP on APIM
Four Things the Docs Don't Tell You About MCP on APIM
Reference architectures is messy and you run into platform quirks that the official documentation completely ignores. If you are running real Model Context Protocol (MCP) workloads behind Azure API Management (APIM), reading this will hopefully help save you days of maddening debugging.
Here are the 'gotchas' you need to know.
1. The Silent Streaming Hang
Your MCP streams are hanging. You're waiting for a complete response, but nothing happens. No errors. No warnings in the logs.
Just background buffering.
It turns your streaming agent into a frustratingly slow experience. The request looks like it's dragging, but it's actually just waiting for the backend connection to close. You fix this by turning off the buffer in your policy. You are missing exactly one attribute.
<forward-request buffer-response="false" />
2. GenAI Policies Break Raw MCP Traffic
Regular GenAI gateway policies don't play well with raw MCP traffic. If you try to drop standard policies like llm-token-limit directly onto these streams, they break so you need to handle token limits and rate shaping differently when dealing with MCP payloads.
3. The 4-Minute Wall
Long-running agent loops will inevitably hit a wall, specifically, the Azure load balancer's default 4-minute idle timeout. If your agent goes quiet while "thinking" for too long, Azure drops the connection silently and you have to design your agents to handle this timeout gracefully.
4. Tool-Calling Identities: AD Token vs. JWT
Choose carefully between validate-azure-ad-token and validate-jwt. Use the AD token policy for strict internal Microsoft Entra ID routing, but rely on standard JWT validation if your tools are dealing with external or custom identities.
See the code here https://github.com/jackweldonweb/apim-mcp-terraform, check license for usage information.