For cloud engineers and DevOps professionals managing applications on Azure Kubernetes Service (AKS), one of the most common yet overlooked issues is SNAT (Source Network Address Translation) port exhaustion. This problem can severely impact outbound connectivity. In particular, it hits hard when applications create a high number of simultaneous connections to a small set of external destinations — such as databases or third-party APIs.

In this article, you will learn what SNAT port exhaustion is, how to detect it, and step-by-step methods to fix it using Azure CLI. You will also learn how to increase outbound ports and frontend IPs to keep your AKS cluster stable under heavy workloads.

Introduction

When SNAT ports run out, your applications can no longer open outbound connections. As a result, you get downtime, timeouts, and performance degradation — often at the worst possible moment. Fortunately, Azure gives you direct control over outbound rules, allocated ports, and frontend IPs on your AKS load balancer. In this guide, we walk through how to identify the problem and fix it step by step.

What Is SNAT Port Exhaustion?

To understand the problem, it helps to know how outbound traffic flows in AKS. Each outbound connection from a pod uses a combination of the pod’s IP address and a port number. SNAT then translates those internal addresses and ports to the load balancer’s public IP and a different port number.

However, the load balancer only has a limited number of ports available for this translation. When your app opens many connections to the same destination IP — for example, a database — it consumes those ports quickly. If all ports are used up, new connections fail. That failure is called SNAT port exhaustion.

💡 In simple terms: think of SNAT ports like phone lines. If all lines are busy, no new calls can get through — even if your app keeps trying.

Real-Life Scenario: SNAT Port Exhaustion in AKS

Imagine you have a high-traffic frontend application running on AKS. It connects to a database hosted on a public IP address. Over time, users start reporting intermittent connectivity errors. After investigating, you discover that the root cause is SNAT port exhaustion — the app has run out of outbound ports to open new connections to the database.

This is a very common pattern in production AKS clusters. Moreover, it is easy to miss because the app appears to work fine under normal load. Only under peak traffic does the exhaustion show up.

How to Identify SNAT Port Exhaustion

Before making any changes, first confirm that SNAT port exhaustion is the actual root cause. Specifically, look for these symptoms:

  • Connection failures: Applications fail to open new outbound connections, even though the destination is reachable.
  • Timeout errors: Connections time out because no SNAT port is available to complete the handshake.
  • Intermittent connectivity: Everything works at low traffic but fails under heavy load — a classic sign of port exhaustion.
  • Azure Monitor alerts: The metric SNATPortExhaustion appears in your Load Balancer metrics dashboard.

To confirm the issue, check your application logs for connection refused or timeout errors. In addition, open the Azure Portal → Load Balancer → Metrics and monitor the SNAT Connection Count and Used SNAT Ports metrics. A spike close to your allocated limit confirms exhaustion.

⚠️ Do not skip this step. Increasing ports without confirming the root cause wastes resources and may not solve the actual problem.

Once confirmed, you need to take two actions:

  • Identify the current outbound rule configuration on the AKS load balancer.
  • Increase the allocated outbound ports and frontend IPs to handle more simultaneous connections.

Using Azure CLI to Fix SNAT Port Exhaustion in AKS

The following steps use Azure CLI to inspect and fix the outbound configuration on your AKS cluster’s load balancer. Follow each step in order.

Step 1: Get the Node Resource Group

First, you need to find the node resource group. In AKS, the underlying infrastructure — including VMs and load balancers — is managed in a separate, auto-generated resource group. Run this command to get its name:

What each part does:

  • az aks show — retrieves details about your AKS cluster.
  • –resource-group myResourceGroup — the resource group where your AKS cluster lives.
  • –name myAKSCluster — the name of your AKS cluster.
  • –query nodeResourceGroup — extracts only the node resource group name from the response.
  • -o tsv — outputs the result as plain text, ready to use in the next command.

Step 2: List Current Outbound Rules

Next, inspect the current outbound rule configuration on your load balancer. This shows you exactly how many ports are currently allocated and how many frontend IPs are in use:

What each part does:

  • az network lb outbound-rule list — lists all outbound rules for the specified load balancer.
  • –resource-group $NODE_RG — uses the node resource group from Step 1.
  • –lb-name kubernetes — targets the load balancer (always named kubernetes in AKS by default).
  • -o table — formats the output as a readable table so you can easily spot the current port allocation.

SNAT Port Exhaustion AKS outbound rules table

Review the output carefully. Look at the AllocatedOutboundPorts and FrontendIPConfigurations columns. If the allocated ports are low and your app is high-traffic, that is your problem confirmed.

Step 3: Increase Outbound Ports and Frontend IPs

Now that the problem is confirmed, fix it by increasing the allocated outbound ports and the number of frontend IPs. More frontend IPs means more total SNAT ports available across the load balancer. Run the command below — adjust the values to match your cluster name and resource group:

What each part does:

  • az aks update — updates the configuration of an existing AKS cluster.
  • –load-balancer-managed-outbound-ip-count 7 — increases the number of managed outbound IPs to 7. Each IP adds 64,000 available SNAT ports.
  • –load-balancer-outbound-ports 2000 — sets the number of outbound ports allocated per node to 2,000. Adjust this based on your expected connection volume.

📊 Example calculation: 7 IPs × 64,000 ports per IP = 448,000 total SNAT ports available. That is a significant increase over the default configuration.

For a real production cluster, replace the placeholder values with your actual cluster details:

Step 4: Verify the Fix

After applying the update, verify the new configuration by running the outbound rule list command again:

Confirm that the AllocatedOutboundPorts and FrontendIPConfigurations columns now reflect your new values. In addition, monitor your Azure Load Balancer metrics over the next 30–60 minutes. Specifically, watch the Used SNAT Ports metric — it should stay well below your new limit.

How to Choose the Right Port Count

Choosing the right values depends on your cluster size and traffic pattern. Here is a simple way to calculate what you need:

  • Ports per node = Total SNAT ports ÷ Number of nodes. For example, 7 IPs × 64,000 = 448,000 ports ÷ 10 nodes = 44,800 ports per node.
  • Start conservative — set --load-balancer-outbound-ports to 1,000 or 2,000 per node and monitor the metrics.
  • Scale up gradually — if Used SNAT Ports still approaches the limit under peak load, increase the IP count or port count further.

⚠️ Important: Setting –load-balancer-outbound-ports too high reduces the number of ports available per backend instance. Always test in a staging environment before applying to production.

Conclusion

For cloud engineers and DevOps teams, managing SNAT port exhaustion in AKS is a critical part of keeping applications reliable under load. By using simple Azure CLI commands, you can identify your current outbound rule configuration, confirm whether exhaustion is occurring, and increase allocated ports and frontend IPs to fix it.

In summary, the fix involves three steps — get the node resource group, inspect the outbound rules, and update the load balancer configuration. Furthermore, adding more managed outbound IPs is the most scalable solution. With 7 IPs, for example, you unlock 448,000 SNAT ports — more than enough for most high-traffic workloads.

Most importantly, always monitor your Used SNAT Ports metric in Azure Monitor after applying changes. Consequently, you will catch any future exhaustion early — before it impacts your users.

Quick Reference: Commands Used in This Guide

  • Get node resource group: az aks show ... --query nodeResourceGroup
  • List outbound rules: az network lb outbound-rule list ... -o table
  • Increase outbound ports and IPs: az aks update ... --load-balancer-managed-outbound-ip-count 7 --load-balancer-outbound-ports 2000
  • Verify the fix: Re-run the outbound rule list and check Azure Monitor metrics.