What is a DoS Attack?
A Denial-of-Service (DoS) is an attack meant to shut down a machine or network, making it inaccessible to its intended users. DoS attacks accomplish this by flooding the target with traffic or sending it information that triggers a crash. There is often no financial gain to be made from this sort of activity. While the attacker could be hired or paid to perform the attack, their motivation could simply be to cause disruption to legitimate users who need to make use of that web application or service.
Keep reading for everything you need to know about denial of service, including how it impacts Kubernetes, how to identify indicators of compromise, and how to use cloud-native tools to prevent it.
What does a DoS Attack look like in Kubernetes?
DoS is often considered to be something external to a Kubernetes cluster (i.e., threat coming from external IP addresses), however, some vulnerabilities cause internal DoS. In fact, not every DoS attack is intentional. In some cases, a misconfigured deployment could send a large number of bad requests to your API, which increases the load on your nodes.
Kubernetes handles DoS attacks in a slightly different manner. Let’s say you’re running an auto-scaling service on a cloud platform like AWS. This is technically hardened against DoS attacks since the service will ensure additional resources are assigned to handle the new traffic (automatically scaling up for unexpected large volumes of traffic is one of the core reasons people choose Kubernetes as a container workload orchestrator). However, there are specific issues with this design.
If you’ve got the capacity to spare, your workloads should be able to scale up in response to a spike in traffic. If you do not have additional capacity to offer (let’s say you’re running your cloud-native workloads in an on-premise datacenter), this will surely lead to systems reaching maximum load and crashing.
Assuming you’re using a service like an auto-scaler cluster, you can provision new nodes if your capacity starts to fill up. This causes a separate issue – paying for unwanted resources just to keep your application up and available. The process of spinning-up new nodes will increase your overall AWS spending. That’s why it’s important to make sure you’ve got some hard limits and enforcement policies to prevent your AWS bill from spiraling out of control.
How to identify signs of DoS Attack in Kubernetes?
A good place to start is with the Container Network Interface (CNI) plugin in Kubernetes. There are several different CNI plugins such as Project Calico, Cilium, and Weavenet, to name a few. The CNI plugin is responsible for handling all of the web requests (north/south or east/west) from containerized workloads. Without a CNI plugin, there is no network connectivity in Kubernetes.
You can usually view the activity in IPTables, or trace the traffic requests using eBPF. However, to better understand the volume of traffic, we should send the data to an event aggregation tool like Prometheus or Grafana for correlation reporting. Alternatively, you can use Sysdig Secure to understand whether the Kubernetes API server is receiving a large volume of inbound traffic. It also correlates this with metadata, such as container name, a specific threshold crossed, etc.
Prepare for a DoS Attack in Kubernetes?
1. Configure rate-limiting for your workloads
In a Kubernetes environment, rate limiting is traditionally applied at the ingress layer, which restricts the number of requests that an external user can make into the cluster. Additionally, users should consider applying rate limits between their microservice app workloads running inside the cluster. Furthermore, you can configure rate-limiting for your service mesh or ingress controllers where applicable to prevent unusual spikes in connections between workloads that could cause the DoS incident. This should prevent any single node from using up too much bandwidth within a cluster.For rate limits, Traefik Proxy can define the limits in a generic way:
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: prod-rate-limit
spec:
rateLimit:
average: 30
burst: 50
Code language: JavaScript (javascript)
2. Run load testing to understand how well your application scales.
You’ll want to set up your application on a staging cluster and hit it with traffic. It’s a bit harder to test against Distributed DoS (DDoS) attacks, where traffic could be coming from many different IPs at once, but there are a few services out there to help with this.
Use Speedscale to intercept the calls and understand those involved in the connections. If you don’t have Speedscale already, you can sign up for a free trial here and download speedctl.
sh -c "$(curl -Lfs https://downloads.speedscale.com/speedctl/install)"
Code language: JavaScript (javascript)
3. Make use of eXpress Data Path (XDP) in either Calico or Cilium’s NetworkPolicy frameworks
XDP allows users to drop potentially malicious packets at the earliest point in the packet processing pipeline that they can conceivably be dropped at for DoS/DDoS protection. The Kernels NIC (Network Interface Card) hardware and drivers support XDP offload options. XDP provides bare metal packet processing at the lowest point in the software stac,k which makes it ideal for speed without compromising programmability.
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: dos-mitigation
spec:
selector: apply-dos-mitigation == 'true'
doNotTrack: true
applyOnForward: true
types:
- Ingress
ingress:
- action: Deny
source:
selector: dos-deny-list == 'true'
Code language: JavaScript (javascript)
Additionally, you can use network policies to restrict which pods can talk to one another. If an attacker gains access to a particular pod, but is isolated from the rest of your cluster, they won’t be able to perform lateral movement.
Outsider threats
If you have public-facing apps, there’s a real possibility that outsiders will find ways to exploit your application logic. Even if your applications are air-gapped and don’t face the public internet, all it takes is a misconfigured network policy in your cloud to suddenly open up a whole new attack vector internally.
Unfortunately, Kubernetes has no ability to fix your misconfigured code. We can learn cloud native technologies to mitigate the damage caused by a security vulnerability in your application. There are a few ways to make sure one compromised component doesn’t lead to a full-scale DDoS attack:
Network policies
Use network policies to restrict which pods can talk to one another. If an attacker gains access to a particular pod, but is isolated from the rest of your cluster, they won’t be able to advance any further.
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: deny-app-policy
spec:
namespaceSelector: has(projectcalico.org/name) && projectcalico.org/name not in {"kube-system", "calico-system", "calico-apiserver"}
types:
- Ingress
- Egress
egress:
# allow all namespaces to communicate to DNS pods
- action: Allow
protocol: UDP
destination:
selector: 'k8s-app == "kube-dns"'
ports:
- 53
Code language: JavaScript (javascript)
The above example applies a default-deny behavior to all non-system pods. The policy also allows access to kube-dns, which simplifies per-pod policies since you don’t need to duplicate the DNS rules in every policy.
Audit the security configuration of each of your workloads.
Tools like Falco will check to make sure the users are not shelling into containers as root, that they don’t have access to restricted host networks, and for other potentially insecure configurations.
- macro: container
condition: container.id != host
- macro: spawned_process
condition: evt.type = execve and evt.dir=<
- rule: run_shell_in_container
desc: a shell was spawned by a non-shell program in a container. Container entrypoints are excluded.
condition: container and proc.name = bash and spawned_process and proc.pname exists and not proc.pname in (bash, docker)
output: "Shell spawned in a container other than entrypoint (user=%user.name container_id=%container.id container_name=%container.name shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline)"
priority: WARNING
Code language: JavaScript (javascript)
Build your images using the scratch base image.
The attacker’s main goal is to communicate via a vulnerable image. If an attacker is unable to make system calls via a vulnerable image, they are stumped.
As long as vulnerable images are not running in your pipeline, the attacker cannot utilize those images. One way to ensure safe images are admitted into production is via an admission controller like Open Policy Agent (OPA). In the following example, only “Pet” owners can update “Pet” information. Additionally, only employees with JSON Web Token (JWT), can see already adopted pets:
package kubernetes.admission
import future.keywords
deny contains msg if {
input.request.kind.kind == "Pod"
some container in input.request.object.spec.containers
image := container.image
not startswith(image, "hooli.com/")
msg := sprintf("image '%s' comes from untrusted registry", [image])
}
Code language: JavaScript (javascript)
Insider Threats
As said above, not all threats are started from outside your organization.
We mentioned a simple misconfiguration of code, but it’s also possible that an irate employee with SSH access to your cluster could exfiltrate data or forcefully cause the outage. Without the right controls in place, former employees may still have access to your control plane nodes via SSH keys. Whether you’re a large enterprise organization or a small tech shop, it’s important to take steps to prevent insider threats. It’s important to establish good security practices from the very beginning, and there’s no better place than through Role-Based Access Controls (RBAC) within Kubernetes.
Role-Based Access Controls
The most important way to prevent insider threats is to avoid sharing credentials. Each user should have their own ServiceAccount (essentially a user profile in Kubernetes). We can then use RBAC to limit the scope of permissions for a ServiceAccount in Kubernetes. The worst-case scenario is if every user has access to the same ServiceAccount credentials to interact with your cluster. If an employee leaves the company, we cannot tell who made changes to the cluster in the case of an insider threat. Similarly, we cannot deactivate the user’s privileges as this would disrupt the entire organization, and a new ServiceAccount would need to be configured for the team. If we plan to audit who did what, so every action in your cluster is no longer anonymous, we need to set boundaries for each user.
Once you do have separate credentials provisioned for each individual interacting with the cluster, it’s important to adhere to the principle of least privilege. Kubernetes has built-in RBAC to help you manage who can do what. Only grant individual access to the resources and operations needed to perform the job. This will help not only in the case of a nefarious employee – it will help prevent genuine mistakes.
In the case of managed Kubernetes services, like Google’s GKE and AWS’s EKS, they tend to come with pre-packaged Identity and Access Manager (IAM) profiles and Kubernetes credentials. Open source projects like rbac-manager can also help you keep your RBAC configurations simple and manageable. Here’s an example Role in the “default” namespace that can be used to grant read access to pods:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: default
name: pod-reader
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
Code language: JavaScript (javascript)
Best practices with Falco
While all of these open-source tools can be used interchangeably to prevent DoS attacks in Kubernetes, it’s important that we always have rules configured for Digital Incident Response & Forensics (DFIR). Falco can identify the signs of compromise that lets users take action against potential DoS attacks.
- Look for unexpected outbound connections from a compromised workload:
Falco can observe outbound network traffic from elasticsearch on a port other than the standard ports used by Elasticsearch (as an example):
- macro: outbound
condition: syscall.type=connect and evt.dir=< and (fd.typechar=4 or fd.typechar=6)
- macro: elasticsearch_cluster_port
condition: fd.sport=9300
- rule: elasticsearch_unexpected_network_outbound
desc: outbound network traffic from elasticsearch on a port other than the standard ports
condition: user.name = elasticsearch and outbound and not elasticsearch_cluster_port
output: "Outbound network traffic from Elasticsearch on unexpected port (connection=%fd.name)"
priority: WARNING
Code language: JavaScript (javascript)
- Observe outbound connection attempts from restricted Namespaces:
We can monitor outbound connections to internal networks (rfc1918), which are involved in the majority of enterprise cases. We can do this by defining our own macro and rule:
- macro: outbound_corp
condition: >
(((evt.type = connect and evt.dir=<) or
(evt.type in (sendto,sendmsg) and evt.dir=< and
fd.l4proto != tcp and fd.connected=false and fd.name_changed=true)) and
(fd.typechar = 4 or fd.typechar = 6) and
(fd.ip != "0.0.0.0" and fd.net != "127.0.0.0/8") and
(evt.rawres >= 0 or evt.res = EINPROGRESS))
- list: k8s_not_monitored
items: ['"green"', '"blue"']
- rule: kubernetes outbound connection
desc: A pod in namespace attempted to connect to the outer world
condition: outbound_corp and k8s.ns.name != "" and not k8s.ns.label.network in (k8s_not_monitored)
output: "Outbound network traffic connection from a Pod: (pod=%k8s.pod.name namespace=%k8s.ns.name srcip=%fd.cip dstip=%fd.sip dstport=%fd.sport proto=%fd.l4proto procname=%proc.name)"
priority: WARNING
Code language: JavaScript (javascript)
- Specify trusted vs. untrusted domains for outbound connections:
In the example below, we have defined a list of trusted domain names (sysdig.es, github.com, and google.com). Any network connection to an IP address that isn’t resolved by any of these domain names will trigger the policy. This is great for telling Falco what’s allowed and being told about everything else.
- list: trusted_domains
- rule: Unexpected outbound network connection
desc: Detect outbound connections with destinations not on allowed list
condition: >
outbound
and not (fd.sip.name in (trusted_domains))
output: Unexpected Outbound Connection
(container=%container.name
command=%proc.cmdline
procpname=%proc.pname
connection=%fd.name
servername=%fd.sip.name
serverip=%fd.sip
type=%fd.type
typechar=%fd.typechar
fdlocal=%fd.lip
fdremote=%fd.rip)
priority: NOTICE
Code language: JavaScript (javascript)
Conclusion
Cloud-native and Kubernetes are built on a platform of transparency and open source. We hope that the above examples give you an idea of the possible vectors of attack that can lead to a DoS incident, as well as the free, open source tools that can be used to mitigate a DoS attack.
DoS Mitigation is not one size fits all. Thus, you must continuously monitor for risks across the container lifecycle, while also ensuring:
- Your network policies apply a zero-trust architecture.
- Vulnerability databases stay up-to-date to detect known vulnerabilities.
- Configurations get scanned at the earliest point in your CI/CD pipeline to ensure that you continue to adhere to best practices for container security as your environment evolves.
As for rate limiting, this is a MUST DO. You can rate limiting your ingress controllers as well as resource limits within the container. To avoid excess spending in your cloud provider, you can configure billing limit quotas on your AWS account, as an example.
Aside from mitigation, it’s also important to be able to identify the patterns or behaviors leading to the breach. Falco can alert users on connections to known bad IPs from C2 servers, connections to unwarranted IPs/FQDNs, as well as connections made on ports or protocols that go off to the baseline design of our application architecture. Using each of those technologies, you should have a good understanding of how a DoS attack can be mitigated using cloud-ative technologies.