Autonomous Container Scaling in Kubernetes via Reinforcement Learning

Authors

  • Imran Qureshi Independent Researcher Jinnah Colony, Faisalabad, Pakistan (PK) – 38000 Author

DOI:

https://doi.org/10.63345/1mymsn26

Keywords:

Autonomous Container Scaling, Kubernetes, Reinforcement Learning, Deep Q-Learning, Autoscaling Performance, Service-Level Objectives

Abstract

Autonomous container scaling within Kubernetes environments has emerged as a crucial mechanism to guarantee both application performance and cost‐effective resource utilization under highly dynamic workloads. Traditional autoscaling solutions—most notably Kubernetes’s native Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA)—operate on static threshold‐based rules that monitor CPU and memory utilization. While straightforward to configure, these mechanisms frequently underperform in the presence of bursty traffic patterns or sudden workload shifts, resulting in oscillatory scaling behavior, frequent SLA (Service Level Agreement) violations, and unnecessary overprovisioning. Reinforcement Learning (RL), by contrast, offers a data‐driven, adaptive approach: an RL agent continuously interacts with the cluster environment, observes multidimensional system metrics, and learns an optimal scaling policy through trial and error, balancing performance objectives against resource costs. In this work, we present the design, implementation, and experimental evaluation of the “Multidimensional Pod Autoscaler” (MPA), a Deep Q‐Learning–based autoscaler integrated into Kubernetes as a custom controller. MPA’s state representation comprises percentile‐based CPU and memory metrics, request arrival rates, error rates, and current replica counts. Its action space supports both horizontal scaling (incrementing or decrementing pod replicas) and vertical adjustments (tuning CPU/memory limits), plus a no‐op option for stability. The reward function penalizes SLA breaches—defined as requests exceeding a 200 ms latency threshold—and resource overprovisioning, weighted to reflect business priorities.

We trained MPA offline on historical workload traces and then deployed it for online fine‐tuning under live traffic, comparing its performance against HPA and a heuristics‐driven Smart HPA. Experiments using both the Bookinfo microservices benchmark and a synthetic Poisson‐arrival workload generator demonstrate that MPA can increase average CPU utilization from 65% to 85%, reduce 99th‐percentile request latency by 40%, cut SLA violation rates from 5% to 1%, and achieve a 25% reduction in cloud resource costs. We provide a statistical analysis table summarizing these gains. This manuscript details the full system architecture, state and action definitions, neural network design, training methodology, and deployment strategy. We discuss practical considerations—such as safe exploration policies, integration with Prometheus metrics, and fallback mechanisms—and conclude with an in‐depth look at future research directions, including multi‐agent RL, meta‐RL transfer learning, workload forecasting integration, explainability, and extension to GPU‐aware and edge‐cloud scenarios.

Downloads

Download data is not yet available.

Downloads

Additional Files

Published

2025-03-01

Issue

Section

Original Research Articles

How to Cite

Autonomous Container Scaling in Kubernetes via Reinforcement Learning. (2025). World Journal of Future Technologies in Computer Science and Engineering (WJFTCSE), 1(1), Mar (1-9). https://doi.org/10.63345/1mymsn26

Similar Articles

11-20 of 64

You may also start an advanced similarity search for this article.