Posts by tags
  • Popular
  • Kubernetes 72
  • tools 24
  • databases 24
  • migrations 13
  • observability 12
  • A-Z
  • AIOps 1
  • ARM 1
  • AWS 3
  • benchmarking 2
  • best practices 7
  • business 4
  • caching 3
  • Calico 1
  • Cassandra 2
  • Ceph 5
  • cert-manager 1
  • CI/CD 9
  • CLI 4
  • ClickHouse 3
  • CNI 2
  • CockroachDB 1
  • comparison 9
  • databases 24
  • eBPF 2
  • Elasticsearch 5
  • etcd 4
  • failures 11
  • FinOps 1
  • Fluentd 1
  • GitLab 4
  • Helm 5
  • hyperconvergence 1
  • Ingress 3
  • Kafka 2
  • Keycloak 1
  • KeyDB 3
  • Kubernetes 72
  • Kubernetes operators 11
  • Linux 4
  • logging 5
  • Logstash 1
  • market 5
  • memcached 1
  • migrations 13
  • MongoDB 2
  • MySQL 2
  • networking 7
  • nginx 1
  • observability 12
  • Palark 7
  • PHP 1
  • PostgreSQL 10
  • Prometheus 4
  • Python 4
  • RabbitMQ 1
  • Redis 4
  • Rook 3
  • security 7
  • serverless 2
  • software development 2
  • SSL 1
  • storage 10
  • success stories 2
  • Terraform 3
  • tools 24
  • troubleshooting 8
  • Vault 1
  • Vector 2
  • virtualization 1
  • VPN 1
  • werf 3
  • YAML 2
  • ZooKeeper 1

Kubecost with AWS integration: Implementing and automating with Terraform

In this article, we are going to take a look at the things Kubecost is capable of and how it integrates with AWS. We’ll also cover a case study in which we were able to help our customer take advantage of these features and use Terraform to automatically deploy everything they needed.

The need for cost monitoring

Managing infrastructure costs is an important component of workflows in today’s IT companies. With infrastructure costs on the rise, businesses need to be able to see which infrastructure components contribute the most to their bottom line. On top of that, they need to be aware of how the number of their users continues to rise to adjust the budgets accordingly. The additional level of abstraction Kubernetes clusters bring about renders this task more challenging.

So how does one go about addressing this problem? For starters, cloud providers provide their own cost calculation services. According to CNCF’s FinOps survey published at the end of 2023, AWS Cost Explorer, GCP Cost Tools, and Azure Cost Management ranked as the top-choice Kubernetes cost monitoring tools.

In practice, however, cloud providers often fail to provide in-depth cost analysis even for the Kubernetes clusters they manage themselves. For example, AWS Billing allows you to calculate cluster costs for EC2 instances but not for EKS cluster entities. To make matters worse, the data is only made available after a 24-hour delay.

Thus, the market is coming up with third-party offerings to effectively address this problem. One of the most prominent tools on the market is Kubecost (even AWS recommends it).

Kubecost features overview

Kubecost is a Kubernetes cluster cost management tool capable of tracking application and infrastructure costs in real-time. It allows you to calculate costs per namespace, deployment, and even containers, with the bottom line broken down by CPU/memory/disk. Its networking cost calculations may come in handy as well.

On top of featuring integration with major cloud providers (AWS, Azure, GCP, and a number of others), Kubecost also features cost management for on-premises K8s clusters — allowing you to manually enter the pricing model to use (note: this is a paid option in the Enterprise edition). In the case of AWS, you can also calculate the cost of resources that are not part of a K8s cluster.

In addition to directly monitoring resource costs, Kubecost features a wide array of reports on how to optimize them: from picking the best requests/limits in Kubernetes to choosing which types of instances to use. You can even roughly calculate the cost of resources that are not yet running in the cluster using the kubectl-cost plugin.

There is a public demo available to evaluate Kubecost’s functionality without having to install it.

In the Enterprise version of Kubecost, along with support and the ability to use more than 250 CPUs in a cluster, customers also get to enjoy such handy features as saved reports, auditing, multi-cloud setups, native SAML/OIDC, and Kubecost Cloud (SaaS with Web UI and data storage with unlimited retention).

There is also an Open Source satellite for Kubecost — OpenCost, a CNCF Sandbox project with source code distributed under the Apache 2.0 license.

Our practical Kubecost & AWS use case

One of our customers decided on Kubecost for comprehensive Kubernetes cost monitoring for their AWS-hosted clusters. Where does Kubecost collect cost data on the resources for a selected cloud provider? Essentially, the data is obtained by querying the provider API and applying it against cluster metrics. For further technical details on this process, you can refer to the product documentation – Cloud Billing Integrations and AWS Cloud Billing Integration.

Now, let’s look at our real-life case to see how this plays out in practice. We had:

  • multiple AWS EKS clusters running in different regions;
  • spot instances that were actively used in those clusters;
  • Karpenter, an Open Source node autoscaler for Kubernetes;
  • a large number of various server software in the clusters;
  • Terraform, which was actively used to manage resources.

What was our customer aiming for in adopting Kubecost?

  1. The application heavily used spot instances in multiple dev environments within the same region, in addition to multiple prod environments in different regions. The customer wanted to calculate the costs related to various application and infrastructure components as well as networking.
  2. For more accurate cost estimation, the additional components for integrating Kubecost with AWS had to be installed in a fully automated fashion. Those components included: a) a Cost and Usage Report; b) Amazon Athena; c) a Spot Instance data feed; d) S3 buckets and IAM policies.

So, how would we go about accomplishing those goals? Let’s dive in!

Implementation

First of all, we built two Terraform modules to automate component deployment:

  1. aws-kubecost-iam handled the IAM configuration for each of the EKS clusters;
  2. aws-kubecost-athena created the necessary AWS resources (buckets + CUR + Spot feed + Athena) within a single account/region. Resources were shared across multiple clusters (dev).

Finally, we used Kubecost’s official Helm chart to deploy it.

Let’s have a look at all these steps in detail.

1. Terraform module #1: aws-kubecost-iam

Let’s start with the Terraform modules. Their source code is available in our examples GitHub repository.

The first module (find source code here) uses EKS with permissions granted via sts:AssumeRoleWithWebIdentity and eks_oidc_issuer to the Kubernetes service account. Kubecost, for its part, only sets ARN roles in the service account’s eks.amazonaws.com/role-arn annotation. Please, refer to the EKS documentation to learn more about how this works.

If you don’t use EKS, the role/policies can be applied in the EC2 instance profile for Kubernetes nodes. You can also use kube2iam and iam.amazonaws.com/role annotations on Kubecost Pods for more granular role assignments.

2. Terraform module #2: aws-kubecost-athena and AWS resources

The second module (find source code here) creates the following AWS resources:

1. AWS Cost and Usage Report (CUR): A daily cloud infrastructure cost report. Kubecost uses it to acquire more accurate information and analyze objects that are outside of the K8s clusters. For more information, see the AWS documentation.

The curious thing about this resource is that it only exists in the us-east-1 region, so you need to pass the appropriate Terraform provider to the module for it. Even though it is tied to a specific region, CUR can analyze costs and upload the results to a bucket in any region.

Note that you must enable Athena when configuring CUR for Kubecost. In that case, along with the cost information, the CloudFormation’s crawler-cfn.yml template will be added to the bucket as well (Kubecost recommends it for creating Athena functions). The tricky part here is that this template, as well as all CUR data, only becomes available in the bucket after an average of a day, while we would like to run Terraform immediately…

So what could be done in this case? The solution was simple — we added it to the module as a template. Keep in mind, however, that AWS actively reshapes CUR, so there may be changes in the future. Currently, we have the permissions configured in the bucket for two services: billingreports and bcm-data-exports.

2. CloudFormation stack for creating Athena functions (it is created based on the crawler-cfn.yml template, see above). The following parameters are used in the template: 

  • CUR name;
  • bucket;
  • prefix.

3. AWS Spot Data Feed subscription provides usage and pricing information for your spot instances. This one is easy to set up: just specify the bucket and the prefix to send the necessary data (it will be forwarded to Kubecost afterwards). For more information, see the AWS documentation.

4. Two buckets:

  1. feed_cur_bucket is for storing spot data feed. The same bucket, albeit with a different prefix, is used to store AWS Cost and Usage Report (CUR) information.
  2. athena_bucket is used to dump the processed CUR when commanded by Kubecost. This bucket follows the recommended policy of purging data older than one day.

3. Kubecost Helm chart and its features

It’s time to talk a bit about the Kubecost deployment process. You can install Kubecost in your EKS cluster as an add-on. However, we opted for the Helm chart (and installed it using Argo CD) since that method gives us more flexibility.

Here are sample Values for the chart version v1.107.1. These values take into account the AWS integrations that our Terraform modules use. Let’s go through a few noteworthy details regarding this chart.

First of all, we used Prometheus, which was already running in the K8s cluster, to export metrics via ServiceMonitor resources, as well as securityContext to meet security guidelines. The web interface authorization was performed using on the company’s OAuth2 service. Since the cluster uses Karpenter for node management, its spot instance labels are in the chart as well.

Below are the Helm chart parameters that require your specific attention:

  1. The serviceAccount annotation to assign the created IAM role. It is necessary to access S3 buckets and Athena.
  2. Regions and names for the S3 buckets: awsSpotDataBucket and athenaBucketName.
  3. Database table and workgroup names for Athena.

Note also that a networkCosts DaemonSet is enabled with the AWS provider specified, the recommended CPU limits for it are set, and metrics collection is enabled.

Finally, the following components must also be present in the Kubernetes cluster for Kubecost to operate properly:

  • Node Exporter to expose system-level metrics, such as K8s node load;
  • kube-state-metrics to expose statuses, labels/annotations, capacity, and requests/limits of objects in the K8s cluster;
  • cAdvisor to expose resource consumption in individual containers.

Basically, it’s a fairly common Prometheus stack — except perhaps for cAdvisor, which provides some additional metrics. For more information about them, refer to the Kubecost docs.

When everything seems to be ready, go to Settings → Diagnostics in the Kubecost dashboard to check if the AWS integration is running properly. It also features availability checks for Node Exporter, kube-state-metrics, and cAdvisor metrics, as well as an option to enable Kubecost metrics export.

Outcome

To summarize, we have successfully integrated our Terraform modules and the official Helm chart into an automated Kubernetes cluster deployment process. This allowed us to generate detailed cost statistics for the whole environment as well as specific workloads running there.

Since our customer actively uses spot instances, the integration of Kubecost and AWS Billing using Cost and Usage Reports has enabled us to capture the exact cost of compute resources used in EKS clusters. We successfully achieved all of the goals, much to our and the customer’s delight!

Still, there is always room for improvement. Here’s the potential we envision for this project:

  1. Some EKS clusters are short-lived development environments that exist only during a feature’s development and testing (they are deleted permanently afterwards). So, in the future, it’s advisable to deploy a single Kubecost installation that runs in a static cluster and automatically collects information on the resources consumed by each environment. In theory, this approach would allow companies to estimate the value of the infrastructure involved in developing new features.
  2. The AWS integration can be expanded. On top of resource consumption stats for the cluster and workloads in it, you can collect data about the managed services used by the environment so that you have a unified overview of the information. (It’s worth noting, though, that this information is already available in AWS billing reports — and no additional manipulations are required. It’s just not that convenient.)
  3. The feature of customizing budgets and setting alerts based on these budgets offers exciting potential as well.

Do you have experience implementing Kubecost or other FinOps solutions? Did you get some value out of our case in AWS? Feel free to leave feedback or share your own experiences in the comments below!

Comments

Your email address will not be published. Required fields are marked *