Exploring Cloud Native projects in CNCF Sandbox. Part 4: 13 arrivals of 2024 H2

This article covers the second half of the new Open Source projects accepted to the CNCF Sandbox last year. They were added as a result of the CNCF TOC (Technical Oversight Committee) votes performed in August, September, and October. The projects below are listed according to their formal categories, starting with those featuring more items.

Security & Compliance

1. Ratify

With Ratify, you can improve your software supply chain security by verifying artifact security metadata against specific policies. When used in Kubernetes, it leverages Gatekeeper as the admission controller.

The Ratify design is based on a framework that follows the provider model and supports two types of providers, internal (built-in) and external (plugins). Essential framework components include:

  • stores that can store and distribute OCI artifacts;
  • verifiers that are responsible for verifying a specific artifact type based on the provided configuration;
  • executors that are the “glue,” linking all Ratify plugin-based components such as the verifiers, referrer stores, and policy providers.

Since there are plugin providers, it’s easy to integrate custom stores and verifiers for your needs. Ratify will orchestrate various verifiers to obtain a final verification result using your policy. Then, admission controllers can rely on the given result at different stages to decide whether the verification was successful.

Ratify has plugins for ORAS as a store as well as for various verifiers: Notation from the Notary project and Cosign from sigstore, verifiers for vulnerability reports (including those generated by Trivy or Grype) and schema, and an alpha plugin for SBOM.

2. Cartography

Cartography is a tool for IT asset inventory focused on security risks. Basically, it explores your existing infrastructure components and builds a map representing your assets and their dependencies as a graph.

Cartography covers numerous infrastructure providers, from Kubernetes and cloud accounts (AWS, Azure, DigitalOcean, GCP, Oracle Cloud) to GitHub, PagerDuty, and AI vendors (Anthropic, OpenAI). Security-related providers it supports include the NIST CVE database, Trivy security scanner, and Crowdstrike Falcon.

All you need to use Cartography is to configure your data sources — i.e. provide access to the services you use (e.g., by configuring relevant API tokens) — and run a CLI command. When it’s done, you can view your infrastructure graph via a web browser. Cartography leverages the Neo4j database to store and display the data.

The latter is significant since you can perform powerful queries to get any information about your resources. That’s why the project offers a usage tutorial showcasing several typical queries you might need, such as finding the EC2 instances that are directly exposed to the internet and S3 buckets allowing anonymous access, or which dependencies are used across all GitHub repos.

Cartography has a drift detection module to help you record data changes over time. Using it, you can specify a Neo4j query for validation to run periodically and track changes in its results from time to time.

Scheduling & Orchestration

3. HAMi

This project was originally known as k8s-vGPU-scheduler. Today, HAMi stands for Heterogeneous AI computing virtualization Middleware. It aims to simplify and automate managing devices used for GenAI needs (GPUs, NPUs, and MLUs) in Kubernetes.

HAMi improves resource efficiency by sharing the same devices for various parallel tasks running in Kubernetes Pods. You can select specific types of devices or target a concrete device. It also allows you to control allocated memory, enforce hard limits on streaming multiprocessors, and perform MIG adjustments (via mig-parted for dynamic-mig).

Here’s a high-level architecture of HAMi:

  • Mutating Webhook checks if there are required resources available to process a task.
  • Scheduler assigns tasks to nodes and devices.
  • Device plugin maps the needed device to a container according to the schedule.
  • HAMi-Core monitors resource usage in the container and ensures isolation.

The project also features a web UI to visualize and manage resources, their usage, and tasks.

Currently, HAMi supports devices from NVIDIA, Cambricon, Hygon, Huawei Ascend, Iluvatar, Mthreads, Metax, and Enflame.

4. Kubernetes AI Toolchain Operator (KAITO)

As it’s easy to see in the name, KAITO is an operator assisting in working with AI/ML workloads — performing model inference and tuning workloads — in Kubernetes.

Since it originated in the Azure Kubernetes Service (AKS) team, the project is focused on using managed K8s from this cloud provider. However, the documentation includes instructions for installing KAITO using an AWS EKS cluster as well.

KAITO implements the container-based model management, which leverages an OpenAI-compatible server to perform inference calls. The project simplifies deploying models by offering easy-to-use built-in configurations and automatic provisioning of required GPU nodes (in Azure). For inference runtimes, it works with vLLM and transformers from Hugging Face.

Here’s an overall KAITO architecture:

KAITO’s two main components, Workspace controller and Node provisioner, are deployed on a system node. The former is responsible for processing a custom resource provided by the user, creating a machine CRD for node auto-provisioning, and creating the required workload for inference or tuning. The latter uses the machine CRD (from Karpenter) and requests Azure Resource Manager to add GPU nodes to the AKS cluster.

In addition to the inference and fine-tuning features, KAITO also offers RAG (Retrieval-augmented generation), starting from the soon-to-be-released v0.5.0.

Service Mesh

5. Kmesh

Kmesh is a high-performance data plane for service mesh. It aims to address two existing issues with Istio: unwanted latency overhead at the proxy layer and high resource consumption. To do so, it leverages eBPF to implement traffic orchestration, including dynamic routing, authorization, and load balancing. Importantly, no code changes in the end-user applications are needed to benefit from the optimizations that Kmesh brings.

Kmesh uses Istio as its control plane and can operate in two modes:

  1. Kernel-native mode that provides the full experience. It delegates L4 and HTTP traffic governance to the kernel and, thus, doesn’t need to pass the data through the proxy layer.
  2. Dual-engine mode, which is for those who prefer an incremental transition. It adds Waypoint to manage L7 traffic. For this mode, running Istio in ambient mode is required.
Kmesh operating in the dual-engine mode. The flow for the kernel-native mode is similar, with no Waypoint involved

To evaluate the performance and resource consumption gains with Kmesh, its authors provide relevant instructions in the project’s documentation.

6. Sermant

Sermant is a proxyless service mesh that leverages Java bytecode enhancement to solve service governance issues in large-scale Java applications built as microservices (based on Spring Cloud, Apache Dubbo, etc.). It provides numerous features, such as dynamic configuration, messaging, heartbeat, service registration, load balancing, tag-based routing, flow control, distributed tracing, and more.

Some of these features are implemented on the framework level, while others are available as plugins. The project’s documentation lists existing plugins split by categories: service discovery and real-time configuration; limiting, downgrading and serviceability; application traffic routing; application observability. There’s also a developer guide for creating new plugins.

Architecturally, Sermant is shaped by three main components:

  • JavaAgent that instruments the application to benefit from the service governance.
  • Backend that connects all JavaAgents and pre-processes the uploaded data.
  • Dynamic Configuration Center that dynamically updates the configuration in JavaAgents. This component is not a part of Sermant. Using existing Open Source solutions, such as ZooKeeper, ServiceComb Kie, or Nacos, is implied.

Thanks to the implementation of the xDS protocol in Sermant, integration with the Istio service mesh is possible. In this case, Sermant will communicate directly with Istio’s control plane and replace Envoy as Istio’s data plane for service governance.

Service Proxy

7. LoxiLB

LoxiLB is a feature-rich load balancer for Kubernetes that aims to be infrastructure-agnostic (i.e. support on-prem, public and hybrid cloud environments), performant, and programmable. It is primarily focused on operating as a service-type load balancer, yet other cases are supported as well. The project leverages eBPF as its core engine.

LoxiLB comes with its Go-based control plane components, eBPF-based data-path implementation, integrated GoBGP-based routing stack, a Kubernetes operator (kube-loxilb), and Kubernetes ingress implementation. Here’s the diagram of a typical Kubernetes cluster with LoxiLB in use:

LoxiLB is not only capable of being installed in various infrastructures, but also works with any Kubernetes distributions and CNIs (including Flannel, Cilium, and Calico). It supports dual-stack with NAT64 and NAT66 for Kubernetes and a broad spectrum of protocols: TCP, UDP, SCTP, QUIC, etc. Its numerous features include kube-proxy replacement with eBPF (full cluster-mesh for K8s), high availability with fast failover detection, extensive and scalable endpoint liveness probes, and stateful firewalling with IPsec/WireGuard support.

To get started with LoxiLB, you can choose from a variety of guides depending on the mode you want to run it in: external cluster, in-cluster, service proxy, Kubernetes Ingress, Kubernetes Egress, or standalone (without Kubernetes at all).

Cloud Native Network

8. OVN-Kubernetes

OVN-Kubernetes is a CNI (Container Network Interface) plugin for Kubernetes clusters implementing OVN (Open Virtual Networking), which is an abstraction on top of Open vSwitch (Open Virtual Switch). It aims to enhance K8s networking by offering advanced features for enterprise and telco use cases.

Some of these features include fine-grained cluster egress traffic controls, support for creating secondary and local networks (e.g., for multihoming), hybrid networking for mixed Windows/Linux clusters (using VXLAN tunnels), and offloading networking tasks from CPU to NIC. It also enables live migrations for KubeVirt-managed virtual machines by keeping established TCP connections alive. You can find a detailed list of features in the project’s documentation.

OVN-Kubernetes supports two deployment modes with varying architecture: default (centralized control plane) and interconnect (distributed control plane). The latter one involves connecting multiple OVN deployments via OVN-managed GENEVE tunnels.

The architecture for the default mode involves:

  • control plane: ovnkube-master Pod (it watches for K8s objects and translates them into OVN logical entities), OVN NBDB database (stores these logical entities), northd (converts the entities to OVN logical flows), and sbdb (stores the flows);
  • data plane: ovnkube-node Pod (it runs the CNI executable), ovn-controller (converts logical flows from sdbd into OpenFlows), and ovs-node Pod (OVS daemon and database, virtual switch).

OVN-Kubernetes is the default CNI in Red Hat’s OpenShift Container Platform (valid for the 4.19 version released last month) and boasts having NVIDIA as another well-known adopter.

Observability

9. Perses

Perses is an observability visualization project that is developing a dashboard tool with bigger plans in mind. It aims to provide a standardized dashboard specification to improve interoperability across various observability tools.

You can easily deploy Perses dashboards as Custom Resources in Kubernetes by leveraging Perses operator. The project also calls itself GitOps-friendly since it has everything you need to manage your dashboards in Git: static validation, a CLI to perform actions in CI/CD pipelines, CI/CD libraries, and SDKs (in Go and in CUE).

Which observability data can Perses visualize? Currently, it supports Prometheus metrics and Tempo traces. The authors plan to add more data sources, naming logs — particularly stored in OpenSearch and Loki — as one of their priorities. Supporting ClickHouse as an observability backend is another item in the project’s roadmap.

Perses’ design allows you to use it as a standalone tool or embed the panels and dashboards in other UIs (by using npm packages). The project follows a plugin-based architecture, with its core plugins available in a separate GitHub repo and guidance on creating custom plugins for any specific needs.

Application Definition & Image Build

10. Shipwright

Shipwright is a framework for building container images on Kubernetes by leveraging existing tools. The project implies using a simple YAML configuration (via CRDs) describing which applications with which tools will be built.

Basically, with Shipwright, you will need to:

  1. Define what you’re building: the source of the image (Git repository or OCI artifact, Dockerfile, etc.), required volumes, the container registry to push the resulting image, and so on. You can also enable a vulnerability scan for newly generated images.
  2. Define how you will build it, i.e., which tool you will use. Shipwright supports the following builders: Buildah, BuildKit (from Docker), Buildpacks, Kaniko, ko, and S2I (Source-To-Image from OpenShift). Different backends come with different features and limitations. You can control the building process by specifying relevant values (directly or via Kubernetes ConfigMaps and Secrets) for each build.
  3. Run the build process. Thanks to the triggers, it can also be event-driven. Notably, you can react to events from GitHub Webhooks or watch for Tekton Pipelines.

Everything happens in Kubernetes, and therefore, this whole workflow is managed by a Kubernetes operator. The project also has a CLI tool that can be used as a standalone binary or kubectl plugin.

In terms of observability, Shipwright exposes several Prometheus metrics — such as total builds and build duration — and supports a pprof profiling mode.

Automation & Configuration

11. KusionStack

KusionStack offers a variety of tools focused on building an Internal Developer Platform (IDP). Its core is a platform orchestrator called Kusion. Declarative and intent-driven, it revolves around a single specification where developers define the workload and all dependencies required for the application deployment. Kusion will do the rest to ensure the application runs.

How will this magic happen? You’ll need the platform engineers to create the so-called Kusion modules. Those are the project’s basic building blocks, implementing the actual infrastructure abstracted away from the developers and required by the application, such as databases, networking services, etc. Each module can be made up of one or several Kubernetes or Terraform resources.

Originally, interacting with Kusion was possible only through a CLI tool, but since the recent v0.14.0 release (January 2025), the project introduced Kusion Server, featuring a Developer Portal and RESTful APIs for managing Kusion objects. Additionally, this web UI visualizes the topology of application resources.

Other notable KusionStack tools are Karpor and Kuperator. Karpor, which boasts even more GitHub stars than the main Kusion repo, is a web UI for Kubernetes focused on three main features:

  1. Search: powerful SQL-style queries to quickly select any Kubernetes resources you need from multi-cluster setups.
  2. Insights for Kubernetes resources: a dashboard showcasing existing issues* and an interactive topology view.
  3. AI: GenAI-based interpretations for existing issues.

* All the abovementioned issues Karpor currently displays rely on the output from the kubeadit tool. However, the authors are considering adding other tools (such as Trivy, Kubescape, and Falco) for this functionality.

Finally, Kuperator is a set of workloads and operators “aiming to bridge the gap between platform development and Kubernetes.” They include controllers for workload lifecycle management, injecting specific configurations to Pods that meet certain criteria, performing one-shot operational tasks on a batch of Pods, and more.

Container Runtime

12. youki

Have you ever thought of a container runtime written in Rust? That’s precisely what youki is. It started as a hobby project exploring container runtimes and eventually grew to a significant community effort (see the impressive GitHub stats above!). As the author highlights, thanks to being fast and requiring not much memory, youki can be a good candidate for environments with strict resource limitations.

youki is an OCI-compliant, low-level runtime similar to runc and crun. It can be used directly to create, start, and run containers. However, it is more convenient to combine it with a higher-level runtime (such as Docker or Podman). Some other facts about youki:

  • It can work in a rootless mode.
  • Currently, it works only on Linux. Using it on other platforms is possible with virtualization involved, and there’re ready-to-use Vagrantfiles for VM setups with Vagrant.
  • It supports WebAssembly, meaning you can build a container image with the WebAssembly module and then run this container with youki.

Cloud Native Storage

13. OpenEBS

OpenEBS is a persistent storage solution for Kubernetes workloads. Its story of (re-)joining the CNCF Sandbox is unique:

  • Originally, OpenEBS was accepted into the CNCF as a Sandbox project in 2019.
  • However, in 2023, a public discussion on archiving this project — due to a lack of activity — was raised. In February 2024, it was archived indeed.
  • Soon after, the team behind OpenEBS introduced numerous changes, paving the way for the project’s resubmission to the Sandbox. In October 2024, it was unarchived and became a Sandbox project again.

An overall architecture of OpenEBS in Kubernetes and in relation to other tooling is illustrated in this diagram:

Here:

  • OpenEBS Control Plane manages the data engines and storage available on the K8s worker nodes, interacts with CSI and other tools to manage the lifecycle of volumes, make snapshots, perform resizes, etc.
  • OpenEBS Data Engines are used by Kubernetes stateful workloads and perform read and write operations on the underlying persistent storage.

As for the storage, OpenEBS supports two types:

  • Local storage. It’s limited to the node with the volume and comes with a minimal overhead. This type has several implementations: Hostpath, ZFS, LVM, Rawfile.
  • Replicated storage. It replicates data across multiple nodes (synchronously) and has the only implementation: Mayastor.

All of these implementations, with the exception of Local PV Hostpath, are presented as CSI (Container Storage Interface) drivers.

Mayastor is written in Rust and, due to its complexity, is the most actively developed part of OpenEBS. It works only in the ReadWriteOnce access mode, supports filesystem and block volume modes, and the following file systems: ext4, btrfs, and xfs. It also allows volume resizing, backups, snapshots, and monitoring (Prometheus metrics are exposed). Mayastor features a kubectl plugin for viewing and managing its resources.

Afterword

Again, we can see that most of the projects joining CNCF are 2-3 years old. The exceptions are OVN-Kubernetes (started in 2016), OpenEBS (the same 2016, but we already explained why this case is unique), and Cartography (2019). Almost half of the projects (6 out of 13) originate from the Asian companies/individuals (China, Korea, and Japan).

This time, AI/ML shines among the popular categories of the newly added projects, which is totally relevant as we see more such workloads in the Cloud Native space. Many networking-related tools are noticeable in this Sandbox batch, too.

As for the programming languages used, Go still dominates, followed by a small but steady presence of Rust, and one exception (Java). Even more unanimity is observed with licences: 12 out of 13 projects chose Apache 2.0.

The following year (2025) was more fruitful in the ecosystem, boasting 13 new CNCF Sandbox additions in January only! Our next overview will be published shortly — feel free to subscribe to our blog so you won’t miss it and other new articles.

P.S. Other articles in this series

  • Part 1: 13 arrivals of 2023 H1: Inspektor Gadget, Headlamp, Kepler, SlimToolkit, SOPS, Clusternet, Eraser, PipeCD, Microcks, kpt, Xline, HwameiStor, and KubeClipper.
  • Part 2: 12 arrivals of 2023 H2: Logging operator, K8sGPT, kcp, KubeStellar, Copa, Kanister, KCL, Easegress, Kuasar, krkn, kube-burner, and Spiderpool.
  • Part 3: 14 arrivals of 2024 H1: Radius, Stacker, Score, Bank-Vaults, TrestleGRC, bpfman, Koordinator, KubeSlice, Atlantis, Kubean, Connect, Kairos, Kuadrant, and openGemini.

Comments

Your email address will not be published. Required fields are marked *