KubeShark: Kubernetes Skill for Claude Code to Catch Bad YAML

Lukas Niessen built KubeShark, a Kubernetes skill for Claude Code and Codex that tackles a specific problem: LLMs hallucinate when writing Kubernetes YAML. They generate deprecated API versions, forget security contexts, create Services selecting no pods, misconfigure probes, omit resource requests, and produce rollouts that look valid but fail under load. Kubernetes is unforgiving here — a wrong Service selector or broken liveness probe applies successfully but causes silent failures or pod restarts.

Failure-Mode-First Workflow

KubeShark is not a dump of best practices. Before generating any YAML, the agent must reason about what can go wrong across six failure domains:

Insecure workload defaults
Resource starvation
Network exposure
Privilege sprawl
Fragile rollouts
API drift

Only after that reasoning does it produce manifests, Helm charts, Kustomize overlays, RBAC, NetworkPolicies, or validation steps. The idea is to make operational details unavoidable rather than skipped.

Specific Mistakes It Catches

Service selector that does not match Deployment labels
Ingress using an API version removed in modern Kubernetes
Deployment running as root with no security context
Liveness probe checking an external database
ClusterRoleBinding where a RoleBinding would suffice
StatefulSet assuming PVCs disappear on scale-down
Helm template rendering valid YAML with wrong Kubernetes API
Kustomize patch silently targeting the wrong resource

Token-Efficient Architecture

KubeShark's main SKILL.md stays compact and procedural. Deeper knowledge lives in focused reference files loaded only when relevant — for example, probe guidance doesn't load RBAC rules, and Helm tasks don't load NetworkPolicy guidance. This prevents token waste and reduces the chance the agent mixes unrelated concepts.

The skill also supports platform-specific contexts via Conditional Reference Retrieval. It detects signals like IRSA, Karpenter, Azure Workload Identity, GKE Autopilot, OpenShift Routes, ApplicationSet, HelmRelease, ServiceMonitor, or OpenTelemetry Collector, then loads the matching reference. This gives EKS-aware, AKS-aware, GKE-aware, OpenShift-aware, GitOps-aware, or observability-aware manifest generation and review — only when the context is relevant.

Defaults lean toward security: Pod Security Standards, cross-resource consistency checks, label/selector/port alignment, deprecated API avoidance, and rollback guidance are built in.