I Accidentally Exposed TLS Secrets to GitHub, Let's Learn From It
woopie fucking doo! I exposed secrets to my public GitHub repository. Join me on a ride!
This wasn't some complex supply chain attack or sophisticated breach. This was me! Using tools without knowing the impact nor having proper safety nets in place.
Here's what happened, how I fixed it, and most importantly, what you can learn from my fuckup.
The Incident Timeline
Picture this: I'm redeploying my Kubernetes cluster after some infrastructure changes. Everything's going smoothly. Talos Linux is humming along, Terraform is doing its thing, but then the LoadBalancer doesn't get an external IP. Uuugh...
Time to debug this issue. I swear to god, if its DNS! Because its always DNS. So I start digging into the configuration files to figure out what's going wrong.
But then something catches my eye: tls.key: Here is my thought process on this:
Huh....?!?!?!?!... Did... did i commit a secret? INTO MY PUBLIC REPO?!!!!!!
apiVersion: v1
kind: Secret
metadata:
name: cilium-ca
namespace: kube-system
type: Opaque
data:
tls.key: LS0tLS1CRUdJTi... # base64 encoded private key
tls.crt: LS0tLS1CRUdJTi... # base64 encoded certificate
ca.crt: LS0tLS1CRUdJTi... # base64 encoded CA cert
There it is. In all its glory. A secret. Base64 encoded, and remember, base64 is not secure!
Don't trust me? Check yourself:
YmFzZTY0IGlzIG5vdCBzZWN1cmUhISEhIQ==Decode the string above and know that this is not secure!
Back to Topic: A TLS private key was committed on September 10, 2025 and published to GitHub. two months. Yes, for two months, I'm presenting the secrets to the public.
Root Cause Analysis
So, how exactly did this happen? I used the following command to generate the cilium manifest:
helm template cilium cilium/cilium --version 1.18.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=true \
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup > cilium.yaml
Then apply it with kubectl, watch it up and running. Commit, push, task done. Simple, right? What could go wrong?
My mistake: I never actually look at what helm template is generating. I just trust it blindly. The Cilium Helm chart, being a comprehensive CNI solution, includes TLS certificate generation for secure communication between components. When you run helm template, it generates everything—including those certificates and private keys.
This is a classic case of automation without verification. I automated the template generation but forgot to automate the security review. I was so focused on getting the cluster up and running that I skipped the most basic security practice: actually checking what I'm about to push.
Immediate Response and Remediation
Here is my battle plan:
- Root Cause Analysis: Identify the flawed static YAML approach (done)
- Access Review: Verify no unauthorised access to the homelab network (done)
- Secret Rotation: Regenerate all affected TLS certificates and keys
- Implement A Solution: Provide a permanent fix for these helm templates.
The Technical Fix
Enter terraform_data resource.
/terraform/kubernetes/cilium.tf
locals {
dist_directory = "${path.module}/dist"
cilium_filepath = "${local.dist_directory}/cilium.yaml"
}
resource "terraform_data" "cilium_yaml" {
input = local.cilium_filepath
lifecycle {
replace_triggered_by = [talos_machine_secrets.this]
}
provisioner "local-exec" {
command = <<EOT
mkdir -p ${local.dist_directory}
echo "*" > ${local.dist_directory}/.gitignore
helm template cilium cilium/cilium --version 1.18.0 \
--namespace kube-system \
--set ipam.mode=kubernetes \
--set kubeProxyReplacement=true \
--set securityContext.capabilities.ciliumAgent="{CHOWN,KILL,NET_ADMIN,NET_RAW,IPC_LOCK,SYS_ADMIN,SYS_RESOURCE,DAC_OVERRIDE,FOWNER,SETGID,SETUID}" \
--set cgroup.autoMount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup > ${local.cilium_filepath}
EOT
}
provisioner "local-exec" {
when = destroy
command = <<EOT
rm -rf ${local.dist_directory}
EOT
}
}
resource "talos_machine_configuration_apply" "cp" {
# ... other configuration ...
config_patches = [
yamlencode({
cluster = {
inlineManifests = [
{
name = "cilium"
contents = file(terraform_data.cilium_yaml.output)
}
]
}
}),
# ... other patches ...
]
}
So, what's going on here?
Dynamic Generation: The terraform_data resource generates the Helm template only when needed, triggered by changes to the Talos machine secrets.
Proper Lifecycle Management: The template file gets created during apply (when = created) and cleaned up during destruction (when = destroy). The replace_triggered_by ensures regeneration when cluster secrets change.
Automatic Gitignore: Creates a .gitignore file in the dist directory to prevent accidental commits.
Direct Integration: Uses file() function to read the generated template directly into the Talos configuration, eliminating the need for complex kubectl manifest resources.
This allows me to mark the secret rotation and implementation as complete. Done!
Wrong! What did I learn? Never Trust! So, time to verify the fix, before moving on.
Verification and Testing
First, is it properly excluded from git?
git status
# dist/ directory should not appear in untracked files due to .gitignore
ls -la terraform/kubernetes/dist/
# Should show .gitignore file preventing commits
Next, is my lifecycle working for that resource?
tofu plan
# Output shows:
# terraform_data.cilium_yaml will be created
# talos_machine_configuration_apply.cp will be updated
tofu apply
# Template should be generated in dist/cilium.yaml
ls -la tofu/kubernetes/dist/cilium.yaml
# File should exist temporarily during apply
# After successful apply, verify the inline manifest
tofu show | grep -A 10 "inlineManifests"
# Should show cilium manifest content loaded from file
Finally, is my cilium deployment working?
kubectl get pods -n kube-system | grep cilium
# All Cilium pods should be running
kubectl logs -n kube-system -l app.kubernetes.io/name=cilium | head -20
# No certificate or TLS errors in startup logs
# Verify Cilium status
cilium status
# Should show healthy cluster connectivity
Now the good news: The dynamic approach isn't just more secure—it's also more reliable. Improvement!
Pushed and fixed on November 8, 2025. Issue resolved!
Lessons Learned
This incident teaches me several valuable lessons:
1. Always Inspect Generated Content
The Lesson: Zero Trust! Never trust automated tools blindly, especially when they're generating a configuration that might contain sensitive data.
The Practice: Before using any generated YAML, take a moment to actually look at it. Here are commands that will help me in the future:
# Quick security scan of generated templates
helm template myapp ./chart | grep -i -E "(secret|password|key|token|tls\.key|tls\.crt)"
Automated Detection Tools: I'm planning to add tools to my workflow for automated detection:
2. Security is a Process, Not a One-Time Thing
The Lesson: Security isn't something you implement once and forget about. It's an ongoing process that needs to be built into every step of your workflow.
Automated Gitignore Rules:
# Generated templates that may contain secrets
**/dist/
**/*-generated.yaml
**/*-template.yaml
cilium.yaml
helm-output/
# Common secret files
.env
.env.local
*.pem
*.key
*.p12
*.pfx
**secrets**
3. Fail Fast and Fail Safe
The Lesson:
Have an incident response plan, even for your homelab.
The Practice:
To quote Mike Tyson:
Everybody has a plan until they get punched in the face.
Get used to it. And be prepared to get punched a lot in the face.
4. Dynamic is Better Than Static (For Secrets)
The Lesson:
Static configuration files are convenient, but they're also dangerous when they contain sensitive data. Dynamic generation with proper lifecycle management is worth the extra complexity.
Even better: When you can auto-rotate secrets, do it!
The Practice:
Use tools like OpenTofu/Terraform, Helm, or Kustomize to generate configurations at deployment time rather than storing them as static files.
Next Steps and Future Improvements
This incident gets me thinking about how to prevent similar issues in the future. Here's what I'm implementing:
AI-Powered Security Review (Future Concept)
This is where it gets interesting. I'm exploring the possibility of using AI to review generated configurations before they get applied. This is future speculation, not a current implementation, but the concept involves:
- Identifying potential security issues
- Suggesting safer alternatives
- Learning from past incidents
Takeaways
If you take away just three things from this post, let them be these:
First: Always inspect what your tools generate. Automation is powerful, but it's not infallible. A quick review can save you from major headaches later.
Second: Design your systems to fail safely. Use dynamic generation, proper lifecycle management, and automated cleanup to minimise the risk of persistent secrets.
Third: When incidents happen (and they will), respond quickly but systematically. Document what goes wrong, fix the immediate issue, and implement controls to prevent recurrence.
Security incidents are learning opportunities in disguise. Yes, they're stressful and potentially dangerous, but they also force us to examine our assumptions and improve our practices. This particular incident makes me a better DevOps engineer, and I hope sharing it helps you avoid similar mistakes.
Remember: we're all human, we all make mistakes, and we all have the opportunity to learn from them. The key is to fail fast, fail safe, and always be improving.
For any questions about this incident or the fix, hit me up on Mastodon or LinkedIn. I'm always happy to discuss security practices and lessons learned.
Happy Coding!