Blog & Insights

Engineering
Perspectives

Deep dives, lessons from the field, and engineering perspectives from the CloudBrx team. No marketing fluff — just practical insights from real cloud deployments.

In-Depth Articles

Long-Form Writing

Architecture patterns, migration war stories, and FinOps playbooks with real numbers from our engineering team.

Cloud Architecture 14 Apr 2025 • 12 min read

The Hidden Cost of Cloud Lock-In: How Multi-Cloud Strategy Protects Your Business

When a major financial services firm asked us to review their Azure estate in 2023, we found something alarming: 94% of their workloads used Azure-specific managed services with no cloud-portable equivalents. Migrating a single application would require a complete rewrite. This is the hidden cost of cloud lock-in — and it is not theoretical.

Cloud vendors design their managed services to be excellent and addictive. Azure Service Bus, AWS Step Functions, GCP Pub/Sub — these are genuinely great products. The problem emerges when you realise that using them deeply means your infrastructure becomes non-portable, your negotiating leverage disappears, and your roadmap is hostage to a vendor's pricing decisions.

At CloudBrx, we advise clients to make a deliberate distinction between anchor services (databases, compute, networking — where you want portability) and acceleration services (managed ML, CDN, serverless — where lock-in is often worth the productivity gain). The key is intentionality: know what you are trading, and trade deliberately.

Multi-CloudArchitectureStrategy

Cloud Security 2 Apr 2025 • 9 min read

Zero Trust Architecture on AWS: A Practical Implementation Guide for Enterprise Teams

Zero Trust is one of those terms that has been so thoroughly marketed it risks becoming meaningless. Never trust, always verify sounds simple. Implementing it across a 200-account AWS organisation with 40 engineering teams is anything but.

This guide documents what Zero Trust actually looks like in practice on AWS — not the whitepaper version, but the version we have implemented for financial services and healthcare clients who needed to pass rigorous third-party security audits. We cover IAM permission boundaries, VPC micro-segmentation with AWS Network Firewall, secrets management with Vault, and the often-overlooked challenge of service-to-service authentication in microservice architectures.

The most important insight we have developed: Zero Trust is not a product you buy. It is an operating model you implement across identity, network, data, and workload layers simultaneously. Miss one layer and the model breaks down.

SecurityAWSZero Trust

FinOps 20 Mar 2025 • 8 min read

FinOps in Practice: The Exact Playbook We Used to Save $890K for an E-Commerce Client

When ShopGrid came to us, they had a $2.3M annual AWS bill, no team-level cost attribution, and a VP of Engineering who described the monthly invoice as "basically a surprise every time". Within 12 weeks, we had reduced their annual run rate by $890K — without decommissioning a single feature or compromising any SLA.

This article lays out the exact playbook, in the exact sequence: first, visibility (tagging taxonomy, cost allocation, anomaly detection); second, quick wins (idle resource cleanup, oversized instance right-sizing, scheduling dev environments off overnight); third, structural savings (Reserved Instance purchasing strategy, Savings Plans modelling, architecture changes with clear ROI).

The single highest-impact action we took — which cost nothing and delivered $340K in savings in 60 days — was automated scheduling of non-production environments. Staging, QA, and dev environments were running 24/7 when engineers worked 9 to 6. Turning them off overnight and on weekends eliminated 65% of their non-production compute cost immediately.

FinOpsAWSCost Optimisation

Platform Engineering 10 Mar 2025 • 11 min read

Kubernetes at Scale: Engineering Lessons from 50 Production Deployments Across Five Industries

After designing and operating 50+ production Kubernetes environments — from 3-node clusters for Series A startups to 400-node multi-region platforms for listed enterprises — we have accumulated lessons that do not appear in the official documentation.

Lesson one: namespaces are not a security boundary. We have seen numerous organisations use namespace separation as their primary tenancy model, only to discover that a misconfigured pod can still reach across namespaces via the cluster network. True multi-tenancy requires network policies, admission controllers, and — for hard-tenancy requirements — separate clusters.

Lesson two: HPA alone is insufficient for traffic spikes. Horizontal Pod Autoscaler works well for gradual load increases. For sudden spikes — Black Friday, product launches, viral events — you need pre-scaling, KEDA for event-driven scaling, and cluster autoscaler tuned to provision nodes before pods enter Pending state.

KubernetesPlatform EngineeringDevOps

Cloud Migration 26 Feb 2025 • 10 min read

The 6R Framework in Practice: Choosing the Right Migration Strategy for Each Workload

AWS popularised the 6R migration framework — Rehost, Replatform, Refactor, Repurchase, Retire, Retain — and it remains the best-known model for categorising migration approaches. What the documentation does not tell you is how to classify real workloads, handle edge cases, and avoid the expensive mistakes that come from applying the wrong R to a workload.

The most common mistake: Rehosting workloads that should be Refactored. A monolithic application on a VM can usually be Rehosted in two days. But if that application has hardcoded IP addresses, Windows-specific file system dependencies, or a licence that does not transfer to cloud, you have just moved your problem without solving it.

In this guide, we share the decision tree we use internally to classify every workload before a single migration begins — including the 15 questions we ask about each application that determine which R is appropriate and why.

Cloud MigrationArchitectureStrategy

Short Posts

Quick Insights

Shorter observations, quick tips, and news from the team.

Quick Tip 18 Apr 2025 • 3 min

Reduce Your RDS Costs by 40% with Aurora Serverless v2

Aurora Serverless v2 is one of the most underused cost optimisation tools in the AWS arsenal. Unlike the original Serverless v1 (which had cold-start problems and limited compatibility), v2 scales in fine-grained increments down to 0.5 ACUs, pausing completely during idle periods. For development, staging, and low-traffic production databases, the savings are immediate. We have seen clients cut their RDS spend by 35–45% within 30 days of migration, with zero application changes required.

AWSCost TipRDS

Engineering Insight 11 Apr 2025 • 4 min

Your CI/CD Pipeline Is Probably Your Biggest Security Vulnerability

When we conduct cloud security assessments, the finding that surprises clients most is their CI/CD pipeline. A pipeline that uses long-lived IAM credentials, runs third-party GitHub Actions without pinned versions, and has access to production secrets is one compromised dependency away from a catastrophic breach. We have seen this pattern in 70% of the pipelines we have audited. The fix: OIDC-based cloud authentication, pinned action versions with hash references, and least-privilege pipeline roles.

DevSecOpsSecurityCI/CD

Engineering Insight 4 Apr 2025 • 5 min

The Terraform Module Pattern That Changed How We Manage Multi-Account AWS

For two years, we managed multi-account AWS deployments by duplicating Terraform root modules per account — a pattern that works until it does not. The turning point was a client with 23 AWS accounts and a 4-hour blast radius when a module change needed applying everywhere. The solution: a meta-module pattern where a single root module iterates over an account manifest, applying child modules per account in parallel with explicit state isolation. Combined with Atlantis for PR-based plan/apply workflows, average multi-account change windows dropped from 4 hours to 22 minutes.

TerraformAWSMulti-Account

Engineering Insight 28 Mar 2025 • 6 min

GCP vs AWS for ML Workloads: What We Have Learned After 30 Deployments

If a client asks which cloud to run ML training workloads on, our honest answer is almost always GCP — but the reasoning is nuanced. GCP's Vertex AI, TPU availability, BigQuery ML integration, and native TensorFlow support create a cohesive ML platform that AWS's fragmented SageMaker ecosystem does not yet match for teams working primarily in Python and TensorFlow. However, for teams already invested in AWS, the migration cost often outweighs the advantages. AWS Trainium and Inferentia chips have also closed the price-performance gap significantly for inference workloads.

GCPAWSMachine Learning

Company News 17 Mar 2025 • 2 min

CloudBrx Is Now a Certified Kubernetes Training Partner

We are proud to announce that CloudBrx has joined the CNCF ecosystem as a certified Kubernetes training partner. Our internal certification programme — which has prepared all 45 engineers for CKA and CKAD — is now available to client engineering teams as a structured 3-day intensive. The programme covers cluster architecture, workload design, networking, storage, and security, with hands-on labs in a production-replica environment. The first public cohort runs in May 2025.

KubernetesTrainingCompany News

Stay Current

Cloud Insights in Your Inbox

One email per month. Practical cloud engineering, FinOps tips, and security insights — no marketing, no product pitches.

EngineeringPerspectives