Maturity Model & Methodology¶
This document explains the maturity model used in the Enterprise Readiness Assessment — why this methodology was chosen, what it does and does not tell us, what each pillar means, where the pillars come from, and the criteria for rating each pillar at each level.
It exists so that the ratings in the assessment are auditable: any reader can independently verify the level assigned to a pillar by checking the criteria here against observable evidence in the codebase and the operational environment.
1. Purpose of a Maturity Model¶
A maturity model answers four practical questions that a list of findings cannot:
- Where are we, on a comparable scale? Saying "the platform has no rate limiting" is true but lacks calibration. Saying "Security is at Level 1 of 5" tells you immediately how that compares to a typical enterprise platform (Level 3) or a best-in-class one (Level 4).
- Where do we need to be? A target rating per pillar makes the destination explicit. "Get to Level 3 on Observability" is a tractable goal; "improve observability" is not.
- What is the gap? The difference between current and target tells you scope. The difference between current and typical tells you whether you are ahead, behind, or normal.
- Are we making progress? A repeatable rubric lets you re-rate quarterly and see motion. Without a model, "are we improving?" is a vibe check.
Maturity ratings are also a communication tool. They compress technical findings into a form that non-technical stakeholders (board, finance, sales, customers) can understand. "We're at Level 1 on Reliability and need to be at Level 3 to sell into regulated industries" is a sentence a CFO can act on.
2. Why This Specific Methodology¶
Several methodologies could have been used. The hybrid chosen here was selected because each alternative has a specific weakness this model corrects.
| Alternative | What it does well | Why it isn't sufficient alone |
|---|---|---|
| CMMI applied broadly | The 1–5 level scale is industry-standard and well-understood | Treats the whole platform as a single number — too coarse to drive a roadmap |
| DORA metrics only | Excellent for delivery performance | Measures delivery throughput; says nothing about architecture, data, or security |
| AWS Well-Architected only | Strong on cloud-native pillars (security, reliability, cost) | Vendor-flavored, light on developer practices and team governance |
| OWASP / ISO 27001 only | Authoritative on security | Single pillar; doesn't surface trade-offs against other pillars |
| Pure narrative audit | Rich detail | Hard to track over time, hard to compare across teams |
The model used here combines pillar selection from multiple sources with CMMI's 1–5 level scale. Pillars are independent — a team can be Level 4 on Documentation and Level 1 on Security; that is informative, not contradictory. Independence is the model's most useful property.
What the model tells us¶
- The shape of the gap, not just the size. A platform with even ratings across pillars is in different trouble than one with a Level 4 on observability and a Level 1 on security.
- The order of investment. High-leverage pillars (Architecture, Background Processing, Security) constrain progress on others; you cannot reach Level 3 on Reliability while Background Processing is at Level 1.
- The realistic phasing. The transition from Level 1 to Level 2 on a pillar is typically one quarter; Level 2 to Level 3 is two quarters; Level 3 to Level 4 is a year or more.
What the model does NOT tell us¶
- Quality of customer experience. A Level 4 platform can still build the wrong product.
- Velocity of delivery. DORA metrics are partly captured under CI/CD but the model does not measure feature throughput.
- Strategic fit. Whether the right modules exist at all is a product-architecture question, not a maturity question.
- Comparative ranking against competitors. Maturity is internal-facing; competitive position requires market analysis.
The model is a health check, not a strategy.
3. The 1–5 Scale¶
The level scale is descended from CMMI (Capability Maturity Model Integration), originally developed by the Software Engineering Institute at Carnegie Mellon. CMMI's level names have been lightly modernized to match contemporary engineering vocabulary:
| Level | CMMI name | Plain reading |
|---|---|---|
| 1 | Initial | Ad-hoc, heroic, undocumented, single point of human failure |
| 2 | Repeatable / Managed | The same person can do it twice the same way; partly documented |
| 3 | Defined | Codified, anyone on the team can do it, low truck-factor risk |
| 4 | Quantitatively Managed | Instrumented, measured, bounded by SLOs / error budgets |
| 5 | Optimizing | Continuous-improvement loop, automated remediation, self-correcting |
Why this scale¶
- Five levels is enough granularity to drive roadmap decisions without producing false precision. A 10-point scale invites debates over half-points; a 3-point scale loses too much signal.
- Level 3 is "normal." Most healthy enterprise platforms operate at Level 3–4 across most pillars. Level 5 is rare and usually only worth pursuing for the 1–2 pillars where the platform has a competitive moat.
- Level transitions correspond to cultural changes, not just tooling. Moving from Level 2 to Level 3 typically requires writing things down (codification); from Level 3 to Level 4 requires instrumentation and measurement; from Level 4 to Level 5 requires automation of the improvement loop itself.
- Adopted convention. Engineers, auditors, and procurement teams recognize this scale. It is not a custom dialect.
The "off the scale" cases¶
- A score below 1 is not used. Either a capability is at Level 1 (chaotic but present) or it is genuinely absent — in which case the rating is N/A and the gap is explicitly flagged.
- A score above 5 is not meaningful. Level 5 already implies continuous improvement; there is nothing higher to aspire to within this scale.
4. Origin of the Pillars¶
The 14 pillars are not invented for this assessment. Each is drawn from one or more established frameworks; the synthesis is opinionated but the components are standard.
| Pillar | Primary source | Secondary sources |
|---|---|---|
| Architecture & Modularity | TOGAF, "Building Microservices" (Newman) | Conway's Law / Domain-Driven Design |
| API Contract & Versioning | OpenAPI Initiative, REST API best practices | Stripe / Twilio API design guides |
| Data Architecture | "Designing Data-Intensive Applications" (Kleppmann) | AWS Well-Architected — Reliability pillar |
| Background Processing | Industry pattern (queue-based architecture) | Twelve-Factor — concurrency, processes |
| Security & Compliance | OWASP API Top 10, CIS Controls v8, ISO/IEC 27001 | SOC 2 Trust Services Criteria |
| Observability | OpenTelemetry, "Observability Engineering" (Majors et al.) | Google SRE Book |
| Reliability & Resilience | AWS Well-Architected — Reliability | Google SRE Book — error budgets, SLOs |
| Scalability | AWS Well-Architected — Performance Efficiency | Twelve-Factor — concurrency |
| Testing & Quality Gates | "Continuous Delivery" (Humble & Farley) | Test pyramid (Cohn) |
| CI/CD & Deployment | DORA / "Accelerate" (Forsgren et al.) | Continuous Delivery |
| Infrastructure as Code | DORA, AWS Well-Architected — Operational Excellence | Terraform / Pulumi conventions |
| Cost Visibility & FinOps | FinOps Foundation framework | AWS Well-Architected — Cost Optimization |
| Documentation & Knowledge Mgmt | DORA, internal-developer-platform best practice | "Software Engineering at Google" |
| Team Practices & Governance | DORA, SPACE framework, ITIL change management | "Accelerate" |
The pillars were chosen to cover both technical and operational dimensions — a pure technical model would miss governance and team practices; a pure operational model would miss data architecture and API contract. Both classes of failure are common at Level 1 platforms.
Pillars deliberately excluded¶
- Performance as a standalone pillar — folded into Scalability and Reliability. Performance numbers without reliability and scalability context are not actionable.
- User experience / product quality — out of scope for an engineering-platform assessment.
- Hiring / talent strategy — partly captured under Team Practices, but the broader org-design question is the CTO's responsibility, not the platform assessment's.
5. What Each Pillar Means¶
Brief definitions; the assessment doc provides per-pillar findings.
5.1 Architecture & Modularity¶
How the codebase is structured: bounded contexts, module boundaries, separation of concerns, file/component sizing, the ability to change one part without disturbing others. Why it matters: dictates the cost of every future change.
5.2 API Contract & Versioning¶
Existence and discipline of an API specification (OpenAPI / AsyncAPI / GraphQL schema), version management, deprecation policy, schema validation. Why it matters: defines the platform's contract with external integrators and shields it from breaking changes.
5.3 Data Architecture¶
Schema management (migrations), driver consistency, separation of OLTP and analytics, replication strategy, retention policy, data-locality concerns. Why it matters: the database is usually the hardest thing to change in an enterprise platform; getting it wrong has decade-long consequences.
5.4 Background Processing¶
Job queue infrastructure, worker concurrency model, retry / dead-letter / idempotency semantics, observable queue health. Why it matters: for SaaS platforms, the worker fleet typically does more work than the request fleet; instability here is felt by every user.
5.5 Security & Compliance¶
Authentication, authorization, secrets management, network and platform-level controls, compliance posture (SOC 2, ISO 27001, GDPR), audit logging, supply-chain security. Why it matters: procurement gates, breach risk, regulatory exposure.
5.6 Observability¶
Structured logging, metrics, distributed tracing, correlation IDs, dashboards, alerting, SLOs. Why it matters: determines mean-time-to-detect and mean-time-to-resolve. Without observability, on-call is reactive guessing.
5.7 Reliability & Resilience¶
Circuit breakers, retries with backoff, graceful shutdown, single-points-of-failure analysis, disaster recovery, backup/restore validation, RTO / RPO commitments. Why it matters: uptime SLAs are the most visible promise to customers.
5.8 Scalability¶
Horizontal scaling readiness, bottleneck analysis, load testing, capacity planning, resource quotas. Why it matters: distinguishes platforms that grow gracefully from platforms that hit a ceiling and require a rewrite.
5.9 Testing & Quality Gates¶
Static analysis, unit / integration / E2E tests, contract tests, code coverage on critical paths, test gates in CI. Why it matters: the floor below which refactoring becomes too dangerous to attempt.
5.10 CI/CD & Deployment¶
Build pipeline, deployment automation, blue/green / canary, rollback, artifact signing, environment promotion. Why it matters: governs the cost and risk of every change reaching production.
5.11 Infrastructure as Code¶
Terraform / Pulumi / equivalent, environment definitions, IaC for the full stack (compute, network, data, observability), drift detection. Why it matters: without IaC, environments diverge invisibly and disaster recovery is a guess-and-check exercise.
5.12 Cost Visibility & FinOps¶
Per-service / per-feature / per-tenant cost attribution, anomaly detection, budgets, unit-economics tracking. Why it matters: unprofitable customers, runaway AI spend, and pricing decisions all depend on this visibility.
5.13 Documentation & Knowledge Management¶
Architecture documents, runbooks, ADRs, API docs, onboarding materials. Why it matters: truck-factor mitigation; onboarding speed; transfer of context across team transitions.
5.14 Team Practices & Governance¶
Code review standards, change management, on-call rotation, postmortems, ADRs as a process, security-aware PR review. Why it matters: technology fixes alone do not produce maturity; processes are how maturity is sustained.
6. Grading Rubric¶
Each level has both a generic definition and pillar-specific criteria. A pillar earns a level only when all criteria for that level are met.
6.1 Generic level definitions¶
Level 1 — Initial. - Capability exists in some form, but ad-hoc. - Outcomes depend on which individual is doing the work. - No documentation, or documentation is so out-of-date it misleads. - Failures are common; root-cause analysis is rare; the same failure recurs. - Truck factor of 1 (a single person leaves and the capability degrades materially).
Level 2 — Repeatable. - A team can produce the same outcome twice if conditions are similar. - Some documentation exists; some tooling is in place. - Outcomes are inconsistent across people or contexts. - Failures are noticed; root-cause analysis is sometimes done; learnings are sometimes captured. - Truck factor of 2–3.
Level 3 — Defined. - The way the capability is delivered is codified — written down, agreed on, tooled. - Anyone on the team can produce a typical outcome by following the process. - Tooling enforces the process where possible (linters, CI gates, templates). - Failures are investigated; postmortems are written; learnings inform the codified process. - Truck factor exceeds team size in most areas.
Level 4 — Quantitatively Managed. - The capability is measured. Specific quantitative targets exist (SLOs, coverage thresholds, error budgets). - Trend lines are tracked; deviations are alerted on. - Decisions about the capability are evidence-based, not opinion-based. - Improvement work is scoped to specific metrics moving in specific directions.
Level 5 — Optimizing. - The capability has a continuous improvement loop built in. - Process / tooling improvements are surfaced automatically (e.g., flaky-test detection, anomaly detection on cost). - Self-healing or auto-remediation is present where appropriate. - The capability gets better without explicit human intervention each cycle.
6.2 Pillar-specific criteria¶
The table below shows the minimum concrete evidence required for a pillar to earn each level. A pillar at Level 3 must meet all Level-3 criteria and all preceding criteria.
| Pillar | Level 1 | Level 3 (target) | Level 4 (aspirational) |
|---|---|---|---|
| Architecture & Modularity | Single-file modules > 5k lines; no module boundaries | Bounded contexts with explicit module APIs; lint enforces import boundaries | Module health metrics (cyclomatic complexity, dependency churn) tracked over time |
| API Contract & Versioning | No spec; no version prefix; no deprecation policy | OpenAPI spec generated from code; URL versioning; documented deprecation policy | Spec compliance gated in CI; consumer-driven contract tests |
| Data Architecture | No migration tool; multiple DB drivers; analytics in OLTP | One driver; migration tool used for every schema change; read replica for analytics | Slow-query SLO; per-tenant data partitioning; query-cost dashboards |
| Background Processing | Cron-polling on DB; no queue; no idempotency | Real queue (Redis / SQS / etc.); idempotency keys on side-effects; DLQ | Queue-depth SLO; per-job-type error rate dashboards; auto-scaling on backlog |
| Security & Compliance | Secrets in source; no rate limit; CORS open | Secrets manager; rate limit on auth + cost endpoints; helmet; SAST + SCA in CI | SOC 2 Type II; quarterly pentest; audit log on all privileged actions |
| Observability | console.log only; no metrics; no traces |
Structured logs with correlation IDs; metrics on golden signals; distributed traces | SLOs and error budgets defined and tracked; alerts mapped to runbooks |
| Reliability & Resilience | No circuit breakers; no graceful shutdown; single DB SPOF | Circuit breakers + retry-with-backoff; graceful shutdown; tested backups | Defined RTO / RPO; chaos testing in staging; multi-AZ |
| Scalability | Cannot run on a 2nd host without correctness regressions | Horizontally scalable; load tests run before release | Auto-scaling on measured signals; capacity model maintained per-service |
| Testing & Quality Gates | Zero tests | Unit + integration on critical paths; CI gate; ≥ 60% coverage on changed code | Mutation testing; contract tests; flaky-test quarantine |
| CI/CD & Deployment | Manual deploy; SSH + process restart | Containerized deploy; automated rollback on health-check fail; ≥ 1 deploy/day | Canary / blue-green; DORA elite metrics tracked; feature flags |
| Infrastructure as Code | No IaC; environments differ invisibly | Terraform (or equiv) for all infra; environments parameterized | Drift detection automated; IaC drift gates merges |
| Cost Visibility & FinOps | No per-service tagging; cloud bill is a surprise | Cost tags on every resource; weekly review; anomaly alerts | Per-tenant unit economics; cost-attribution gates feature decisions |
| Documentation & Knowledge Mgmt | Tribal knowledge only | Architecture, runbook, ADR documents in-repo; reviewed in PRs | Documentation health metric (freshness, completeness) tracked |
| Team Practices & Governance | Ad-hoc PR review; no standard | Codified review checklist; change-management for high-risk changes; postmortems | DORA metrics tracked per-team; postmortem actions completed and verified |
6.3 Worked examples¶
Architecture & Modularity at Level 1. Two route files totaling 46k lines; no module boundaries; cross-cutting helpers imported from anywhere. Why this is L1, not L2: there is no internal structure that a second engineer could reproduce; understanding requires reading the entire file. (This describes the current state.)
Architecture & Modularity at Level 3. Codebase organized into modules/identity, modules/content, etc.; each module exposes a typed public API; ESLint or equivalent prevents internals from being imported across module boundaries; new engineers can be productive in one module without reading the others.
Security & Compliance at Level 1. Hardcoded secrets in source; CORS accepts all origins; no rate limiting; no helmet; no SAST. Why this is L1, not L2: none of the basic controls are reproducibly in place. (This describes the current state.)
Security & Compliance at Level 3. Secrets in a manager; rate limits on auth and cost-bearing endpoints; helmet with reasonable defaults; SAST and dependency scanning in CI; threat model exists. SOC 2 Type I in progress.
Background Processing at Level 1. 108 PM2 workers polling MySQL; no idempotency keys; concurrency control is a per-process flag that doesn't survive multi-host deployment. (This describes the current state.)
Background Processing at Level 4. Queue-depth SLO defined and tracked; per-job-type latency and error-rate dashboards; auto-scaling on backlog; idempotency tested in CI; chaos tests for retry behavior.
7. How to Apply Ratings¶
7.1 Evidence required¶
A rating must be backed by observable evidence, not opinion. Acceptable evidence:
- Source code / repository state (file sizes, dependency graph, presence of tests)
- Configuration files (
ecosystem.config.js, IaC files, CI workflows) - Operational artifacts (dashboards, runbooks, alert configurations, postmortem documents)
- Process artifacts (PR templates, code review checklists, change-management tickets)
- Direct measurement (uptime numbers, deployment frequency, MTTR from real incidents)
Self-assessment without evidence is not a rating; it is a hope.
7.2 Who decides¶
Ratings should be reviewed by at least two senior engineers, ideally with one external to the team to counteract familiarity bias. For pillars with compliance implications (Security, Data Architecture), an external auditor's view is the strongest signal.
7.3 Cadence¶
- Quarterly re-rating is the recommended cadence for an active modernization program. More frequent than that produces noise; less frequent loses the feedback loop.
- Annual re-rating is sufficient once the platform reaches Level 3 across most pillars.
- Event-driven re-rating when major changes occur (a service extraction, a compliance certification, an incident that exposes a gap).
7.4 What counts as a level transition¶
A pillar is considered to have transitioned to a new level only when the criteria for that level have been met for one full quarter without regression. A capability that meets Level 3 criteria for two weeks and then quietly degrades is still effectively at Level 2 — discipline is part of the rating.
7.5 Targets vs. fantasies¶
A 12-month target for a pillar should be at most one level above current, except in unusual circumstances. Two-level transitions in a single year are rare and usually involve hiring or organizational change, not just engineering. Targets that exceed plausibility undermine the model.
8. Limitations¶
8.1 Subjectivity¶
The rubric is structured to minimize judgment calls, but no rubric is perfectly objective. Two qualified raters will agree on most pillars and disagree on edges. The model's value comes from trend over time, not from any single rating's precision.
8.2 Pillar interaction is not modeled¶
The pillars are independent in the table but interdependent in reality. Improving Observability while Security remains at Level 1 produces a platform that can see its own breaches but cannot prevent them. The model surfaces gaps; it does not prescribe sequencing — that is the roadmap's job.
8.3 Maturity is not value¶
A Level 5 platform that solves the wrong problem is worth less than a Level 2 platform that solves the right one. The model assesses how the platform is built, not whether it should exist.
8.4 The model can be gamed¶
Any rubric can be optimized for. A team that wants to "show progress" can re-categorize criteria to artificially advance ratings. The model assumes good-faith application, with external validation as the safeguard.
8.5 Best-in-class is rare¶
Level 5 across multiple pillars is uncommon outside large platform companies that have invested decades and hundreds of engineers. For most enterprise platforms, Level 3 across most pillars and Level 4 on the 2–3 pillars most critical to the business is the realistic destination.
9. Related¶
- Enterprise Readiness Assessment — uses this model to rate the platform and produce a roadmap
- Architecture Overview — current technical state referenced in pillar evidence
- AWS Well-Architected Framework whitepapers — the source for several pillars
- DORA / "Accelerate" — source for CI/CD and team-practices pillars
- CMMI — source for the 1–5 level scale