Skip to content

Maturity Model & Methodology

This document explains the maturity model used in the Enterprise Readiness Assessment — why this methodology was chosen, what it does and does not tell us, what each pillar means, where the pillars come from, and the criteria for rating each pillar at each level.

It exists so that the ratings in the assessment are auditable: any reader can independently verify the level assigned to a pillar by checking the criteria here against observable evidence in the codebase and the operational environment.


1. Purpose of a Maturity Model

A maturity model answers four practical questions that a list of findings cannot:

  1. Where are we, on a comparable scale? Saying "the platform has no rate limiting" is true but lacks calibration. Saying "Security is at Level 1 of 5" tells you immediately how that compares to a typical enterprise platform (Level 3) or a best-in-class one (Level 4).
  2. Where do we need to be? A target rating per pillar makes the destination explicit. "Get to Level 3 on Observability" is a tractable goal; "improve observability" is not.
  3. What is the gap? The difference between current and target tells you scope. The difference between current and typical tells you whether you are ahead, behind, or normal.
  4. Are we making progress? A repeatable rubric lets you re-rate quarterly and see motion. Without a model, "are we improving?" is a vibe check.

Maturity ratings are also a communication tool. They compress technical findings into a form that non-technical stakeholders (board, finance, sales, customers) can understand. "We're at Level 1 on Reliability and need to be at Level 3 to sell into regulated industries" is a sentence a CFO can act on.


2. Why This Specific Methodology

Several methodologies could have been used. The hybrid chosen here was selected because each alternative has a specific weakness this model corrects.

Alternative What it does well Why it isn't sufficient alone
CMMI applied broadly The 1–5 level scale is industry-standard and well-understood Treats the whole platform as a single number — too coarse to drive a roadmap
DORA metrics only Excellent for delivery performance Measures delivery throughput; says nothing about architecture, data, or security
AWS Well-Architected only Strong on cloud-native pillars (security, reliability, cost) Vendor-flavored, light on developer practices and team governance
OWASP / ISO 27001 only Authoritative on security Single pillar; doesn't surface trade-offs against other pillars
Pure narrative audit Rich detail Hard to track over time, hard to compare across teams

The model used here combines pillar selection from multiple sources with CMMI's 1–5 level scale. Pillars are independent — a team can be Level 4 on Documentation and Level 1 on Security; that is informative, not contradictory. Independence is the model's most useful property.

What the model tells us

  • The shape of the gap, not just the size. A platform with even ratings across pillars is in different trouble than one with a Level 4 on observability and a Level 1 on security.
  • The order of investment. High-leverage pillars (Architecture, Background Processing, Security) constrain progress on others; you cannot reach Level 3 on Reliability while Background Processing is at Level 1.
  • The realistic phasing. The transition from Level 1 to Level 2 on a pillar is typically one quarter; Level 2 to Level 3 is two quarters; Level 3 to Level 4 is a year or more.

What the model does NOT tell us

  • Quality of customer experience. A Level 4 platform can still build the wrong product.
  • Velocity of delivery. DORA metrics are partly captured under CI/CD but the model does not measure feature throughput.
  • Strategic fit. Whether the right modules exist at all is a product-architecture question, not a maturity question.
  • Comparative ranking against competitors. Maturity is internal-facing; competitive position requires market analysis.

The model is a health check, not a strategy.


3. The 1–5 Scale

The level scale is descended from CMMI (Capability Maturity Model Integration), originally developed by the Software Engineering Institute at Carnegie Mellon. CMMI's level names have been lightly modernized to match contemporary engineering vocabulary:

Level CMMI name Plain reading
1 Initial Ad-hoc, heroic, undocumented, single point of human failure
2 Repeatable / Managed The same person can do it twice the same way; partly documented
3 Defined Codified, anyone on the team can do it, low truck-factor risk
4 Quantitatively Managed Instrumented, measured, bounded by SLOs / error budgets
5 Optimizing Continuous-improvement loop, automated remediation, self-correcting

Why this scale

  • Five levels is enough granularity to drive roadmap decisions without producing false precision. A 10-point scale invites debates over half-points; a 3-point scale loses too much signal.
  • Level 3 is "normal." Most healthy enterprise platforms operate at Level 3–4 across most pillars. Level 5 is rare and usually only worth pursuing for the 1–2 pillars where the platform has a competitive moat.
  • Level transitions correspond to cultural changes, not just tooling. Moving from Level 2 to Level 3 typically requires writing things down (codification); from Level 3 to Level 4 requires instrumentation and measurement; from Level 4 to Level 5 requires automation of the improvement loop itself.
  • Adopted convention. Engineers, auditors, and procurement teams recognize this scale. It is not a custom dialect.

The "off the scale" cases

  • A score below 1 is not used. Either a capability is at Level 1 (chaotic but present) or it is genuinely absent — in which case the rating is N/A and the gap is explicitly flagged.
  • A score above 5 is not meaningful. Level 5 already implies continuous improvement; there is nothing higher to aspire to within this scale.

4. Origin of the Pillars

The 14 pillars are not invented for this assessment. Each is drawn from one or more established frameworks; the synthesis is opinionated but the components are standard.

Pillar Primary source Secondary sources
Architecture & Modularity TOGAF, "Building Microservices" (Newman) Conway's Law / Domain-Driven Design
API Contract & Versioning OpenAPI Initiative, REST API best practices Stripe / Twilio API design guides
Data Architecture "Designing Data-Intensive Applications" (Kleppmann) AWS Well-Architected — Reliability pillar
Background Processing Industry pattern (queue-based architecture) Twelve-Factor — concurrency, processes
Security & Compliance OWASP API Top 10, CIS Controls v8, ISO/IEC 27001 SOC 2 Trust Services Criteria
Observability OpenTelemetry, "Observability Engineering" (Majors et al.) Google SRE Book
Reliability & Resilience AWS Well-Architected — Reliability Google SRE Book — error budgets, SLOs
Scalability AWS Well-Architected — Performance Efficiency Twelve-Factor — concurrency
Testing & Quality Gates "Continuous Delivery" (Humble & Farley) Test pyramid (Cohn)
CI/CD & Deployment DORA / "Accelerate" (Forsgren et al.) Continuous Delivery
Infrastructure as Code DORA, AWS Well-Architected — Operational Excellence Terraform / Pulumi conventions
Cost Visibility & FinOps FinOps Foundation framework AWS Well-Architected — Cost Optimization
Documentation & Knowledge Mgmt DORA, internal-developer-platform best practice "Software Engineering at Google"
Team Practices & Governance DORA, SPACE framework, ITIL change management "Accelerate"

The pillars were chosen to cover both technical and operational dimensions — a pure technical model would miss governance and team practices; a pure operational model would miss data architecture and API contract. Both classes of failure are common at Level 1 platforms.

Pillars deliberately excluded

  • Performance as a standalone pillar — folded into Scalability and Reliability. Performance numbers without reliability and scalability context are not actionable.
  • User experience / product quality — out of scope for an engineering-platform assessment.
  • Hiring / talent strategy — partly captured under Team Practices, but the broader org-design question is the CTO's responsibility, not the platform assessment's.

5. What Each Pillar Means

Brief definitions; the assessment doc provides per-pillar findings.

5.1 Architecture & Modularity

How the codebase is structured: bounded contexts, module boundaries, separation of concerns, file/component sizing, the ability to change one part without disturbing others. Why it matters: dictates the cost of every future change.

5.2 API Contract & Versioning

Existence and discipline of an API specification (OpenAPI / AsyncAPI / GraphQL schema), version management, deprecation policy, schema validation. Why it matters: defines the platform's contract with external integrators and shields it from breaking changes.

5.3 Data Architecture

Schema management (migrations), driver consistency, separation of OLTP and analytics, replication strategy, retention policy, data-locality concerns. Why it matters: the database is usually the hardest thing to change in an enterprise platform; getting it wrong has decade-long consequences.

5.4 Background Processing

Job queue infrastructure, worker concurrency model, retry / dead-letter / idempotency semantics, observable queue health. Why it matters: for SaaS platforms, the worker fleet typically does more work than the request fleet; instability here is felt by every user.

5.5 Security & Compliance

Authentication, authorization, secrets management, network and platform-level controls, compliance posture (SOC 2, ISO 27001, GDPR), audit logging, supply-chain security. Why it matters: procurement gates, breach risk, regulatory exposure.

5.6 Observability

Structured logging, metrics, distributed tracing, correlation IDs, dashboards, alerting, SLOs. Why it matters: determines mean-time-to-detect and mean-time-to-resolve. Without observability, on-call is reactive guessing.

5.7 Reliability & Resilience

Circuit breakers, retries with backoff, graceful shutdown, single-points-of-failure analysis, disaster recovery, backup/restore validation, RTO / RPO commitments. Why it matters: uptime SLAs are the most visible promise to customers.

5.8 Scalability

Horizontal scaling readiness, bottleneck analysis, load testing, capacity planning, resource quotas. Why it matters: distinguishes platforms that grow gracefully from platforms that hit a ceiling and require a rewrite.

5.9 Testing & Quality Gates

Static analysis, unit / integration / E2E tests, contract tests, code coverage on critical paths, test gates in CI. Why it matters: the floor below which refactoring becomes too dangerous to attempt.

5.10 CI/CD & Deployment

Build pipeline, deployment automation, blue/green / canary, rollback, artifact signing, environment promotion. Why it matters: governs the cost and risk of every change reaching production.

5.11 Infrastructure as Code

Terraform / Pulumi / equivalent, environment definitions, IaC for the full stack (compute, network, data, observability), drift detection. Why it matters: without IaC, environments diverge invisibly and disaster recovery is a guess-and-check exercise.

5.12 Cost Visibility & FinOps

Per-service / per-feature / per-tenant cost attribution, anomaly detection, budgets, unit-economics tracking. Why it matters: unprofitable customers, runaway AI spend, and pricing decisions all depend on this visibility.

5.13 Documentation & Knowledge Management

Architecture documents, runbooks, ADRs, API docs, onboarding materials. Why it matters: truck-factor mitigation; onboarding speed; transfer of context across team transitions.

5.14 Team Practices & Governance

Code review standards, change management, on-call rotation, postmortems, ADRs as a process, security-aware PR review. Why it matters: technology fixes alone do not produce maturity; processes are how maturity is sustained.


6. Grading Rubric

Each level has both a generic definition and pillar-specific criteria. A pillar earns a level only when all criteria for that level are met.

6.1 Generic level definitions

Level 1 — Initial. - Capability exists in some form, but ad-hoc. - Outcomes depend on which individual is doing the work. - No documentation, or documentation is so out-of-date it misleads. - Failures are common; root-cause analysis is rare; the same failure recurs. - Truck factor of 1 (a single person leaves and the capability degrades materially).

Level 2 — Repeatable. - A team can produce the same outcome twice if conditions are similar. - Some documentation exists; some tooling is in place. - Outcomes are inconsistent across people or contexts. - Failures are noticed; root-cause analysis is sometimes done; learnings are sometimes captured. - Truck factor of 2–3.

Level 3 — Defined. - The way the capability is delivered is codified — written down, agreed on, tooled. - Anyone on the team can produce a typical outcome by following the process. - Tooling enforces the process where possible (linters, CI gates, templates). - Failures are investigated; postmortems are written; learnings inform the codified process. - Truck factor exceeds team size in most areas.

Level 4 — Quantitatively Managed. - The capability is measured. Specific quantitative targets exist (SLOs, coverage thresholds, error budgets). - Trend lines are tracked; deviations are alerted on. - Decisions about the capability are evidence-based, not opinion-based. - Improvement work is scoped to specific metrics moving in specific directions.

Level 5 — Optimizing. - The capability has a continuous improvement loop built in. - Process / tooling improvements are surfaced automatically (e.g., flaky-test detection, anomaly detection on cost). - Self-healing or auto-remediation is present where appropriate. - The capability gets better without explicit human intervention each cycle.

6.2 Pillar-specific criteria

The table below shows the minimum concrete evidence required for a pillar to earn each level. A pillar at Level 3 must meet all Level-3 criteria and all preceding criteria.

Pillar Level 1 Level 3 (target) Level 4 (aspirational)
Architecture & Modularity Single-file modules > 5k lines; no module boundaries Bounded contexts with explicit module APIs; lint enforces import boundaries Module health metrics (cyclomatic complexity, dependency churn) tracked over time
API Contract & Versioning No spec; no version prefix; no deprecation policy OpenAPI spec generated from code; URL versioning; documented deprecation policy Spec compliance gated in CI; consumer-driven contract tests
Data Architecture No migration tool; multiple DB drivers; analytics in OLTP One driver; migration tool used for every schema change; read replica for analytics Slow-query SLO; per-tenant data partitioning; query-cost dashboards
Background Processing Cron-polling on DB; no queue; no idempotency Real queue (Redis / SQS / etc.); idempotency keys on side-effects; DLQ Queue-depth SLO; per-job-type error rate dashboards; auto-scaling on backlog
Security & Compliance Secrets in source; no rate limit; CORS open Secrets manager; rate limit on auth + cost endpoints; helmet; SAST + SCA in CI SOC 2 Type II; quarterly pentest; audit log on all privileged actions
Observability console.log only; no metrics; no traces Structured logs with correlation IDs; metrics on golden signals; distributed traces SLOs and error budgets defined and tracked; alerts mapped to runbooks
Reliability & Resilience No circuit breakers; no graceful shutdown; single DB SPOF Circuit breakers + retry-with-backoff; graceful shutdown; tested backups Defined RTO / RPO; chaos testing in staging; multi-AZ
Scalability Cannot run on a 2nd host without correctness regressions Horizontally scalable; load tests run before release Auto-scaling on measured signals; capacity model maintained per-service
Testing & Quality Gates Zero tests Unit + integration on critical paths; CI gate; ≥ 60% coverage on changed code Mutation testing; contract tests; flaky-test quarantine
CI/CD & Deployment Manual deploy; SSH + process restart Containerized deploy; automated rollback on health-check fail; ≥ 1 deploy/day Canary / blue-green; DORA elite metrics tracked; feature flags
Infrastructure as Code No IaC; environments differ invisibly Terraform (or equiv) for all infra; environments parameterized Drift detection automated; IaC drift gates merges
Cost Visibility & FinOps No per-service tagging; cloud bill is a surprise Cost tags on every resource; weekly review; anomaly alerts Per-tenant unit economics; cost-attribution gates feature decisions
Documentation & Knowledge Mgmt Tribal knowledge only Architecture, runbook, ADR documents in-repo; reviewed in PRs Documentation health metric (freshness, completeness) tracked
Team Practices & Governance Ad-hoc PR review; no standard Codified review checklist; change-management for high-risk changes; postmortems DORA metrics tracked per-team; postmortem actions completed and verified

6.3 Worked examples

Architecture & Modularity at Level 1. Two route files totaling 46k lines; no module boundaries; cross-cutting helpers imported from anywhere. Why this is L1, not L2: there is no internal structure that a second engineer could reproduce; understanding requires reading the entire file. (This describes the current state.)

Architecture & Modularity at Level 3. Codebase organized into modules/identity, modules/content, etc.; each module exposes a typed public API; ESLint or equivalent prevents internals from being imported across module boundaries; new engineers can be productive in one module without reading the others.

Security & Compliance at Level 1. Hardcoded secrets in source; CORS accepts all origins; no rate limiting; no helmet; no SAST. Why this is L1, not L2: none of the basic controls are reproducibly in place. (This describes the current state.)

Security & Compliance at Level 3. Secrets in a manager; rate limits on auth and cost-bearing endpoints; helmet with reasonable defaults; SAST and dependency scanning in CI; threat model exists. SOC 2 Type I in progress.

Background Processing at Level 1. 108 PM2 workers polling MySQL; no idempotency keys; concurrency control is a per-process flag that doesn't survive multi-host deployment. (This describes the current state.)

Background Processing at Level 4. Queue-depth SLO defined and tracked; per-job-type latency and error-rate dashboards; auto-scaling on backlog; idempotency tested in CI; chaos tests for retry behavior.


7. How to Apply Ratings

7.1 Evidence required

A rating must be backed by observable evidence, not opinion. Acceptable evidence:

  • Source code / repository state (file sizes, dependency graph, presence of tests)
  • Configuration files (ecosystem.config.js, IaC files, CI workflows)
  • Operational artifacts (dashboards, runbooks, alert configurations, postmortem documents)
  • Process artifacts (PR templates, code review checklists, change-management tickets)
  • Direct measurement (uptime numbers, deployment frequency, MTTR from real incidents)

Self-assessment without evidence is not a rating; it is a hope.

7.2 Who decides

Ratings should be reviewed by at least two senior engineers, ideally with one external to the team to counteract familiarity bias. For pillars with compliance implications (Security, Data Architecture), an external auditor's view is the strongest signal.

7.3 Cadence

  • Quarterly re-rating is the recommended cadence for an active modernization program. More frequent than that produces noise; less frequent loses the feedback loop.
  • Annual re-rating is sufficient once the platform reaches Level 3 across most pillars.
  • Event-driven re-rating when major changes occur (a service extraction, a compliance certification, an incident that exposes a gap).

7.4 What counts as a level transition

A pillar is considered to have transitioned to a new level only when the criteria for that level have been met for one full quarter without regression. A capability that meets Level 3 criteria for two weeks and then quietly degrades is still effectively at Level 2 — discipline is part of the rating.

7.5 Targets vs. fantasies

A 12-month target for a pillar should be at most one level above current, except in unusual circumstances. Two-level transitions in a single year are rare and usually involve hiring or organizational change, not just engineering. Targets that exceed plausibility undermine the model.


8. Limitations

8.1 Subjectivity

The rubric is structured to minimize judgment calls, but no rubric is perfectly objective. Two qualified raters will agree on most pillars and disagree on edges. The model's value comes from trend over time, not from any single rating's precision.

8.2 Pillar interaction is not modeled

The pillars are independent in the table but interdependent in reality. Improving Observability while Security remains at Level 1 produces a platform that can see its own breaches but cannot prevent them. The model surfaces gaps; it does not prescribe sequencing — that is the roadmap's job.

8.3 Maturity is not value

A Level 5 platform that solves the wrong problem is worth less than a Level 2 platform that solves the right one. The model assesses how the platform is built, not whether it should exist.

8.4 The model can be gamed

Any rubric can be optimized for. A team that wants to "show progress" can re-categorize criteria to artificially advance ratings. The model assumes good-faith application, with external validation as the safeguard.

8.5 Best-in-class is rare

Level 5 across multiple pillars is uncommon outside large platform companies that have invested decades and hundreds of engineers. For most enterprise platforms, Level 3 across most pillars and Level 4 on the 2–3 pillars most critical to the business is the realistic destination.


  • Enterprise Readiness Assessment — uses this model to rate the platform and produce a roadmap
  • Architecture Overview — current technical state referenced in pillar evidence
  • AWS Well-Architected Framework whitepapers — the source for several pillars
  • DORA / "Accelerate" — source for CI/CD and team-practices pillars
  • CMMI — source for the 1–5 level scale