Maturity Model & Methodology¶

This document explains the maturity model used in the Enterprise Readiness Assessment — why this methodology was chosen, what it does and does not tell us, what each pillar means, where the pillars come from, and the criteria for rating each pillar at each level.

It exists so that the ratings in the assessment are auditable: any reader can independently verify the level assigned to a pillar by checking the criteria here against observable evidence in the codebase and the operational environment.

1. Purpose of a Maturity Model¶

A maturity model answers four practical questions that a list of findings cannot:

Where are we, on a comparable scale? Saying "the platform has no rate limiting" is true but lacks calibration. Saying "Security is at Level 1 of 5" tells you immediately how that compares to a typical enterprise platform (Level 3) or a best-in-class one (Level 4).
Where do we need to be? A target rating per pillar makes the destination explicit. "Get to Level 3 on Observability" is a tractable goal; "improve observability" is not.
What is the gap? The difference between current and target tells you scope. The difference between current and typical tells you whether you are ahead, behind, or normal.
Are we making progress? A repeatable rubric lets you re-rate quarterly and see motion. Without a model, "are we improving?" is a vibe check.

Maturity ratings are also a communication tool. They compress technical findings into a form that non-technical stakeholders (board, finance, sales, customers) can understand. "We're at Level 1 on Reliability and need to be at Level 3 to sell into regulated industries" is a sentence a CFO can act on.

2. Why This Specific Methodology¶

Several methodologies could have been used. The hybrid chosen here was selected because each alternative has a specific weakness this model corrects.

Alternative	What it does well	Why it isn't sufficient alone
CMMI applied broadly	The 1–5 level scale is industry-standard and well-understood	Treats the whole platform as a single number — too coarse to drive a roadmap
DORA metrics only	Excellent for delivery performance	Measures delivery throughput; says nothing about architecture, data, or security
AWS Well-Architected only	Strong on cloud-native pillars (security, reliability, cost)	Vendor-flavored, light on developer practices and team governance
OWASP / ISO 27001 only	Authoritative on security	Single pillar; doesn't surface trade-offs against other pillars
Pure narrative audit	Rich detail	Hard to track over time, hard to compare across teams

The model used here combines pillar selection from multiple sources with CMMI's 1–5 level scale. Pillars are independent — a team can be Level 4 on Documentation and Level 1 on Security; that is informative, not contradictory. Independence is the model's most useful property.

What the model tells us¶

The shape of the gap, not just the size. A platform with even ratings across pillars is in different trouble than one with a Level 4 on observability and a Level 1 on security.
The order of investment. High-leverage pillars (Architecture, Background Processing, Security) constrain progress on others; you cannot reach Level 3 on Reliability while Background Processing is at Level 1.
The realistic phasing. The transition from Level 1 to Level 2 on a pillar is typically one quarter; Level 2 to Level 3 is two quarters; Level 3 to Level 4 is a year or more.

What the model does NOT tell us¶

Quality of customer experience. A Level 4 platform can still build the wrong product.
Velocity of delivery. DORA metrics are partly captured under CI/CD but the model does not measure feature throughput.
Strategic fit. Whether the right modules exist at all is a product-architecture question, not a maturity question.
Comparative ranking against competitors. Maturity is internal-facing; competitive position requires market analysis.

The model is a health check, not a strategy.

3. The 1–5 Scale¶

The level scale is descended from CMMI (Capability Maturity Model Integration), originally developed by the Software Engineering Institute at Carnegie Mellon. CMMI's level names have been lightly modernized to match contemporary engineering vocabulary:

Level	CMMI name	Plain reading
1	Initial	Ad-hoc, heroic, undocumented, single point of human failure
2	Repeatable / Managed	The same person can do it twice the same way; partly documented
3	Defined	Codified, anyone on the team can do it, low truck-factor risk
4	Quantitatively Managed	Instrumented, measured, bounded by SLOs / error budgets
5	Optimizing	Continuous-improvement loop, automated remediation, self-correcting

Why this scale¶

Five levels is enough granularity to drive roadmap decisions without producing false precision. A 10-point scale invites debates over half-points; a 3-point scale loses too much signal.
Level 3 is "normal." Most healthy enterprise platforms operate at Level 3–4 across most pillars. Level 5 is rare and usually only worth pursuing for the 1–2 pillars where the platform has a competitive moat.
Level transitions correspond to cultural changes, not just tooling. Moving from Level 2 to Level 3 typically requires writing things down (codification); from Level 3 to Level 4 requires instrumentation and measurement; from Level 4 to Level 5 requires automation of the improvement loop itself.
Adopted convention. Engineers, auditors, and procurement teams recognize this scale. It is not a custom dialect.

The "off the scale" cases¶

A score below 1 is not used. Either a capability is at Level 1 (chaotic but present) or it is genuinely absent — in which case the rating is N/A and the gap is explicitly flagged.
A score above 5 is not meaningful. Level 5 already implies continuous improvement; there is nothing higher to aspire to within this scale.

4. Origin of the Pillars¶

The 14 pillars are not invented for this assessment. Each is drawn from one or more established frameworks; the synthesis is opinionated but the components are standard.

Pillar	Primary source	Secondary sources
Architecture & Modularity	TOGAF, "Building Microservices" (Newman)	Conway's Law / Domain-Driven Design
API Contract & Versioning	OpenAPI Initiative, REST API best practices	Stripe / Twilio API design guides
Data Architecture	"Designing Data-Intensive Applications" (Kleppmann)	AWS Well-Architected — Reliability pillar
Background Processing	Industry pattern (queue-based architecture)	Twelve-Factor — concurrency, processes
Security & Compliance	OWASP API Top 10, CIS Controls v8, ISO/IEC 27001	SOC 2 Trust Services Criteria
Observability	OpenTelemetry, "Observability Engineering" (Majors et al.)	Google SRE Book
Reliability & Resilience	AWS Well-Architected — Reliability	Google SRE Book — error budgets, SLOs
Scalability	AWS Well-Architected — Performance Efficiency	Twelve-Factor — concurrency
Testing & Quality Gates	"Continuous Delivery" (Humble & Farley)	Test pyramid (Cohn)
CI/CD & Deployment	DORA / "Accelerate" (Forsgren et al.)	Continuous Delivery
Infrastructure as Code	DORA, AWS Well-Architected — Operational Excellence	Terraform / Pulumi conventions
Cost Visibility & FinOps	FinOps Foundation framework	AWS Well-Architected — Cost Optimization
Documentation & Knowledge Mgmt	DORA, internal-developer-platform best practice	"Software Engineering at Google"
Team Practices & Governance	DORA, SPACE framework, ITIL change management	"Accelerate"

The pillars were chosen to cover both technical and operational dimensions — a pure technical model would miss governance and team practices; a pure operational model would miss data architecture and API contract. Both classes of failure are common at Level 1 platforms.

Pillars deliberately excluded¶

Performance as a standalone pillar — folded into Scalability and Reliability. Performance numbers without reliability and scalability context are not actionable.
User experience / product quality — out of scope for an engineering-platform assessment.
Hiring / talent strategy — partly captured under Team Practices, but the broader org-design question is the CTO's responsibility, not the platform assessment's.

5. What Each Pillar Means¶

Brief definitions; the assessment doc provides per-pillar findings.

5.1 Architecture & Modularity¶

How the codebase is structured: bounded contexts, module boundaries, separation of concerns, file/component sizing, the ability to change one part without disturbing others. Why it matters: dictates the cost of every future change.

5.2 API Contract & Versioning¶

Existence and discipline of an API specification (OpenAPI / AsyncAPI / GraphQL schema), version management, deprecation policy, schema validation. Why it matters: defines the platform's contract with external integrators and shields it from breaking changes.

5.3 Data Architecture¶

Schema management (migrations), driver consistency, separation of OLTP and analytics, replication strategy, retention policy, data-locality concerns. Why it matters: the database is usually the hardest thing to change in an enterprise platform; getting it wrong has decade-long consequences.

5.4 Background Processing¶

Job queue infrastructure, worker concurrency model, retry / dead-letter / idempotency semantics, observable queue health. Why it matters: for SaaS platforms, the worker fleet typically does more work than the request fleet; instability here is felt by every user.

5.5 Security & Compliance¶

Authentication, authorization, secrets management, network and platform-level controls, compliance posture (SOC 2, ISO 27001, GDPR), audit logging, supply-chain security. Why it matters: procurement gates, breach risk, regulatory exposure.

5.6 Observability¶

Structured logging, metrics, distributed tracing, correlation IDs, dashboards, alerting, SLOs. Why it matters: determines mean-time-to-detect and mean-time-to-resolve. Without observability, on-call is reactive guessing.

5.7 Reliability & Resilience¶

Circuit breakers, retries with backoff, graceful shutdown, single-points-of-failure analysis, disaster recovery, backup/restore validation, RTO / RPO commitments. Why it matters: uptime SLAs are the most visible promise to customers.

5.8 Scalability¶

Horizontal scaling readiness, bottleneck analysis, load testing, capacity planning, resource quotas. Why it matters: distinguishes platforms that grow gracefully from platforms that hit a ceiling and require a rewrite.

5.9 Testing & Quality Gates¶

Static analysis, unit / integration / E2E tests, contract tests, code coverage on critical paths, test gates in CI. Why it matters: the floor below which refactoring becomes too dangerous to attempt.

5.10 CI/CD & Deployment¶

Build pipeline, deployment automation, blue/green / canary, rollback, artifact signing, environment promotion. Why it matters: governs the cost and risk of every change reaching production.

5.11 Infrastructure as Code¶

Terraform / Pulumi / equivalent, environment definitions, IaC for the full stack (compute, network, data, observability), drift detection. Why it matters: without IaC, environments diverge invisibly and disaster recovery is a guess-and-check exercise.

5.12 Cost Visibility & FinOps¶

Per-service / per-feature / per-tenant cost attribution, anomaly detection, budgets, unit-economics tracking. Why it matters: unprofitable customers, runaway AI spend, and pricing decisions all depend on this visibility.

5.13 Documentation & Knowledge Management¶

Architecture documents, runbooks, ADRs, API docs, onboarding materials. Why it matters: truck-factor mitigation; onboarding speed; transfer of context across team transitions.

5.14 Team Practices & Governance¶

Code review standards, change management, on-call rotation, postmortems, ADRs as a process, security-aware PR review. Why it matters: technology fixes alone do not produce maturity; processes are how maturity is sustained.

6. Grading Rubric¶

Each level has both a generic definition and pillar-specific criteria. A pillar earns a level only when all criteria for that level are met.

6.1 Generic level definitions¶

Level 1 — Initial. - Capability exists in some form, but ad-hoc. - Outcomes depend on which individual is doing the work. - No documentation, or documentation is so out-of-date it misleads. - Failures are common; root-cause analysis is rare; the same failure recurs. - Truck factor of 1 (a single person leaves and the capability degrades materially).

Level 2 — Repeatable. - A team can produce the same outcome twice if conditions are similar. - Some documentation exists; some tooling is in place. - Outcomes are inconsistent across people or contexts. - Failures are noticed; root-cause analysis is sometimes done; learnings are sometimes captured. - Truck factor of 2–3.

Level 3 — Defined. - The way the capability is delivered is codified — written down, agreed on, tooled. - Anyone on the team can produce a typical outcome by following the process. - Tooling enforces the process where possible (linters, CI gates, templates). - Failures are investigated; postmortems are written; learnings inform the codified process. - Truck factor exceeds team size in most areas.

Level 4 — Quantitatively Managed. - The capability is measured. Specific quantitative targets exist (SLOs, coverage thresholds, error budgets). - Trend lines are tracked; deviations are alerted on. - Decisions about the capability are evidence-based, not opinion-based. - Improvement work is scoped to specific metrics moving in specific directions.

Level 5 — Optimizing. - The capability has a continuous improvement loop built in. - Process / tooling improvements are surfaced automatically (e.g., flaky-test detection, anomaly detection on cost). - Self-healing or auto-remediation is present where appropriate. - The capability gets better without explicit human intervention each cycle.

6.2 Pillar-specific criteria¶

The table below shows the minimum concrete evidence required for a pillar to earn each level. A pillar at Level 3 must meet all Level-3 criteria and all preceding criteria.

Pillar	Level 1	Level 3 (target)	Level 4 (aspirational)
Architecture & Modularity	Single-file modules > 5k lines; no module boundaries	Bounded contexts with explicit module APIs; lint enforces import boundaries	Module health metrics (cyclomatic complexity, dependency churn) tracked over time
API Contract & Versioning	No spec; no version prefix; no deprecation policy	OpenAPI spec generated from code; URL versioning; documented deprecation policy	Spec compliance gated in CI; consumer-driven contract tests
Data Architecture	No migration tool; multiple DB drivers; analytics in OLTP	One driver; migration tool used for every schema change; read replica for analytics	Slow-query SLO; per-tenant data partitioning; query-cost dashboards
Background Processing	Cron-polling on DB; no queue; no idempotency	Real queue (Redis / SQS / etc.); idempotency keys on side-effects; DLQ	Queue-depth SLO; per-job-type error rate dashboards; auto-scaling on backlog
Security & Compliance	Secrets in source; no rate limit; CORS open	Secrets manager; rate limit on auth + cost endpoints; helmet; SAST + SCA in CI	SOC 2 Type II; quarterly pentest; audit log on all privileged actions
Observability	`console.log` only; no metrics; no traces	Structured logs with correlation IDs; metrics on golden signals; distributed traces	SLOs and error budgets defined and tracked; alerts mapped to runbooks
Reliability & Resilience	No circuit breakers; no graceful shutdown; single DB SPOF	Circuit breakers + retry-with-backoff; graceful shutdown; tested backups	Defined RTO / RPO; chaos testing in staging; multi-AZ
Scalability	Cannot run on a 2nd host without correctness regressions	Horizontally scalable; load tests run before release	Auto-scaling on measured signals; capacity model maintained per-service
Testing & Quality Gates	Zero tests	Unit + integration on critical paths; CI gate; ≥ 60% coverage on changed code	Mutation testing; contract tests; flaky-test quarantine
CI/CD & Deployment	Manual deploy; SSH + process restart	Containerized deploy; automated rollback on health-check fail; ≥ 1 deploy/day	Canary / blue-green; DORA elite metrics tracked; feature flags
Infrastructure as Code	No IaC; environments differ invisibly	Terraform (or equiv) for all infra; environments parameterized	Drift detection automated; IaC drift gates merges
Cost Visibility & FinOps	No per-service tagging; cloud bill is a surprise	Cost tags on every resource; weekly review; anomaly alerts	Per-tenant unit economics; cost-attribution gates feature decisions
Documentation & Knowledge Mgmt	Tribal knowledge only	Architecture, runbook, ADR documents in-repo; reviewed in PRs	Documentation health metric (freshness, completeness) tracked
Team Practices & Governance	Ad-hoc PR review; no standard	Codified review checklist; change-management for high-risk changes; postmortems	DORA metrics tracked per-team; postmortem actions completed and verified

6.3 Worked examples¶

Architecture & Modularity at Level 1. Two route files totaling 46k lines; no module boundaries; cross-cutting helpers imported from anywhere. Why this is L1, not L2: there is no internal structure that a second engineer could reproduce; understanding requires reading the entire file. (This describes the current state.)

Architecture & Modularity at Level 3. Codebase organized into modules/identity, modules/content, etc.; each module exposes a typed public API; ESLint or equivalent prevents internals from being imported across module boundaries; new engineers can be productive in one module without reading the others.

Security & Compliance at Level 1. Hardcoded secrets in source; CORS accepts all origins; no rate limiting; no helmet; no SAST. Why this is L1, not L2: none of the basic controls are reproducibly in place. (This describes the current state.)

Security & Compliance at Level 3. Secrets in a manager; rate limits on auth and cost-bearing endpoints; helmet with reasonable defaults; SAST and dependency scanning in CI; threat model exists. SOC 2 Type I in progress.

Background Processing at Level 1. 108 PM2 workers polling MySQL; no idempotency keys; concurrency control is a per-process flag that doesn't survive multi-host deployment. (This describes the current state.)

Background Processing at Level 4. Queue-depth SLO defined and tracked; per-job-type latency and error-rate dashboards; auto-scaling on backlog; idempotency tested in CI; chaos tests for retry behavior.

7. How to Apply Ratings¶

7.1 Evidence required¶

A rating must be backed by observable evidence, not opinion. Acceptable evidence:

Source code / repository state (file sizes, dependency graph, presence of tests)
Configuration files (ecosystem.config.js, IaC files, CI workflows)
Operational artifacts (dashboards, runbooks, alert configurations, postmortem documents)
Process artifacts (PR templates, code review checklists, change-management tickets)
Direct measurement (uptime numbers, deployment frequency, MTTR from real incidents)

Self-assessment without evidence is not a rating; it is a hope.

7.2 Who decides¶

Ratings should be reviewed by at least two senior engineers, ideally with one external to the team to counteract familiarity bias. For pillars with compliance implications (Security, Data Architecture), an external auditor's view is the strongest signal.

7.3 Cadence¶

Quarterly re-rating is the recommended cadence for an active modernization program. More frequent than that produces noise; less frequent loses the feedback loop.
Annual re-rating is sufficient once the platform reaches Level 3 across most pillars.
Event-driven re-rating when major changes occur (a service extraction, a compliance certification, an incident that exposes a gap).

7.4 What counts as a level transition¶

A pillar is considered to have transitioned to a new level only when the criteria for that level have been met for one full quarter without regression. A capability that meets Level 3 criteria for two weeks and then quietly degrades is still effectively at Level 2 — discipline is part of the rating.

7.5 Targets vs. fantasies¶

A 12-month target for a pillar should be at most one level above current, except in unusual circumstances. Two-level transitions in a single year are rare and usually involve hiring or organizational change, not just engineering. Targets that exceed plausibility undermine the model.

8. Limitations¶

8.1 Subjectivity¶

The rubric is structured to minimize judgment calls, but no rubric is perfectly objective. Two qualified raters will agree on most pillars and disagree on edges. The model's value comes from trend over time, not from any single rating's precision.

8.2 Pillar interaction is not modeled¶

The pillars are independent in the table but interdependent in reality. Improving Observability while Security remains at Level 1 produces a platform that can see its own breaches but cannot prevent them. The model surfaces gaps; it does not prescribe sequencing — that is the roadmap's job.

8.3 Maturity is not value¶

A Level 5 platform that solves the wrong problem is worth less than a Level 2 platform that solves the right one. The model assesses how the platform is built, not whether it should exist.

8.4 The model can be gamed¶

Any rubric can be optimized for. A team that wants to "show progress" can re-categorize criteria to artificially advance ratings. The model assumes good-faith application, with external validation as the safeguard.

8.5 Best-in-class is rare¶

Level 5 across multiple pillars is uncommon outside large platform companies that have invested decades and hundreds of engineers. For most enterprise platforms, Level 3 across most pillars and Level 4 on the 2–3 pillars most critical to the business is the realistic destination.

Enterprise Readiness Assessment — uses this model to rate the platform and produce a roadmap
Architecture Overview — current technical state referenced in pillar evidence
AWS Well-Architected Framework whitepapers — the source for several pillars
DORA / "Accelerate" — source for CI/CD and team-practices pillars
CMMI — source for the 1–5 level scale