This guide is for SRE and Infrastructure leaders who need clearer signals on change risk, operational load, and how delivery patterns intersect with reliability outcomes. It focuses on using Metrics → Delivery, Teams → Iterations, Developer Coaching, and gitStream (if enabled), plus reliability/incident metrics where configured.
TL;DR – SRE / Infra / Reliability:
- Use Metrics → Delivery to find flow patterns that increase change risk.
- Use Teams → Iterations (Completed) to track unplanned work and reliability-driven scope shifts.
- Use Developer Coaching to spot workload patterns that signal operational strain.
- If configured, pair incident / reliability metrics with Delivery trends to strengthen your story.
- Use gitStream to standardize safe-change behavior with low noise.
Start here in 15 minutes
- Pick one reliability-critical service or team.
- In Metrics → Delivery, set the time window to the last 4–8 weeks.
- Scan for:
- Spikes in PR size.
- Periods with slower Review or Deploy Time.
- Open Teams → Iterations → Completed for the same team and:
- Estimate how much work was unplanned (operational / incident-driven).
- Write a one-line summary:
“When X happens in delivery, we see more reliability load / incidents.” - Use that summary to propose one experiment (e.g., smaller PRs or extra review on a service).
Who this guide is for
This path is for people who:
- Own or influence availability, incident response, and change management.
- Need to show how delivery practices affect reliability and operational load.
- Partner with DevEx, Platform, QA/Release, and PMO.
What you likely care about
- Are change patterns increasing reliability risk?
- Is operational work visible and linked to planning, or hidden as “background noise”?
- Where is unplanned reliability work eroding feature capacity?
- Which low-noise standards reduce risk without slowing flow?
Before you begin
- Git integration and key repos are connected.
- Teams, services, and ownership are clear enough to slice metrics by team or area.
- If available, incident / reliability metrics are configured and mapped to teams/services.
- Developer Coaching is enabled for relevant teams (where available).
- gitStream is enabled on at least some reliability-critical repos (if your org uses it).
Step 1: Use Delivery metrics to identify risky change patterns
Goal: Connect reliability issues to concrete delivery behavior.
Where: Metrics → Delivery
- Select a team or service that has seen incidents or reliability concerns.
- Choose a timeframe that includes recent incidents (e.g., last 4–8 weeks).
- Review:
- Cycle Time stage trends (especially Review and Deploy Time).
- PR size patterns and any spikes in large, late changes.
- Any visible trends around rushes to deploy before cutoffs.
- Mark 1–2 concrete risk signals, such as:
- “Frequent large PRs merged shortly before deploy.”
- “Review Time compressed when incident backlog is high.”
Step 2: Make reliability work visible in Iterations
Goal: Show how unplanned reliability work affects delivery capacity.
Where: Teams → Iterations (Completed)
- Open the last few completed iterations for teams covering critical services.
- Review:
- Unplanned work that came from incidents / reliability tasks.
- Scope removed or delayed because of operational load.
- Patterns across iterations (e.g., every sprint loses 20–30% of capacity to incidents).
- Use these patterns to:
- Quantify reliability work in terms of lost feature capacity.
- Make the case for more SRE capacity or automation.
Step 3: Use Developer Coaching to spot operational strain
Goal: Find hotspots where a few people carry too much reliability burden.
Where: Developer Coaching (if enabled)
- Look for contributors who:
- Handle a disproportionate share of reviews or critical PRs.
- Frequently appear in incident/operational work.
- Compare those hotspots with:
- High Cycle Time or Rework in their services.
- Known incident trends.
- Use this to justify:
- Spreading knowledge via pairing, documentation, or ownership changes.
- Targeted automation or standards for high-risk areas.
Step 4: Pair reliability metrics with delivery trends (if configured)
Goal: Tell a clean “change → incident → improvement” story.
- Identify periods or services with higher incident volume or failure signals.
- Overlay those periods with:
- Spikes in large or rushed PRs.
- Increased unplanned work in Iterations.
- Capture 1–2 specific narratives per quarter to bring to leadership and DevEx/QA:
- “When we tightened review standards and reduced oversized PRs, incidents dropped the next month.”
Step 5: Use gitStream to standardize safe-change patterns
Goal: Turn reliability learnings into guardrails.
Where: gitStream Hub (if enabled)
- Start with patterns that directly reduce risk:
- Flagging changes in critical services for extra review.
- Protecting against massive PRs in sensitive areas.
- Encouraging AI review or additional checks for high-risk files.
- Roll guardrails out to a few services, then expand once teams are comfortable.
- Use Delivery and incident trends to verify impact.
Recommended operating rhythm
Weekly
- Review Delivery stage trends for high-risk services.
- Scan Completed Iterations for reliability-driven unplanned work.
- Bring one reliability+flow observation to your platform/DevEx or EM partners.
Monthly / per release
- Summarize how delivery patterns correlated with incidents.
- Agree on one safe-change experiment (standard or automation) to test.
- Update gitStream and team standards based on outcomes.
Comments
0 comments
Article is closed for comments.