Home Founders' Note

Why We Built IncidentFox

The IncidentFox Founders Dec 11, 2025 8 min read

Manual on-call is broken — and SREs deserve better.

Site Reliability Engineering was created because reliability matters.

It's the backbone of every modern system. It ensures availability, performance, and resilience. And in its ideal form, SRE is a discipline filled with deep engineering: automation, infrastructure design, distributed systems thinking, long-term improvements, postmortems, and reliability strategy.

That's what SRE was meant to be.

But ask yourself this:

When you hear "SRE," what do you picture?

How many SREs have you met who are actually doing the reliability engineering they signed up for — and how many are mostly… on call?

The reality today is that "SRE" has quietly become synonymous with "the on-call people."

And I say this as someone who has lived it firsthand.

Where SRE Drifted — And What It Feels Like Inside

I've had the honor of working on the real, exciting side of SRE: building automation that deployed across thousands of machines, improving distributed systems, designing better reliability pipelines. Work that required thought, creativity, and experience — the work that makes SRE a craft.

But then there's on-call week.

Every time it came around, I crossed my fingers and hoped nothing big would happen. Big outages were rare. What happened far more often were the draining, constant interruptions: medium issues, small degradations, noisy alerts, misbehaving cron jobs, slow endpoints, partial outages scattered across unfamiliar services.

Late-night incident coordination.

Jumping into systems I barely know.

Applying bandaid fixes even though I know they're not the right long-term solution.

Writing the same postmortem for the third time this month because we never have time for the real fix.

Spending the next day mentally fragmented from context switching.

It doesn't feel good — not because the problems aren't solvable, but because this is not what SRE talent is meant for.

If this resonates with you, it's because it's a shared experience.

And it's not just anecdotal. The data is clear:

  • 22% of engineering leaders face critical burnout — with on-call being a leading cause
  • Repetitive response tasks are the #1 driver of incident responder fatigue
  • On-call engineers report anxiety even when not actively responding
  • Burnout directly increases mistakes during incidents, creating a vicious cycle

This isn't an individual problem. It's a systemic one.

The Real Problem: On-Call Has Become Work Humans Shouldn't Be Doing

Modern systems are too complex. Too distributed. They generate too much operational noise for humans to manually triage.

Yet we're still asking people — talented engineers — to sift through logs, correlate alerts, coordinate responders, apply temporary fixes, and write summaries in the middle of the night.

But here's the truth:

Most of on-call is not deep engineering.

It's structured, repeatable, cognitively demanding work — the kind of work AI is exceptionally good at.

And because we overload SREs with this work, the actual high-leverage parts of the job suffer: reliability design, automation, long-term improvements, systems thinking. The work that truly requires experience and judgment.

SRE is an important role.

It deserves the space for deep engineering, not reactive firefighting.

And that's exactly why we built IncidentFox.

Our Belief — And the Future We're Building

We believe AI can handle the majority of the incident lifecycle:

  • Triage & investigation
  • Alert correlation
  • Incident coordination
  • Root-cause analysis
  • Postmortems
  • Proposals for pipeline and reliability improvements

Not because AI will replace SREs — but because AI should take over the parts of on-call that were never meant for humans in the first place.

Humans will still do the hard engineering, the creative system design, the nuanced judgment calls. But AI can and should take on the repetitive, reactive, mentally draining parts of on-call that burn people out and pull them away from meaningful work.

We believe SREs should spend their talent on engineering — not endless context switching, not midnight bandaid fixes, not incident coordination theater.

And so we built IncidentFox: an AI SRE agent that absorbs the operational load and restores what SRE was always meant to be.

Manual on-call is broken.

AI can fix it.

Let's give SREs their craft back.

— The IncidentFox Founders

Ready to Reclaim Your Craft?

We're accepting 2 design partners for our Q1 pilot program. Let AI handle the grind so you can focus on engineering that matters.

Book Free Pilot