Data Center Incident Program Manager

Jobgether

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Data Center Incident Program Manager in United States.

The Data Center Incident Program Manager will lead the end-to-end incident management lifecycle for mission-critical data center environments, ensuring operational resilience and rapid recovery. This role requires a strategic and detail-oriented professional who can define standards, establish protocols, and lead cross-functional teams during high-impact incidents. You will serve as Incident Commander when necessary, drive post-incident analysis, and implement corrective actions to prevent recurrence. By designing governance frameworks, reporting structures, and readiness exercises, you will enhance reliability, accountability, and operational excellence. The ideal candidate thrives under pressure, brings technical credibility in data center operations, and fosters continuous improvement across teams and processes. Your work will directly impact the stability and scalability of high-density compute infrastructure.

n

Accountabilities:

  • Define incident severity levels, escalation thresholds, and lifecycle stages from declaration to closure
  • Establish and maintain incident response standards, war rooms, runbooks, and stakeholder communication templates
  • Lead readiness activities including simulations, tabletop exercises, and on-call Incident Commander rotations
  • Serve as Incident Commander during high-impact events, coordinating cross-functional teams and driving structured response
  • Conduct post-incident reviews, perform root cause analyses, and track corrective and preventive actions to closure
  • Implement incident management tools, dashboards, and program metrics to monitor performance and readiness
  • Communicate trends and systemic gaps to design and operations teams for ongoing improvement

Requirements:

  • 7+ years of experience in mission-critical infrastructure, data center operations, or reliability engineering
  • Proven experience leading major incidents and war rooms, with calm and decisive leadership under pressure
  • Strong familiarity with facilities systems, hardware operations, or network infrastructure
  • Demonstrated ability to run post-incident reviews and track corrective actions effectively
  • Experience defining and operationalizing incident management processes, documentation, and escalation paths
  • Preferred: experience in hyperscale or high-density AI compute environments, familiarity with ISO-based quality systems, and proficiency with incident tooling such as PagerDuty, ServiceNow, or Jira

Benefits:

  • Competitive salary range of $125,600–$228,000 USD plus equity and performance-related bonuses
  • Medical, dental, and vision insurance with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses
  • 401(k) plan with employer match
  • Paid parental, medical, and caregiver leave
  • Flexible paid time off and 13+ company holidays, plus additional coordinated office closures
  • Mental health and wellness support
  • Employer-paid life and disability coverage
  • Annual learning and development stipend and relocation support for eligible employees
  • Meal benefits and other taxable fringe perks such as charitable donation matching and wellness stipends

n

Why Apply Through Jobgether?

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role’s core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

 Why Apply Through Jobgether? 

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

You can apply to this job and others using your online resume. Click the link below to submit your online resume and email your application to this employer.

Set up job alerts and get notified about the new jobs

Similar Jobs

Scroll to Top