Site Reliability Engineer (SRE)

Full Time - Hybrid / RemoteToronto (CA) - San Francisco (USA)

About Katalyze AI

Katalyze AI is a fast-growing AI-driven biotech platform company on a mission to make life-saving drugs accessible and affordable for everyone. Our AI Agents help pharmaceutical and biotech companies increase production efficiency, reduce costs, and minimize waste. We're a team of humble, fast-moving, and curious craftspeople working at the intersection of science and AI.

About the Role

We're looking for a Site Reliability Engineer to ensure Katalyze AI's platform is reliable, scalable, and secure as we grow with enterprise customers. You'll build and maintain the infrastructure and practices that keep our systems running smoothly — and help us move fast without breaking things.

What You'll Do

  • Define and maintain SLOs, SLIs, and error budgets for critical platform services

  • Build and operate CI/CD pipelines, monitoring, alerting, and incident response systems

  • Design and manage cloud infrastructure (AWS/GCP/Azure) using infrastructure-as-code (Terraform, Pulumi)

  • Implement observability tooling (logging, tracing, metrics) across the platform

  • Partner with engineering to embed reliability practices into the development lifecycle

  • Lead incident response and post-mortems; drive systemic improvements

  • Support security and compliance requirements for enterprise customer deployments

  • Build automation to reduce toil and improve operational efficiency

What We're Looking For

  • 4+ years of SRE, DevOps, or platform engineering experience

  • Strong experience with Kubernetes, Docker, and container orchestration

  • Proficiency with cloud platforms (AWS preferred) and infrastructure-as-code

  • Experience with observability tools (Datadog, Grafana, Prometheus, or similar)

  • Understanding of security best practices and enterprise compliance requirements (SOC 2, HIPAA awareness)

  • Experience with Python or Go for automation scripting

  • Startup experience preferred — you're comfortable building from scratch

Command Palette

Search for a command to run...