← all jobs

[Remote] Site Reliability Engineer, Inference Infrastructure

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Cohere is a company focused on scaling intelligence to serve humanity through AI systems. They are seeking a Site Reliability Engineer to join their Model Serving team, responsible for developing and operating AI platforms that deliver large language models through API endpoints, ensuring high performance and reliability.

Responsibilities

  • Build self-service systems that automate managing, deploying and operating services
  • This includes our custom Kubernetes operators that support language model deployments
  • Automate environment observability and resilience. Enable all developers to troubleshoot and resolve problems
  • Take steps required to ensure we hit defined SLOs, including participation in an on-call rotation
  • Build strong relationships with internal developers and influence the Infrastructure team’s roadmap based on their feedback
  • Develop our team through knowledge sharing and an active review process

Skills

  • 5+ years of engineering experience running production infrastructure at a large scale
  • Experience designing large, highly available distributed systems with Kubernetes, and GPU workloads on those clusters
  • Experience with Kubernetes dev and production coding and support
  • Experience with GCP, Azure, AWS, OCI, multi-cloud on-prem / hybrid serving
  • Experience in designing, deploying, supporting, and troubleshooting in complex Linux-based computing environments
  • Experience in compute/storage/network resource and cost management
  • Excellent collaboration and troubleshooting skills to build mission-critical systems, and ensure smooth operations and efficient teamwork
  • The grit and adaptability to solve complex technical challenges that evolve day to day
  • Familiarity with computational characteristics of accelerators (GPUs, TPUs, and/or custom accelerators), especially how they influence latency and throughput of inference
  • Strong understanding or working experience with distributed systems
  • Experience in Golang, C++ or other languages designed for high-performance scalable servers

Benefits

  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)

Company Overview

  • Cohere develops enterprise artificial intelligence software and provides language models, retrieval tools, and workplace platforms. It was founded in 2019, and is headquartered in Toronto, Ontario, CAN, with a workforce of 201-500 employees. Its website is https://cohere.com.
  • Company H1B Sponsorship

  • Cohere has a track record of offering H1B sponsorships, with 11 in 2025, 14 in 2024, 13 in 2023, 5 in 2022, 2 in 2021. Please note that this does not guarantee sponsorship for this specific role.
  • More open positions

    [Remote] VIce President, Financial Planning & Analysis

    Work from home Full-time role

    [Remote] Senior Product Marketing Manager

    Work from home Full-time role

    [Remote] Senior Product Analyst - Clinical Research Products

    Work from home Full-time role

    [Remote] Sr Software Engineer - Core Backend & Platform Engineering

    Work from home Full-time role

    [Remote] Account Executive, Brand Partnerships (Vyro & MrBeast Ecosystem)

    Work from home Full-time role

    Revenue Recognition Accountant

    Work from home Full-time role

    Specialist, Sales (K12 Assessment)

    Work from home Full-time role

    [Hiring] Audiologist @Hear.com US

    Work from home Full-time role

    Senior Manager, Growth Marketing, Web + Mobile

    Work from home Full-time role

    Innovation Principal (Remote)

    Work from home Full-time role

    Senior Client Engagement & Leadership Consultant

    Work from home Full-time role

    Remote Data Entry Specialist – Part‑Time Opportunity for College Students at careerzynith – Flexible Hours, $20/hr, Work‑From‑Home

    Work from home Full-time role

    [Remote] Scientist 3 - Clinical Research

    Work from home Full-time role

    Account Executive (II) - Restaurants

    Work from home Full-time role

    Technical Program Manager, Payments & Billing

    Work from home Full-time role

    Consultant for Baseline and Endline Evaluation

    Work from home Full-time role

    Remote Customer Service Representative – Travel & Airline Support for careerzynith – Work‑From‑Home (Full‑Time)

    Work from home Full-time role

    Experienced Remote Data Processing Clerk & Data Entry Operator (US) – Join careerzynith Team!

    Work from home Full-time role

    Silicon Product and Test Engineer

    Work from home Full-time role

    Compensation & Benefits Manager (all genders) auf den kanarischen Inseln

    Work from home Full-time role

    Remote Customer Support & Associate Network Security Engineer – careerzynith – $27/hr – Full‑Time Remote (Georgia, USA)

    Work from home Full-time role