Home > Careers > Senior MLOps Engineer

Senior MLOps Engineer

• Level: Senior

• Job type: Full-time (contract)

• Location: Global - Remote

• Category: ML & DS

Join a team revolutionizing mental healthcare.

At Mentalyc, we are redefining the future of mental health care by merging the power of AI with clinical expertise. Our vision is to make therapy more effective, efficient, and truly measurable through insightful, data-driven interventions. Our mission is to build an AI-based platform that fully automates note-taking in therapy, creating an anonymized data set to help uncover the most effective therapeutic methods.

We believe in elevating mental health care, one note at a time. As a team, we are driven by curiosity, care, and collaboration. We push boundaries, embrace new ideas, and trust in each other and the process. We strive to inspire, innovate, and explore, all while ensuring data privacy and supporting therapists so they can focus on what matters most—delivering quality care.

Our values guide everything we do:

Curiosity drives us to explore new possibilities, innovate, and continuously improve. Whether it's developing our AI tools or growing personally, we embrace learning at every level.

Care extends beyond our mission to improve therapy. We care deeply for our team, clients, and partners, ensuring we create an environment of support, respect, and well-being.

Trust forms the foundation of our relationships. We maintain data security, uphold transparency, and foster trust internally within our team and externally with clients and partners.

Collaboration is at the heart of our success. We work together across departments and with therapists to create solutions that truly make a difference in mental health care.

If you share our passion for making a positive impact on mental healthcare and are excited to be part of a groundbreaking team, we invite you to join us at Mentalyc.

What We Offer:

Innovative and Mission-Driven Environment: Be part of a fast-growing, high-performing company dedicated to transforming mental health technology with impactful AI solutions.

Culture of Excellence: Collaborate with driven and talented individuals who share a commitment to innovation, continuous improvement, and achieving outstanding results.

Scaling up our Impact Worldwide: Lead the efforts in building scalable machine learning solutions to maximize the impact of our solutions to clinicians and patients across the world.

High-Impact Role: Drive key technical initiatives, enhance platform scalability and quality, and play a critical role in achieving the company’s strategic objectives.

Flexible and International Team: Join a global team that values excellence and adaptability, offering the benefits of fully remote work and flexible hours to meet the demands of a high-growth environment.

Responsibilities

Deploy, Optimize, and Monitor ML Infrastructure: Lead efforts to ensure models are efficiently deployed for GPU inference, including parallelization and low-level optimization strategies. Establish logging, monitoring, and alerting mechanisms to guarantee 24/7 system reliability.

Performance and Turnaround Time Improvements: Identify bottlenecks in our processing (speech-to-text, structure note creation, downstream features) to reduce turnaround times.

Dynamic Scaling and Cost Optimization: Implement Kubernetes-based solutions (Helm, Keda, Kubeflow) for auto-scaling to handle fluctuating workloads. Fine-tune resource allocation, particularly GPU resources, to balance high performance with cost-effectiveness.

CI/CD and Model Lifecycle Management: Build and maintain automated CI/CD pipelines for model training, testing, and finetuning in collaboration with ML Engineers and Clinicians. Drive best practices in model versioning, QA, and end-to-end deployment processes.

Infrastructure as Code and Cloud Management: Use Terraform to provision and manage AWS infrastructure. Streamline deployment pipelines to ensure reliable releases and updates in a high-availability environment.

Collaboration on Model Iterations: Work closely with ML Engineers and Clinicians to refine model performance for session note generation and advanced analytics (progress assessment, treatment recommendations, etc.). Ensure models are production-ready and seamlessly integrate into a scalable cluster.

Requirements

Professional Experience: 5+ years in MLOps, ML Engineering with DevOps or similar roles, with a focus on deploying and managing ML workflows at scale. Demonstrated success handling GPU-based inference for real-time or batch processing.

Technical Proficiency: Solid experience with Kubernetes (Helm, Keda, Kubeflow) and Terraform for infrastructure provisioning. Proficient in Python, with C++ a plus for performance optimizations. Skilled in PyTorch for model fine tuning and serving.

Cloud and Observability Skills: Strong knowledge of AWS (EC2, S3, EKS, etc.) and experience configuring monitoring tools (Prometheus, Grafana, CloudWatch) to ensure observability, uptime, and scalability.

Scalability Focus: Proven track record in designing and implementing solutions for high-throughput, GPU-centric ML workflows. Ability to optimize resource usage and cost, particularly under dynamic workload demands.

Open-Ended Scalability Problem-Solving: Demonstrated ability to tackle complex, ambiguous scalability challenges with innovative solutions. Skilled at identifying bottlenecks—from GPU infrastructure to data pipelines—and driving robust, cost-effective optimizations in production environments.

Personal Qualities

Problem-Solver: You tackle complex challenges—from queue backlogs to GPU optimization—with creativity and determination.

Proactive and Results-Driven: You take initiative to enhance reliability and performance, consistently seeking ways to reduce costs and streamline processes.

Adaptable: You thrive in fast-paced settings, capable of pivoting quickly in response to new requirements or technologies.

Collaborative Leader: You mentor team members, encourage knowledge sharing, and foster an environment of mutual support.

Curiosity and Empathy: You’re eager to learn—from new MLOps tools to the workflows of clinicians—ensuring solutions meet real-world needs.

Nice to Have

C++ Optimizations: Proficiency in C++ for high-performance model or library optimizations.

Security and Compliance: Familiarity with HIPAA, SOC2, or other healthcare data protection standards.

Experience with Additional MLOps Tools: Familiarity with other orchestration frameworks like Kubeflow shows broader industry insight.

Experience with Additional MLOps Tools and Innovations: Familiarity with other orchestration frameworks (e.g., Kubeflow) and a commitment to staying current with emerging MLOps trends and best practices.

JavaScript, GraphQL, and SQL Experience: Additional knowledge of frontend/back-end and database technologies can enhance cross-team collaboration and end-to-end solution delivery.