Description:
• Provide leadership, mentoring, and sound judgment as the reliability engineering lead on the team.
• Design and maintain autonomous systems for building, deploying, testing, and operating Filevine products.
• Serve as the authoritative voice of reliability across the full software development lifecycle.
• Monitor, aggregate, dashboard, and alert on software and infrastructure events to ensure visibility and rapid response.
• Continuously improve CI/CD pipelines, automation scripts, playbooks, and tools to streamline operations and reduce resolution time.
• Identify and resolve gaps in system availability, performance, and security while strengthening the overall security posture.
• Document processes, architecture, procedures, and best practices to support team effectiveness.
• Research, adopt, or build reliable tools that improve engineer productivity.
• Collaborate with team members and stakeholders, mentor junior engineers, and participate in a 24/7 on-call rotation for production support and emergency response.
Requirements:
• 8+ years of hands-on technical experience in software engineering, infrastructure, or operations roles, including at least 4 years dedicated to Site Reliability Engineering.
• Strong curiosity, self-motivation, and a continuous learning mindset with proactive enthusiasm for improving systems and processes.
• Strong proficiency in Python, Bash, PowerShell, and other common SRE scripting and tooling technologies.
• Expert-level experience designing, building, and maintaining autonomous systems for build, deployment, testing, monitoring, and operations.
• Hands-on experience with AWS services such as EC2, Kubernetes/EKS, CloudWatch, Lambda, S3, and IAM.
• Proficiency in core SRE skills including monitoring and alerting, incident response, capacity planning, performance optimization, CI/CD enhancement, and reliability best practices.
• Bachelor’s degree in Computer Science, Information Systems, or a related field, or equivalent certifications such as AWS or Google Cloud Professional certifications, or substantial comparable direct work experience.
• Proven track record of independently driving reliability improvements, reducing toil through automation, and supporting highly available, scalable production systems in a fast-paced environment.
Benefits:
• $160,000 - $190,000 base salary.
• Eligible for a paid time off policy.
• Comprehensive benefits package.
• Medical, dental, and vision insurance for full-time employees.
• Maternity and paternity leave for full-time employees.
• Short- and long-term disability coverage.
• Opportunity to learn from a dedicated leadership team.
• Top-of-the-line company swag.