← All Jobs
Posted May 21, 2026

Python Insfrastructure Engineer - Model Evaluation

Apply Now
Python Infrastructure Engineer — Model Evaluation (AI Training) About The Role What if your Python expertise could directly shape the systems that power next-generation AI models? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation infrastructure that leading AI labs depend on to train and benchmark their models. This is a high-impact, fully remote contract role working on real production systems — not toy projects. You'll collaborate directly with data, research, and engineering teams at the frontier of AI development. • Organization: Alignerr • Type: Hourly Contract • Location: Remote • Commitment: 20–40 hours/week What You'll Do • Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control • Build and maintain evaluation harnesses that integrate with inference frameworks and benchmark AI model performance • Improve reliability, performance, and safety across existing Python codebases • Instrument systems with observability tooling — metrics, logging, and monitoring to track system reliability and model performance • Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes • Collaborate in synchronous design reviews to iterate on architecture and implementation decisions Who You Are • Native or fluent English speaker with strong written and verbal communication skills • Full-stack developer with a solid systems programming background in Python • 3–5+ years of professional experience writing production-grade Python • Experienced building evaluation harnesses for ML models and integrating with inference frameworks • Strong understanding of observability and metrics collection for monitoring system and model performance • Able to commit 20–40 hours per week with reliability and focus Nice to Have • Prior experience with data annotation platforms, data quality systems, or evaluation pipelines • Familiarity with AI/ML workflows, model training, or benchmarking infrastructure • Experience with distributed systems or developer tooling at scale • Background in MLOps, data engineering, or research engineering environments Why Join Us • Work on cutting-edge AI projects alongside leading research labs at the frontier of the field • Fully remote and async-friendly — work from wherever you do your best work • Freelance autonomy with the substance of meaningful, high-impact engineering work • Make a direct, tangible contribution to the systems that shape how AI models are built and evaluated • Potential for ongoing work and contract extension as new projects launch Apply tot his job Apply To this Job