← All Jobs
Posted May 31, 2026

AI Engineer - Clinical Data Science

Job Description: • We are looking for an AI Engineer to join our Data Science team, building AI-powered solutions for clinical data processing and analysis within a major pharmaceutical organization. You will design, develop and deploy generative AI systems that automate clinical reporting workflows, extract intelligence from documents, and accelerate data-driven decision making. • This is a hands-on engineering role - you'll be writing production code, not just building prototypes. Responsibilities: • Generative AI & Automation. • Develop LLM-powered automation tools for clinical reporting and document generation workflows. • Build AI-driven code generation pipelines and quality assessment frameworks. • Design and implement human-in-the-loop review workflows with feedback loops to continuously improve output quality. Research & Evaluation: • Research and evaluate emerging AI methods, frameworks, and techniques for specific tasks - e.g. comparing fine-tuning vs zero-shot approaches, assessing new document extraction tools, or trialling new agentic frameworks. • Prototype and benchmark new approaches before recommending adoption. • Stay current with a rapidly evolving field and bring new ideas to the team. Agentic AI & Orchestration: • Design and build multi-agent systems for data workflows - agents that retrieve, generate, validate, and iterate autonomously. • Implement agent orchestration using frameworks such as Google ADK, Lang Graph, or Lang Chain. • Deploy and manage agents on Google Vertex AI. Document Understanding & RAG: • Build document processing pipelines (PDFs, Word/DOCX) - extraction, parsing, table detection, structure recognition. • Design and build RAG pipelines grounded in source documents. • Process, extract and transform data from unstructured and semi-structured sources. • Code Quality & Engineering Practices: • Write clean, well-tested, maintainable Python code following SOLID principles and recognised design patterns. • pply single responsibility, dependency inversion, and interface segregation in real codebases - not just theory. • Write meaningful tests and maintain high standards across the team. • Refactor and improve existing code as part of normal development workflow. AI-Assisted Development: • Use AI coding tools (e.g. Gemini CLI, GitHub Copilot) as a core part of your development workflow. • Critically review and validate AI-generated code - understanding what it produces, why, and when it's wrong. • Write effective prompts to direct AI tools toward correct, secure, well-structured output. • Know when to use AI and when to write code manually - judgement over speed. Platform & Infrastructure: • Integrate and orchestrate LLM providers available through Google Vertex AI (Gemini, etc.). • Build internal tools and applications using Stream lit and Fast API. • Containerize and deploy services using Docker. Required Skills & Experience: • MSc in Data Science, Computer Science, Bioinformatics, or related field (or equivalent practical experience), Strong Python skills. • Hands-on experience building RAG systems or LLM-powered applications (using LangChain, LlamaIndex, or similar frameworks). • Experience integrating LLM APIs (Google Gemini, OpenAI, or similar) - we work primarily through Google Vertex AI. • Working knowledge of vector databases (ChromaDB, Weaviate, Qdrant, Pinecone, or similar). • Cloud platform experience (GCP preferred, especially Vertex AI). • Docker and containerized deployments. • Strong software engineering fundamentals - SOLID principles, clean code practices, design patterns, testing, version control (Git), code review. • Comfortable using AI-assisted development tools (e.g. Gemini CLI, GitHub Copilot) - and critically evaluating what they produce. • Strongly Preferred. • Experience with agentic AI patterns - multi-agent orchestration, tool use, autonomous workflows (LangGraph, Google ADK, or similar). • Document processing experience - extracting and parsing data from PDFs and Word/DOCX files programmatically. • Understanding of LLM evaluation principles and output quality assessment (BLEU, ROUGE etc, code execution metrics, or similar). • Data science fundamentals - Pandas, NumPy, scikit-learn, statistical analysis, data visualization. • Prompt engineering and optimisation techniques. • Streamlit application development. Domain Knowledge: • Clinical trials or pharmaceutical industry experience. • Familiarity with clinical data standards. • wareness of regulatory and data privacy requirements in life sciences. Infrastructure & DevOps : • Terraforma or infrastructure-as-code experience. • CI/CD pipeline design (GitHub Actions or similar). Knowledge Graphs: • Neo4j, Cypher query language. • Network for graph analytics. • Graph-based RAG or knowledge extraction. AI/ML: • Experience with LLM-driven code generation. • LLM fine-tuning experience (e.g. LoRA, PEFT, RLHF, Vertex AI model tuning, or similar approaches). • NLP and text processing (HuggingFace Transformers, Sentence-Transformers). • PyTorch or TensorFlow (for custom model work if needed). • Google ADK (Agent Development Kit) or Vertex AI Agent Builder. • Model Context Protocol (MCP) for tool integration and interoperability. Other: • Frontend experience (React, TypeScript). • FastAPI or Flask REST API development. • PostgreSQL or similar relational databases. What You'll Work With: • Languages: Python (primary), SQL, some TypeScript/R. • AI/ML : Lang Chain, LlamaIndex, Lang Graph, Google ADK, MCP, Hugging Face Transformers, Sentence-Transformers, Google Gemini (via Vertex AI). • Document Processing: PyMuPDF, python-docx, pdf plumber, OCR tools. • Data: Pandas, NumPy, SciPy, scikit-learn, Plotly. • Databases: Vector databases, graph databases, relational databases. • Infrastructure: Docker, Google Cloud Platform (Vertex AI, GCS), Terraform, GitHub Actions. • Applications: stream lit, Fast API, Flask. • Tools: Python packaging, testing frameworks, linting, Git. About You: • You care about code quality - not just making things work, but making them maintainable. • You're comfortable working across the full stack of an AI application, from data ingestion to user-facing tools. • You can context-switch between multiple projects and work autonomously. • You're curious about the clinical/pharmaceutical domain and motivated to learn it. • You see AI-assisted development as a force multiplier, not a replacement for engineering judgment. • You're a self-directed learner who researches new methods and tools, evaluates them critically, and knows when to adopt vs when to stick with what works.