Job Description:
• We are looking for an AI Engineer to join our Data Science team, building AI-powered solutions for clinical data processing and analysis within a major pharmaceutical organization. You will design, develop and deploy generative AI systems that automate clinical reporting workflows, extract intelligence from documents, and accelerate data-driven decision making.
• This is a hands-on engineering role - you'll be writing production code, not just building prototypes.
Responsibilities:
• Generative AI & Automation.
• Develop LLM-powered automation tools for clinical reporting and document generation workflows.
• Build AI-driven code generation pipelines and quality assessment frameworks.
• Design and implement human-in-the-loop review workflows with feedback loops to continuously improve output quality.
Research & Evaluation:
• Research and evaluate emerging AI methods, frameworks, and techniques for specific tasks - e.g. comparing fine-tuning vs zero-shot approaches, assessing new document extraction tools, or trialling new agentic frameworks.
• Prototype and benchmark new approaches before recommending adoption.
• Stay current with a rapidly evolving field and bring new ideas to the team.
Agentic AI & Orchestration:
• Design and build multi-agent systems for data workflows - agents that retrieve, generate, validate, and iterate autonomously.
• Implement agent orchestration using frameworks such as Google ADK, Lang Graph, or Lang Chain.
• Deploy and manage agents on Google Vertex AI.
Document Understanding & RAG:
• Build document processing pipelines (PDFs, Word/DOCX) - extraction, parsing, table detection, structure recognition.
• Design and build RAG pipelines grounded in source documents.
• Process, extract and transform data from unstructured and semi-structured sources.
• Code Quality & Engineering Practices:
• Write clean, well-tested, maintainable Python code following SOLID principles and recognised design patterns.
• pply single responsibility, dependency inversion, and interface segregation in real codebases - not just theory.
• Write meaningful tests and maintain high standards across the team.
• Refactor and improve existing code as part of normal development workflow.
AI-Assisted Development:
• Use AI coding tools (e.g. Gemini CLI, GitHub Copilot) as a core part of your development workflow.
• Critically review and validate AI-generated code - understanding what it produces, why, and when it's wrong.
• Write effective prompts to direct AI tools toward correct, secure, well-structured output.
• Know when to use AI and when to write code manually - judgement over speed.
Platform & Infrastructure:
• Integrate and orchestrate LLM providers available through Google Vertex AI (Gemini, etc.).
• Build internal tools and applications using Stream lit and Fast API.
• Containerize and deploy services using Docker.
Required Skills & Experience:
• MSc in Data Science, Computer Science, Bioinformatics, or related field (or equivalent practical experience), Strong Python skills.
• Hands-on experience building RAG systems or LLM-powered applications (using LangChain, LlamaIndex, or similar frameworks).
• Experience integrating LLM APIs (Google Gemini, OpenAI, or similar) - we work primarily through Google Vertex AI.
• Working knowledge of vector databases (ChromaDB, Weaviate, Qdrant, Pinecone, or similar).
• Cloud platform experience (GCP preferred, especially Vertex AI).
• Docker and containerized deployments.
• Strong software engineering fundamentals - SOLID principles, clean code practices, design patterns, testing, version control (Git), code review.
• Comfortable using AI-assisted development tools (e.g. Gemini CLI, GitHub Copilot) - and critically evaluating what they produce.
• Strongly Preferred.
• Experience with agentic AI patterns - multi-agent orchestration, tool use, autonomous workflows (LangGraph, Google ADK, or similar).
• Document processing experience - extracting and parsing data from PDFs and Word/DOCX files programmatically.
• Understanding of LLM evaluation principles and output quality assessment (BLEU, ROUGE etc, code execution metrics, or similar).
• Data science fundamentals - Pandas, NumPy, scikit-learn, statistical analysis, data visualization.
• Prompt engineering and optimisation techniques.
• Streamlit application development.
Domain Knowledge:
• Clinical trials or pharmaceutical industry experience.
• Familiarity with clinical data standards.
• wareness of regulatory and data privacy requirements in life sciences.
Infrastructure & DevOps :
• Terraforma or infrastructure-as-code experience.
• CI/CD pipeline design (GitHub Actions or similar).
Knowledge Graphs:
• Neo4j, Cypher query language.
• Network for graph analytics.
• Graph-based RAG or knowledge extraction.
AI/ML:
• Experience with LLM-driven code generation.
• LLM fine-tuning experience (e.g. LoRA, PEFT, RLHF, Vertex AI model tuning, or similar approaches).
• NLP and text processing (HuggingFace Transformers, Sentence-Transformers).
• PyTorch or TensorFlow (for custom model work if needed).
• Google ADK (Agent Development Kit) or Vertex AI Agent Builder.
• Model Context Protocol (MCP) for tool integration and interoperability.
Other:
• Frontend experience (React, TypeScript).
• FastAPI or Flask REST API development.
• PostgreSQL or similar relational databases.
What You'll Work With:
• Languages: Python (primary), SQL, some TypeScript/R.
• AI/ML : Lang Chain, LlamaIndex, Lang Graph, Google ADK, MCP, Hugging Face Transformers, Sentence-Transformers, Google Gemini (via Vertex AI).
• Document Processing: PyMuPDF, python-docx, pdf plumber, OCR tools.
• Data: Pandas, NumPy, SciPy, scikit-learn, Plotly.
• Databases: Vector databases, graph databases, relational databases.
• Infrastructure: Docker, Google Cloud Platform (Vertex AI, GCS), Terraform, GitHub Actions.
• Applications: stream lit, Fast API, Flask.
• Tools: Python packaging, testing frameworks, linting, Git.
About You:
• You care about code quality - not just making things work, but making them maintainable.
• You're comfortable working across the full stack of an AI application, from data ingestion to user-facing tools.
• You can context-switch between multiple projects and work autonomously.
• You're curious about the clinical/pharmaceutical domain and motivated to learn it.
• You see AI-assisted development as a force multiplier, not a replacement for engineering judgment.
• You're a self-directed learner who researches new methods and tools, evaluates them critically, and knows when to adopt vs when to stick with what works.