Must be able to obtain a Public Trust
Must be able to work remote EST hours
Overview
• We are building the next generation of agentic AI to transform how the agency accelerates research, makes decisions, and ships products at scale.
• We are a small, startup-minded team that ships fast and owns what we build end-to-end.
• We are looking for an SDE II who is hungry to contribute to a real production system, not a sandbox.
• You will work across the application and infrastructure layers, implement features that users interact with every day, and be expected to own what you build from design through deployment.
• You will not be handed perfectly scoped tickets.
• You will be expected to ask good questions, figure things out, and move.
• The best person for this role communicates clearly, collaborates without ego, and brings genuine empathy for the users whose work they are making better.
• You are a self-starter with a high bar and a high sense of urgency.
• You play well with others and make the people around you better.
What You Will Do
Build Agentic AI Systems
• Implement and iterate on our agentic workflows: tool-calling, multi-step reasoning, planning, memory, and agent-to-agent (A2A) communication patterns at the application layer
• Build and maintain MCP (Model Context Protocol) client-side integrations: how agents discover, invoke, and compose tools
• Implement tool definitions, input/output schemas, error handling, retry logic, and result formatting for GRACE's growing tool library
• Contribute to multi-agent orchestration patterns that are reliable and debuggable in production, not just in demos
Build LLM-Powered Features
• Implement LLM orchestration logic: prompt construction, context management, model selection, and response parsing across OpenAI GPT, Anthropic Claude, and Google Gemini
• Build and maintain RAG pipeline components: query formulation, result ranking, citation grounding, and hallucination mitigation
• Implement and iterate on prompt engineering patterns and system prompts that drive quality and consistency across model families
• Contribute to context window budget management: truncation, summarization, and pagination logic that makes the right call at runtime
• Build LLM evaluation components: grounding assessment, regression tests, safety checks, and quality metrics
• Write prompts and pipelines with token economics in mind; cost-per-query is a real constraint, not an afterthought
Own the Backend
• Build secure, well-tested backend features end-to-end: from application logic through to the API contract the frontend consumes
• Implement integrations with internal and external data sources and APIs, including Dimensions, Google Search, Slack, SharePoint, and LLM provider APIs
• Contribute to monitoring, logging, and distributed tracing so that failures are diagnosable and regressions are caught before users report them
• Implement fallback, retry, and graceful degradation patterns for AI service dependencies
• Write production-quality code: readable, tested, reviewed, and documented
Contribute to Infrastructure
• Work within Microsoft Azure infrastructure: Azure Functions, Azure API Management, Azure Container Apps, and Azure OpenAI Service
• Contribute to CI/CD pipelines, deployment automation, and release processes
• Work with containerization tools and infrastructure as code; understand the environment your code runs in
• Contribute to application-level SLOs: tool call success rates, response quality, and latency from the user's perspective
Collaborate and Grow
• Participate actively in design reviews, sprint planning, and retrospectives; ask good questions and push back when something does not add up
• Communicate technical decisions clearly to both engineers and non-engineers; no one should have to guess what you built or why
• Work closely with the PM, researcher, designer, and senior engineers to translate ambiguous requirements into clear, actionable implementations
• Bring genuine curiosity and empathy to every feature; understand who is using what you build and why it matters to them
• Ensure strong privacy, security, and compliance in all systems, integrations, and data handling
Basic Qualifications
• Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience
• 3+ years of professional software engineering experience building and operating production systems
• Proven experience in high-velocity environments where you contributed to shipping real products end-to-end
• Strong proficiency in Python and at least one other backend language; familiarity with modern backend frameworks and async patterns
• Solid understanding of algorithms, data structures, distributed systems, and software design patterns
• Experience building and operating systems on major cloud platforms (AWS, GCP, or Azure)
• Experience with containerization (Docker) and working within CI/CD pipelines
• Clear, direct communicator who gives and receives feedback well, works with empathy, and makes the people around them better
Preferred Qualifications
• Hands-on experience building features on top of LLMs in production: tool-calling, RAG, multi-step reasoning, and context management
• Familiarity with A2A (Agent-to-Agent) communication patterns and multi-agent orchestration frameworks
• Familiarity with MCP at the client/consumer layer: how agents discover and invoke tools via MCP
• Working knowledge of prompt engineering and LLM behavior across model families; you understand why Claude and GPT respond differently to the same prompt
• Experience with LLM evaluation, grounding assessment, or regression testing for AI-powered systems
• Awareness of token economics at the application layer: cost-per-query, context budget management, and prompt efficiency
• Experience on Microsoft Azure: Azure Functions, API Management, Container Apps, or Azure OpenAI Service
• Familiarity with secrets management, least-privilege access, and security-conscious engineering practices
• Experience in startup or early-stage environments: comfort with ambiguity, rapid iteration, and wearing multiple hats
• Experience in healthcare, life sciences, or other regulated domains is a plus but not required
Why This Role
• You will work on a production system that real users depend on every day to do meaningful work.
• You will not be one of hundreds of engineers on a feature nobody uses.
• You will see the impact of what you build quickly, get direct feedback, and have real ownership over your work.