Job Description
Job Description
About Us
Gentoro was founded by a team with deep experience in enterprise infrastructure and AI, with leadership roots at companies including Splunk, WebLogic, and Asurion. Gentoro helps organizations simplify AI integration into real-world systems, with the observability, manageability, and security required for production deployments. As agentic workflows shift from experimentation to real execution, Gentoro helps teams enforce governance, maintain auditability, and deliver reliable outcomes at scale.
About the Role
We are looking for a visionary Principal Engineer who will bridge the gap between high-level architecture and hands-on execution, specifically focusing on simplifying enterprise integration for AI agents. As a key hire during our current growth phase, you will define the standards for how our platform scales and interacts with other enterprise applications.
What You’ll Do
- Design and implement multi-agent systems and orchestration layers.
- Build and operate observability stacks (e.g., OpenTelemetry) to monitor agent reasoning paths, tool usage, and performance in real-time.
- Develop and enforce technical safety mechanisms—such as input/output filtering and behavioral boundaries—to mitigate risks like hallucinations, prompt injections, and bias.
- Analyze telemetry and execution traces to create feedback loops for continuous agent improvement and automated evaluation.
- Securely connect agents to external services, unstructured data, and enterprise APIs via robust tool-calling schemas.
- Implement fallback mechanisms, human-in-the-loop (HITL) checkpoints, and automated recovery for agentic failures.
- Implement best practices for MLOps, monitoring, and performance tuning of AI models in live environments
- Automate SDLC processes and CI/CD pipelines, elevate QA standards, and develop incident response protocols to enable high velocity, availability and reliability of our platform
Requirements
- 10+ years of senior engineering experience at a fast-paced, high-growth technology startup that has successfully scaled from early stage through Series A/B funding (or equivalent growth phase)
- 5+ years of ML, including 2+ years focused on LLMs or agentic workflows.
- Proficiency in agent orchestration and memory-augmented systems.
- Hands-on experience analyzing tracing and logging data.
- Experience using feedback loops to continuously improve ML systems
- Built agents that invoked tools or utilized Model Context Protocol (MCP) to access enterprise data sources
- Proficiency in modern technologies (e.g., Python, semantic search, vector DBs, GraphQL, queues, containers, Kubernetes, real-time data processing, Spark, Open Telemetry, Clickhouse)
- Thrives in startup ambiguity while maintaining the discipline of an enterprise-grade engineer
- Acts as a force multiplier who elevates the technical bar for the entire team
- Obsessed with practical application of AI systems and capable of building enterprise solutions that solve real-world customer problems