Senior Al Engineer

Unico Connect Private LimitedMumbai, Maharashtra₹2,000,000 – ₹3,500,000
Adzuna INPosted 2h agoOriginal Listing
it-jobs

Job Description

About the role: Unico Connect is an Al-first technology partner that builds custom mobile, web, and AI products for clients across multiple geographies. We are hiring a Senior Al Engineer for a dedicated client engagement focused on building an Al-powered application builder platform - a product where users describe software in plain English and the system generates, previews, and iteratively refines working code. The mandatory requirement for this role is hands-on production experience shipping LLMpowered systems with agent architectures, with experience in code generation or developer tooling contexts a strong advantage. The role is product-focused and deeply hands-on. You will own everything between the user's prompt and correct code landing in the project: the agentic loop, code generation pipeline, context management, evaluation suite, and model cost strategy. You will work alongside the Senior MLOps Engineer who operationalises the infrastructure around your system, and collaborate closely with backend, frontend, and DevOps engineers. Responsibilities: Agent architecture: Design and own the agentic loop for the platform - request interpretation, planning, tool-calling sequence (read file, edit file, run build, search code, install package), and stop conditions. Make and revisit architectural decisions on singleagent vs. multi-agent designs, including planner/executor splits and dedicated buildrepair sub-agents. Code generation pipeline: Own the end-to-end generation flow: task classification, context gathering, planning, targeted edits, verification, and commit. Implement diff/search-replace-based file editing with fuzzy matching and fallback strategies. Enforce scope discipline so the agent makes minimal diffs and does not modify code it was not asked to touch. Self-repair loop: Build and tune the automated repair loop that pipes compiler, lint, build, and runtime errors back to the model with retry budgets and model escalation. This loop is the primary quality lever - the difference between 60-70% and 90%+ build success rates. Context management: Build file-relevance retrieval so the agent sees the right files, not the whole codebase: dependency graphs, AST/tree-sitter-based chunking, embeddings, recency signals, and hybrid retrieval. Implement conversation summarisation and memory for long sessions, and address long-project degradation through codebase summaries and periodic consistency passes. Own token budgeting and prompt caching strategy. Prompt engineering as a discipline: Own the system prompt and per-task prompt variants (new feature, bug fix, styling change). Maintain few-shot examples and enforce coding conventions, stack rules, and prohibited behaviours such as no hardcoded secrets and no whole-file rewrites. Version prompts like code with changelogs and rollback capability. Evaluation and quality measurement: Design and own the evaluation suite: representative test prompts run on every prompt and model change, scored on build success rate, instruction adherence, and output quality including LLM-as-judge and visual/screenshot checks where relevant. Define regression gates that block qualitydegrading changes from shipping. Treat evals the way engineers treat automated testing: versioned, automated, and tracked over time. This responsibility is nonnegotiable at this level. Model strategy and cost: Design model routing - cheap and fast models for classification and small edits, frontier models for complex generation. Drive cost optimisation through prompt caching, diff-based edits over full-file rewrites, and tighter context selection. Track cost per agent run and tokens per task; evaluate new model releases against the eval suite and lead migrations when results justify it. Safety and reliability of agent behaviour: Defend against prompt injection from user content and fetched web content. Ensure secrets never appear in generated client code. Define what the agent's tools may and may not do in collaboration with the platform team. Contribute to output moderation and abuse-pattern awareness. Mentorship and engineering standards: Run code reviews, define engineering conventions for Al work, and raise the engineering bar across the Al team. Work closely with the Senior MLOps Engineer on handoff of eval design, prompt configurations, and model routing logic. Requirements: Hands-on production ownership of LLM-powered systems with agent architectures (mandatory). Must have personally shipped and operated at least one complex production Al system - agentic, multi-step, or code generation - with end-to-end ownership of architecture, evaluation, and cost. POCs, internal demos, and tutorialgrade work do not qualify. 5+ years of professional software or Al engineering experience, with at least 3 years focused on LLM applications, Al engineering, or production Al systems. Candidates with strong backend backgrounds and a clear, substantive pivot into LLM systems qualify. Strong Python proficiency and service development. Production-grade Python with FastAPI or equivalent: type hints, async patterns, streaming responses, testing, and packaging. Not notebook-only. Depth across LLM APIs and agent systems. Production experience with at least two of OpenAl, Anthropic Claude, Google Gemini, or open-weight models (vLLM, Ollama, Together). Production experience with at least one agent framework (LangGraph, CrewAI, AutoGen, Llamalndex Agents) or hand-rolled equivalent. Hands-on with tool calling, structured outputs, and multi-step reasoning. Demonstrated, systematic evaluation practice - non-negotiable. Must have built evaluation harnesses that gate production releases, not ad-hoc testing. Hands-on with at least one of LangSmith, Langfuse, Promptfoo, Ragas, or DeepEval. Candidates with no systematic answer to evaluation should not be considered at senior level regardless of other strengths. Cost discipline for production Al. Track record of measurable cost optimisation on production Al features. Able to speak in specifics: cost per request, savings achieved through caching or model routing, context reduction decisions. AWS working knowledge. Hands-on with EC2, S3, IAM, and Docker. Comfort with CI/CD workflows and deploying Al services. Awareness of LLM security failure modes. Familiar with prompt injection patterns, understands that system prompt rules alone are insufficient, and has experience with output validation and content safety in production. Nice to have: experience with AST/tree-sitter tooling, diff-based editing systems, or compiler-adjacent work; MCP server authoring; open-source Al contributions; published technical writing on LLM systems; multi-modal model experience; fine-tuning exposure (LORA, QLORA, PEFT). Skills:- Generative AI (GenAI), Large Language Models (LLM), Agentic AI and Artificial Intelligence (AI)

Get AI-Matched to This Job

Upload your resume and our AI will score how well you match this and thousands of similar roles.