Job Description
About the job Key Accountability - Monitoring Effectiveness – Ensuring the monitoring framework and enhancements are setup to increase Pro-active identification & resolution prior to customer impact. - Setup & maintain centralized Monitoring Configuration by code - Consistently drive the alert volume down and eliminate false alerts - Setup advanced monitoring alerts for golden signals i.e. Latency, Errors, Throughputs & Saturation. - Transform from traditional CPU, Memory symptomatic monitors to more advanced alert co-relation pinpointing directly to issues for predictive monitoring - Create & implement Synthetic or End User Monitoring using Python, Selenium for customer experience monitoring - Set up API End point monitoring & measure uptime & availability across customers, products & infrastructure endpoints. - Implement SLOs, SLIs, Error Budgets concepts to measure & setup Maturity model - Maintain & Manage Code Repository built to scale and security measures - Leverage Automation to push changes on monitoring tools - Setup Orchestration mechanism for on-boarding & decommissioning to ensure Operational Readiness - Setup Dashboards & Create visibility across all Cross-functional teams - Establish Telemetry for automated collection of data across Metrics, Logs & Traces - Continuous Analysis on Data to acknowledge gaps and implementing improvements - Minimum Requirements - Associate's degree (or equivalent) in Computer Science; Information Technology or related field preferred - 10-12 years of IT experience with 6 years of Monitoring Experience - Experience in Administrating Monitoring Tools – AppDynamics, SolarWinds, Grafana, Zabbix, DataDog, ELK Stack etc. - Hands-on experience on Logs, Metrics, Traces, Parsing, RegEx, Tagging - Hands-on experience on implementing APM, EUM, Synthetics, API endpoint etc. - Hands-on experience on integrations with ITSM tools such as Service Now & Jira - Hands-on experience on Ansible, Python, Selenium, Shell - Hands-on experience on Enterprise scale of Azure, VM Ware & AWS - Hands-on experience on creating dashboards and analysis - Excellent interpersonal, influencing skills, interacting appropriately with colleagues of many technical skill levels, remaining calm and courteous while working in a high-stress situation to resolve problems. - Skills - Technical Skills: Monitoring Tool Administration, Logs Indexing & pipeline, Azure, VMWare, Ansible, Python, Selenium, Terraform, Shell, Windows, Linux, GROK parsing - Problem-solving skills – should be able to devise technical and creative solutions. Use Analytics to understand pattern and pro-actively identify gaps - Communication skills – Effective communication is key in this role to gather data about problems, prepare detailed notes and reports, and update users with further steps - Time management – Need to maintain excellent time management skills and should be able to set priorities when handling multiple cases. - Team collaboration – To routinely work with other functions to resolve user issues, so they need to successfully collaborate with team members and coworkers. - Highly motivated, hands-on personality. - Ability to learn quickly in a challenging environment. Skills: Monitoring Experience: 10.00-12.00 Years
Get AI-Matched to This Job
Upload your resume and our AI will score how well you match this and thousands of similar roles.