Summary
Overview
Work History
Education
Skills
Accomplishments
Certification
Languages
Timeline
Generic
Ragoth Varma Shanmugasundaram
Verified
This profile is verified using an email address.

Ragoth Varma Shanmugasundaram

Gurugram

Summary

Results-driven professional with a robust background in systems engineering and software development, dedicated to ensuring high availability and reliability of applications. Expertise in automation, monitoring, and performance optimization enhances efficiency and operational effectiveness. Strong collaborator with a proven ability to adapt to evolving needs, consistently focused on achieving team goals. Proficient in scripting, cloud services, and incident management, recognized for reliability and exceptional problem-solving skills.

Overview

8
8
years of professional experience
1
1
Certification

Work History

Senior Site Reliability Engineer

Optum, UnitedHealth Group
Gurgaon,India
05.2025 - Current
  • Developed full-stack NLP analytics platform utilizing React, Flask, MongoDB Atlas, Azure, and OpenAI to facilitate natural language querying of healthcare member feedback across 8+ lines of business.
  • MissionControl-IQ — AI-Powered IT Incident Resolution Assistant Built an AI-driven incident resolution platform that reduced P1/P2 incident resolution time by leveraging historical incident data, TF-IDF similarity search, and Azure OpenAI (GPT-4) to generate structured solutions and actionable recommendations.
  • Data Pipeline: Built an automated data pipeline that extracts customer survey feedback from Qualtrics via Azure Databricks, performs PII sanitization and AI-powered intent classification using Azure OpenAI, stores structured results in PostgreSQL, and generates vector embeddings persisted to MongoDB Atlas for semantic search capabilities.
  • Monitored system performance and reliability using Splunk and Datadog, proactively identifying anomalies and preventing potential outages.Monitored system performance and reliability using Splunk and Datadog, proactively identifying anomalies and preventing potential outages.
  • Led war room calls during critical production outages, coordinating with development, QA, and dependent teams to identify root causes and drive rapid resolution.Led war room calls during critical production outages, coordinating with development, QA, and dependent teams to identify root causes and drive rapid resolution.
  • Contributed to Terraform-based AWS infrastructure provisioning, supporting scalable, version-controlled deployments and improving system reliability.Contributed to Terraform-based AWS infrastructure provisioning, supporting scalable, version-controlled deployments and improving system reliability.

Site Reliability Engineer

AIRTEL
New Delhi , INDIA
08.2023 - 04.2025

Incident Management & Automation:

  • Implemented Incident and Service Request (SR) tracking for the Optimus and Retina dashboards, identifying and automating solutions for ServiceNow Portal challenges, reducing manual effort and increasing process efficiency.
  • Developed an automated web dashboard with Python, Pandas, NumPy, and Streamlit, now hosted on Airtel linux servers, to improve monitoring and streamline incident reporting.

Process Automation & Reporting:

  • Led the automation of critical reporting processes, using Python to create seamless, self-updating reports, enhancing operational visibility and reducing turnaround times for issue resolution.

Performance Monitoring:

  • Established performance benchmarks and monitored system health and responsiveness using Kibana, implementing improvements based on data insights to maintain system reliability.

Cross-functional Coordination:

  • Coordinated cross-functional teams for major releases and updates, ensuring minimal downtime and smooth rollouts, demonstrating effective project management and collaboration skills.
  • Cloud Platform Proficiency:Hands-on experience with AWS services for managing scalable cloud solutions, ensuring reliable and optimized deployments.
  • Kubernetes & Containerization:Deployed Kubernetes clusters using GCP, managing containerized applications to ensure scalability and reliability. Configured and managed Nginx as a reverse proxy using Jhoster to enhance application performance.
  • CI/CD & Infrastructure as Code:Set up and managed Jenkins servers for CI/CD pipelines, automating deployment workflows to reduce manual intervention and increase deployment speed. Integrated infrastructure as code practices for efficient resource management.

Kafka Data Management:

  • Managed Kafka data flow, including topic creation, configuration, and troubleshooting through Spark queries to ensure data quality, optimizing data ingestion and consumption processes.

Job Scheduling & Resource Allocation:

  • Scheduled and managed data ingestion jobs on Airflow and DBT, balancing resources and increasing capacity as needed to meet performance targets.

Documentation & Troubleshooting Guides:

  • Authored comprehensive documentation for system architectures, automated processes, and troubleshooting protocols, fostering team knowledge sharing and streamlined issue resolution.

Internal Project:

  • Implemented Kyuubi connection checks for sanity testing, with automated email notifications.
  • Automated JCEKS entries management via Vault.
  • Streamlined automation for 90+ DMU user connections every 12 hours, with email alerts.
  • Developed automation for Oracle entry creation.
  • Automated testing of S3 bucket configurations and permissions.
  • Implemented automated processes for checkpoint deletion, data purging, and dropping Hive tables.
  • Automated data deletion workflows in response to user requests.
  • Created scripts for sanity checks to hit endpoints, retrieve responses, and send email notifications.
  • Implemented INCIDENT & SR tracking for the OPTIMUS and RETINA WEB Dashboard

Support Engineer L2

CSS Corp - Client Google
HYDRABAD, INDIA
06.2019 - 06.2024
  • Providing support via handset and chat to global users on all the platforms: Windows, Linux, Macintosh, Android, and iOS. Addressing client queries/issues on priority through tickets, calls, which resulted in achieving customer satisfaction metrics.
  • Enabling users via troubleshooting port-related issues, and connecting the users to the cloud via secured mediums.
  • Understading of handling Linux based system including troubleshooting of systems and process management etc

Member of Technical Staff

BLUESTACK
New Delhi , INDIA
06.2020 - 07.2023
  • Automated several processes using Python and App Script.
  • Testing the application at various levels before deploying to production.
  • Run quality checks on features of the app and bugs that move into the development environment, provide first-level debugging with the help of the developers' team, and closely loop with the automation team to run sanity checks on newly added features.
  • Worked on Big Query to resolve day-to-day queries to help country managers.
  • Being the senior member of the process, I have managed the entire support team.
  • Closely working with the Development team to create a CI/CD pipeline to deploy the fix.

Project done internally.

1. SSL Certificate check and send email using Python script.

2. Have created the Kubernetes cluster with worker nodes for app deployment.

3. Created CI/CD pipeline by integrating Bitbucket and Jenkins for continuous deployment.

4. Jira automation using Google Sheets.

5. Play Store ratings and reviews fetched to Google Sheets using Console API.

Desktop Support Engineer

Teams Computer - Vodafone Client
BANGLORE, INDIA
09.2018 - 05.2019
  • Technical Support and Troubleshooting: Providing assistance for hardware, software, and network-related issues, resolving technical problems by diagnosing and troubleshooting user-reported incidents, and performing system upgrades and installations.
  • System Maintenance and Monitoring: Maintaining and updating operating systems, software, and security patches. Monitoring desktop performance and ensuring systems are operating efficiently, with minimal downtime.
  • User Training and Documentation: Offering guidance to end-users on system operations, new software, or hardware. Documenting support processes, solutions to common issues, and maintaining an inventory of equipment and software licences.

Education

Certificate of Higher Education - ML & Cloud

IIT Madras With Upgrad
New Delhi, India
2021

Bachelor of Arts - Bachlore Of Computer Application

Sri Krishna Arts & Science College
Coimbatore, India
2016

GCSEs - Accounting

Shri KK Naidu Memorial HR sec School
Coimbatore, India
2013

Skills

  • Python
  • Bash
  • Docker
  • Kubernetese (k8s)
  • Hive
  • Spark
  • DBT
  • Kyuubi
  • Troubleshooting
  • Kubernetes (K8s)
  • Vault
  • Ranger
  • DataMesh
  • AWS
  • Linux
  • Hadoop
  • OCP ( Open Shift Platform)
  • Jenkins
  • Kibana
  • Observability
  • Project management
  • Documentation
  • Team Management
  • Airflow
  • DataHub
  • Kafka
  • BitBucket (Version Controller)
  • SQL
  • Jira
  • Incident management
  • Microservices architecture
  • Log analysis
  • Scripting languages
  • Infrastructure automation
  • Terraform
  • Datadog
  • GitHub Action
  • Cloud

Accomplishments

  • Rising Star Performance Award
  • Execution Excellence Award

Certification

- Advance Machine Learning & Colud with UpGrad Collaboration with IIT Madras

- CCNA Certification 2017

Languages

English
Advanced
C1
Hindi
Intermediate
B1
Tamil
Proficient
C2

Timeline

Senior Site Reliability Engineer

Optum, UnitedHealth Group
05.2025 - Current

Site Reliability Engineer

AIRTEL
08.2023 - 04.2025

Member of Technical Staff

BLUESTACK
06.2020 - 07.2023

Support Engineer L2

CSS Corp - Client Google
06.2019 - 06.2024

Desktop Support Engineer

Teams Computer - Vodafone Client
09.2018 - 05.2019

Bachelor of Arts - Bachlore Of Computer Application

Sri Krishna Arts & Science College

GCSEs - Accounting

Shri KK Naidu Memorial HR sec School

Certificate of Higher Education - ML & Cloud

IIT Madras With Upgrad
Ragoth Varma Shanmugasundaram