Played a key role as a backend engineer for large-scale Android application stacks, ensuring their reliability and performance. Collaborated with cross-functional teams to optimize the backend infrastructure and enhance system resilience.
Demonstrated expertise in managing mission-critical applications in large-scale environments, ensuring high availability and fault tolerance. Implemented robust monitoring systems and KPIs to proactively identify and address performance bottlenecks and potential failures.
Utilized strong system troubleshooting skills to diagnose and resolve complex issues, working closely with team members and developers to implement effective solutions. Conducted root cause analysis and documented troubleshooting steps to enhance system stability and reduce downtime.
Automated routine tasks to improve operational efficiency and reduce manual effort. Developed scripts and tools for configuration management, deployment, and monitoring, resulting in streamlined processes and increased productivity.
Applied network security knowledge to handle incident response, investigation, and remediation. Collaborated with security teams to identify and mitigate vulnerabilities, ensuring the confidentiality, integrity, and availability of systems and data.
Demonstrated strong project management abilities by successfully leading and coordinating projects. Managed project timelines, resources, and deliverables to ensure timely and successful implementation of system improvements and upgrades.
Worked autonomously and collaboratively within a team environment, effectively communicating with stakeholders, developers, and other team members. Provided clear and concise documentation, reports, and status updates to convey technical information to both technical and non-technical audiences.
Overview
9
9
years of professional experience
Work History
Senior Site Reliability Engineer
Huawei Singapore
12.2022 - Current
Collaborated with app developers to understand their requirements and design infrastructure solutions that met their needs, ensuring seamless integration and optimal performance.
Set up and maintained backend application clusters and middleware stacks (MySQL, Redis, and Kafka) to support high availability and scalability.
Containerized application deployments using Docker and Kubernetes, enabling efficient and scalable deployment of microservices.
Configured load balancing for application servers using Nginx as a reverse proxy, ensuring high availability and efficient distribution of traffic.
Performed database DDL/DML operations for production changes, ensuring data integrity and minimal disruption to services. Collaborated with developers to optimize database performance and implement efficient query strategies.
Tested and deployed new microservice code through Infrastructure as Code (IaC) practices, ensuring consistency, reproducibility, and efficient deployment processes.
Implemented log monitoring for microservices in a large-scale environment using the ELK (Elasticsearch, Logstash, Kibana) stack. Developed monitoring dashboards to gain real-time insights into system performance, troubleshoot issues, and optimize resource utilization.
Conducted comprehensive disaster testing to validate system resilience and recovery processes, ensuring business continuity and minimizing potential downtime.
Developed monitoring dashboards and automated tasks through shell scripting, improving operational efficiency and enabling proactive monitoring and maintenance.
Troubleshot production issues within defined Service Level Agreements (SLAs), monitored key performance indicators (KPIs), and collaborated with teams to investigate root causes and implement preventive measures.
Fostered collaboration with team members to achieve target results, including cost optimization initiatives, resource management, and performance optimization. Encouraged knowledge sharing and cross-functional collaboration to drive innovation and operational excellence.
Lead Engineer
7- ELEVEN
10.2022 - 11.2022
Led a team of engineers in the successful deployment and management of New Relic systems for monitoring, observability, and performance management of critical applications.
Implemented streamlined deployment processes, leveraging CI/CD pipelines to automate building, testing, and deployment of code changes, resulting in reduction in deployment time and improved release consistency.
Conducted comprehensive code reviews and implemented coding standards, ensuring high-quality code and identifying potential issues, security vulnerabilities, and performance bottlenecks.
Utilized New Relic's monitoring capabilities to gain real-time insights into application performance, resource usage, and error rates. Set up customized alerts and notifications to proactively detect and resolve issues.
Collaborated with cross-functional teams, including developers and operations, to optimize application performance. Analyzed performance metrics, identified bottlenecks, and implemented targeted optimizations, resulting in improvement in response times.
Developed and executed capacity planning strategies using New Relic's capacity planning tools. Monitored system utilization, forecasted resource requirements, and recommended scaling measures, ensuring optimal system performance and availability.
Ensured adherence to security practices and compliance standards during deployment processes. Implemented access controls, encryption, and authentication mechanisms, and regularly audited security configurations to mitigate vulnerabilities and maintain compliance.
Technical Solution Lead
Huawei International Pte Ltd
12.2019 - 10.2022
Worked closely with app developers to understand their requirements and design infrastructure solutions that meet their needs, ensuring seamless integration and optimal performance.
Set up and maintained backend application clusters, middleware stacks (MySQL, Redis, and Kafka), and load-balanced application servers using Nginx as a reverse proxy. Implemented robust architectures to support high availability and scalability.
Executed database DDL/DML operations for production changes, ensuring data integrity and minimal disruption to services. Collaborated with developers to optimize database performance and implement efficient query strategies.
Leveraged Infrastructure as Code (IaC) practices to test and deploy new microservice code, ensuring consistency and reproducibility across environments. Implemented automation scripts for streamlined deployments and increased development efficiency.
Implemented log monitoring for microservices in a large-scale environment using the ELK (Elasticsearch, Logstash, Kibana) stack. Designed and configured dashboards to gain real-time insights into system performance, troubleshoot issues, and optimize resource utilization.
Conducted comprehensive disaster testing to validate system resilience and recovery processes. Collaborated with cross-functional teams to identify potential failure scenarios, simulate disaster events, and ensure business continuity.
Developed monitoring dashboards to track key performance indicators (KPIs) and automate tasks using shell scripting. Improved operational efficiency by implementing proactive monitoring and automating routine maintenance tasks.
Proactively troubleshooted and resolved production issues within defined Service Level Agreements (SLAs), minimizing downtime and ensuring a high level of availability. Collaborated with teams to investigate root causes, implement preventive measures, and continuously improve system stability.
Fostered collaboration with team members to achieve target results, including cost optimization initiatives, resource management, and performance optimization. Encouraged knowledge sharing and cross-functional collaboration to drive innovation and operational excellence.
Senior System Engineer
Cerner Corporation
12.2016 - 12.2019
Provisioned Linux and Windows nodes for health systems, ensuring smooth operation and system availability.
Conducted production system patching through Yum, ensuring security updates and bug fixes were applied in a timely manner.
Performed Windows server vulnerability patching to maintain a secure infrastructure and protect against potential threats.
Managed SSL certificate updates for Elastic Load Balancers (ELBs), ensuring secure communication and compliance with industry standards.
Provided support for Linux and Windows servers running on bare metal and virtual environments, including VMware. Performed administration tasks, troubleshooting, and performance tuning to ensure optimal system performance.
Administered cloud tools such as Chef, Zabbix, and Jenkins, ensuring their reliability and effective utilization within the infrastructure.
Troubleshot and resolved system errors through fault-finding, analysis, and inspections. Collaborated with cross-functional teams to identify root causes and implement appropriate solutions.
Built proof-of-concept (PoC) environments on AWS, leveraging its services to showcase the feasibility and benefits of proposed solutions.
Senior System Engineer
Capgemini India Ltd
07.2014 - 12.2016
Monitored backup jobs for mission-critical servers, ensuring data integrity, completion, and compliance with backup policies and schedules.
Designed cabling diagrams for new disaster recovery (DR) setups at data centers, ensuring optimal connectivity and reliable backup and recovery processes.
Set up and managed Windows, Linux, and database backups, implementing appropriate backup strategies and schedules to ensure data protection and availability.
Built VMware environments for new customers, configuring virtual machines, storage, and networking to meet customer requirements and ensure optimal performance.
Created servers as per customer requirements, deploying and configuring both Windows and Linux systems, including installation of necessary software and services.
Managed servers in both Windows and Linux environments, performing routine maintenance, monitoring system performance, and addressing issues to ensure high availability and performance.
Led the recovery of data in the event of a disaster or complete data loss, implementing recovery processes and procedures to minimize downtime and data loss.
Senior Business Development Manager at Huawei International Singapore Pte LtdSenior Business Development Manager at Huawei International Singapore Pte Ltd