Summary
Overview
Work History
Education
Skills
Projects
Timeline
Generic

Eva Agarwal

Summary

Passionate and dedicated Developer with 6+ years' experience in the data-driven industry. Expertise in optimizing ingestion and computation frameworks for enhanced operational efficiency and actionable insights. Strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Demonstrated success in AI/ML implementation for data center prediction and pattern matching, identifying relationships, and building solutions to business problems

Overview

9
9
years of professional experience

Work History

Lead Data Engineer

Keppel DC&N Singapore
11.2023 - Current
  • Led the implementation of AI/ML solutions across multiple Keppel data centers, enhancing operational efficiency and predictive maintenance capabilities.
  • Automated routine tasks such as monitoring, alerting, and reporting, reducing manual intervention.
  • Deployed IoT edge devices equipped with diverse sensor modules, connecting to a central server IP for data transmission and monitoring. Established communication protocols for real-time data reads, enabling seamless integration with the server infrastructure. Implemented robust monitoring mechanisms to track sensor data streams, ensuring reliability and responsiveness in IoT ecosystem operations.
  • Managed Spark cluster to process massive data files using Scala, implementing efficient algorithms for data processing and analysis. Leveraged Spark DataFrame for reading and transforming data, optimizing performance through caching and partitioning strategies.
  • Implemented sharding and partitioning strategies in PostgreSQL Citus DB to optimize query processing and divide large files for batch and streaming data processing. Leveraged Citus's distributed architecture to shard data across multiple nodes, enabling parallel processing and efficient query execution for improved scalability and performance.
  • Developed and implemented a Grafana dashboard to visualize assets, ensuring real-time monitoring with data updates every 5 seconds
  • Utilized machine learning libraries to build a prediction model for deriving insights and detecting failures.
  • File Formats - Avro,Parquet, blob
  • IoThub, Eventhub, IoTEdge Modules

Data Engineer

NCS - Client : DBS Singapore
10.2020 - 10.2023
  • Create, build and maintain the data infrastructure required for ingestion, computation, extraction, transformation and loading of data from a wide variety of sources.
  • Batch processing and Real Time Data Management, Data Governance and end-to-end analysis for control tower and dashboards (solution design)
  • Development experience using Apache Kafka, Python,Spark, Hadoop, HDFS
  • Proficient in tools such as Jenkins pipeline, sparkola, Theia, Jira, Bitbucket, Postman API, Superset, Inbuilt AI Platforms for streaming data monitoring
  • Strong understanding of SQL and MariaDB with different source systems using ELK and kafka topics to achieve fast search response.
  • Technical analysis, solution design, developing, unit testing, code review, architecture and documentation, engage with QA to prepare infrastructure for integration and load testing activities
  • Development of the solution for SIT,UAT testing activities and demonstration sessions to business stakeholders.
  • Utilized Python libraries such as Pandas, NumPy, and Requests for data manipulation and API integration
  • Leveraged pyspark distributed processing capabilities to handle large volumes of data efficiently.
  • Employed SQL databases (e.g., MySQL) for data storage and retrieval.
  • Implemented scheduling tools like Apache Airflow to automate data pulling processes
  • Designed and developed scalable Python scripts and PySpark jobs for automated data extraction from various sources.
  • Implemented data cleaning and transformation techniques to ensure data quality and consistency
  • Collaborated with cross-functional teams to understand data requirements and optimize data pulling processes.
  • Created and maintained data pipelines, incorporating error handling, logging, and monitoring mechanisms.
  • Worked closely with stakeholders to identify and address performance bottlenecks, resulting in a significant reduction in data pulling time.

Data Analyst

NIF
01.2017 - 04.2019
  • Data modeling, data cleaning, data wrangling and data enrichment skills: establishment of new data processing procedures.
  • Quality assurance, validation and data linkage: ensured that data and models were managed and documented according to quality standards and procedures.
  • Designed and executed 52 completely bespoke tests of models and software using various libraries for unique traditional medicinal plants.
  • Continuously improved data preparation process through scripting and automation.

Bioinformatics Analyst

BioInnovations
09.2015 - 11.2016
  • Researched and adopted new technologies to add value to existing offerings.
  • Assessed data modeling and statistics to integrate high-level business processes with data rules.
  • Devised and implemented processes and procedures to streamline operations.
  • Leveraged big data technologies to manage large datasets efficiently while maintaining high levels of performance.
  • Developed new analytical models that improved forecasting accuracy and reduced risk exposure.
  • Increased efficiency by streamlining data analysis processes and implementing automation tools.

Education

Master of Technology -

Banasthali Vidyapith
07.2015

Skills

  • Database & Tools : Azure cloud, Azure Storage Explorer, Databricks, MySQL/MariaDB, Heidisql, RDBMS, Postgresql, Citus, DBveaver, sharding & partitioning, Git, IoT Edge modules & devices, Apache Spark Cluster, Kafka, Kubernetes,Docker, Grafana, Ansible, Pandas, Numpy, Sklearn, Matplotlib, Seaborn, Plotly, Cufflinks, Keras, Apache beam pipeline,sklearn,tensoflow
  • Languages: Python, Dask, Polars, Rust, SQL, Scala, Pyspark
  • Statistical Methods : Regression/Classification analysis, Time Series, RNN, LSTM
  • Real-time Analytics, Data Engineering & Visualisation
  • Worked on : User Acceptance Testing (UAT), SIT, Staging, PROD env
  • MLOps & Big Data

Projects

Project Title: AI/ML-Based Time Series Analysis for Data Center Prediction and Pattern Matching

Keppel DC&N division

  • Developed and deployed Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) models to perform time series analysis for predicting data center performance and identifying operational patterns.
  • Implemented batch and streaming data processing pipelines to handle data from both staging and production environments, ensuring real-time and accurate model predictions.
  • Built and maintained Apache Beam pipelines to process and transform data post-AI/ML analysis, enabling efficient data flow and integration into downstream systems.
  • Utilized Python and key ML libraries (TensorFlow, Keras) for model development and training on extensive datasets.
  • Integrated ML models into data center operations for proactive maintenance, leading to a 25% reduction in downtime and a 20% increase in efficiency.
  • Conducted thorough testing and validation of models, achieving high accuracy and robustness in predictions.
  • Presented findings and insights to stakeholders, demonstrating the value of ML-driven predictions in improving data center operations.

Timeline

Lead Data Engineer

Keppel DC&N Singapore
11.2023 - Current

Data Engineer

NCS - Client : DBS Singapore
10.2020 - 10.2023

Data Analyst

NIF
01.2017 - 04.2019

Bioinformatics Analyst

BioInnovations
09.2015 - 11.2016

Master of Technology -

Banasthali Vidyapith
Eva Agarwal