Summary

Overview

Work History

Education

Skills

Projects

Timeline

Eva Agarwal

Summary

Passionate and dedicated Developer with 6+ years' experience in the data-driven industry. Expertise in optimizing ingestion and computation frameworks for enhanced operational efficiency and actionable insights. Strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Demonstrated success in AI/ML implementation for data center prediction and pattern matching, identifying relationships, and building solutions to business problems

Overview

years of professional experience

Work History

Lead Data Engineer

Keppel DC&N Singapore

11.2023 - Current

Led the implementation of AI/ML solutions across multiple Keppel data centers, enhancing operational efficiency and predictive maintenance capabilities.
Automated routine tasks such as monitoring, alerting, and reporting, reducing manual intervention.
Deployed IoT edge devices equipped with diverse sensor modules, connecting to a central server IP for data transmission and monitoring. Established communication protocols for real-time data reads, enabling seamless integration with the server infrastructure. Implemented robust monitoring mechanisms to track sensor data streams, ensuring reliability and responsiveness in IoT ecosystem operations.
Managed Spark cluster to process massive data files using Scala, implementing efficient algorithms for data processing and analysis. Leveraged Spark DataFrame for reading and transforming data, optimizing performance through caching and partitioning strategies.
Implemented sharding and partitioning strategies in PostgreSQL Citus DB to optimize query processing and divide large files for batch and streaming data processing. Leveraged Citus's distributed architecture to shard data across multiple nodes, enabling parallel processing and efficient query execution for improved scalability and performance.
Developed and implemented a Grafana dashboard to visualize assets, ensuring real-time monitoring with data updates every 5 seconds
Utilized machine learning libraries to build a prediction model for deriving insights and detecting failures.
File Formats - Avro,Parquet, blob
IoThub, Eventhub, IoTEdge Modules

Data Engineer

NCS - Client : DBS Singapore

10.2020 - 10.2023

Create, build and maintain the data infrastructure required for ingestion, computation, extraction, transformation and loading of data from a wide variety of sources.
Batch processing and Real Time Data Management, Data Governance and end-to-end analysis for control tower and dashboards (solution design)
Development experience using Apache Kafka, Python,Spark, Hadoop, HDFS
Proficient in tools such as Jenkins pipeline, sparkola, Theia, Jira, Bitbucket, Postman API, Superset, Inbuilt AI Platforms for streaming data monitoring
Strong understanding of SQL and MariaDB with different source systems using ELK and kafka topics to achieve fast search response.
Technical analysis, solution design, developing, unit testing, code review, architecture and documentation, engage with QA to prepare infrastructure for integration and load testing activities
Development of the solution for SIT,UAT testing activities and demonstration sessions to business stakeholders.
Utilized Python libraries such as Pandas, NumPy, and Requests for data manipulation and API integration
Leveraged pyspark distributed processing capabilities to handle large volumes of data efficiently.
Employed SQL databases (e.g., MySQL) for data storage and retrieval.
Implemented scheduling tools like Apache Airflow to automate data pulling processes
Designed and developed scalable Python scripts and PySpark jobs for automated data extraction from various sources.
Implemented data cleaning and transformation techniques to ensure data quality and consistency
Collaborated with cross-functional teams to understand data requirements and optimize data pulling processes.
Created and maintained data pipelines, incorporating error handling, logging, and monitoring mechanisms.
Worked closely with stakeholders to identify and address performance bottlenecks, resulting in a significant reduction in data pulling time.

Data Analyst

NIF

01.2017 - 04.2019

Data modeling, data cleaning, data wrangling and data enrichment skills: establishment of new data processing procedures.
Quality assurance, validation and data linkage: ensured that data and models were managed and documented according to quality standards and procedures.
Designed and executed 52 completely bespoke tests of models and software using various libraries for unique traditional medicinal plants.
Continuously improved data preparation process through scripting and automation.

Bioinformatics Analyst

BioInnovations

09.2015 - 11.2016

Researched and adopted new technologies to add value to existing offerings.
Assessed data modeling and statistics to integrate high-level business processes with data rules.
Devised and implemented processes and procedures to streamline operations.
Leveraged big data technologies to manage large datasets efficiently while maintaining high levels of performance.
Developed new analytical models that improved forecasting accuracy and reduced risk exposure.
Increased efficiency by streamlining data analysis processes and implementing automation tools.

Education

Master of Technology -

Banasthali Vidyapith

07.2015

Skills

Database & Tools : Azure cloud, Azure Storage Explorer, Databricks, MySQL/MariaDB, Heidisql, RDBMS, Postgresql, Citus, DBveaver, sharding & partitioning, Git, IoT Edge modules & devices, Apache Spark Cluster, Kafka, Kubernetes,Docker, Grafana, Ansible, Pandas, Numpy, Sklearn, Matplotlib, Seaborn, Plotly, Cufflinks, Keras, Apache beam pipeline,sklearn,tensoflow
Languages: Python, Dask, Polars, Rust, SQL, Scala, Pyspark
Statistical Methods : Regression/Classification analysis, Time Series, RNN, LSTM

Real-time Analytics, Data Engineering & Visualisation
Worked on : User Acceptance Testing (UAT), SIT, Staging, PROD env
MLOps & Big Data

Projects

Project Title: AI/ML-Based Time Series Analysis for Data Center Prediction and Pattern Matching

Keppel DC&N division

Developed and deployed Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) models to perform time series analysis for predicting data center performance and identifying operational patterns.
Implemented batch and streaming data processing pipelines to handle data from both staging and production environments, ensuring real-time and accurate model predictions.
Built and maintained Apache Beam pipelines to process and transform data post-AI/ML analysis, enabling efficient data flow and integration into downstream systems.
Utilized Python and key ML libraries (TensorFlow, Keras) for model development and training on extensive datasets.
Integrated ML models into data center operations for proactive maintenance, leading to a 25% reduction in downtime and a 20% increase in efficiency.
Conducted thorough testing and validation of models, achieving high accuracy and robustness in predictions.
Presented findings and insights to stakeholders, demonstrating the value of ML-driven predictions in improving data center operations.

Timeline

Lead Data Engineer

Keppel DC&N Singapore

11.2023 - Current

Data Engineer

NCS - Client : DBS Singapore

10.2020 - 10.2023

Data Analyst

NIF

01.2017 - 04.2019

Bioinformatics Analyst

BioInnovations

09.2015 - 11.2016

Master of Technology -

Banasthali Vidyapith

Eva Agarwal

Summary

Overview

Work History

Lead Data Engineer

Data Engineer

Data Analyst

Bioinformatics Analyst

Education

Master of Technology -

Skills

Projects

Timeline

Lead Data Engineer

Data Engineer

Data Analyst

Bioinformatics Analyst

Master of Technology -

Similar Profiles

Mohammad AshaduzzamanMohammad Ashaduzzaman

Ramesh NelapudiRamesh Nelapudi

Deborah GardnerDeborah Gardner

Rohit JhaRohit Jha

Mayank JainMayank Jain