Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

JAGAN KUMAR

Bengaluru

Summary

Having 7 years of experience as a Big Data Engineer & Administrator and an ETL expert, I have expertise in ETL processes and tools, including Cloudera Data Science Workbench (CDSW), Neo4j, Unravel, Cloudera Data Platform (CDP), Cloudera Distributed Hadoop (CDH), Hortonworks Data Platform (HDP), Apache Airflow, and cloud services such as AWS and GCP. I possess strong knowledge of Hadoop Administration and a deep understanding of Hadoop ecosystem components such as HDFS, MapReduce, YARN, Pig, Sentry, Sqoop, Hive, HBase, Oozie, Zookeeper, and Ranger. I am a self-motivated, responsible professional with excellent self-starting and teamwork abilities, and possess strong interactive communication skills.

Overview

8
8
years of professional experience

Work History

Big Data Engineer

Tookitaki
03.2024 - Current
  • Designed, developed, and maintained ETL data flows using shell scripts to automate data extraction, transformation, and loading processes for large-scale datasets.
  • Implemented and optimized ETL workflows in cloud environments, leveraging services in AWS ( S3, Lambda, Glue) and GCP (BigQuery, Cloud Storage, Dataflow).
  • Created and managed DAGs in Apache Airflow to schedule, monitor, and optimize ETL workflows, ensuring timely data availability for downstream processes.
  • Automated recurring ETL tasks and optimized shell scripts for better performance and resource utilization.
  • Developed robust mechanisms for data quality checks, ensuring accuracy and consistency across data pipelines.
  • Worked closely with data analysts, data scientists, and stakeholders to understand requirements and deliver reliable, scalable ETL solutions.
  • Monitored ETL processes for failures or delays, proactively identifying and resolving issues to ensure system reliability.
  • Maintained detailed documentation for ETL workflows, shell scripts, and cloud integration processes to support knowledge transfer and team collaboration.
  • Ensured adherence to data governance, compliance policies, and security best practices in all ETL processes.
  • Actively used GitHub for version control, pushing ETL scripts and workflows to repositories, managing branches, and collaborating with team members on code reviews and deployments.
  • Experience working with relational databases like Oracle, MySQL, and PostgreSQL for data integration.
  • Familiarity with file formats such as CSV, JSON, XML, and Parquet for data exchange.
  • Skilled in using UNIX/Linux environments for ETL workflow development and job orchestration.
  • Knowledge of integrating ETL flows with Hadoop ecosystems and cloud data platforms

Sr. Software Engineer

Harman International
03.2020 - 09.2021
  • Upgraded cluster from CDH to CDP.
  • Installed Hadoop patches and performed version upgrades as required.
  • Conducted performance tuning of Hadoop clusters and jobs for optimized efficiency.
  • Performed data balancing across clusters to ensure uniform distribution.
  • Set up Cloudera Data Platform (CDP) clusters from scratch.
  • Imported and exported data into HDFS using Sqoop for data migration and integration.
  • Analyzed system failures, identified root causes, and recommended appropriate solutions.
  • Created and updated policies in Apache Ranger, including adding users to maintain data security.
  • Documented system processes and procedures for future reference.
  • Participated in end-to-end Hadoop cluster setup, including installation, configuration, and monitoring.
  • Built ETL pipelines and created/configured Autosys jobs for workflow management and automation.
  • Developed Python scripts for data validation and cluster monitoring, including triggering Hive and Spark jobs and monitoring Hadoop services.

Technical Associate

Teradata Corporation
03.2020 - 09.2021
  • Responsible for Hadoop cluster maintenance, including commissioning and decommissioning of DataNodes, cluster monitoring, and troubleshooting.
  • Managed and reviewed data backups to ensure data integrity and availability.
  • Added and removed nodes to/from existing Hortonworks Data Platform (HDP) Hadoop clusters.
  • Experienced in defining job flows using Oozie for workflow automation.
  • Managed and reviewed Hadoop log files for troubleshooting and administration purposes.
  • Communicated and escalated issues appropriately to ensure timely resolution.
  • Created and updated policies in Apache Ranger, including adding users to policies as requested, to maintain data security.
  • Followed standard backup policies to ensure high availability of cluster.
  • Analyzed system failures, identified root causes, and recommended appropriate courses of action.
  • Documented system processes and procedures for future reference.
  • Monitored multiple Hadoop cluster environments to ensure performance and stability.
  • Monitored workload, job performance, and capacity planning using Hortonworks.
  • Experienced in working with tools like Pepperdata and Datameer for performance monitoring and data analysis.

Systems Engineer

Tata Consultancy Services
03.2017 - 03.2020
  • Set up and managed end-to-end Hadoop cluster, including installation, configuration, and continuous monitoring.
  • Decommissioned and commissioned nodes on running Cloudera Distributed Hadoop (CDH) cluster.
  • Installed and configured Hadoop ecosystem tools such as Sqoop, Flume, HBase, Zookeeper, and Oozie.
  • Managed and reviewed Hadoop log files for troubleshooting and performance optimization.
  • Conducted performance tuning for Hadoop clusters and jobs, including data balancing across clusters.
  • Imported and exported data into HDFS using Sqoop for seamless data migration.
  • Analyzed system failures, identified root causes, and recommended appropriate corrective actions.
  • Documented system processes and procedures for future reference.
  • Installed Hadoop patches and performed version upgrades as needed.
  • Worked with tools such as Cloudera Data Science Workbench, Neo4j, Geneos, Control-M, and Unravel.
  • Performed metadata backups for Hadoop using snapshots.
  • Created and assigned role-based access control using Sentry to ensure data security.
  • Collaborated with infrastructure, network, and application teams to ensure high data quality and availability.
  • Transferred data between clusters as part of migration and replication processes.
  • Troubleshot and resolved issues related to application usage, HDFS, and MapReduce jobs, providing RCA documentation.
  • Supported application users by addressing issues raised in ServiceNow tickets and JIRA.
  • Worked with version control systems, including Git and Bitbucket, for code management, commits, and deployments.

Education

Master of Science - Control and Automation

NIT Rourkela
Rourkela - INDIA
05.2016

Bachelor of Technology - Electrical and Electronics Engineering

JNTU-Kakinada
Kakinada - INDIA
04.2013

Skills

    Big Data frameworks (Hadoop, HDFS, MapReduce, YARN, Hive, Pig, HBase, Sqoop, Ranger, Spark, PySpark, Impala, Sentry, Hue, Oozie), data platforms (Cloudera Data Platform (CDP), Cloudera Distributed Hadoop (CDH), Hortonworks Data Platform (HDP)), operating systems (Windows, MacOS, Linux, Ubuntu), scripting and programming (Ansible, Shell Scripting, Python), project and version control tools (MS Office, MS Project, MS Visio, MS Visual Studio, PowerPoint, GIT, GitHub, Bitbucket, GitLab, AWS CodeCommit), CI/CD and containerization tools (Jenkins, Docker), cloud platforms (AWS, GCP), scheduling and monitoring tools (Control-M, Autosys, Geneos, Grafana), ticketing and workflow management tools (Jira, ServiceNow, Symphony SummitAI, AirFlow), and ETL processes

Timeline

Big Data Engineer

Tookitaki
03.2024 - Current

Technical Associate

Teradata Corporation
03.2020 - 09.2021

Sr. Software Engineer

Harman International
03.2020 - 09.2021

Systems Engineer

Tata Consultancy Services
03.2017 - 03.2020

Master of Science - Control and Automation

NIT Rourkela

Bachelor of Technology - Electrical and Electronics Engineering

JNTU-Kakinada
JAGAN KUMAR