Solid hands-on experience with Big Data Technologies like Apache Hadoop, Spark, Hive, Kafka, etc Specialized in using AWS services such as S3, Redshift, EMR, Lambda, Glue, and EC2 to build scalable and secure cloud data architectures. Great exposure in data ingestion involving large or near-real-time datasets, high data quality, increasing pipeline robustness and reducing the need for human intervention. Solid experience in building batch processing, real time and near real time data ingestion pipelines with Apache Kafka, Apache Spark streaming using Python, Java and Scala. Proven experience in Design, Develop, Test, Build and deploy Big Data and machine learning solutions for handling Petabyte scale data using Python, Java and Scala. Good exposure to Azure cloud technologies like ADLS, Synapse Analytics, Azure Databricks , ADF, Azure functions, Keyvault, AzureML, logic apps, AKS, etc Hands on experience in writing HiveQL and SparkSQL queries for querying and analyzing large volume of data. Hands on experience in Hive joins, partitions and bucketing and transactions. Worked with various file formats like Flat files, images, GIS, Avro, ORC, Parquet, XML, JSON, etc Solid hands on experience with containerization and orchestration using Docker and Kubernetes on various cloud platforms and on premise. Diligent Data Engineer with robust background in data engineering and proven ability to design and implement complex data pipelines. Successfully contributed to optimizing data architecture and enhancing data processing efficiencies. Demonstrated expertise in big data technologies and proficiency in Python and SQL.
Apache Hadoop
Apache Spark
Kafka
Hive
HDFS
YARN
Python
Java
Scala
MySQL
Oracle
SQL Server
Hbase
Cassandra
MongoDB
AWS
MS Azure
undefined