Senior Data Engineer
Summary
	Over 8+ years of diversified experience in Software Design & Development. Experience as Big Data Engineer solving business use cases for several clients. Experience in the field of software with expertise in backend applications.
	Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
	Implemented various frameworks for data pipelines and workflows using HBase, Kafka with Spark/PySpark, Python and Scala.
	Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
	Experience in setting up monitoring infrastructure for Hadoop cluster using Nagios and Ganglia.
	Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.
	Experience working with Amazon's AWS services like EC2, EMR, S3, KMS, Kinesis, Lambda, API gateways, IAM etc.
	Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
	Hands-on use of Spark and Scala API's to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.
	Expert knowledge of developing Power BI solutions by configuring the data security and assigning the licenses and sharing the reports and dashboards to the difference user groups in the organization.
	Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.
	Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture.
	Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files.
	Experience in building data pipelines using Azure Data factory, Azure Databricks and loading data to Azure Data Lake, Azure SQL Database, Azure SQL Data warehouse and Controlling and granting database access.
	Implemented near real time data pipeline using framework based on Kafka, Spark.
	Experience in developing customized UDFs in Python to extend Hive and Pig Latin functionality.
	Expertise in designing complex Mappings and have expertise in performance tuning and slowly changing Dimension Tables and Fact tables
	Participates in the development improvement and maintenance of snowflake database applications.
	Solid ability to querying and optimize diverse SQL Data Bases like MySQL, Oracle, Postgres and NoSQL Data bases like Apache HBase, Cassandra
	Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
	Strong experience in writing scripts using Python API, PySpark API and Spark API for analyzing the data.
	Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.
	Experience in working with Flume and NiFi for loading log files into Hadoop.
	Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
	Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
	Experience in developing customized UDFs in Python to extend Hive and Pig Latin functionality.
	Experience on ETL concepts using Informatica Power Center, AB Initio.
	Working experience with Linux lineup like Redhat and CentOS.
	Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.
Expectations
As a senior data engineer, I should possess a solid technical background and proficiency with data engineering principles and procedures. Programming skills in Python, SQL, or Java are a must, as is knowledge of data processing frameworks like Apache Spark, Apache Hadoop, or Apache Kafka. Additionally, I should be well-versed in data modeling, ETL (Extract, Transform, Load) procedures, and data integration methods. One of the key responsibilities of a senior data engineer is designing and implementing efficient and scalable data pipelines. This involves understanding the data requirements, working closely with data scientists and analysts to gather their needs, and building robust and reliable pipelines that transform and move data from various sources to the target destinations. I should be comfortable with tools and technologies commonly used for data integration and workflow orchestration, such as Apache Airflow, Luigi, or similar tools.
Employment Preferences
Expected Hourly Rate
** USD/hr
Academic Degree
Experience
Total Professional Experience
Startup Experience
Big-Tech Companies
Enterprise Experience
Skills
- Hadoop
- MapReduce
- Pig
- Hive
- YARN
- Kafka
- Flume
- Sqoop
- Impala
- Oozie
- Zookeeper
- Spark2.0
- Ambari
- Mahout
- MongoDB
- Cassandra
- Avro
- Storm
- Parquet
- Snappy
- Cloudera
- CDH3
- CDH4
- CDH5
- Hortonworks
- MapR
- Apache
- Java
- Python
- Jruby
- SQL
- HTML
- DHTML
- Scala
- JavaScript
- XML
- C
- C++
- HBase
- Servlets
- JavaBeans
- JSP
- JDBC
- JNDI
- EJB
- Struts
- XSD
- DTD
- JAXP
- SAX
- DOM
- JAXB
- Agile
- Waterfall
- Eclipse
- Ant
- Maven
- IntelliJ
- JUNIT
- Log4J
- Spring
- Hibernate
- WebSphere
- WebLogic
- JBoss
- Tomcat
- MySQL
- PL-SQL
- PostgreSQL
- Oracle
- AWS
- Azure
Contacts are hidden
Send a connection request to the candidate to get their contact details.
Contact Candidate
