Data Engineer candidate in Oklahoma City, Oklahoma, United States

Summary

Automated webscraping process from different online resources, perform data cleaning and data analysis.
Collaborated with developers in designing Access ETL, data pipelines and database architecture using data warehousing concepts.
Extensively worked with Alteryx, Informatica, AWS Glue, Snap Logic, building ETL data workflows, push the data into Snowflake, Databricks Data Lakes and use Power BI for Data Visualizations.
Experienced in building ETL data piplelines using sqoop and data processing using Hive, Spark, Scala and store the data in Parquet, AVRO, csv, ORC formats in HDFS, and worked with noSQL databases such as Hbase, Neo4j etc.,
Good Experience working on Azure cloud platform(Azure Synapse, Azure SQL, Azure Databricks, Azure Data Factory, Azure Data Lake & Blob Storage).
Designed optimizing Spark SQL queries, Data frames, import data from Data sources, perform transformations in spark, python, dbt, perform read/write operations, save the results to output directory into HDFS/Azure.
Developed PySpark scripts using both Data frames/SQL/Data sets and RDD/Map Reduce for Data aggregation queries and writing data back into the OLTP system through Sqoop.
Extensively worked on various analytic tools such as Alteryx, Everstream Analytics, Thoughtspot etc.,
Responsible for creating data pipelines from scratch for marketing data in a Hybrid cloud where data warehouse and reporting layers are distributed over AWS, GCP.
Advanced Data Retrieval experience with SQL in Sql Server, Sql Workbench and dbt in Snowflake.
Worked on data pre-processing, cleaning the data, and performed data imputation techniques for the missing values in the dataset using Python Dask.
Good exposure to working on streaming data using Kafka, Spark Streaming etc.,
Highly experienced working with Databases like Oracle, MySQL, SQL Server, Dynamo DB, PostgreSQL, Neo4j, MongoDB, MS Access, Cassandra, CosmoDB,Couchbase.
Good experience in creating Docker images, k8 clusters and managing them using Terraform.
Worked on Python - data manipulation, writing functions and performing statistical operations.
Created Airflow jobs with the snowflake operator to create fact and dimension tables in the snowflake warehouse.
Created dataflows using python prefect for managing, monitoring and scheduling workflows in prefect UI
Stored and retrieved data from data-warehouses using Snowflake and integrated snowflake with Power BI, and Tableau for dashboards & visualizations.
Well versed with Microsoft Business Intelligence Tools, creating SSIS (Integration Service), SSRS (Reporting Service) and SSAS (Analysis Service) packages
Worked on Data Extraction from APIs, Data Munging, azure, Data Modelling, Loading the data into Snowflake.
Created macros and used existing macros to develop SAS programs for Health Care clinical data analysis.
Identified trends and opportunities using data analysis and conduct statistical analysis for business growth and to avoid late payments.
Developed Stored Procedures to generate various Drill-through reports, Parameterized reports, Tabular reports, and Matrix reports using SSRS.
Created various repositories and to implement CI/CD using GIT and Jenkins.
Converted MapReduce programs into Spark transformations using Spark RDDs using PySpark. Used Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, azure, and Pair RDDs.
Worked with PySpark for using Spark libraries by using python scripting for data analysis.
Created interactive dashboards/reports using data visualization/BI tools (like Power BI, Tableau, Spotfire, QlikView, SAP Lumira, Clicdata, Cognos, Business Objects) to provide Savings metrics, azure, Finance performance, KPIs (Key Performance Indicator) metrics and operational reporting to managers, business owners and customers.
Developed and scheduled Python scripts to automate data formats.
Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency.

Expectations

As organizations are details are details with massive data on day to day basis, dealing with data problems are growing too, I can help for maintaining, migrating, storing, bulk loading the data and providing the access to data to other teams and clients, Performing data transformations (ELT), Working with BI tools on Datawarehouse datasets and data analysis using pyspark, python dask, SQL. Working on Modern data stack, data infrastructures, maintaining data quality, solve business problems