Software Engineer

General Software Development San Jose, California, United States

Profile
Contacts

Summary

The team focuses on, improving read/write throughput for heterogeneous data sources/sinks, reducing latencies and bounding upper percentiles when communicating with data stores, adding new query optimization rules, choosing better default memory configurations for applications, JVMs, and OS
Added support for the benchmark tool to benchmark ad-hoc queries irrespective of the benchmark type
Modified our tool to enable submit spark job in a single session (single EMR Step) on EMR reducing the total execution time from 2hrs to 30 mins
Added various spark configs (executor and driver) to better handle OOM, JVM crashes and worked on better logging to make debugging easier when an error is encountered
Worked on benchmarking query outliers and its improvements due to various reasons like S3 throttling
Designed and implemented a CI/CD pipeline using AWS CDK on Faragte, enabling automatic execution of integration tests upon code commit
Worked on merging all new Open-source-spark (OSS) commits to our fork of Spark to get latest bug fixes, improvements etc