Professional Summary
Lead Software Engineer with 18+ years architecting distributed streaming and data management systems for exabyte-scale platforms. Deep expertise in stream processing, fault-tolerant architectures, and high-availability transaction frameworks. Drove 290+ SDE years efficiency, reduced 4000+ yearly tickets, and prevented $60M in data quality incidents. Proven leader in technical strategy, cross-team engineering efforts, and balancing critical trade-offs in complex distributed environments while maintaining exceptional reliability and performance at scale.
Employment
Data Lifecycle Management Platform (2025)
Architected centralized Data Lifecycle Management platform,
simplifying development, reducing 4000+ yearly tickets across 20+ teams, and preventing $60M
in data quality incidents.
- Architected and evangelized a centralized Data Lifecycle Management Platform for Amazon's exabyte-scale data lake, reducing 4000+ yearly tickets across 20+ teams.
- Designed a fault-tolerant state machine with federated ownership and an event-driven architecture with guaranteed execution semantics, simplifying complex data workflows and preventing $60M in potential data quality incidents.
- Mentored 3 Senior Engineers and established organization-wide architectural standards, earning VP-level endorsement.
Led high-performance lineage system processing billions of
relationships daily, reducing processing time from 12+ hours to 8 minutes.
- Led architecture and implementation of a high-performance data lineage system, processing billions of relationships daily with sub-second latency for compliance and operational workflows.
- Integrated a novel Rust-based distributed graph processing engine with stream processing capabilities, reducing data processing time from 12+ hours to 8 minutes and achieving 10x faster queries at scale.
- Designed a real-time streaming refresh pipeline maintaining consistent 50-100ms query latency and implemented zero-downtime deployment for 24/7 availability of mission-critical services.
Led redesign of Data Completeness Platform, addressing 1000+
customer tickets, developed Completeness Time travel Debugger reducing support incidents by
80%.
- Led the redesign of a Data Completeness Platform, addressing systemic issues identified from 1000+ customer tickets and influencing product vision with VP and Sr. Principal Engineers.
- Architected a next-generation system with an append-only segment stream and snapshot-based model, enabling time-travel queries and earning praise for technical excellence.
- Used GenerativeAI to automate analysis of org wide user-reported tickets to find usability gaps in DataLake, and present live dashboard and weekly reports on recommended areas of investment to Senior Leadership.
- Developed a Completeness Debugger tool, reducing support incidents by 80% and MTTR from days to minutes for critical data issues.
- Delivered mission-critical Data Completeness APIs, integrating with a core internal resolution system to process 2M+ daily jobs with zero Sev-2 incidents post-launch, unblocking critical dataset onboarding.
- Provided technical consultation for the migration of all relevant data producers to the new Completeness APIs, significantly reducing operational burden and enhancing scalability.
Architected system processing multi-million events per second with
sub-100ms latency, saving 60+ SDE years.
- Architected an event-driven streaming dependency resolution and lineage tracking system, processing millions of events per second with sub-100ms latency, saving substantial SDE years across numerous Amazon teams.
- Designed APIs, and led the implementation of a comprehensive SDK with high-level constructs, AST, and code generation, reducing integration time from weeks to minutes.
- Ensured system resilience and reliable message delivery during traffic spikes, establishing cross-team architectural standards.
Architected platform processing billions of daily metrics,
attributing $1B+ in costs and driving $200M+ annual savings.
- Architected a company-wide cost attribution platform, processing billions of daily metrics for 2000+ auto-onboarded services.
- Designed a reliable calculation engine attributing $1B+ in infrastructure costs with reproducible results and automated error analysis, driving $200M+ in annual infrastructure savings.
- Integrated infrastructure cost metrics into A/B analysis tools, enabling data-driven infrastructure decisions and earning recognition from Senior Vice President.
Architected data bridge system across 4000+ pipelines, helping Amazon deprecate Oracle datawarehouse, while saving 200+ SDE years.
- Architected a high-throughput data bridge system integrating NoSQL/streaming data sources with analytics platforms, facilitating critical migration from Oracle to DynamoDB and reducing infrastructure costs.
- Developed a scalable solution for streaming ingestion, delta aggregation, and transformation across 4000+ pipelines, saving 200+ SDE years.
- Designed a self-service console reducing pipeline creation from days to minutes and implemented a thorough testing framework that reduced production incidents by 80%.
Redesigned EventBus V2 handling multi-million TPS, architected
Sagan V2 serving 2000+ services, optimized systems resulting in $5M+ annual savings.
- Redesigned EventBus V2, a high-throughput stream processing service handling 10M+ events per second, implementing in-memory database architecture that reduced complexity by 70% and incidents by 80%
- Architected Sagan V2, a distributed sequence generation service, achieving microsecond latency and serving 2000+ Amazon services.
- Optimized a legacy ordering storage and routing system, reducing backend requests by 90% and fleet capacity by 40%, resulting in $5M+ annual savings and an ECPS Platform award.
- Architected enterprise web portals for technical problem resolution and business intelligence, extending open-source frameworks with cross-platform compatibility across desktop and mobile devices.
- Led IBM Rational Application Developer projects, creating patented communication tools improving productivity by 2000% and components featured in IBM technical publications.