Jatin Kumar

Jatin Kumar

Seattle, WA

jatinvarshney

Professional Summary

Lead Software Engineer with 18+ years architecting distributed streaming and data management systems for exabyte-scale platforms. Deep expertise in stream processing, fault-tolerant architectures, and high-availability transaction frameworks. Drove 290+ SDE years efficiency, reduced 4000+ yearly tickets, and prevented $60M in data quality incidents. Proven leader in technical strategy, cross-team engineering efforts, and balancing critical trade-offs in complex distributed environments while maintaining exceptional reliability and performance at scale.

Technical Skills

Languages:

  • Java
  • Python
  • Scala
  • C++
  • Rust
  • SQL
  • PartiQL
  • Smithy
  • TypeScript

Architecture & Systems:

  • Distributed Systems
  • Fault-Tolerant Design
  • Scalable Architecture
  • Event-Driven Systems
  • Microservices
  • Metadata Management
  • Transaction Processing
  • Data Lifecycle Management
  • Multi-threading
  • Memory Management
  • Networking
  • Storage
  • Caching

Streaming, Big Data & Cloud:

  • AWS
  • Apache Flink
  • Apache Kafka
  • Apache Spark
  • EMR
  • Parquet/ORC
  • Kinesis/Kafka
  • SQS
  • Event Bridge
  • Kubernetes
  • Docker
  • Redis/MemoryDB
  • Cassandra
  • DynamoDB
  • S3
  • Elasticsearch
  • Aurora/PostgreSQL

Engineering Excellence:

  • RESTful Web API Design
  • Code Reviews
  • Unit/Integration Testing
  • A/B Testing
  • Observability
  • Blue/green deployments
  • Performance Optimization
  • Scalable and Resilient Design
  • Algorithms & Data Structures

Application Areas:

  • Generative AI
  • Agentic AI
  • RAG
  • LLM
  • Stream Processing Platforms
  • Real-time Systems
  • Data Pipeline Orchestration
  • State Machines & Workflows
  • Data Lineage
  • Cost Attribution
  • Data Integration
  • Low-latency Processing

Employment

July 2011 - Present | Amazon | Senior Software Engineer / Technical Lead

Data Lifecycle Management Platform (2025)
Architected centralized Data Lifecycle Management platform, simplifying development, reducing 4000+ yearly tickets across 20+ teams, and preventing $60M in data quality incidents.
  • Architected and evangelized a centralized Data Lifecycle Management Platform for Amazon's exabyte-scale data lake, reducing 4000+ yearly tickets across 20+ teams.
  • Designed a fault-tolerant state machine with federated ownership and an event-driven architecture with guaranteed execution semantics, simplifying complex data workflows and preventing $60M in potential data quality incidents.
  • Mentored 3 Senior Engineers and established organization-wide architectural standards, earning VP-level endorsement.
Enterprise-Scale Data Lineage System (2025)
Led high-performance lineage system processing billions of relationships daily, reducing processing time from 12+ hours to 8 minutes.
  • Led architecture and implementation of a high-performance data lineage system, processing billions of relationships daily with sub-second latency for compliance and operational workflows.
  • Integrated a novel Rust-based distributed graph processing engine with stream processing capabilities, reducing data processing time from 12+ hours to 8 minutes and achieving 10x faster queries at scale.
  • Designed a real-time streaming refresh pipeline maintaining consistent 50-100ms query latency and implemented zero-downtime deployment for 24/7 availability of mission-critical services.
Data Completeness Platform (2023-2024)
Led redesign of Data Completeness Platform, addressing 1000+ customer tickets, developed Completeness Time travel Debugger reducing support incidents by 80%.
  • Led the redesign of a Data Completeness Platform, addressing systemic issues identified from 1000+ customer tickets and influencing product vision with VP and Sr. Principal Engineers.
  • Architected a next-generation system with an append-only segment stream and snapshot-based model, enabling time-travel queries and earning praise for technical excellence.
  • Used GenerativeAI to automate analysis of org wide user-reported tickets to find usability gaps in DataLake, and present live dashboard and weekly reports on recommended areas of investment to Senior Leadership.
  • Developed a Completeness Debugger tool, reducing support incidents by 80% and MTTR from days to minutes for critical data issues.
  • Delivered mission-critical Data Completeness APIs, integrating with a core internal resolution system to process 2M+ daily jobs with zero Sev-2 incidents post-launch, unblocking critical dataset onboarding.
  • Provided technical consultation for the migration of all relevant data producers to the new Completeness APIs, significantly reducing operational burden and enhancing scalability.
Event-Driven Stream Processing & Dependency Resolution System (2021-2023)
Architected system processing multi-million events per second with sub-100ms latency, saving 60+ SDE years.
  • Architected an event-driven streaming dependency resolution and lineage tracking system, processing millions of events per second with sub-100ms latency, saving substantial SDE years across numerous Amazon teams.
  • Designed APIs, and led the implementation of a comprehensive SDK with high-level constructs, AST, and code generation, reducing integration time from weeks to minutes.
  • Ensured system resilience and reliable message delivery during traffic spikes, establishing cross-team architectural standards.
Enterprise Cost Attribution Platform (2018-2020)
Architected platform processing billions of daily metrics, attributing $1B+ in costs and driving $200M+ annual savings.
  • Architected a company-wide cost attribution platform, processing billions of daily metrics for 2000+ auto-onboarded services.
  • Designed a reliable calculation engine attributing $1B+ in infrastructure costs with reproducible results and automated error analysis, driving $200M+ in annual infrastructure savings.
  • Integrated infrastructure cost metrics into A/B analysis tools, enabling data-driven infrastructure decisions and earning recognition from Senior Vice President.
NoSQL to Data Warehouse Integration Platform (2016-2017)
Architected data bridge system across 4000+ pipelines, helping Amazon deprecate Oracle datawarehouse, while saving 200+ SDE years.
  • Architected a high-throughput data bridge system integrating NoSQL/streaming data sources with analytics platforms, facilitating critical migration from Oracle to DynamoDB and reducing infrastructure costs.
  • Developed a scalable solution for streaming ingestion, delta aggregation, and transformation across 4000+ pipelines, saving 200+ SDE years.
  • Designed a self-service console reducing pipeline creation from days to minutes and implemented a thorough testing framework that reduced production incidents by 80%.
Core Streaming Platform Services (2011-2015)
Redesigned EventBus V2 handling multi-million TPS, architected Sagan V2 serving 2000+ services, optimized systems resulting in $5M+ annual savings.
  • Redesigned EventBus V2, a high-throughput stream processing service handling 10M+ events per second, implementing in-memory database architecture that reduced complexity by 70% and incidents by 80%
  • Architected Sagan V2, a distributed sequence generation service, achieving microsecond latency and serving 2000+ Amazon services.
  • Optimized a legacy ordering storage and routing system, reducing backend requests by 90% and fleet capacity by 40%, resulting in $5M+ annual savings and an ECPS Platform award.

Oct 2010 - Jul 2011 | RAMPGreen Technologies Pvt. Ltd. | Sr. Software Engineer

  • Architected enterprise web portals for technical problem resolution and business intelligence, extending open-source frameworks with cross-platform compatibility across desktop and mobile devices.

Jul 2007 - Oct 2010 | IBM India Software Labs | System Software Engineer | Technical Architect

  • Led IBM Rational Application Developer projects, creating patented communication tools improving productivity by 2000% and components featured in IBM technical publications.

Patents and Achievements

  • Patent holder for "System for generating modified video output" and "Real-time preview of URL addressable dynamic resources"
  • Winner: Cloudtune Hackathon 2021, Big Data Technologies Hackathon 2025 (implemented cost-saving infrastructure solutions)
  • Received ECPS Platform award from SVP for technical excellence in backend system design
  • Member of API Bar-raiser group and Operational Readiness Review committee, setting engineering standards for scalable systems
  • Published research paper in IEEE Computer Society and ACM Journal on archaeological artifact restoration using novel algorithms

Education

Bachelor of Technology, Computer Science and Engineering - Harcourt Butler Technological Institute, Kanpur, India (2007)