Software Engineer | Distributed Systems | Real-Time Architecture
Building real-time, scalable systems that power collaborative human-AI interaction through distributed state management and event-driven architectures.
I approach software as distributed systems first. My thinking centers on:
- Latency vs. Consistency Tradeoffs β Understanding when to sacrifice immediate consistency for responsiveness, and vice versa. Real-time systems require explicit tradeoff decisions.
- Event-Driven Thinking β Every system is a flow of immutable events processed through independent, scalable services rather than coupled monolithic operations.
- Distributed State Synchronization β The core challenge: maintaining eventual consistency across multiple clients/services without creating bottlenecks or race conditions.
- Scalability at Scale β Designing for multi-user concurrency from the ground upβnot as an afterthought. This shapes architectural choices from day one.
- Low-Latency Requirements β Real-time applications demand sub-100ms response times. Architecture must support this through careful service layering and data locality.
This mindset informs every design decision: How do components communicate? What happens during network partitions? Where are the single points of failure?
The Socratic Arena | github.com/Ayush-Kumar0207/The_Socratic_Arena
Problem: Moderate real-time competitive debates with AI-driven scoring while maintaining consistent rankings across concurrent matches.
Architecture:
Client (React) β WebSocket β Message Queue / Broker β Score Pipeline (AI Evaluation) β Leaderboard State Machine β Database (Event Log + Current State)
Key Challenges:
- AI Latency Boundary: Gemini API calls (500-2000ms) cannot block real-time message delivery. Solution: Async scoring pipeline with eventual consistency for leaderboard updates.
- Concurrent Match State: Multiple debates running simultaneously. Centralized state machine prevents race conditions in Elo calculations.
- Debate Ordering Guarantee: Messages must be processed in order within a debate, but debates themselves are independent. Partitioned event log per debate.
Tech Stack: Node.js, Socket.IO, Supabase (PostgreSQL), Gemini API, React, TailwindCSS
CodeVerse | github.com/Ayush-Kumar0207/codeverse
Problem: Enable real-time collaborative coding with live preview and multi-language execution without state divergence between clients.
Architecture:
Client A (Editor State) ββ WebSocket ββ Client B (Editor State) β Operational Transform / CRDT Layer β Canonical Document State β Code Execution Service (isolated runtime) Output β Broadcast to all clients
Key Challenges:
- Concurrent Edits: Three clients editing simultaneously. Without CRDT/OT, edits conflict. Solution: Implement operation-based merging to ensure convergence.
- Execution Isolation: Running untrusted code (Python, C++, Java) safely. Each execution is containerized with resource limits and timeouts.
- Latency Hiding: Execution takes 200-500ms. UI remains responsive through optimistic rendering while awaiting server-side results.
- Multi-Language Support: Backend abstraction over language runtimes. Each language has separate execution handler that returns standardized output.
Tech Stack: Next.js, Node.js, Socket.IO, TypeScript, Docker (execution), Express, Vercel
AlgoVista | github.com/Ayush-Kumar0207/algovista
Problem: Visualize algorithm execution state in real-time while maintaining step-wise consistency and enabling progress tracking across sessions.
Architecture:
Algorithm Executor (step-by-step iterator) β (yields state at each step) State Manager (immutable snapshots) β Visualization Renderer (React components) β (also persists to DB) Streak & Progress Tracker
Key Challenges:
- Execution State Complexity: Algorithms involve multiple data structures changing simultaneously. Each step is an atomic state transition. State must be serializable for session resumption.
- Rendering Efficiency: Redraw visualization on every step (can be 100+ steps). Solution: Memoization + incremental DOM updates. Only changed elements re-render.
- Progress Consistency: User pauses, closes browser, returns later. State snapshot on database allows resumption at exact step. No desync between visual state and persisted state.
- Streak Gamification: Track consecutive days without breaking state consistency. Distributed timestamp validation prevents clock-skew exploits.
Tech Stack: TypeScript, React, Algorithm visualization library, Supabase, Streak tracking service
| Capability | Application |
|---|---|
| Distributed Systems Design | Multi-service architectures with async communication patterns |
| Event-Driven Architectures | Decoupled systems communicating through immutable events and message brokers |
| Real-Time Synchronization | WebSocket-based state propagation with consistency guarantees |
| Consistency Models | Strong, Eventual, Causal, and Weak consistency tradeoff analysis |
| Scalable Backend Design | Horizontal scaling through partitioning, caching, and load balancing |
| AI Integration Pipelines | Non-blocking LLM calls with fallback and degradation strategies |
| Concurrency & State Management | Race condition prevention, distributed locks, atomic operations |
| Low-Latency System Design | Optimizing for p50, p95, p99 latency SLOs |
Computation Layer
TypeScript Β· JavaScript Β· Python Β· C++ Β· Java
Interface Layer
React Β· Next.js Β· TailwindCSS
Service Layer
Node.js Β· Express Β· Socket.IO
Real-Time Synchronization Layer
WebSocket (Socket.IO) Β· Event-driven message passing
Data Layer
PostgreSQL (Supabase) Β· Event sourcing for auditability
Execution Layer
Docker Β· Containerized runtimes with resource limits
Intelligence Layer
Gemini API Β· Async AI evaluation pipelines
Infrastructure Layer
Vercel Β· Cloud deployment Β· GitHub OAuth
What I Optimize For:
- Latency Budget Allocation β Every millisecond counts in real-time systems. I design with explicit latency budgets per service. When Gemini API hits the budget, the system has a fallback strategy rather than blocking.
- Consistency Under Concurrency β Multi-user systems are inherently chaotic. I use event sourcing, CRDT algorithms, or distributed locks depending on consistency requirements and failure modes.
- Scalability Through Partitioning β Vertical scaling hits limits fast. Instead, I partition horizontally: debates partitioned by debate ID, users by region, code execution by language runtime.
- Failure Mode Design β Systems fail. I design for graceful degradation: cache misses don't crash the system, AI timeouts don't block live editing, network partitions preserve data through event logs.
- Observability First β Distributed systems are opaque. Every component emits structured logs, metrics, and traces. Debugging production requires comprehensive observability.
- State Machine Discipline β Complex systems need explicit state machines. Transitions are validated, race conditions are eliminated through centralized orchestration.
- Sub-100ms round-trip latency in collaborative editing with concurrent multi-user edits
- Consistency of AI-generated rankings across distributed evaluation pipeline without blocking realtime updates
- State synchronization when clients reconnect after network failures (no data loss)
- Horizontal scalability to support thousands of simultaneous concurrent editors/debaters
- Safe execution of arbitrary user code without resource exhaustion or security vulnerabilities
- Recovery guarantees through immutable event logs and incremental state snapshots
- Race condition elimination in financial state (Elo rankings, scores) under high concurrency
- Operational Transform vs. CRDT tradeoffs for documents >100MB with 100+ concurrent editors
- Sub-50ms AI inference latency for real-time scoring without accuracy degradation
- Distributed consensus algorithms for multi-region state consistency
- Observability & tracing patterns for diagnosing latency tail in distributed systems
- Event sourcing retention policies and snapshot compression for long-running systems
Recent optimization decisions:
- WebSocket over HTTP polling β Latency budget required <100ms, polling adds 500-1000ms overhead. WebSocket reduced to 50-80ms p95.
- Async AI scoring β Blocking on Gemini API blocked UI. Async pipeline with eventual consistency maintains <100ms message delivery.
- Partition debates by ID β Single global state machine became bottleneck at 500+ concurrent debates. Partitioned event log per debate, single leaderboard update queue.
- CRDT over operational transform β OT requires operational history replay. CRDT enables tombstone-based deletion without history replay. Reduced memory overhead 60%.
I'm optimizing for problems where:
- Distributed systems thinking is critical to solution quality
- Real-time constraints drive architectural decisions
- Scalability is non-negotiable from day one
- Consistency under concurrency requires careful tradeoff analysis
- AI integration must not compromise latency requirements
Available for discussion on system design, distributed architecture, real-time systems, and building resilient high-scale platforms.
GitHub: Ayush-Kumar0207


