Architecture
State Disaggregation
Description
- Built a standalone control plane, separating tasks and states, optimizing the state migration mechanism in Flink.
- Applied Java and gRPC to create a distributed event-driven framework, where TaskManager manages operators.
- Utilized watermarks as the logical ingestion time to handle late-arriving events in window operators.
- Implemented consistent hashing with virtual nodes to minimize state migration cost during operator scaling.
- Used RocksDB to store state of TaskManager, employed etcd for storing routing table, ensuring fault-tolerance.
- Construct a scalable deployment on AWS EC2 using Docker Compose, auto-scaling, and load balancing.
- Evaluated latency during state migration with Prometheus and Grafana, finding no downtime, just a 30% rise.