A Stream Processing System with State Disaggregation

Feb. 2023 – May 2023

Architecture

State Disaggregation

Description

  • Built a standalone control plane, separating tasks and states, optimizing the state migration mechanism in Flink.
  • Applied Java and gRPC to create a distributed event-driven framework, where TaskManager manages operators.
  • Utilized watermarks as the logical ingestion time to handle late-arriving events in window operators.
  • Implemented consistent hashing with virtual nodes to minimize state migration cost during operator scaling.
  • Used RocksDB to store state of TaskManager, employed etcd for storing routing table, ensuring fault-tolerance.
  • Construct a scalable deployment on AWS EC2 using Docker Compose, auto-scaling, and load balancing.
  • Evaluated latency during state migration with Prometheus and Grafana, finding no downtime, just a 30% rise.
Yingzhe Dong
Yingzhe Dong
Graduate Student of Computer Science

I am a full-stack developer with interests in software development, system architecture, and distributed system.