🚦 FlowSync: A Multi-Agent Deep Reinforcement Learning Framework for Coordinated Traffic Signal Control
FlowSync is a Multi-Agent Deep Reinforcement Learning (MARL) framework for adaptive traffic signal control.
Each intersection is modeled as an agent that cooperatively learns to optimize signal timing to minimize global traffic congestion.
A key innovation is a neighbor-state attention mechanism, enabling agents to dynamically weigh neighboring intersections’ states and learn coordinated behaviour.
Built with Python, TensorFlow/Keras, and validated using SUMO (Simulation of Urban MObility).
We formulate Traffic Signal Control (TSC) as a Multi-Agent Markov Decision Process (MDP):
-
$\mathcal{I} = {1, 2, \dots, N}$ — each agent$i$ is an intersection.
$s = (s_1, s_2, \dots, s_N)$ - Each agent observes only its local state
$s_i$ .
$A_i = {\text{Keep Phase}, \text{Change Phase}}$
- Determined by the SUMO simulator:
$P(s' \mid s, a)$
- Each agent receives local reward
$r_i = R(s, a, s')$
$\gamma \in [0, 1]$
Objective:
Each agent’s local state combines visual, tabular and neighbor information:
| Component | Shape / Type | Description |
|---|---|---|
| Local Map | 150 × 150 × 1 |
Visual occupancy map of nearby vehicles |
| Queue Length | 1 × 12 |
Per-lane queue counts |
| Vehicle Count | 1 × 12 |
Per-lane vehicle counts |
| Waiting Time | 1 × 12 |
Per-lane accumulated waiting time |
| Current Phase | one-hot | Current signal phase |
| Neighbor Queues | 4 × 12 |
Up to 4 neighbors' queue lengths |
| Neighbor Phases | 4 × 1 |
Up to 4 neighbors' current phases |
| Neighbor Mask | 1 × 4 |
Binary mask indicating active neighbors |
| Action | Meaning |
|---|---|
0 — Keep Phase |
Continue current traffic phase |
1 — Change Phase |
End current phase (yellow) and move to next |
Penalty-based reward designed to reduce congestion:
Weights conf/grid_2x2/sumo_agent.conf.
We approximate the optimal action-value function:
Training minimizes the TD loss:
with
Inputs: visual map, local vectors, neighbor states.
Local feature extraction
-
Visual Encoder: small CNN (32 filters @ 8×8, 16 filters @ 4×4) →
$v_{\text{map}}$ -
Vector Encoder: dense layers on concatenated per-lane vectors →
$v_{\text{local}}$ -
Local embedding: concatenate
$v_{\text{map}}$ and$v_{\text{local}}$ → dense →$e_{\text{local}}$
Coordination via attention
- Form neighbor embeddings then apply dot-product attention:
Output:
Q-value head
- Concatenate
$e_{\text{local}}$ and$c_{\text{neighbor}}$ , pass through two dense layers, output 2 linear units for{Keep, Change}.
| Setting | Value |
|---|---|
| Environment | SUMO |
| Scenario | 2×2 grid (J1–J4) |
| Controller | traffic_light_dqn.py |
| Parameter | Value |
|---|---|
| LEARNING_RATE | 0.001 |
| GAMMA | 0.8 |
| BATCH_SIZE | 20 |
| MAX_MEMORY_LEN | 1000 |
| UPDATE_PERIOD | 300 |
| UPDATE_Q_BAR_FREQ | 5 |
| D_DENSE | 20 |
| EPSILON | 0.00 |
Agents were trained on the grid_2x2 scenario (data/grid_2x2/grid.sumocfg).
Observations
- Learning trend: average reward improves over training (penalties reduced).
- Agent specialization: some intersections handle more traffic; all agents still show learning improvement.
sudo apt install sumo sumo-tools
pip install tensorflow numpy traci pandas matplotlibcd data/grid_2x2/
netconvert --node-files=grid.nod.xml --edge-files=grid.edg.xml --output-file=grid.net.xmlEdit runexp.py:
sumoBinary_nogui = "/usr/bin/sumo"
sumoBinary_gui = "/usr/bin/sumo-gui"
setting_memo = "grid_2x2"Run:
python runexp.pyFor headless runs, ensure sumo_cmd_str points to the nogui binary.
python plot_results.pyThis generates rewards_over_time.png from the latest memories.txt.
This implementation builds on the ideas in the IntelliLight work:
Hua Wei*, Guanjie Zheng*, Huaxiu Yao, Zhenhui Li, IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control, KDD 2018. IntelliLight repository
FlowSync/
├── conf/
│ └── grid_2x2/
├── data/
│ └── grid_2x2/
├── models/
│ ├── deeplight_agent.py
│ ├── network_agent.py
│ └── agent.py
├── runexp.py
├── traffic_light_dqn.py
├── plot_results.py
├── rewards_over_time.png
└── README.md
- Built on IntelliLight architecture (Wei et al., 2018)
- Inspired by MARL attention-based coordination frameworks
- Simulation powered by SUMO
