EasyRogue-2

A procedurally generated roguelike dungeon game with a Gymnasium RL environment. Train agents with DQN/PPO or use a frozen VLM (Qwen3.5-4B) with a trainable action head. Play it yourself too.

The Game

You're @, navigating procedural dungeon floors to reach the exit >. Fight zombies z and vampires v, pick up health potions +, and descend deeper. Rooms connect via tunnels, enemies get harder with depth, and shops appear every 2 floors (if enabled).

Controls: 4 discrete actions: right, left, down, up.

Text render (LLMs). Image render (VLMs / RGB RL agents). Numeric observation (RL agents, 16x18 int grid).

Project Structure

src/                    # Game engine
  engine.py             # Core loop, FOV, bump system, level transitions
  world/level.py        # Level generation (rooms, tunnels, spawning)
  entities/             # Entity hierarchy (fighters, items, shops)
  utilities/            # Actions, A* pathfinding
env/                    # Gymnasium environment
  __init__.py           # EasyRogue env (3 render modes, configurable observability)
conf/                   # Reward configs (YAML)
scripts/                # Training, evaluation, benchmarking

Environment

from env import EasyRogue

env = EasyRogue(
    render_mode="numeric",   # "numeric" (16x18 int grid), "text" (ASCII), "image" (280x180 RGB)
    perfect_info=True,       # Full map vs 7x7 FOV
    reward_config="conf/rewards_default.yaml",
)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(action)  # action in {0,1,2,3}

The info dict includes: map (ASCII), player health/attack/defense, gold, enemies, potions, depth, exits taken.

RL Training

Uses Stable Baselines 3 with a custom GridCNN feature extractor that treats the 16x18 integer grid as a single-channel image (3 conv layers, 32->64->64 channels).

# Train PPO (best performer)
python scripts/train_ppo.py --perfect-info --timesteps 5000000

# Train DQN
python scripts/train_dqn.py --perfect-info --timesteps 5000000

# Evaluate and compare models
python scripts/evaluate_rl.py --models models/ppo_*.zip --algo ppo --n-episodes 500

Things that mattered:

CNN is required. MLP on flattened grids doesn't learn spatial navigation.
Don't use VecNormalize with discrete integer observations. It corrupts the replay buffer for DQN especially.
PPO needs ent_coef=0.02 to avoid collapsing to wall-bumping on procedural levels.
DQN needs long exploration (exploration_fraction=0.5, final_eps=0.05) or it collapses too.

VLM Benchmarking

Zero-shot evaluation of Qwen3.5-4B on the game, using either ASCII text or rendered screenshots.

# Text mode (local model, no server needed)
python scripts/benchmark_vlm_text.py --num-episodes 50

# Image mode
python scripts/benchmark_vlm_image.py --num-episodes 20

# With vLLM server instead of local inference
python scripts/benchmark_vlm_text.py --api-url http://localhost:8000

VLM Action Head Finetuning

Freezes Qwen3.5-4B as a feature extractor and trains a small MLP action/value head with PPO.

python scripts/finetune_vlm_actionhead.py \
    --render-mode text --no-perfect-info \
    --total-timesteps 50000 --device cuda

Architecture: game state (text or image) -> frozen VLM -> last-token features (2560-dim) -> 3-layer MLP -> 4 action logits + value estimate.

Training uses cosine LR decay, return normalization, value function clipping, and saves the best checkpoint by eval exit rate. The VLM runs once per step during rollout collection; cached features are reused across PPO epochs.

Results

All numbers are exit rate (% of episodes where the agent reaches the exit). Random baseline is 17%.

RL Agents (500-episode eval, numeric observations)

Model	Perfect Info	Imperfect Info (7x7 FOV)
PPO (5M steps)	62.6% (+98.0 reward)	29.2% (-23.6 reward)
DQN (5-10M steps)	10.0% (-20.6 reward)	34.4% (-39.2 reward)

PPO with full observability is the strongest RL agent: aggressive play, fast exits (85 avg steps), high kill rate (1.12/ep). DQN struggles with perfect info (exploration collapse) but is the best RL agent under imperfect info, playing patiently (179 avg steps, 66.6% survival).

VLM Zero-Shot (Qwen3.5-4B, no training)

Mode	Perfect Info	Imperfect Info
Text	4%	4%
Image	5%	5%

Worse than random. The model moves directionally (right/down bias) instead of exploring, so it rarely finds the exit.

VLM Action Head (Frozen Qwen3.5-4B + Trained MLP, 50K steps)

Mode	Perfect Info	Imperfect Info
Text	45%	70%
Image	50%	40%

Text+imperfect is the best agent in the entire project at 70% exit rate. Direction hints in the text prompt ("exit is 3 tiles RIGHT and 2 DOWN") let the VLM's language understanding handle spatial navigation better than learned CNN features with full map visibility. Image mode does better with perfect info (50% vs 45%) since the full map screenshot contains direct spatial information, but text with direction hints dominates under imperfect info.

Play It

python scripts/gameonly.py

Install

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
conf		conf
env		env
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EasyRogue-2

The Game

Project Structure

Environment

RL Training

VLM Benchmarking

VLM Action Head Finetuning

Results

RL Agents (500-episode eval, numeric observations)

VLM Zero-Shot (Qwen3.5-4B, no training)

VLM Action Head (Frozen Qwen3.5-4B + Trained MLP, 50K steps)

Play It

Install

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EasyRogue-2

The Game

Project Structure

Environment

RL Training

VLM Benchmarking

VLM Action Head Finetuning

Results

RL Agents (500-episode eval, numeric observations)

VLM Zero-Shot (Qwen3.5-4B, no training)

VLM Action Head (Frozen Qwen3.5-4B + Trained MLP, 50K steps)

Play It

Install

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages