Skip to content

SulRash/EasyRogue-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EasyRogue-2

A procedurally generated roguelike dungeon game with a Gymnasium RL environment. Train agents with DQN/PPO or use a frozen VLM (Qwen3.5-4B) with a trainable action head. Play it yourself too.

The Game

You're @, navigating procedural dungeon floors to reach the exit >. Fight zombies z and vampires v, pick up health potions +, and descend deeper. Rooms connect via tunnels, enemies get harder with depth, and shops appear every 2 floors (if enabled).

Controls: 4 discrete actions: right, left, down, up.

Text render      Image render      Numeric render

Text render (LLMs). Image render (VLMs / RGB RL agents). Numeric observation (RL agents, 16x18 int grid).

Project Structure

src/                    # Game engine
  engine.py             # Core loop, FOV, bump system, level transitions
  world/level.py        # Level generation (rooms, tunnels, spawning)
  entities/             # Entity hierarchy (fighters, items, shops)
  utilities/            # Actions, A* pathfinding
env/                    # Gymnasium environment
  __init__.py           # EasyRogue env (3 render modes, configurable observability)
conf/                   # Reward configs (YAML)
scripts/                # Training, evaluation, benchmarking

Environment

from env import EasyRogue

env = EasyRogue(
    render_mode="numeric",   # "numeric" (16x18 int grid), "text" (ASCII), "image" (280x180 RGB)
    perfect_info=True,       # Full map vs 7x7 FOV
    reward_config="conf/rewards_default.yaml",
)
obs, info = env.reset()
obs, reward, terminated, truncated, info = env.step(action)  # action in {0,1,2,3}

The info dict includes: map (ASCII), player health/attack/defense, gold, enemies, potions, depth, exits taken.

RL Training

Uses Stable Baselines 3 with a custom GridCNN feature extractor that treats the 16x18 integer grid as a single-channel image (3 conv layers, 32->64->64 channels).

# Train PPO (best performer)
python scripts/train_ppo.py --perfect-info --timesteps 5000000

# Train DQN
python scripts/train_dqn.py --perfect-info --timesteps 5000000

# Evaluate and compare models
python scripts/evaluate_rl.py --models models/ppo_*.zip --algo ppo --n-episodes 500

Things that mattered:

  • CNN is required. MLP on flattened grids doesn't learn spatial navigation.
  • Don't use VecNormalize with discrete integer observations. It corrupts the replay buffer for DQN especially.
  • PPO needs ent_coef=0.02 to avoid collapsing to wall-bumping on procedural levels.
  • DQN needs long exploration (exploration_fraction=0.5, final_eps=0.05) or it collapses too.

VLM Benchmarking

Zero-shot evaluation of Qwen3.5-4B on the game, using either ASCII text or rendered screenshots.

# Text mode (local model, no server needed)
python scripts/benchmark_vlm_text.py --num-episodes 50

# Image mode
python scripts/benchmark_vlm_image.py --num-episodes 20

# With vLLM server instead of local inference
python scripts/benchmark_vlm_text.py --api-url http://localhost:8000

VLM Action Head Finetuning

Freezes Qwen3.5-4B as a feature extractor and trains a small MLP action/value head with PPO.

python scripts/finetune_vlm_actionhead.py \
    --render-mode text --no-perfect-info \
    --total-timesteps 50000 --device cuda

Architecture: game state (text or image) -> frozen VLM -> last-token features (2560-dim) -> 3-layer MLP -> 4 action logits + value estimate.

Training uses cosine LR decay, return normalization, value function clipping, and saves the best checkpoint by eval exit rate. The VLM runs once per step during rollout collection; cached features are reused across PPO epochs.

Results

All numbers are exit rate (% of episodes where the agent reaches the exit). Random baseline is 17%.

RL Agents (500-episode eval, numeric observations)

Model Perfect Info Imperfect Info (7x7 FOV)
PPO (5M steps) 62.6% (+98.0 reward) 29.2% (-23.6 reward)
DQN (5-10M steps) 10.0% (-20.6 reward) 34.4% (-39.2 reward)

PPO with full observability is the strongest RL agent: aggressive play, fast exits (85 avg steps), high kill rate (1.12/ep). DQN struggles with perfect info (exploration collapse) but is the best RL agent under imperfect info, playing patiently (179 avg steps, 66.6% survival).

VLM Zero-Shot (Qwen3.5-4B, no training)

Mode Perfect Info Imperfect Info
Text 4% 4%
Image 5% 5%

Worse than random. The model moves directionally (right/down bias) instead of exploring, so it rarely finds the exit.

VLM Action Head (Frozen Qwen3.5-4B + Trained MLP, 50K steps)

Mode Perfect Info Imperfect Info
Text 45% 70%
Image 50% 40%

Text+imperfect is the best agent in the entire project at 70% exit rate. Direction hints in the text prompt ("exit is 3 tiles RIGHT and 2 DOWN") let the VLM's language understanding handle spatial navigation better than learned CNN features with full map visibility. Image mode does better with perfect info (50% vs 45%) since the full map screenshot contains direct spatial information, but text with direction hints dominates under imperfect info.

Play It

python scripts/gameonly.py

Install

pip install -r requirements.txt

About

EasyRogue reincarnated better!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages