Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

World-In-Agent reasoning + Agent-In-World curriculum for interactive LLM agent training

Overview

Role-Agent extends the verl-agent and veRL training stack with dual-role evolution for LLM agents. The project keeps the scalable multi-turn rollout and RL infrastructure from the upstream stack, while adding two Role-Agent components that make the agent learn both as an internal world model and as an actor shaped by past world failures.

Component	What It Adds	Main Entry Points
World-In-Agent (WIA)	Predict next environment feedback with `<predict_next>` and shape step rewards by prediction-observation similarity.	`role_agent/wia_utils.py`, `agent_system/multi_turn_rollout/rollout_loop.py`
Agent-In-World (AIW)	Track failed episodes as lightweight failure fingerprints and upweight similar tasks during training.	`role_agent/aiw_curriculum.py`, `verl/trainer/ppo/ray_trainer.py`
Training Integration	Toggle Role-Agent behavior through `algorithm.role_agent.*` without replacing the existing PPO/GiGPO pipeline.	`verl/trainer/config/ppo_trainer.yaml`
Launch Recipes	Ready-to-run scripts for ALFWorld, WebShop, Search-R1, and WebShop + GiGPO.	`examples/role_agent_trainer/`

For detailed design notes, implementation alignment, and known tradeoffs, see docs/role_agent_alignment.md.

Highlights

Dual-role training signal: combines next-state prediction quality with failure-aware task resampling.
Minimal integration surface: WIA/AIW are optional Hydra flags layered on top of the existing multi-turn rollout loop.
Long-horizon friendly: inherits verl-agent's step-independent rollout design, avoiding full-history concatenation at every turn.
Practical training recipes: includes concrete scripts for text-based environments and search/tool-use settings.

Quick Start

Install dependencies following the upstream verl-agent / veRL environment requirements, then run a Role-Agent recipe:

cd /path/to/roleagent
bash examples/role_agent_trainer/run_webshop.sh

Enable Role-Agent behavior with Hydra flags:

algorithm.role_agent.enable_wia=true \
algorithm.role_agent.enable_aiw=true

Common configuration keys:

Key	Purpose
`algorithm.role_agent.text_match_max_chars`	Cap text length for WIA/AIW similarity scoring.
`algorithm.role_agent.aiw_top_k`	Number of similar failed tasks to upweight.
`algorithm.role_agent.aiw_boost`	Cross-task AIW sampling boost.
`algorithm.role_agent.aiw_self_boost`	Failed-task self replay boost.
`algorithm.role_agent.aiw_similarity_thresh`	Optional similarity gate for cross-task boosts.
`algorithm.role_agent.aiw_max_history`	Maximum retained failure fingerprints.

When AIW is enabled, set data.dataloader_num_workers=0 so the mutable weighted sampler remains well-defined.

Example Recipes

Script	Environment	Algorithm
`examples/role_agent_trainer/run_alfworld.sh`	ALFWorld	PPO / GAE
`examples/role_agent_trainer/run_webshop.sh`	WebShop	PPO / GAE
`examples/role_agent_trainer/run_webshop_gigpo.sh`	WebShop	GiGPO
`examples/role_agent_trainer/run_search.sh`	Search-R1	GiGPO

Data-root overrides are documented in examples/role_agent_trainer/README.md.

Repository Map

role_agent/                         # WIA scoring, AIW curriculum, prompt utilities
agent_system/multi_turn_rollout/    # Multi-turn rollout loop with Role-Agent hooks
verl/trainer/ppo/ray_trainer.py     # PPO/GiGPO trainer integration
examples/role_agent_trainer/        # Role-Agent launch scripts
docs/role_agent_alignment.md        # Detailed design and implementation notes

Upstream Base

This repository is derived from verl-agent, which provides scalable multi-turn interaction, environment wrappers, and RL algorithms including PPO, GRPO, DAPO, RLOO, and GiGPO. Role-Agent focuses on the WIA/AIW training additions on top of that base.

Please also acknowledge the upstream projects when using this code:

License

This repository follows the upstream Apache-2.0 license. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
agent_system		agent_system
docker		docker
docs		docs
examples		examples
gigpo		gigpo
recipe		recipe
role_agent		role_agent
scripts		scripts
tests		tests
token_agent		token_agent
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Overview

Highlights

Quick Start

Example Recipes

Repository Map

Upstream Base

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Overview

Highlights

Quick Start

Example Recipes

Repository Map

Upstream Base

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages