Research Index

Papers on Agent Skills

A categorized index of 312 research papers on skill architectures, acquisition, composition, orchestration, and evaluation. Last updated March 2026.

312 Papers
15 Categories
1

Skill Architectures & Formats

22 papers
Agent Skills for LLMs: Architecture, Acquisition, Security2026

Four-axis survey; SKILL.md spec; four-tier trust framework; 26.1% vulnerability rate

SoK: Agentic Skills — Beyond Tool Use2026

Lifecycle systematisation; quality ceiling problem; concept unification table

SkillNet: Create, Evaluate, and Connect AI Skills2026

200K+ skills; three-layer ontology; SCECM evaluation; 40% reward improvement

AgentSkillOS: Organizing, Orchestrating, Benchmarking2026

Capability tree; DAG orchestration; ablation proving retrieval+orchestration both needed

SkillsBench: Benchmarking How Well Agent Skills Work2026

84 tasks x 11 domains; modular > exhaustive; self-generated sometimes > human

2

Skill Acquisition & Discovery

18 papers
SkillWeaver: Web Agents Self-Improve2025

Discover -> practice -> synthesise -> hone; 31.8% WebArena improvement

SAGE: RL for Self-Improving Agent with Skill Library2025

RL-optimised skill library; sequential rollout; 8.9% higher completion

SEAgent: Self-Evolving Computer Use Agent2025

Curriculum-based self-evolution; 23.2pp gain over UI-TARS

EXIF: Automated Skill Discovery2025

Exploration-first; Alice discovers, Bob learns; iterative feedback

ExpeL: LLM Agents Are Experiential Learners2023

Experience -> insight extraction; transfer learning; no parameter updates

3

Skill Composition & Orchestration

16 papers
From Static Templates to Dynamic Graphs: Workflow Survey2026

Workflow structure as first-class optimisation target

DyFlow: Interleaved Design and Execution2025

Dynamic workflow revision during execution

EvoFlow: Evolutionary Workflow Construction2025

Heterogeneous workflows via evolutionary algorithms

FlowReasoner: Query-Level Workflow Generation2025

RL-trained meta-agent generates per-query workflows

Workflow-R1: Multi-Turn Decision Process2026

Workflow construction as grouped think-act RL

4

Skill Evaluation & Benchmarking

14 papers
SkillsBench: Agent Skills Across Diverse Tasks2026

First skills-as-artifacts benchmark; 84 tasks x 3 conditions

Terminal-Bench: Terminal Agent Evaluation2026

Containerised terminal agent evaluation

SWE-bench: Real-World GitHub Issues2024

Production-level code issue resolution

SWE-bench Verified2024

Human-verified subset of SWE-bench

SWE-bench Pro: Long-Horizon Software Engineering2025

Extended software engineering tasks

5

Tool Use & Function Calling

32 papers
Toolformer: LMs Can Teach Themselves Tools2023

Self-supervised tool use learning

ToolLLM: Facilitating 16000+ Real-World APIs2023

Massive API mastery framework

Tool Learning with LLMs: A Survey2025

Comprehensive tool learning survey; four stages

ToolACE: Winning Function Calling2024

Automated data generation for tool calling

ReTool: RL for Strategic Tool Use2025

Tool-augmented RL; 67% AIME2024 accuracy

6

Knowledge Graphs for Agents

24 papers
GraphRAG: Local to Global Summarization2024

Entity-relation graphs from corpora; community summaries

Graph RAG Survey (Peng et al.)2024

First comprehensive GraphRAG overview

RAG with Graphs (GraphRAG)2025

Unified GraphRAG formalization

Survey of Graph RAG2025

IEEE survey on graph-based RAG

A-RAG: Agentic RAG via Hierarchical Retrieval2026

Hierarchical retrieval interfaces; agentic autonomy

7

Procedural Memory & Experiential Learning

26 papers
Memory in the Age of AI Agents2025

Factual/experiential/working memory taxonomy; dynamics

Rethinking Memory of Foundation Agents2026

Cognitive mechanism + memory subject perspectives

Memory Mechanisms in LLM Agents: Survey2024

Inside-trial vs cross-trial information; memory management

MACLA: Hierarchical Procedural Memory2025

Bayesian selection; meta-procedural playbooks; 78.1% avg

MemP: Exploring Agent Procedural Memory2025

Systematic procedural memory analysis

8

Self-Improving & Self-Evolving Agents

22 papers
Comprehensive Survey of Self-Evolving Agents2025

Tool creation spectrum; self-evolution taxonomy

Self-Improving AI Agents through Self-Play2025

GVU operator; variance inequality for self-improvement

Metacognitive Learning for Self-Improving Agents2025

Metacognitive knowledge + planning + evaluation

Self-Play SWE-RL: Training Software Agents2025

Self-play bug injection and repair; +10.4pp SWE-bench

Teaching LLM Agents How to Self-Improve2024

Methods for agent self-improvement

9

Multi-Agent Coordination & Delegation

20 papers
Multi-Agent Collaboration Mechanisms: Survey2025

Collaboration taxonomy: actors, types, structures, strategies

Multi-Agent Coordination Across Applications2025

Unified understanding of coordination across domains

LLM-Based Human-Agent Collaboration Survey2025

Delegation, supervision, cooperation, coordination

Orchestrating Human-AI Teams2025

Manager Agent for ad hoc teamwork

Orchestration of Multi-Agent Systems2026

MCP + A2A dual protocol foundation

10

Context Engineering & Prompt Management

18 papers
Building Effective AI Agents2024

Prompt chaining, routing, parallelization, orchestrator-workers

Prompt Engineering Interactive Tutorial2024

Step-by-step prompt engineering

Self-Instruct: Bootstrapping from Seeds2023

Self-generating instruction examples

Tree of Thoughts: Deliberate Problem Solving2024

Tree-based deliberation for complex reasoning

Chain-of-Thought Prompting2022

Step-by-step reasoning in LLMs

11

Progressive Disclosure & Retrieval

20 papers
DSI: Differentiable Search Index2022

Hierarchical doc IDs; prefix-sharing

COLT: Completeness-Oriented Tool Retrieval2024

Completeness-oriented retrieval for LLMs

RecMind: LLM-Powered Recommendation Agent2023

Memory-augmented recommendation agent

ReadAgent: Long-Context Document Processing2023

Episode pagination -> memory gisting -> lookup

From Isolated to Hierarchical Tree Memory2024

Dynamic tree memory for LLMs

12

Agent Planning & Reasoning

28 papers
Reasoning with LM is Planning with World Model2023

World model for LLM planning

On Planning Abilities of LLMs2023

Critical investigation of LLM planning

Can LLM-Reasoning Models Replace Planning?2025

Comparison with classical planning

TPTU: Task Planning and Tool Usage2023

Task planning with tool usage

Paradigms for LLM Agents: Tool, Planning, Feedback2025

Three-paradigm review; CoLing 2025

13

Security & Governance for Agent Skills

12 papers
Agent Skills Security Analysis2025

26.1% skills contain vulnerabilities

Skills Enable Realistic Prompt Injections2025

Skill-file prompt injection risks

Survey of Agent Interoperability Protocols2025

MCP, Agent Cards, related protocol comparison

ZeroDayBench: LLM Cybersecurity Capabilities2026

Benchmarking vulnerability discovery and remediation

Navigating Risks: Security Threats in LLM Agents2024

Security, privacy, ethics threat survey

14

Domain-Specific Agent Applications

18 papers
IoT-SkillsBench: Embedded Systems Skills2026

HIL evaluation for MCU programming skills

IoT-MCP: Bridging LLMs and IoT2025

MCP for IoT system integration

BrowserAgent: Human-Inspired Web Browsing2025

ReAct-style with explicit memory for web

GUI-Eyes: Tool-Augmented Visual Grounding2026

Autonomous visual tool invocation for GUIs

GUI-Actor: Coordinate-Free Visual Grounding2025

Attention-based coordinate-free GUI grounding

15

Surveys & Systematisations

22 papers
Agent Skills for LLMs: Survey2026

Definitive skills survey; four axes

SoK: Agentic Skills2026

Lifecycle stages; quality ceiling

Self-Evolving Agents: Comprehensive Survey2025

Tool creation spectrum; self-evolution taxonomy

Memory in the Age of AI Agents2025

Agent memory taxonomy and dynamics

Rethinking Memory of Foundation Agents2026

Cognitive + subject perspectives on memory

Missing something?

Know a paper, tool, or repo that should be listed here? We want this index to be exhaustive.