Research Index

Papers on Agent Skills

A categorized index of 312 research papers on skill architectures, acquisition, composition, orchestration, and evaluation. Last updated March 2026.

312 Papers

15 Categories

1 Skill Architectures & Formats 22 2 Skill Acquisition & Discovery 18 3 Skill Composition & Orchestration 16 4 Skill Evaluation & Benchmarking 14 5 Tool Use & Function Calling 32 6 Knowledge Graphs for Agents 24 7 Procedural Memory & Experiential Learning 26 8 Self-Improving & Self-Evolving Agents 22 9 Multi-Agent Coordination & Delegation 20 10 Context Engineering & Prompt Management 18 11 Progressive Disclosure & Retrieval 20 12 Agent Planning & Reasoning 28 13 Security & Governance for Agent Skills 12 14 Domain-Specific Agent Applications 18 15 Surveys & Systematisations 22

Skill Architectures & Formats

22 papers

Agent Skills for LLMs: Architecture, Acquisition, Security2026

arXiv:2602.12430

Four-axis survey; SKILL.md spec; four-tier trust framework; 26.1% vulnerability rate

SoK: Agentic Skills — Beyond Tool Use2026

arXiv:2602.20867

Lifecycle systematisation; quality ceiling problem; concept unification table

SkillNet: Create, Evaluate, and Connect AI Skills2026

arXiv:2603.04448

200K+ skills; three-layer ontology; SCECM evaluation; 40% reward improvement

AgentSkillOS: Organizing, Orchestrating, Benchmarking2026

arXiv:2603.02176

Capability tree; DAG orchestration; ablation proving retrieval+orchestration both needed

SkillsBench: Benchmarking How Well Agent Skills Work2026

arXiv:2602.12670

84 tasks x 11 domains; modular > exhaustive; self-generated sometimes > human

Skill Acquisition & Discovery

18 papers

SkillWeaver: Web Agents Self-Improve2025

arXiv:2504.07079

Discover -> practice -> synthesise -> hone; 31.8% WebArena improvement

SAGE: RL for Self-Improving Agent with Skill Library2025

arXiv:2512.17102

RL-optimised skill library; sequential rollout; 8.9% higher completion

SEAgent: Self-Evolving Computer Use Agent2025

Curriculum-based self-evolution; 23.2pp gain over UI-TARS

EXIF: Automated Skill Discovery2025

arXiv:2506.04287

Exploration-first; Alice discovers, Bob learns; iterative feedback

ExpeL: LLM Agents Are Experiential Learners2023

arXiv:2308.10144

Experience -> insight extraction; transfer learning; no parameter updates

Skill Composition & Orchestration

16 papers

From Static Templates to Dynamic Graphs: Workflow Survey2026

arXiv:2603.22386

Workflow structure as first-class optimisation target

DyFlow: Interleaved Design and Execution2025

Dynamic workflow revision during execution

EvoFlow: Evolutionary Workflow Construction2025

Heterogeneous workflows via evolutionary algorithms

FlowReasoner: Query-Level Workflow Generation2025

RL-trained meta-agent generates per-query workflows

Workflow-R1: Multi-Turn Decision Process2026

Workflow construction as grouped think-act RL

Skill Evaluation & Benchmarking

14 papers

SkillsBench: Agent Skills Across Diverse Tasks2026

arXiv:2602.12670

First skills-as-artifacts benchmark; 84 tasks x 3 conditions

Terminal-Bench: Terminal Agent Evaluation2026

Containerised terminal agent evaluation

SWE-bench: Real-World GitHub Issues2024

arXiv:2310.06770

Production-level code issue resolution

SWE-bench Verified2024

Human-verified subset of SWE-bench

SWE-bench Pro: Long-Horizon Software Engineering2025

arXiv:2509.16941

Extended software engineering tasks

Tool Use & Function Calling

32 papers

Toolformer: LMs Can Teach Themselves Tools2023

arXiv:2302.04761

Self-supervised tool use learning

ToolLLM: Facilitating 16000+ Real-World APIs2023

arXiv:2307.16789

Massive API mastery framework

Tool Learning with LLMs: A Survey2025

arXiv:2405.17935

Comprehensive tool learning survey; four stages

ToolACE: Winning Function Calling2024

arXiv:2409.00920

Automated data generation for tool calling

ReTool: RL for Strategic Tool Use2025

arXiv:2504.11536

Tool-augmented RL; 67% AIME2024 accuracy

Knowledge Graphs for Agents

24 papers

GraphRAG: Local to Global Summarization2024

arXiv:2404.16130

Entity-relation graphs from corpora; community summaries

Graph RAG Survey (Peng et al.)2024

arXiv:2408.08921

First comprehensive GraphRAG overview

RAG with Graphs (GraphRAG)2025

arXiv:2501.00309

Unified GraphRAG formalization

Survey of Graph RAG2025

arXiv:2501.13958

IEEE survey on graph-based RAG

A-RAG: Agentic RAG via Hierarchical Retrieval2026

arXiv:2602.03442

Hierarchical retrieval interfaces; agentic autonomy

Procedural Memory & Experiential Learning

26 papers

Memory in the Age of AI Agents2025

arXiv:2512.13564

Factual/experiential/working memory taxonomy; dynamics

Rethinking Memory of Foundation Agents2026

arXiv:2602.06052

Cognitive mechanism + memory subject perspectives

Memory Mechanisms in LLM Agents: Survey2024

Inside-trial vs cross-trial information; memory management

MACLA: Hierarchical Procedural Memory2025

arXiv:2512.18950

Bayesian selection; meta-procedural playbooks; 78.1% avg

MemP: Exploring Agent Procedural Memory2025

arXiv:2508.06433

Systematic procedural memory analysis

Self-Improving & Self-Evolving Agents

22 papers

Comprehensive Survey of Self-Evolving Agents2025

arXiv:2507.21046

Tool creation spectrum; self-evolution taxonomy

Self-Improving AI Agents through Self-Play2025

arXiv:2512.02731

GVU operator; variance inequality for self-improvement

Metacognitive Learning for Self-Improving Agents2025

Metacognitive knowledge + planning + evaluation

Self-Play SWE-RL: Training Software Agents2025

arXiv:2512.18552

Self-play bug injection and repair; +10.4pp SWE-bench

Teaching LLM Agents How to Self-Improve2024

arXiv:2410.12468

Methods for agent self-improvement

Multi-Agent Coordination & Delegation

20 papers

Multi-Agent Collaboration Mechanisms: Survey2025

arXiv:2501.06322

Collaboration taxonomy: actors, types, structures, strategies

Multi-Agent Coordination Across Applications2025

arXiv:2502.14743

Unified understanding of coordination across domains

LLM-Based Human-Agent Collaboration Survey2025

arXiv:2505.00753

Delegation, supervision, cooperation, coordination

Orchestrating Human-AI Teams2025

arXiv:2510.02557

Manager Agent for ad hoc teamwork

Orchestration of Multi-Agent Systems2026

arXiv:2601.13671

MCP + A2A dual protocol foundation

Context Engineering & Prompt Management

18 papers

Building Effective AI Agents2024

Prompt chaining, routing, parallelization, orchestrator-workers

Prompt Engineering Interactive Tutorial2024

Step-by-step prompt engineering

Self-Instruct: Bootstrapping from Seeds2023

arXiv:2212.10560

Self-generating instruction examples

Tree of Thoughts: Deliberate Problem Solving2024

Tree-based deliberation for complex reasoning

Chain-of-Thought Prompting2022

Step-by-step reasoning in LLMs

Progressive Disclosure & Retrieval

20 papers

DSI: Differentiable Search Index2022

Hierarchical doc IDs; prefix-sharing

COLT: Completeness-Oriented Tool Retrieval2024

Completeness-oriented retrieval for LLMs

RecMind: LLM-Powered Recommendation Agent2023

Memory-augmented recommendation agent

ReadAgent: Long-Context Document Processing2023

Episode pagination -> memory gisting -> lookup

From Isolated to Hierarchical Tree Memory2024

Dynamic tree memory for LLMs

Agent Planning & Reasoning

28 papers

Reasoning with LM is Planning with World Model2023

arXiv:2305.14992

World model for LLM planning

On Planning Abilities of LLMs2023

Critical investigation of LLM planning

Can LLM-Reasoning Models Replace Planning?2025

arXiv:2412.10395

Comparison with classical planning

TPTU: Task Planning and Tool Usage2023

Task planning with tool usage

Paradigms for LLM Agents: Tool, Planning, Feedback2025

arXiv:2406.05804

Three-paradigm review; CoLing 2025

Security & Governance for Agent Skills

12 papers

Agent Skills Security Analysis2025

26.1% skills contain vulnerabilities

Skills Enable Realistic Prompt Injections2025

Skill-file prompt injection risks

Survey of Agent Interoperability Protocols2025

MCP, Agent Cards, related protocol comparison

ZeroDayBench: LLM Cybersecurity Capabilities2026

arXiv:2603.02297

Benchmarking vulnerability discovery and remediation

Navigating Risks: Security Threats in LLM Agents2024

arXiv:2411.09523

Security, privacy, ethics threat survey

Domain-Specific Agent Applications

18 papers

IoT-SkillsBench: Embedded Systems Skills2026

arXiv:2603.19583

HIL evaluation for MCU programming skills

IoT-MCP: Bridging LLMs and IoT2025

MCP for IoT system integration

BrowserAgent: Human-Inspired Web Browsing2025

ReAct-style with explicit memory for web

GUI-Eyes: Tool-Augmented Visual Grounding2026

Autonomous visual tool invocation for GUIs

GUI-Actor: Coordinate-Free Visual Grounding2025

Attention-based coordinate-free GUI grounding

Surveys & Systematisations

22 papers

Agent Skills for LLMs: Survey2026

arXiv:2602.12430

Definitive skills survey; four axes

SoK: Agentic Skills2026

arXiv:2602.20867

Lifecycle stages; quality ceiling

Self-Evolving Agents: Comprehensive Survey2025

arXiv:2507.21046

Tool creation spectrum; self-evolution taxonomy

Memory in the Age of AI Agents2025

arXiv:2512.13564

Agent memory taxonomy and dynamics

Rethinking Memory of Foundation Agents2026

arXiv:2602.06052

Cognitive + subject perspectives on memory

Missing something?

Know a paper, tool, or repo that should be listed here? We want this index to be exhaustive.

Request addition

← Back to Skills