Papers on Agent Skills
A categorized index of 312 research papers on skill architectures, acquisition, composition, orchestration, and evaluation. Last updated March 2026.
Skill Architectures & Formats
22 papersFour-axis survey; SKILL.md spec; four-tier trust framework; 26.1% vulnerability rate
Lifecycle systematisation; quality ceiling problem; concept unification table
200K+ skills; three-layer ontology; SCECM evaluation; 40% reward improvement
Capability tree; DAG orchestration; ablation proving retrieval+orchestration both needed
84 tasks x 11 domains; modular > exhaustive; self-generated sometimes > human
Skill Acquisition & Discovery
18 papersDiscover -> practice -> synthesise -> hone; 31.8% WebArena improvement
RL-optimised skill library; sequential rollout; 8.9% higher completion
Curriculum-based self-evolution; 23.2pp gain over UI-TARS
Exploration-first; Alice discovers, Bob learns; iterative feedback
Experience -> insight extraction; transfer learning; no parameter updates
Skill Composition & Orchestration
16 papersWorkflow structure as first-class optimisation target
Dynamic workflow revision during execution
Heterogeneous workflows via evolutionary algorithms
RL-trained meta-agent generates per-query workflows
Workflow construction as grouped think-act RL
Skill Evaluation & Benchmarking
14 papersFirst skills-as-artifacts benchmark; 84 tasks x 3 conditions
Containerised terminal agent evaluation
Production-level code issue resolution
Human-verified subset of SWE-bench
Extended software engineering tasks
Tool Use & Function Calling
32 papersSelf-supervised tool use learning
Massive API mastery framework
Comprehensive tool learning survey; four stages
Automated data generation for tool calling
Tool-augmented RL; 67% AIME2024 accuracy
Knowledge Graphs for Agents
24 papersEntity-relation graphs from corpora; community summaries
First comprehensive GraphRAG overview
Unified GraphRAG formalization
IEEE survey on graph-based RAG
Hierarchical retrieval interfaces; agentic autonomy
Procedural Memory & Experiential Learning
26 papersFactual/experiential/working memory taxonomy; dynamics
Cognitive mechanism + memory subject perspectives
Inside-trial vs cross-trial information; memory management
Bayesian selection; meta-procedural playbooks; 78.1% avg
Systematic procedural memory analysis
Self-Improving & Self-Evolving Agents
22 papersTool creation spectrum; self-evolution taxonomy
GVU operator; variance inequality for self-improvement
Metacognitive knowledge + planning + evaluation
Self-play bug injection and repair; +10.4pp SWE-bench
Methods for agent self-improvement
Multi-Agent Coordination & Delegation
20 papersCollaboration taxonomy: actors, types, structures, strategies
Unified understanding of coordination across domains
Delegation, supervision, cooperation, coordination
Manager Agent for ad hoc teamwork
MCP + A2A dual protocol foundation
Context Engineering & Prompt Management
18 papersPrompt chaining, routing, parallelization, orchestrator-workers
Step-by-step prompt engineering
Self-generating instruction examples
Tree-based deliberation for complex reasoning
Step-by-step reasoning in LLMs
Progressive Disclosure & Retrieval
20 papersHierarchical doc IDs; prefix-sharing
Completeness-oriented retrieval for LLMs
Memory-augmented recommendation agent
Episode pagination -> memory gisting -> lookup
Dynamic tree memory for LLMs
Agent Planning & Reasoning
28 papersWorld model for LLM planning
Critical investigation of LLM planning
Comparison with classical planning
Task planning with tool usage
Three-paradigm review; CoLing 2025
Security & Governance for Agent Skills
12 papers26.1% skills contain vulnerabilities
Skill-file prompt injection risks
MCP, Agent Cards, related protocol comparison
Benchmarking vulnerability discovery and remediation
Security, privacy, ethics threat survey
Domain-Specific Agent Applications
18 papersHIL evaluation for MCU programming skills
MCP for IoT system integration
ReAct-style with explicit memory for web
Autonomous visual tool invocation for GUIs
Attention-based coordinate-free GUI grounding
Surveys & Systematisations
22 papersDefinitive skills survey; four axes
Lifecycle stages; quality ceiling
Tool creation spectrum; self-evolution taxonomy
Agent memory taxonomy and dynamics
Cognitive + subject perspectives on memory
Know a paper, tool, or repo that should be listed here? We want this index to be exhaustive.