PATS: Proficiency-Aware Temporal Sampling for Multi-View Sports Skill Assessment

Bianchi, Edoardo; Liotta, Antonio

PATS: Proficiency-Aware Temporal Sampling for Multi-View Sports Skill Assessment

Edoardo Bianchi, Antonio Liotta

Free University of Bozen-Bolzano
IEEE Sport Technology and Research Workshop 2025

PATS

PATS is a novel video sampling strategy designed specifically for automated sports skill assessment. Unlike traditional methods that randomly sample frames or use uniform intervals, PATS preserves complete fundamental movements within continuous temporal segments.

Abstract

Automated sports skill assessment requires capturing fundamental movement patterns that distinguish expert from novice performance, yet current video sampling methods disrupt the temporal continuity essential for proficiency evaluation. To this end, we introduce Proficiency-Aware Temporal Sampling (PATS), a novel sampling strategy that preserves complete fundamental movements within continuous temporal segments for multi-view skill assessment. PATS adaptively segments videos to ensure each analyzed portion contains full execution of critical performance components, repeating this process across multiple segments to maximize information coverage while maintaining temporal coherence. Evaluated on the EgoExo4D benchmark with SkillFormer, PATS surpasses the state-of-the-art accuracy across all viewing configurations (+0.65% to +3.05%) and delivers substantial gains in challenging domains (+26.22% bouldering, +2.39% music, +1.13% basketball). Systematic analysis reveals that PATS successfully adapts to diverse activity characteristics-from high-frequency sampling for dynamic sports to fine-grained segmentation for sequential skills-demonstrating its effectiveness as an adaptive approach to temporal sampling that advances automated skill assessment for real-world applications.

Proficiency-Aware Temporal Sampling

PATS Example

In this example configuration, PATS extracts N_target = 32 frames from N_s = 2 continuous temporal segments of duration d_s = 3s from a 10 s video. Within each segment, ⌊N_target/N_s⌋ = 16 frames are sampled uniformly (red vertical lines), preserving temporal continuity within segments. Segment positioning with automatic spacing prevents overlap and ensures comprehensive temporal coverage. This configuration is used in the basketball and bouldering domains.

Comparison with State-of-the-Art Methods

PATS Overall Accuracy

SkillFormer+PATS surpasses the state-of-the-art accuracy across all viewing configurations. Our approach delivers consistent improvements over the original SkillFormer: 47.3% accuracy for egocentric views (+3.05%), 46.6% for exocentric views (+0.65%), and 48.0% for combined views (+1.05%). These gains are achieved while maintaining computational efficiency with 14-27M parameters and 4 training epochs.

Optimal Configuration Patterns

Systematic analysis reveals PATS adapts to activity characteristics: 32 frames proves universally optimal, sampling rates vary by dynamics (4.0-5.33 FPS for dynamic vs. 0.89 FPS for sequential), view selection matches skill type (egocentric for proprioceptive, fused for technique-based), and segmentation inversely correlates with action continuity (2-12 segments).

Scenario-Specific Optimal Configuration

Scenario-Specific Configuration

PATS demonstrates strong adaptability across diverse activity domains, with optimal configurations varying by skill characteristics. Basketball achieves the highest accuracy at 78.76% using rapid multi-view sampling, while music reaches 74.14% through fine-grained egocentric capture with 12 segments. Cooking and bouldering show distinct preferences for exocentric-only and egocentric-only views respectively, both utilizing high-frequency sampling.

The most substantial improvements appear in domains requiring precise temporal coordination. Bouldering shows the largest gain at +26.22% over SkillFormer, followed by music at +2.39% and basketball at +1.13%. These results validate PATS' effectiveness for proprioceptive skills where temporal continuity is critical.

However, limitations emerge in certain scenarios. Dancing presents a mixed case where PATS improves over SkillFormer but remains below baseline methods, suggesting the explored parameter combinations may not fully capture rhythmic and aesthetic components. Soccer shows a specific decline in egocentric view accuracy, indicating that PATS' temporal sampling strategy may not suit all activity-view combinations. These challenges point to opportunities for further refinement in scenario-specific configurations and potentially automated parameter selection mechanisms for broader applicability.

BibTeX

@INPROCEEDINGS{Bian2510:PATS,
AUTHOR="Edoardo Bianchi and Antonio Liotta",
TITLE="{PATS:} {Proficiency-Aware} Temporal Sampling for {Multi-View} Sports Skill
Assessment",
BOOKTITLE="2025 IEEE International Workshop on Sport, Technology and Research (STAR)
(IEEE STAR 2025)",
ADDRESS="Trento, Italy",
PAGES=6,
DAYS=29,
MONTH=oct,
YEAR=2025,
ABSTRACT="Automated sports skill assessment requires capturing fundamental movement
patterns that distinguish expert from novice performance, yet current video
sampling methods disrupt the temporal continuity essential for proficiency
evaluation. To this end, we introduce Proficiency-Aware Temporal Sampling
(PATS), a novel sampling strategy that preserves complete fundamental
movements within continuous temporal segments for multi-view skill
assessment. PATS adaptively segments videos to ensure each analyzed portion
contains full execution of critical performance components, repeating this
process across multiple segments to maximize information coverage while
maintaining temporal coherence. Evaluated on the EgoExo4D benchmark with
SkillFormer, PATS surpasses the state-of-the-art accuracy across all
viewing configurations (+0.65\% to +3.05\%) and delivers substantial gains
in challenging domains (+26.22\% bouldering, +2.39\% music, +1.13\%
basketball). Systematic analysis reveals that PATS successfully adapts to
diverse activity characteristics-from high-frequency sampling for dynamic
sports to fine-grained segmentation for sequential skills-demonstrating its
effectiveness as an adaptive approach to temporal sampling that advances
automated skill assessment for real-world applications."
}

More Works from Our Lab

ProfVLM: A Lightweight Video-Language Model for Multi-View Proficiency Estimation

Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton Information

SkillFormer: Unified Multi-View Proficiency Estimation for Proficiency Estimation

PATS: Proficiency-Aware Temporal Sampling for Multi-View Sports Skill Assessment

Abstract

Proficiency-Aware Temporal Sampling

Comparison with State-of-the-Art Methods

Optimal Configuration Patterns

Scenario-Specific Optimal Configuration

BibTeX