Research Project

UR–JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures

Bridging geometric measure theory and self-supervised learning.

LeJEPA (Balestriero & LeCun, 2025) recently identified the isotropic Gaussian as the optimal target distribution for JEPA embeddings. But the manifold hypothesis says real data concentrates on a low-dimensional subset of the ambient space, which is in tension with a full-dimensional isotropic target.

UR–JEPA resolves this tension by replacing the Gaussian target with a uniformly n-rectifiable measure: the canonical geometric-measure-theory notion of “quantitatively n-dimensional at every location and scale.” The regularizer is a Gaussian-kernel smoothed Carleson-type square function built on prior work: Chousionis–Garnett–Le–Tolsa, Square functions and uniform rectifiability (TAMS, 2016).

Selected results

UR–JEPA achieves +18 pp over IJEPA–IN22K foundation-model transfer on Galaxy10 SDSS (81.4% vs 62.9% linear-probe). In-domain UR–JEPA on a 21K-image astronomical dataset substantially exceeds a 630M-parameter foundation model pretrained on 22M images.
UR–JEPA outperforms LeJEPA on in-domain linear-probe accuracy on ImageNet-10 and ImageNet-100 at matched-recipe and 3 seeds:
- Inet10: Δ = +0.83 pp (paired-t = +15.5, p << 0.001) with ~30% lower seed variance.
- Inet100: final-acc Δ = +1.81 pp (paired-t = 3.14, one-sided p = 0.044, positive on all 3 seeds) at 800 epochs.
Statistically tied with LeJEPA at convergence on Galaxy10 SDSS and EuroSAT (3 seeds each). On EuroSAT the in-domain pair (UR–JEPA 95.97%, LeJEPA 96.11%) is competitive with Scale-MAE (ViT-L, ~300M params) at a 25× smaller backbone (resnet18, 11M params).
UR–JEPA bounds the local-density square-function it optimizes. On ImageNet-100, the loss's own per-image RMS|Δ| diagnostic has 5 to 40× fewer high-severity points under UR–JEPA than LeJEPA at every absolute threshold (e.g., 6 vs 198 above 0.72; ratio 33×). The mechanism is direct: L^CGLT minimizes the same statistic.
Anomalous classes are templated. A 3-seed clustering probe on the CGLT log-density Δ statistic flags Inet100 test images as anomalous when they are dominated by one repeating structure (American coot, garden spider, tile roof, window screen, computer keyboard, honeycomb). The diagnostic is seed-stable (cross-seed Spearman 0.63) and cloud-robust (Spearman 0.84), exactly what the dyadic log-density square function is built to detect.

Inet100 test-set images flagged as anomalous by the UR-JEPA local-scale clustering probe, dominated by repetitive textures (American coot, garden spider webs, tile roof, window screen, computer keyboard, honeycomb) — Representative anomalies flagged by the 3-seed Δ-clustering probe on ImageNet-100 test. Repetitive-texture / low-diversity classes dominate the upper-severity tail, which is exactly the failure mode the dyadic log-density square function is built to flag.

Geometrically distinct representation. Direct visualization of the projector output shows UR–JEPA leaves a 5 to 6 order-of-magnitude eigenvalue cliff at index ~20 to 25 (out of D = 32) with per-dimension Gaussianity preserved (Shapiro-Wilk W ≈ 0.99). LeJEPA's SIGReg drives the same covariance toward near-isotropy (top-to-bottom ratio at most 3.6). At matched accuracy, the two regularizers yield structurally distinct projected representations.
The AD-regularity anchor is empirically dispensable. A 3-seed Galaxy10 ablation removes the Ahlfors–David anchor from L^CGLT and matches the headline with tighter seed variance (±0.0009 vs ±0.0017), simplifying the recipe by one hyperparameter.

Self-supervised learningJEPAGeometric measure theoryUniform rectifiabilityWorld models

Paper: arXiv:2606.01443 · Code: github.com/SPATIOLYX/UR-JEPA · Status: under review at TMLR.

← Back to Research