Filtered out
Friday · May 29, 2026
Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systemsOn the Geometry of Games and their SolversFHRFormer: A Self-Supervised Masked Transformer Framework for Fetal Heart Rate Time-Series Inpainting and ForecastingEvoMD-LLM: Learning the Language of Species Evolution in Reactive Molecular DynamicsSurfacing Isolated Learners with Outcome-Independent Mediation of Feedback between Teachers and Students Using AIOrthogonal Concept Erasure for Diffusion ModelsDifferentiable Belief-based Opponent ShapingTrends in AI and Human-AI Interaction in Clinical Trials -- A Hybrid Human-AI ExplorationOmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science SubfieldsQuantifying and Optimizing Simplicity via Polynomial RepresentationsMind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete DiffusionBehavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference PredictionPractitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey EvidenceBridging the Sim-to-Real Gap in Reinforcement Learning-Based Industrial Dispatching through Execution SemanticsOptSkills: Learning Generalizable Optimization Skills from Problem Archetypes via Cluster-Based DistillationLLM-Evolved Domain-Independent Heuristics for Symbolic AI PlanningImproving Collaborative Storytelling with a Multi-Agent Framework Based on Large Language ModelsBitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-DevicesCertified Policy Optimisation for Nested Causal Bandits via PAC-Bayes RiskMiraBench: Evaluating Action-Conditioned Reliability in Robotic World ModelsUncertainty-Aware Transfer Learning for Cross-Building Energy Forecasting: Toward Robust and Scalable District-Level Energy ManagementBenchmarking Positional Encoding Strategies for Transformer-Based EEG Foundation ModelsBehavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Predictionmarkdown-svg-rendererFrom XXLTraffic to EvoXXLTraffic: Scaling Traffic Forecasting to Sensor-Evolving NetworksFinVerBench: Benchmark Validity and Calibration in Large Language Model Financial Statement VerificationCitation-Closure Retrieval and Per-Rule Attribution for Real-World Regulatory Compliance Question AnsweringTailoring the Curriculum: Student-Centered Reasoning Distillation via Dynamic Data-Model CompatibilityDiagnosing Harmful Continuation in Answer-Correct Long-CoT Training TracesPaper Agents, Paper Gains: An Empirical Analysis of DeFi Investment AgentsHarmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic SchedulingWhen and How Human Curation Backfires: Preference Alignment under Multi-Model Self-Consuming LoopConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE CompressionPassNet: Scaling Large Language Models for Graph Compiler Pass GenerationBattery-Sim-Agent: Leveraging LLM-Agent for Inverse Battery Parameter EstimationThink Fast, Talk Smart: Partitioning Deterministic and Neural Computation for Structured Health Text GenerationPTCG-Bench: Can LLM Agents Master Pok\'emon Trading Card Game?NaRA: Noise-Aware LoRA for Parameter-Efficient Fine-Tuning of Diffusion LLMsMEMENTO: Leveraging Web as a Learning Signal for Low-Data DomainsPRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted ReviewingHarnessing non-adversarial robustness in large language models🔬ESM: The Bitter Lesson is Coming for Proteins - Alex Rives, BioHubPlanning with the Views via Scene Self-ExplorationDenseSteer: Steering Small Language Models towards Dense Math ReasoningFrontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypesUI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI AgentsCroissant Tasks: A Metadata Format for Reproducible Machine Learning EvaluationsThe Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion ModelsAligned but Fragile: Enhancing LLM Safety Robustness via Zeroth-Order OptimizationXetrieval: Mechanistically Explaining Dense RetrievalEntropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language ModelsThe Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language ModelingBEAMS: Benchmarking and Evaluating AI for Modeling and SimulationMINDGAMES: A Live Arena for Evaluating Social and Strategic Reasoning in Multi-Agent LLMsDeepSurvey: Enhancing Analytical Depth and Citation Reliability in Automated Survey GenerationThe Chain Holds, the Answer Folds: Trace-Answer Dissociation in Reasoning Models Under Adversarial PressureHiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question AnsweringSAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic SearchReview Arcade: On the Human Alignment and Gameability of LLM ReviewsVFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element AnalysisTRACE: Toulmin-based Reasoning Assessment through Constructive Elements for LLM CoT EvaluationRobust and Efficient Guardrails with Latent ReasoningRethinking Literature Search Evaluation: Deep Research Helps, and Human Citation Lists Are Not a Ground TruthReasonLight: A Multimodal Foundation Model-Enhanced Reinforcement Learning Framework for Zero-Shot Traffic Signal ControlRubric-Guided Process Reward for Stepwise Model RoutingReasonOps: Operator Segmentation for LLM Reasoning TracesCoHyDE: Iterative Co-Training of LLM Rewriter & Dense Encoder for Tool RetrievalBetter Later Than Sooner: Neuro-Symbolic Knowledge Graph Construction via Ontology-grounded Post-extraction CorrectionLFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMsBeyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph ModelingAgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and SecurityBenchTrace: A Benchmark for Testing Reflection Ability and Controlled Evolution in LLM AgentsSkillsInjector: Dynamic Skill Context Construction for LLM AgentsOpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution TrajectoriesDeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement LearningWhen Does Persona Prompting Actually Help? A Retrieval and Metric Analysis of Expert Role Injection in LLMsIndexing the Unreadable: LLM-Native Recursive Construction and Search of Service TaxonomiesArchitecture-Sensitive Supervised Fine-Tuning for Screen-Conditioned Action Prediction: A PiSAR BenchmarkPRO-CUA: Process-Reward Optimization for Computer Use AgentsReliable Reasoning with Large Language Models via Preference-Based Maximum SatisfiabilityGTA: Generating Long-Horizon Tasks for Web Agents at ScaleAdopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the WildBeyond Attack Success Rate: Temporal Logit Observability for LLM Safety FailuresNotation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI SystemsThe Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data PlaneBeyond Consensus: Trace-Level Synthesis in Mixture of AgentsNICE: A Theory-Grounded Diagnostic Benchmark for Social Intelligence of LLMsGRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM AgentsThe Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIFHallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic CachingMoment-KV: Momentum-Based Decode-Time KV Cache Compression for Long GenerationRedundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent TrajectoriesGPS-Enhanced Tourist Mobility Modeling with Seasonal Spatial Priors and LLM-Based Activity Chain GenerationGoverning Technical Debt in Agentic AI Systems