AI Frontier Feed

AWS ML Blog models score 67.5 2026-03-23

Overcoming LLM hallucinations in regulated industries: Artificial Genius’s deterministic models on Amazon Nova

In this post, we’re excited to showcase how AWS ISV Partner Artificial Genius is using Amazon SageMaker AI and Amazon Nova to deliver a solution that is probabilistic on input but deterministic on output, helping to ena…

arXiv cs.CL models score 118.9 2026-03-23

From Tokens To Agents: A Researcher's Guide To Understanding Large Language Models

arXiv:2603.19269v1 Announce Type: new Abstract: Researchers face a critical choice: how to use -- or not use -- large language models in their work. Using them well requires understanding the mechanisms that shape what…

arXiv cs.CL models score 110.9 2026-03-23

Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach

arXiv:2603.19668v1 Announce Type: new Abstract: This paper presents a novel prompt engineering framework for trait specific Automatic Essay Scoring (AES) in Arabic, leveraging large language models (LLMs) under zero-sho…

arXiv cs.AI agents score 108.9 2026-03-23

The {\alpha}-Law of Observable Belief Revision in Large Language Model Inference

arXiv:2603.19262v1 Announce Type: cross Abstract: Large language models (LLMs) that iteratively revise their outputs through mechanisms such as chain-of-thought reasoning, self-reflection, or multi-agent debate lack pri…

arXiv cs.CL multimodal score 105.9 2026-03-23

L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention

arXiv:2511.17910v2 Announce Type: replace Abstract: Recently, Chain-of-Thought (CoT) reasoning has significantly enhanced the capabilities of large language models (LLMs), but Vision-Language Models (VLMs) still struggl…

arXiv cs.CL models score 99.9 2026-03-23

FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment

arXiv:2603.19741v1 Announce Type: cross Abstract: Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID prefer…

arXiv cs.CL models score 97.9 2026-03-23

DLLM Agent: See Farther, Run Faster

arXiv:2602.07451v3 Announce Type: replace Abstract: Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties, yet their imp…

arXiv cs.CL models score 96.9 2026-03-23

Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens

arXiv:2406.10985v2 Announce Type: replace Abstract: Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based…

arXiv cs.LG models score 94.9 2026-03-23

FastMMoE: Accelerating Multimodal Large Language Models through Dynamic Expert Activation and Routing-Aware Token Pruning

arXiv:2511.17885v2 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) have achieved impressive performance, but high-resolution visual inputs result in long sequences of visual tokens and su…

arXiv cs.CL models score 94.9 2026-03-23

From Comprehension to Reasoning: A Hierarchical Benchmark for Automated Financial Research Reporting

arXiv:2603.19254v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate financial research reports, shifting from auxiliary analytic tools to primary content producers. Yet recent…

arXiv cs.CL models score 94.9 2026-03-23

Test-Time Alignment for Large Language Models via Textual Model Predictive Control

arXiv:2502.20795v4 Announce Type: replace Abstract: Aligning Large Language Models (LLMs) with human preferences through finetuning is resource-intensive, motivating lightweight alternatives at test time. We address tes…

arXiv cs.AI models score 91.9 2026-03-23

PowerLens: Taming LLM Agents for Safe and Personalized Mobile Power Management

arXiv:2603.19584v1 Announce Type: new Abstract: Battery life remains a critical challenge for mobile devices, yet existing power management mechanisms rely on static rules or coarse-grained heuristics that ignore user a…

arXiv cs.LG models score 89.9 2026-03-23

The Autonomy Tax: Defense Training Breaks LLM Agents

arXiv:2603.19423v1 Announce Type: cross Abstract: Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tas…

arXiv cs.CL agents score 89.9 2026-03-23

ReViSQL: Achieving Human-Level Text-to-SQL

arXiv:2603.20004v1 Announce Type: cross Abstract: Translating natural language to SQL (Text-to-SQL) is a critical challenge in both database research and data analytics applications. Recent efforts have focused on enhan…

arXiv cs.CL models score 89.9 2026-03-23

Identifying and Mitigating Bottlenecks in Role-Playing Agents: A Systematic Study of Disentangling Character Profile Axes

arXiv:2601.04716v2 Announce Type: replace Abstract: Advancements in Large Language Model (LLM) Role-Playing Agents have focused on various construction methodologies, yet it remains unclear which aspects of character pr…

arXiv cs.LG models score 88.9 2026-03-23

MAPLE: Metadata Augmented Private Language Evolution

arXiv:2603.19258v1 Announce Type: cross Abstract: While differentially private (DP) fine-tuning of large language models (LLMs) is a powerful tool, it is often computationally prohibitive or infeasible when state-of-the…

arXiv cs.CL models score 88.9 2026-03-23

Enhancing Legal LLMs through Metadata-Enriched RAG Pipelines and Direct Preference Optimization

arXiv:2603.19251v1 Announce Type: new Abstract: Large Language Models (LLMs) perform well in short contexts but degrade on long legal documents, often producing hallucinations such as incorrect clauses or precedents. In…

arXiv cs.LG models score 86.9 2026-03-23

Speculating Experts Accelerates Inference for Mixture-of-Experts

arXiv:2603.19289v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models have gained popularity as a means of scaling the capacity of large language models (LLMs) while maintaining sparse activations and reduced…

arXiv cs.LG models score 86.9 2026-03-23

Dual Path Attribution: Efficient Attribution for SwiGLU-Transformers through Layer-Wise Target Propagation

arXiv:2603.19742v1 Announce Type: new Abstract: Understanding the internal mechanisms of transformer-based large language models (LLMs) is crucial for their reliable deployment and effective operation. While recent effo…

arXiv cs.LG models score 86.9 2026-03-23

ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

arXiv:2511.09833v2 Announce Type: replace Abstract: Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores us…

arXiv cs.LG models score 86.9 2026-03-23

Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models

arXiv:2601.18734v3 Announce Type: replace Abstract: Knowledge distillation improves large language model (LLM) reasoning by compressing the knowledge of a teacher LLM to train smaller LLMs. On-policy distillation advanc…

arXiv cs.AI agents score 86.9 2026-03-23

ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models

arXiv:2603.19515v1 Announce Type: new Abstract: Large language models (LLMs) with advanced cognitive capabilities are emerging as agents for various reasoning and planning tasks. Traditional evaluations often focus on s…

arXiv cs.AI models score 86.9 2026-03-23

Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization

arXiv:2603.19268v1 Announce Type: cross Abstract: Large language models (LLMs) in the direction of task adaptation and capability enhancement for professional fields demonstrate significant application potential. Nevert…

arXiv cs.AI models score 86.9 2026-03-23

Framing Effects in Independent-Agent Large Language Models: A Cross-Family Behavioral Analysis

arXiv:2603.19282v1 Announce Type: cross Abstract: In many real-world applications, large language models (LLMs) operate as independent agents without interaction, thereby limiting coordination. In this setting, we exami…

arXiv cs.AI models score 86.9 2026-03-23

PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents

arXiv:2603.18377v2 Announce Type: replace-cross Abstract: Cloud-hosted large language models (LLMs) have become the de facto planners in agentic systems, coordinating tools and guiding execution over local environments.…

arXiv cs.CL models score 86.9 2026-03-23

DataProphet: Demystifying Supervision Data Generalization in Multimodal LLMs

arXiv:2603.19688v1 Announce Type: new Abstract: Conventional wisdom for selecting supervision data for multimodal large language models (MLLMs) is to prioritize datasets that appear similar to the target benchmark, such…

arXiv cs.CL models score 86.9 2026-03-23

Rethinking Ground Truth: A Case Study on Human Label Variation in MLLM Benchmarking

arXiv:2603.19744v1 Announce Type: new Abstract: Human Label Variation (HLV), i.e. systematic differences among annotators' judgments, remains underexplored in benchmarks despite rapid progress in large language model (L…

arXiv cs.CL models score 86.9 2026-03-23

SAGE: Sustainable Agent-Guided Expert-tuning for Culturally Attuned Translation in Low-Resource Southeast Asia

arXiv:2603.19931v1 Announce Type: new Abstract: The vision of an inclusive World Wide Web is impeded by a severe linguistic divide, particularly for communities in low-resource regions of Southeast Asia. While large lan…

arXiv cs.CL agents score 86.9 2026-03-23

RouterKGQA: Specialized--General Model Routing for Constraint-Aware Knowledge Graph Question Answering

arXiv:2603.20017v1 Announce Type: new Abstract: Knowledge graph question answering (KGQA) is a promising approach for mitigating LLM hallucination by grounding reasoning in structured and verifiable knowledge graphs. Ex…

arXiv cs.AI models score 84.9 2026-03-23

Goedel-Code-Prover: Hierarchical Proof Search for Open State-of-the-Art Code Verification

arXiv:2603.19329v1 Announce Type: cross Abstract: Large language models (LLMs) can generate plausible code but offer limited guarantees of correctness. Formally verifying that implementations satisfy specifications requ…

Latest