AWS ML Blog
models
score 67.5
2026-03-23
In this post, we’re excited to showcase how AWS ISV Partner Artificial Genius is using Amazon SageMaker AI and Amazon Nova to deliver a solution that is probabilistic on input but deterministic on output, helping to ena…
arXiv cs.CL
models
score 118.9
2026-03-23
arXiv:2603.19269v1 Announce Type: new Abstract: Researchers face a critical choice: how to use -- or not use -- large language models in their work. Using them well requires understanding the mechanisms that shape what…
arXiv cs.CL
models
score 110.9
2026-03-23
arXiv:2603.19668v1 Announce Type: new Abstract: This paper presents a novel prompt engineering framework for trait specific Automatic Essay Scoring (AES) in Arabic, leveraging large language models (LLMs) under zero-sho…
arXiv cs.AI
agents
score 108.9
2026-03-23
arXiv:2603.19262v1 Announce Type: cross Abstract: Large language models (LLMs) that iteratively revise their outputs through mechanisms such as chain-of-thought reasoning, self-reflection, or multi-agent debate lack pri…
arXiv cs.CL
multimodal
score 105.9
2026-03-23
arXiv:2511.17910v2 Announce Type: replace Abstract: Recently, Chain-of-Thought (CoT) reasoning has significantly enhanced the capabilities of large language models (LLMs), but Vision-Language Models (VLMs) still struggl…
arXiv cs.CL
models
score 99.9
2026-03-23
arXiv:2603.19741v1 Announce Type: cross Abstract: Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID prefer…
arXiv cs.CL
models
score 97.9
2026-03-23
arXiv:2602.07451v3 Announce Type: replace Abstract: Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties, yet their imp…
arXiv cs.CL
models
score 96.9
2026-03-23
arXiv:2406.10985v2 Announce Type: replace Abstract: Large language models (LLMs) have shown promising efficacy across various tasks, becoming powerful tools in numerous aspects of human life. However, Transformer-based…
arXiv cs.LG
models
score 94.9
2026-03-23
arXiv:2511.17885v2 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) have achieved impressive performance, but high-resolution visual inputs result in long sequences of visual tokens and su…
arXiv cs.CL
models
score 94.9
2026-03-23
arXiv:2603.19254v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate financial research reports, shifting from auxiliary analytic tools to primary content producers. Yet recent…
arXiv cs.CL
models
score 94.9
2026-03-23
arXiv:2502.20795v4 Announce Type: replace Abstract: Aligning Large Language Models (LLMs) with human preferences through finetuning is resource-intensive, motivating lightweight alternatives at test time. We address tes…
arXiv cs.AI
models
score 91.9
2026-03-23
arXiv:2603.19584v1 Announce Type: new Abstract: Battery life remains a critical challenge for mobile devices, yet existing power management mechanisms rely on static rules or coarse-grained heuristics that ignore user a…
arXiv cs.LG
models
score 89.9
2026-03-23
arXiv:2603.19423v1 Announce Type: cross Abstract: Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tas…
arXiv cs.CL
agents
score 89.9
2026-03-23
arXiv:2603.20004v1 Announce Type: cross Abstract: Translating natural language to SQL (Text-to-SQL) is a critical challenge in both database research and data analytics applications. Recent efforts have focused on enhan…
arXiv cs.CL
models
score 89.9
2026-03-23
arXiv:2601.04716v2 Announce Type: replace Abstract: Advancements in Large Language Model (LLM) Role-Playing Agents have focused on various construction methodologies, yet it remains unclear which aspects of character pr…
arXiv cs.LG
models
score 88.9
2026-03-23
arXiv:2603.19258v1 Announce Type: cross Abstract: While differentially private (DP) fine-tuning of large language models (LLMs) is a powerful tool, it is often computationally prohibitive or infeasible when state-of-the…
arXiv cs.CL
models
score 88.9
2026-03-23
arXiv:2603.19251v1 Announce Type: new Abstract: Large Language Models (LLMs) perform well in short contexts but degrade on long legal documents, often producing hallucinations such as incorrect clauses or precedents. In…
arXiv cs.LG
models
score 86.9
2026-03-23
arXiv:2603.19289v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models have gained popularity as a means of scaling the capacity of large language models (LLMs) while maintaining sparse activations and reduced…
arXiv cs.LG
models
score 86.9
2026-03-23
arXiv:2603.19742v1 Announce Type: new Abstract: Understanding the internal mechanisms of transformer-based large language models (LLMs) is crucial for their reliable deployment and effective operation. While recent effo…
arXiv cs.LG
models
score 86.9
2026-03-23
arXiv:2511.09833v2 Announce Type: replace Abstract: Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores us…
arXiv cs.LG
models
score 86.9
2026-03-23
arXiv:2601.18734v3 Announce Type: replace Abstract: Knowledge distillation improves large language model (LLM) reasoning by compressing the knowledge of a teacher LLM to train smaller LLMs. On-policy distillation advanc…
arXiv cs.AI
agents
score 86.9
2026-03-23
arXiv:2603.19515v1 Announce Type: new Abstract: Large language models (LLMs) with advanced cognitive capabilities are emerging as agents for various reasoning and planning tasks. Traditional evaluations often focus on s…
arXiv cs.AI
models
score 86.9
2026-03-23
arXiv:2603.19268v1 Announce Type: cross Abstract: Large language models (LLMs) in the direction of task adaptation and capability enhancement for professional fields demonstrate significant application potential. Nevert…
arXiv cs.AI
models
score 86.9
2026-03-23
arXiv:2603.19282v1 Announce Type: cross Abstract: In many real-world applications, large language models (LLMs) operate as independent agents without interaction, thereby limiting coordination. In this setting, we exami…
arXiv cs.AI
models
score 86.9
2026-03-23
arXiv:2603.18377v2 Announce Type: replace-cross Abstract: Cloud-hosted large language models (LLMs) have become the de facto planners in agentic systems, coordinating tools and guiding execution over local environments.…
arXiv cs.CL
models
score 86.9
2026-03-23
arXiv:2603.19688v1 Announce Type: new Abstract: Conventional wisdom for selecting supervision data for multimodal large language models (MLLMs) is to prioritize datasets that appear similar to the target benchmark, such…
arXiv cs.CL
models
score 86.9
2026-03-23
arXiv:2603.19744v1 Announce Type: new Abstract: Human Label Variation (HLV), i.e. systematic differences among annotators' judgments, remains underexplored in benchmarks despite rapid progress in large language model (L…
arXiv cs.CL
models
score 86.9
2026-03-23
arXiv:2603.19931v1 Announce Type: new Abstract: The vision of an inclusive World Wide Web is impeded by a severe linguistic divide, particularly for communities in low-resource regions of Southeast Asia. While large lan…
arXiv cs.CL
agents
score 86.9
2026-03-23
arXiv:2603.20017v1 Announce Type: new Abstract: Knowledge graph question answering (KGQA) is a promising approach for mitigating LLM hallucination by grounding reasoning in structured and verifiable knowledge graphs. Ex…
arXiv cs.AI
models
score 84.9
2026-03-23
arXiv:2603.19329v1 Announce Type: cross Abstract: Large language models (LLMs) can generate plausible code but offer limited guarantees of correctness. Formally verifying that implementations satisfy specifications requ…