Se connecter

Bibliographies thématiques / Large Language Models (LLM) / Articles de revues

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : Large Language Models (LLM).

Articles de revues sur le sujet « Large Language Models (LLM) »

Auteur : Grafiati

Publié le 17 mai 2025

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 50 meilleurs articles de revues pour votre recherche sur le sujet « Large Language Models (LLM) ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les articles de revues sur diverses disciplines et organisez correctement votre bibliographie.

1

Fang, Meng, Shilong Deng, Yudi Zhang, Zijing Shi, Ling Chen, Mykola Pechenizkiy et Jun Wang. « Large Language Models Are Neurosymbolic Reasoners ». Proceedings of the AAAI Conference on Artificial Intelligence 38, n^o 16 (24 mars 2024) : 17985–93. http://dx.doi.org/10.1609/aaai.v38i16.29754.

Texte intégral

Résumé :

A wide range of real-world applications is characterized by their symbolic nature, necessitating a strong capability for symbolic reasoning. This paper investigates the potential application of Large Language Models (LLMs) as symbolic reasoners. We focus on text-based games, significant benchmarks for agents with natural language capabilities, particularly in symbolic tasks like math, map reading, sorting, and applying common sense in text-based worlds. To facilitate these agents, we propose an LLM agent designed to tackle symbolic challenges and achieve in-game objectives. We begin by initializing the LLM agent and informing it of its role. The agent then receives observations and a set of valid actions from the text-based games, along with a specific symbolic module. With these inputs, the LLM agent chooses an action and interacts with the game environments. Our experimental results demonstrate that our method significantly enhances the capability of LLMs as automated agents for symbolic reasoning, and our LLM agent is effective in text-based games involving symbolic tasks, achieving an average performance of 88% across all tasks.

Styles APA, Harvard, Vancouver, ISO, etc.

2

Wang, Runze, Mingqi Yang et Yanming Shen. « Bridging Molecular Graphs and Large Language Models ». Proceedings of the AAAI Conference on Artificial Intelligence 39, n^o 20 (11 avril 2025) : 21234–42. https://doi.org/10.1609/aaai.v39i20.35422.

Texte intégral

Résumé :

While Large Language Models (LLMs) have shown exceptional generalization capabilities, their ability to process graph data, such as molecular structures, remains limited. To bridge this gap, this paper proposes Graph2Token, an efficient solution that aligns graph tokens to LLM tokens. The key idea is to represent a graph token with the LLM token vocabulary, without fine-tuning the LLM backbone. To achieve this goal, we first construct a molecule-text paired dataset from multi-sources, including CHEBI and HMDB, to train a graph structure encoder, which reduces the distance between graphs and texts representations in the feature space. Then, we propose a novel alignment strategy that associates a graph token with LLM tokens. To further unleash the potential of LLMs, we collect molecular IUPAC name identifiers, which are incorporated into the LLM prompts. By aligning molecular graphs as special tokens, we can activate LLMs' generalization ability to molecular few-shot learning. Extensive experiments on molecular classification and regression tasks demonstrate the effectiveness of our proposed Graph2Token.

Styles APA, Harvard, Vancouver, ISO, etc.

3

Mochihashi, Daichi. « Large Language Models（LLM）and Robotics ». Journal of the Robotics Society of Japan 40, n^o 10 (2022) : 863–66. http://dx.doi.org/10.7210/jrsj.40.863.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

4

Devyatkin, Dmitry A., Vladimir A. Salimovsky, Natalia V. Chudova, Anastasia A. Ryzhova et Oleg G. Grigoriev. « Large language models and speech genre systematicity ». International Journal “Speech Genres” 20, n^o 1 (45) (21 février 2025) : 6–23. https://doi.org/10.18500/2311-0740-2025-20-1-45-6-23.

Texte intégral

Résumé :

The paper examines a large language model (LLM) to recognize speech genres. Although artificial neural networks are effectively utilized in many important fields, they, however, have a serious drawback. The mechanism of their functioning is hidden from researchers; therefore, the results of their use do not get explanation. The purpose of the study is to reveal the basic mechanisms of functioning of the linguistic model LLM (Transformer) and thereby ensure the interpretability of the data it provides. The research is based on two genres of academic text: “Description of a new scientific phenomenon” and “Explication of a scientific concept.” We verified a hypothesis according to which the LLM feature set is based on the speech systematicity of the recognized genres. It is also shown that since genre-speech systematicity is determined by extralinguistic factors, primarily the characteristics of human consciousness, its manifestations, reflected in the hidden state of the LLM, can be used to model cognitive processes embodied in speech. We also analyze existing approaches to the interpretation of LLMs and describe the applied method to do it. The paper provides the following linguistic interpretation of LLM training and fine-tuning: preliminary training on large text corpora allows a model to display language resources (a system of linguistic units and general principles of their use) relatively completely, while fine-tuning on samples of a certain genre-speech organization restructures the linguistic systematicity into speech systematicity. During the experiments we decoded the hidden state of the LLM and accurately reproduced the composition and frequency of lexis from the training dataset. The classification score for each of the considered genres by the LLM is F1 0.99, we believe this is because of their speech consistency.

Styles APA, Harvard, Vancouver, ISO, etc.

5

Yang, Jidong. « Large language models privacy and security ». Applied and Computational Engineering 76, n^o 1 (16 juillet 2024) : 177–88. http://dx.doi.org/10.54254/2755-2721/76/20240584.

Texte intégral

Résumé :

The advancement of large language models (LLMs) has yielded significant advancements across various domains. Nevertheless, this progress has also raised crucial concerns regarding privacy and security. The paper does a comprehensive literature study to thoroughly examine the fundamental principles of LLM. It also provides a detailed examination of the characteristics and application fields of various LLMs, with a particular focus on Transformer. Furthermore, this study places emphasis on the examination of privacy concerns that may emerge in the context of LLM's handling of personal and sensitive data. It also explores the potential hazards associated with information leakage and misuse, as well as the existing privacy safeguards and the obstacles encountered in their implementation. Overall, LLM has made significant advancements in technology. However, it is imperative to acknowledge the importance of doing research on safeguarding privacy and enhancing security. These aspects are vital for guaranteeing the sustained development and public confidence in LLM technology.

Styles APA, Harvard, Vancouver, ISO, etc.

6

Shanahan, Murray. « Talking about Large Language Models ». Communications of the ACM 67, n^o 2 (25 janvier 2024) : 68–79. http://dx.doi.org/10.1145/3624724.

Texte intégral

Résumé :

Interacting with a contemporary LLM-based conversational agent can create an illusion of being in the presence of a thinking creature. Yet, in their very nature, such systems are fundamentally not like us.

Styles APA, Harvard, Vancouver, ISO, etc.

7

Liu, Yuxin. « Attention is All Large Language Model Need ». ITM Web of Conferences 73 (2025) : 02025. https://doi.org/10.1051/itmconf/20257302025.

Texte intégral

Résumé :

With the advent of the Transformer, the attention mechanism has been applied to Large Language Model (LLM), evolving from initial single- modal large models to today's multi-modal large models. This has greatly propelled the development of Artificial Intelligence (AI) and ushered humans into the era of large models. Single-modal large models can be broadly categorized into three types based on their application domains: Text LLM for Natural Language Processing (NLP), Image LLM for Computer Vision (CV), and Audio LLM for speech interaction. Multi-modal large models, on the other hand, can leverage multiple data sources simultaneously to optimize the model. This article also introduces the training process of the GPT series. Large models have also had a significant impact on industry and society, bringing with them a number of unresolved problems. The purpose of this article is to assist researchers in comprehending the various forms of LLM, as well as its development, pre- training architecture, difficulties, and future objectives.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Ma, Ziyang, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu et al. « Speech Recognition Meets Large Language Model : Benchmarking, Models, and Exploration ». Proceedings of the AAAI Conference on Artificial Intelligence 39, n^o 23 (11 avril 2025) : 24840–48. https://doi.org/10.1609/aaai.v39i23.34666.

Texte intégral

Résumé :

In this paper, we focus on prompting one of the most important tasks in the field of speech processing, i.e., automatic speech recognition (ASR), with speech foundation encoders and large language models (LLM). Despite the growing body of research in this area, we find that many crucial design decisions in LLM-based ASR systems are often inadequately justified. This lack of clarity impedes the field's progress, making it challenging to pinpoint which design choices truly improve model performance. To address these challenges, we conduct a comprehensive series of experiments that explore various aspects, leading to the optimal LLM-based ASR system. We found that delicate designs are not necessary, while a clean setup with little task-specific design is competent. The models achieve strong performance on the Librispeech and Gigaspeech datasets, compared to both LLM-based models and non-LLM-based models. Finally, we explore the capability emergence of LLM-based ASR in the process of modal alignment. We hope that our study can facilitate the research on extending LLM with cross-modality capacity and shed light on the LLM-based ASR community.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Zelenkov, Yuri A. « Knowledge management in organization and the large language models ». Russian Management Journal 22, n^o 3 (2024) : 573–601. https://doi.org/10.21638/spbu18.2024.309.

Texte intégral

Résumé :

Purpose: to summarize, classify and analyze current academic papers on the use of large language models (LLM) in knowledge management in organization. Methodology: systematic literature review was conducted. It was based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. 75 papers were selected for the analysis, including academic papers and reports of consulting companies published since 2020. Findings: four main research areas have been identified: (1) LLM implementation issues; (2) the impact of LLM on knowledge management efficiency; the application of LLM in the processes of (3) knowledge usage and (4) knowledge creation. Within each area, the key papers and open questions have been reviewed. Originality and contribution: the paper presents a systematic review of current publications, proposes a classification of research topics, and identifies potential directions for new research. The study also considers limitations hindering the implementation of LLM in the organization's knowledge management practice.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Martínez, Gonzalo, Javier Conde, Elena Merino-Gómez, Beatriz Bermúdez-Margaretto, José Alberto Hernández, Pedro Reviriego et Marc Brysbaert. « Establishing vocabulary tests as a benchmark for evaluating large language models ». PLOS ONE 19, n^o 12 (12 décembre 2024) : e0308259. https://doi.org/10.1371/journal.pone.0308259.

Texte intégral

Résumé :

Vocabulary tests, once a cornerstone of language modeling evaluation, have been largely overlooked in the current landscape of Large Language Models (LLMs) like Llama 2, Mistral, and GPT. While most LLM evaluation benchmarks focus on specific tasks or domain-specific knowledge, they often neglect the fundamental linguistic aspects of language understanding. In this paper, we advocate for the revival of vocabulary tests as a valuable tool for assessing LLM performance. We evaluate seven LLMs using two vocabulary test formats across two languages and uncover surprising gaps in their lexical knowledge. These findings shed light on the intricacies of LLM word representations, their learning mechanisms, and performance variations across models and languages. Moreover, the ability to automatically generate and perform vocabulary tests offers new opportunities to expand the approach and provide a more complete picture of LLMs’ language skills.

Styles APA, Harvard, Vancouver, ISO, etc.

11

Blinn, Andrew, Xiang Li, June Hyung Kim et Cyrus Omar. « Statically Contextualizing Large Language Models with Typed Holes ». Proceedings of the ACM on Programming Languages 8, OOPSLA2 (8 octobre 2024) : 468–98. http://dx.doi.org/10.1145/3689728.

Texte intégral

Résumé :

Large language models (LLMs) have reshaped the landscape of program synthesis. However, contemporary LLM-based code completion systems often hallucinate broken code because they lack appropriate code context, particularly when working with definitions that are neither in the training data nor near the cursor. This paper demonstrates that tighter integration with the type and binding structure of the programming language in use, as exposed by its language server, can help address this contextualization problem in a token-efficient manner. In short, we contend that AIs need IDEs, too! In particular, we integrate LLM code generation into the Hazel live program sketching environment. The Hazel Language Server is able to identify the type and typing context of the hole that the programmer is filling, with Hazel's total syntax and type error correction ensuring that a meaningful program sketch is available whenever the developer requests a completion. This allows the system to prompt the LLM with codebase-wide contextual information that is not lexically local to the cursor, nor necessarily in the same file, but that is likely to be semantically local to the developer's goal. Completions synthesized by the LLM are then iteratively refined via further dialog with the language server, which provides error localization and error messages. To evaluate these techniques, we introduce MVUBench, a dataset of model-view-update (MVU) web applications with accompanying unit tests that have been written from scratch to avoid data contamination, and that can easily be ported to new languages because they do not have large external library dependencies. These applications serve as challenge problems due to their extensive reliance on application-specific data structures. Through an ablation study, we examine the impact of contextualization with type definitions, function headers, and errors messages, individually and in combination. We find that contextualization with type definitions is particularly impactful. After introducing our ideas in the context of Hazel, a low-resource language, we duplicate our techniques and port MVUBench to TypeScript in order to validate the applicability of these methods to higher-resource mainstream languages. Finally, we outline ChatLSP, a conservative extension to the Language Server Protocol (LSP) that language servers can implement to expose capabilities that AI code completion systems of various designs can use to incorporate static context when generating prompts for an LLM.

Styles APA, Harvard, Vancouver, ISO, etc.

12

Shi, Zhouxing, Yihan Wang, Fan Yin, Xiangning Chen, Kai-Wei Chang et Cho-Jui Hsieh. « Red Teaming Language Model Detectors with Language Models ». Transactions of the Association for Computational Linguistics 12 (2024) : 174–89. http://dx.doi.org/10.1162/tacl_a_00639.

Texte intégral

Résumé :

Abstract The prevalence and strong capability of large language models (LLMs) present significant safety and ethical risks if exploited by malicious users. To prevent the potentially deceptive usage of LLMs, recent work has proposed algorithms to detect LLM-generated text and protect LLMs. In this paper, we investigate the robustness and reliability of these LLM detectors under adversarial attacks. We study two types of attack strategies: 1) replacing certain words in an LLM’s output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation. In both strategies, we leverage an auxiliary LLM to generate the word replacements or the instructional prompt. Different from previous works, we consider a challenging setting where the auxiliary LLM can also be protected by a detector. Experiments reveal that our attacks effectively compromise the performance of all detectors in the study with plausible generations, underscoring the urgent need to improve the robustness of LLM-generated text detection systems. Code is available at https://github.com/shizhouxing/LLM-Detector-Robustness.

Styles APA, Harvard, Vancouver, ISO, etc.

13

Smetana, Mason, Lucio Salles de Salles, Igor Sukharev et Lev Khazanovich. « Highway Construction Safety Analysis Using Large Language Models ». Applied Sciences 14, n^o 4 (6 février 2024) : 1352. http://dx.doi.org/10.3390/app14041352.

Texte intégral

Résumé :

The highway construction industry carries substantial safety risks for workers, necessitating thorough accident analyses to implement effective preventive measures. Current research lacks comprehensive investigations into safety incidents, relying heavily on conventional statistical methods and overlooking valuable textual information in publicly available databases. This study leverages a state-of-the-art large language model (LLM), specifically OpenAI’s GPT-3.5 model. The primary focus is to enhance text-based incident analysis that is sourced from OSHA’s Severe Injury Reports (SIR) database. By incorporating novel natural language processing (NLP) techniques, dimensionality reduction, clustering algorithms, and LLM prompting of incident narratives, the study aims to develop an approach to the analysis of major accident causes in highway construction. The resulting cluster analysis, coupled with LLM summarization and cause identification, reveals the major accident types, such as heat-related and struck-by injuries, as well as commonalities between incidents. This research showcases the potential of artificial intelligence (AI) and LLM technology in data-driven analysis. By efficiently processing textual data and providing insightful analysis, the study fosters practical implications for safety professionals and the development of more effective accident prevention and intervention strategies within the industry.

Styles APA, Harvard, Vancouver, ISO, etc.

14

Zheng, Xiang, Longxiang Wang, Yi Liu, Xingjun Ma, Chao Shen et Cong Wang. « CALM : Curiosity-Driven Auditing for Large Language Models ». Proceedings of the AAAI Conference on Artificial Intelligence 39, n^o 26 (11 avril 2025) : 27757–64. https://doi.org/10.1609/aaai.v39i26.34991.

Texte intégral

Résumé :

Auditing Large Language Models (LLMs) is a crucial and challenging task. In this study, we focus on auditing black-box LLMs without access to their parameters, only to the provided service. We treat this type of auditing as a black-box optimization problem where the goal is to automatically uncover input-output pairs of the target LLMs that exhibit illegal, immoral, or unsafe behaviors. For instance, we may seek a non-toxic input that the target LLM responds to with a toxic output or an input that induces the hallucinative response from the target LLM containing politically sensitive individuals. This black-box optimization is challenging due to the scarcity of feasible points, the discrete nature of the prompt space, and the large search space. To address these challenges, we propose Curiosity-Driven Auditing for Large Language Models (CALM), which uses intrinsically motivated reinforcement learning to finetune an LLM as the auditor agent to uncover potential harmful and biased input-output pairs of the target LLM. CALM successfully identifies derogatory completions involving celebrities and uncovers inputs that elicit specific names under the black-box setting. This work offers a promising direction for auditing black-box LLMs.

Styles APA, Harvard, Vancouver, ISO, etc.

15

Ashikhmin, E. G., V. V. Levchenko et G. I. Seletkova. « Experience in applying large language models to analyse quantitative sociological data ». Vestnik Universiteta, n^o 11 (2 janvier 2025) : 205–15. https://doi.org/10.26425/1816-4277-2024-11-205-215.

Texte intégral

Résumé :

The article discusses the possibilities and limitations of using large language models (hereinafter referred to as LLM) to analyse quantitative data in sociological research. Also, attention is paid to the actor-network theory, according to which neural networks act as active participants of social interaction. It is noted that the usage of the LLM can be considered as an innovative process in the field of applied sociological research. The article demonstrates examples of the LLM application for quantitative methods of analysis on the basis of a survey dataset taken from open sources. Practical examples show how the LLM can be used to construct frequency and summary tables, calculate averages and conduct correlation analysis. The application of the LLM is seen as an innovative process that promotes the development of new methodological approaches. The authors analyse examples of the LLM usage in sociology and emphasise the need to build an innovative culture and develop methodological approaches to verify and correct the results. In addition, the authors highlight the importance of interpreting the LLM results in the context of sociological theory and practice. The article also discusses the role of the LLM in empowering the sociological research, especially in the areas of analysing big data and discovering hidden patterns. Finally, the authors suggest paths for future research in the application of the LLM in sociology, including the development of new methods and tools for integrating the LLM into the sociological research.

Styles APA, Harvard, Vancouver, ISO, etc.

16

Li, Qinbin, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin et al. « LLM-PBE : Assessing Data Privacy in Large Language Models ». Proceedings of the VLDB Endowment 17, n^o 11 (juillet 2024) : 3201–14. http://dx.doi.org/10.14778/3681954.3681994.

Texte intégral

Résumé :

Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an in-depth exploration of data privacy concerns, shedding light on influential factors such as model size, data characteristics, and evolving temporal dimensions. This study not only enriches the understanding of privacy issues in LLMs but also serves as a vital resource for future research in the field. Aimed at enhancing the breadth of knowledge in this area, the findings, resources, and our full technical report are made available at https://llm-pbe.github.io/, providing an open platform for academic and practical advancements in LLM privacy assessment.

Styles APA, Harvard, Vancouver, ISO, etc.

17

Schubert, Marc Cicero, Wolfgang Wick et Varun Venkataramani. « Performance of Large Language Models on a Neurology Board–Style Examination ». JAMA Network Open 6, n^o 12 (7 décembre 2023) : e2346721. http://dx.doi.org/10.1001/jamanetworkopen.2023.46721.

Texte intégral

Résumé :

ImportanceRecent advancements in large language models (LLMs) have shown potential in a wide array of applications, including health care. While LLMs showed heterogeneous results across specialized medical board examinations, the performance of these models in neurology board examinations remains unexplored.ObjectiveTo assess the performance of LLMs on neurology board–style examinations.Design, Setting, and ParticipantsThis cross-sectional study was conducted between May 17 and May 31, 2023. The evaluation utilized a question bank approved by the American Board of Psychiatry and Neurology and was validated with a small question cohort by the European Board for Neurology. All questions were categorized into lower-order (recall, understanding) and higher-order (apply, analyze, synthesize) questions based on the Bloom taxonomy for learning and assessment. Performance by LLM ChatGPT versions 3.5 (LLM 1) and 4 (LLM 2) was assessed in relation to overall scores, question type, and topics, along with the confidence level and reproducibility of answers.Main Outcomes and MeasuresOverall percentage scores of 2 LLMs.ResultsLLM 2 significantly outperformed LLM 1 by correctly answering 1662 of 1956 questions (85.0%) vs 1306 questions (66.8%) for LLM 1. Notably, LLM 2’s performance was greater than the mean human score of 73.8%, effectively achieving near-passing and passing grades in the neurology board examination. LLM 2 outperformed human users in behavioral, cognitive, and psychological–related questions and demonstrated superior performance to LLM 1 in 6 categories. Both LLMs performed better on lower-order than higher-order questions, with LLM 2 excelling in both lower-order and higher-order questions. Both models consistently used confident language, even when providing incorrect answers. Reproducible answers of both LLMs were associated with a higher percentage of correct answers than inconsistent answers.Conclusions and RelevanceDespite the absence of neurology-specific training, LLM 2 demonstrated commendable performance, whereas LLM 1 performed slightly below the human average. While higher-order cognitive tasks were more challenging for both models, LLM 2’s results were equivalent to passing grades in specialized neurology examinations. These findings suggest that LLMs could have significant applications in clinical neurology and health care with further refinements.

Styles APA, Harvard, Vancouver, ISO, etc.

18

Ishay, Adam, et Joohyung Lee. « LLM+AL : Bridging Large Language Models and Action Languages for Complex Reasoning About Actions ». Proceedings of the AAAI Conference on Artificial Intelligence 39, n^o 23 (11 avril 2025) : 24212–20. https://doi.org/10.1609/aaai.v39i23.34597.

Texte intégral

Résumé :

Large Language Models (LLMs) have made significant strides in various intelligent tasks but still struggle with complex action reasoning tasks that require systematic search. To address this limitation, we propose a method that bridges the natural language understanding capabilities of LLMs with the symbolic reasoning strengths of action languages. Our approach, termed LLM+AL, leverages the LLM's strengths in semantic parsing and commonsense knowledge generation alongside the action language's proficiency in automated reasoning based on encoded knowledge. We compare LLM+AL against state-of-the-art LLMs, including ChatGPT-4, Claude 3 Opus, Gemini Ultra 1.0, and o1-preview, using benchmarks for complex reasoning about actions. Our findings indicate that, although all methods exhibit errors, LLM+AL, with relatively minimal human corrections, consistently leads to correct answers, whereas standalone LLMs fail to improve even with human feedback. LLM+AL also contributes to automated generation of action languages.

Styles APA, Harvard, Vancouver, ISO, etc.

19

Kim, Jun-Hwa, Nam-Ho Kim, Donghyeok Jo et Chee Sun Won. « Multimodal Food Image Classification with Large Language Models ». Electronics 13, n^o 22 (20 novembre 2024) : 4552. http://dx.doi.org/10.3390/electronics13224552.

Texte intégral

Résumé :

In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are encoded and combined with image features obtained from a transformer-based architecture to improve food image classification. Our approach employs a cross-attention mechanism to effectively fuse visual and textual modalities, enhancing the model’s ability to extract discriminative features beyond what can be achieved with visual features alone.

Styles APA, Harvard, Vancouver, ISO, etc.

20

Trott, Sean. « Large Language Models and the Wisdom of Small Crowds ». Open Mind 8 (2024) : 723–38. http://dx.doi.org/10.1162/opmi_a_00144.

Texte intégral

Résumé :

Abstract Recent advances in Large Language Models (LLMs) have raised the question of replacing human subjects with LLM-generated data. While some believe that LLMs capture the “wisdom of the crowd”—due to their vast training data—empirical evidence for this hypothesis remains scarce. We present a novel methodological framework to test this: the “number needed to beat” (NNB), which measures how many humans are needed for a sample’s quality to rival the quality achieved by GPT-4, a state-of-the-art LLM. In a series of pre-registered experiments, we collect novel human data and demonstrate the utility of this method for four psycholinguistic datasets for English. We find that NNB > 1 for each dataset, but also that NNB varies across tasks (and in some cases is quite small, e.g., 2). We also introduce two “centaur” methods for combining LLM and human data, which outperform both stand-alone LLMs and human samples. Finally, we analyze the trade-offs in data cost and quality for each approach. While clear limitations remain, we suggest that this framework could guide decision-making about whether and how to integrate LLM-generated data into the research pipeline.

Styles APA, Harvard, Vancouver, ISO, etc.

21

Reena Chandra. « Automation Frameworks for End-to-End Testing of Large Language Models (LLMs) ». Journal of Information Systems Engineering and Management 10, n^o 43s (30 avril 2025) : 464–72. https://doi.org/10.52783/jisem.v10i43s.8400.

Texte intégral

Résumé :

Building and delivering high-quality software is critical in software engineering and requires verification and validation processes for end-to-end testing that are reliable, robust, and deliver correct results fast. Manual testing of LLM models, while feasible, is very time-consuming and inefficient and has scalability issues depending on how big the model under test (MUT) is. Recent research and cutting-edge technology innovations in LLM models have deeply influenced software engineering. We need to integrate its impact robustly in areas of model analysis, test automation, model execution, debugging, and report generation.This paper focuses on a framework approach for automated software testing of the LLM models to reduce human interactions and has improved results in a fast, cost-efficient, and time-efficient manner for automated testing methods for industries. The proposed Automation Framework (LLMAutoE2E) leverages and integrates LLMs for testing of different LLM models (like BERT, BART, Hugging Face, and multiple models available to test) to automate the end-to-end execution lifecycle of the LLM models. By leveraging LLMs, companies and industries can generate automated test cases, automated unit test codes, automated integration and end-to-end tests, and automated reporting of the LLM model's execution results. This research emphasizes the potential of the Automation Framework (LLMAutoE2E) for LLM to automate and streamline the overall execution and result generation of the LLM models and the overall testing workflows while addressing challenges in current LLM models testing, its accuracy and scalability for deployments, and reporting. The proposed Automation Framework (LLMAutoE2E) can also automate defect analysis, which improves the software reliability by a manifold and reduces the development cycles for companies. This Research paper details the role of Automation Frameworks for LLMs and how it is transforming QA processes, key methodologies, improving reliability and efficiency, addressing current challenges like model safety, bias detection, and continuous monitoring, and future trends in AI-driven software testing.

Styles APA, Harvard, Vancouver, ISO, etc.

22

Longwell, Jack B., Ian Hirsch, Fernando Binder, Galileo Arturo Gonzalez Conchas, Daniel Mau, Raymond Jang, Rahul G. Krishnan et Robert C. Grant. « Performance of Large Language Models on Medical Oncology Examination Questions ». JAMA Network Open 7, n^o 6 (18 juin 2024) : e2417641. http://dx.doi.org/10.1001/jamanetworkopen.2024.17641.

Texte intégral

Résumé :

ImportanceLarge language models (LLMs) recently developed an unprecedented ability to answer questions. Studies of LLMs from other fields may not generalize to medical oncology, a high-stakes clinical setting requiring rapid integration of new information.ObjectiveTo evaluate the accuracy and safety of LLM answers on medical oncology examination questions.Design, Setting, and ParticipantsThis cross-sectional study was conducted between May 28 and October 11, 2023. The American Society of Clinical Oncology (ASCO) Oncology Self-Assessment Series on ASCO Connection, the European Society of Medical Oncology (ESMO) Examination Trial questions, and an original set of board-style medical oncology multiple-choice questions were presented to 8 LLMs.Main Outcomes and MeasuresThe primary outcome was the percentage of correct answers. Medical oncologists evaluated the explanations provided by the best LLM for accuracy, classified the types of errors, and estimated the likelihood and extent of potential clinical harm.ResultsProprietary LLM 2 correctly answered 125 of 147 questions (85.0%; 95% CI, 78.2%-90.4%; P &lt; .001 vs random answering). Proprietary LLM 2 outperformed an earlier version, proprietary LLM 1, which correctly answered 89 of 147 questions (60.5%; 95% CI, 52.2%-68.5%; P &lt; .001), and the best open-source LLM, Mixtral-8x7B-v0.1, which correctly answered 87 of 147 questions (59.2%; 95% CI, 50.0%-66.4%; P &lt; .001). The explanations provided by proprietary LLM 2 contained no or minor errors for 138 of 147 questions (93.9%; 95% CI, 88.7%-97.2%). Incorrect responses were most commonly associated with errors in information retrieval, particularly with recent publications, followed by erroneous reasoning and reading comprehension. If acted upon in clinical practice, 18 of 22 incorrect answers (81.8%; 95% CI, 59.7%-94.8%) would have a medium or high likelihood of moderate to severe harm.Conclusions and RelevanceIn this cross-sectional study of the performance of LLMs on medical oncology examination questions, the best LLM answered questions with remarkable performance, although errors raised safety concerns. These results demonstrated an opportunity to develop and evaluate LLMs to improve health care clinician experiences and patient care, considering the potential impact on capabilities and safety.

Styles APA, Harvard, Vancouver, ISO, etc.

23

Jiang, Chunyang, Chi-Min Chan, Wei Xue, Qifeng Liu et Yike Guo. « Importance Weighting Can Help Large Language Models Self-Improve ». Proceedings of the AAAI Conference on Artificial Intelligence 39, n^o 23 (11 avril 2025) : 24257–65. https://doi.org/10.1609/aaai.v39i23.34602.

Texte intégral

Résumé :

Large language models (LLMs) have shown remarkable capability in numerous tasks and applications. However, fine-tuning LLMs using high-quality datasets under external supervision remains prohibitively expensive. In response, LLM self-improvement approaches have been vibrantly developed recently. The typical paradigm of LLM self-improvement involves training LLM on self-generated data, part of which may be detrimental and should be filtered out due to the unstable data quality. While current works primarily employs filtering strategies based on answer correctness, in this paper, we demonstrate that filtering out correct but with high distribution shift extent (DSE) samples could also benefit the results of self-improvement. Given that the actual sample distribution is usually inaccessible, we propose a new metric called DS weight to approximate DSE, inspired by the Importance Weighting methods. Consequently, we integrate DS weight with self-consistency to comprehensively filter the self-generated samples and fine-tune the language model. Experiments show that with only a tiny valid set (up to 5% size of the training set) to compute DS weight, our approach can notably promote the reasoning ability of current LLM self-improvement methods. The resulting performance is on par with methods that rely on external supervision from pre-trained reward models.

Styles APA, Harvard, Vancouver, ISO, etc.

24

Dvoichenkov, Danylo D. « Knowledge Graphs and Large Language Models ». Control Systems and Computers, n^o 3 (303) (2023) : 54–60. http://dx.doi.org/10.15407/csc.2023.03.054.

Texte intégral

Résumé :

Large Language Models(LLM) based on the Transformer architecture is nowadays one of the most widely used tool in Natural Language Processing(NLP) field. Nonetheless this approach has some limitations and flaws. In particular, these problems become crucial for the NLP-based expert systems. The LLMs may sometimes hallucinate and provide non-trustworthy responses. We will advocate the using of Knowledge Graphs for solving this problem.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Meng, Yueqi. « A review of research on training responsible large language models ». Applied and Computational Engineering 82, n^o 1 (8 novembre 2024) : 88–92. http://dx.doi.org/10.54254/2755-2721/82/20240940.

Texte intégral

Résumé :

Abstract. In recent years, there has been a growing acceptance of large language models (LLM) as a mainstream method in the field of natural language processing. Consequently, numerous studies have been conducted on this topic. Training responsible Large Language Models have become a prominent subject of research in the past few years. This type of research mainly focuses on the examination of bias, morality and other aspects of LLM. There are certain similarities in the methodologies employed in those studies. This article presents a comprehensive overview of numerous recent investigations, analyzing and categorizing the methodologies employed in these studies, and offering a literature review. This review examines the three perspectives of LLM bias data set construction, bias detection and bias elimination It provides a comparative analysis of the advantages and disadvantages of different methods. After completing relevant evaluations, a comprehensive examination of the research on training responsible LLM is conducted and potential future research directions are proposed in this article.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Liu, Xinxin. « A Survey of Hallucination Problems Based on Large Language Models ». Applied and Computational Engineering 97, n^o 1 (26 novembre 2024) : 24–30. http://dx.doi.org/10.54254/2755-2721/2024.17851.

Texte intégral

Résumé :

Abstract. Large language models (LLM) have made significant achievements in the field of natural language processing, but the generated text often contains content that is inconsistent with the real world or user input, known as hallucinations. This article investigates the current situation of hallucinations in LLM, including the definition, types, causes, and solutions of hallucinations. Illusions are divided into different types such as factual and faithful, mainly caused by factors such as training data defects, low utilization of facts, and randomness in the decoding process. The phenomenon of hallucinations poses a threat to the reliability of LLM, especially in fields such as healthcare, finance, and law, which may lead to serious consequences. To address this issue, this article investigates methods such as managing training datasets, knowledge editing, and enhancing retrieval generation. Future research should classify and evaluate illusions more finely, explore multimodal strategies, enhance model stability, and integrate human intelligence and artificial intelligence to jointly address challenges, promoting the continuous progress of LLM.

Styles APA, Harvard, Vancouver, ISO, etc.

27

Broadhurst, Martin. « Leveraging ChatGPT for Excel : How large language models are changing spreadsheet practices ». Journal of AI, Robotics & ; Workplace Automation 3, n^o 3 (1 septembre 2024) : 26. http://dx.doi.org/10.69554/akgw5928.

Texte intégral

Résumé :

The integration of large language models (LLMs) with spreadsheet software has the potential to revolutionise data analysis and task automation in the workplace. This paper explores four key approaches to leveraging LLMs within spreadsheets: 1) LLM as a spreadsheet mentor, providing on-demand assistance and guidance to users; 2) LLM as an analyst with file ingestion, enabling direct data manipulation and in-chat analysis; 3) integrated LLM tools within office productivity suites, offering native LLM capabilities; and 4) LLM-powered add-ons, extending spreadsheet functionality through custom functions and formulas. By examining the strengths and weaknesses of each approach, this paper highlights the potential benefits, such as increased accessibility to advanced features, streamlined workflows and enhanced data insights. Challenges related to hallucinations, limited functionality and user adaptation are also discussed. As LLM technology continues to advance, more sophisticated integrations are expected to emerge, further transforming the way businesses and individuals work with data. The future of LLM spreadsheet integration holds immense promise for improving productivity, automating tasks and unlocking valuable insights from data.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Wang, Dongjie. « Application, Challenges and Prospects of Large Language Models in the Medical Field ». Applied and Computational Engineering 117, n^o 1 (21 février 2025) : 187–97. https://doi.org/10.54254/2755-2721/2025.20950.

Texte intégral

Résumé :

Artificial intelligence has become a new trend in the new era. Within its field, large language models (LLM) have unique advantages in today's massive data due to their excellent text understanding and generation capabilities. This article briefly outlines the development background and architecture of LLM. In view of the rapid development of large language models, this article analyzes the practical applications, challenges and future development prospects of LLM from a medical perspective. The advantages of LLM in medical application were analyzed, and the application of LLM in various medical fields was reviewed by taking BioGPT, Zhongjing (CMLM-ZhongJing) large language model and HuatuoGPT as examples. Secondly, the paper analyzes the challenges that LLM currently faces in the medical field, such as the "hallucination" problem, the explainability problem, etc., and provides corresponding solutions. Finally, the future development prospects of LLM in the medical field are proposed, which is to transform from single modality to multimodality, from large-scale to lightweight, strengthen the integration with medical equipment, and better provide services for relevant medical staff, patients, and students in the medical field.

Styles APA, Harvard, Vancouver, ISO, etc.

29

KUTOMI, Nozomu. « On LLM (Large Language Models) on Local PC ». Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 36, n^o 3 (15 août 2024) : 70–76. https://doi.org/10.3156/jsoft.36.3_70.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

30

Karev, Alexey, et Dong Xu. « ConSCompF : Consistency-focused Similarity Comparison Framework for Generative Large Language Models ». Journal of Artificial Intelligence Research 82 (12 mars 2025) : 1325–47. https://doi.org/10.1613/jair.1.17028.

Texte intégral

Résumé :

Large Language Models (LLM) are one of the most important discoveries in machine learning in recent years. LLM-based artificial intelligence (AI) assistants, such as ChatGPT, have consistently attracted attention from researchers, investors, and the general public, driving the rapid growth of this industry. With dozens of new LLMs released every month, it becomes quite challenging to differentiate between them, thereby creating a demand for new LLM comparison methods. In this research, the Consistency-focused Similarity Comparison Framework (ConSCompF) for generative large language models is proposed. It compares texts generated by two LLMs and produces a similarity score, indicating the overall degree of similarity between their responses. The main advantage of this framework is that it can operate on a small number of unlabeled data, such as chatbot instruction prompts, and does not require LLM developers to disclose any information about their product. To evaluate the efficacy of ConSCompF, two experiments aimed at identifying similarities between multiple LLMs are conducted. Additionally, these experiments examine the correlation between the similarity scores generated by ConSCompF and the differences in outputs produced by other benchmarking techniques, such as ROUGE-L. Finally, a series of few-shot LLM comparison experiments is conducted to evaluate the performance of ConSCompF in a few-shot LLM comparison scenario. The proposed framework can be used for calculating similarity matrices of multiple LLMs, which can be effectively visualized using principal component analysis (PCA). The outputs of ConSCompF may provide useful insights into data that might have been used during LLM training and help detect potential investment fraud attempts.

Styles APA, Harvard, Vancouver, ISO, etc.

31

Viswanathan, Vijay, Kiril Gashteovski, Kiril Gashteovski, Carolin Lawrence, Tongshuang Wu et Graham Neubig. « Large Language Models Enable Few-Shot Clustering ». Transactions of the Association for Computational Linguistics 12 (2024) : 321–33. http://dx.doi.org/10.1162/tacl_a_00648.

Texte intégral

Résumé :

Abstract Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user’s intent. Existing approaches to semi-supervised clustering require a significant amount of feedback from an expert to improve the clusters. In this paper, we ask whether a large language model (LLM) can amplify an expert’s guidance to enable query-efficient, few-shot semi-supervised text clustering. We show that LLMs are surprisingly effective at improving clustering. We explore three stages where LLMs can be incorporated into clustering: before clustering (improving input features), during clustering (by providing constraints to the clusterer), and after clustering (using LLMs post-correction). We find that incorporating LLMs in the first two stages routinely provides significant improvements in cluster quality, and that LLMs enable a user to make trade-offs between cost and accuracy to produce desired clusters. We release our code and LLM prompts for the public to use.1

Styles APA, Harvard, Vancouver, ISO, etc.

32

Besta, Maciej, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda et al. « Graph of Thoughts : Solving Elaborate Problems with Large Language Models ». Proceedings of the AAAI Conference on Artificial Intelligence 38, n^o 16 (24 mars 2024) : 17682–90. http://dx.doi.org/10.1609/aaai.v38i16.29720.

Texte intégral

Résumé :

We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks

Styles APA, Harvard, Vancouver, ISO, etc.

33

Ray, Sujan, Arpita Nath Sarker, Neelakshi Chatterjee, Kowshik Bhowmik et Sayantan Dey. « Leveraging Large Language Models for Clinical Trial Eligibility Criteria Classification ». Digital 5, n^o 2 (8 avril 2025) : 12. https://doi.org/10.3390/digital5020012.

Texte intégral

Résumé :

The advent of transformer technology and large language models (LLMs) has further broadened the already extensive application field of artificial intelligence (AI). A large portion of medical records is stored in text format, such as clinical trial texts. Part of these texts is information regarding eligibility criteria. We aimed to harness the immense capabilities of an LLM by fine-tuning an open-source LLM (Llama-2) to develop a classifier from the clinical trial data. We were interested in investigating whether a fine-tuned LLM could better decide the eligibility criteria from the clinical trial text and compare the results with a more traditional method. Such an investigation can help us determine the extent to which we can rely on text-based applications developed from large language models and possibly open new avenues of application in the medical domain. Our results are comparable to the best-performing methods for this task. Since we used state-of-the-art technology, this research has the potential to open new avenues in the field of LLM application in the healthcare sector.

Styles APA, Harvard, Vancouver, ISO, etc.

34

Hegelich, Simon, et Kolja Hegelich. « Large Language Models : Starke KI mit bekannten Schwächen ? » MedienWirtschaft 19, n^o 3 (2022) : 6–11. http://dx.doi.org/10.15358/1613-0669-2022-3-6.

Texte intégral

Résumé :

Künstliche Intelligenz (KI) wird immer besser darin, Texte zu erzeugen, die wie vom Menschen geschrieben wirken. Diese Fortschritte verdanken wir extrem aufwändigen Modellen aus dem Bereich des Deep Learning (Large Language Models (LLM)). Aber ist ein Computer intelligent, wenn er intelligent wirkende Texte erzeugen kann? Der berühmte Turing-Test würde dies bejahen. Wir denken aber, dass Zweifel angebracht sind. So komplex diese modernen Algorithmen auch sein mögen, im Kern betreiben sie Mustererkennung. Der Aspekt, durch eigene Gedanken etwas Neues zu erschaffen, fehlt diesen Maschinen. Durch eine Analyse der Funktionsweise von LLM wollen wir verständlich machen, wie diese Modelle arbeiten, wo ihre Grenzen liegen und warum sie dennoch für das Ziel einer starken oder allgemeinen künstlichen Intelligenz ein wesentlicher Baustein sein werden.

Styles APA, Harvard, Vancouver, ISO, etc.

35

Zhang, Yiqian. « Application of Large Language Models in Power System Operation and Control ». Journal of Computing and Electronic Information Management 15, n^o 3 (26 décembre 2024) : 79–83. https://doi.org/10.54097/sb9qdz28.

Texte intégral

Résumé :

The introduction of "carbon peak" and "carbon neutrality" targets and the vigorous advancement of national energy market construction, renewable energy sources like wind and solar power are experiencing rapid development. However, this also brings challenges in accounting for uncertainties in various operational scheduling and optimization control processes of the power system, especially for the theoretical control methods. Fortunately, Large language model (LLM) with rapid development has shown promising prospects in the power sector. This review summarizes the application of LLM technology in power system operation and control, outlining the new power system's demand for AI technology, the impact of LLM on system management, and the technological foundation including network architecture, training methods, and data configuration. Finally, it explores the applications of LLM in power system operation and control from the perspectives of generation, transmission, distribution, consumption, and equipment.

Styles APA, Harvard, Vancouver, ISO, etc.

36

Pahune, Saurabh, et Manoj Chandrasekharan. « Several Categories of Large Language Models (LLMs) : A Short Survey ». International Journal for Research in Applied Science and Engineering Technology 11, n^o 7 (31 juillet 2023) : 615–33. http://dx.doi.org/10.22214/ijraset.2023.54677.

Texte intégral

Résumé :

Abstract: Large Language Models (LLMs) have become effective tools for natural language process-ing and have been used in many different fields. This essay offers a succinct summary of various LLM subcategories. The survey emphasizes recent developments and efforts made for various LLM kinds, including task-based financial LLMs, multilingual language LLMs, biomedical and clinical LLMs, vision language LLMs, and code language models. The survey gives a general summary of the methods, attributes, datasets, transformer models, and comparison metrics applied in each category of LLMs. Furthermore, it highlights unresolved problems in the field of developing chatbots and virtual assistants, such as boosting natural language processing, enhancing chatbot intelligence, and resolving moral and legal dilemmas. The purpose of this study is to provide readers, developers, academics, and users interested in LLM-based chatbots and virtual intelligent assistant technologies with use full information and future directions.

Styles APA, Harvard, Vancouver, ISO, etc.

37

Grashchenkov, Pavel V., Kseniia A. Studenikina et Lada I. Pasko. « Coordinate structure constraint in the linguistic competence of large language models ». Vestnik of Saint Petersburg University. Language and Literature 21, n^o 3 (2024) : 668–88. https://doi.org/10.21638/spbu09.2024.309.

Texte intégral

Résumé :

A syntactic island is a construction extraction from which leads to ungrammaticality. Island constraints are generally demonstrated through the impossibility of the A′-movement, e. g. wh-movement. Considering extraction from an island as ungrammatical is common to all native speakers. In terms of natural language understanding and generation, the competence of large language models (LLM) is almost indistinguishable from the human one. However, the difference between the grammatical constraints of the native speakers and LLM are still studied insufficiently. If the LLM grammar is set up similar to the human one, they will demonstrate high sensitivity to island constraints. The current study aims to compare the language competence of the native speakers and LLM based on the coordinate structure islands. The three dialogue systems — ChatGPT, YandexGPT and GigaChat — were examined via two tests. The first one investigates whether the model is able to give a semantically correct answer to the question with violation of island constraints. The second test directly accesses the grammaticality judgements. The results clearly show that the LLM language competence differs from the human one. The observed models regularly answer the questions violating island constraints correctly and consider them grammatical. YandexGPT turns out to be more consistent, while ChatGPT and GigaChat frequently give incorrect answers to the questions Яwhich they judge acceptable. The influence of the stimuli’s grammatical features depends on the model: the island sensitivity of ChatGPT and GigaChat is determined by the same features in contrast to YandexGPT. Thus, the results call into question the fact that LLM language competence is close to the human one.

Styles APA, Harvard, Vancouver, ISO, etc.

38

Chhun, Cyril, Fabian M. Suchanek et Chloé Clavel. « Do Language Models Enjoy Their Own Stories ? Prompting Large Language Models for Automatic Story Evaluation ». Transactions of the Association for Computational Linguistics 12 (2024) : 1122–42. http://dx.doi.org/10.1162/tacl_a_00689.

Texte intégral

Résumé :

Abstract Storytelling is an integral part of human experience and plays a crucial role in social interactions. Thus, Automatic Story Evaluation (ASE) and Generation (ASG) could benefit society in multiple ways, but they are challenging tasks which require high-level human abilities such as creativity, reasoning, and deep understanding. Meanwhile, Large Language Models (LLMs) now achieve state-of-the-art performance on many NLP tasks. In this paper, we study whether LLMs can be used as substitutes for human annotators for ASE. We perform an extensive analysis of the correlations between LLM ratings, other automatic measures, and human annotations, and we explore the influence of prompting on the results and the explainability of LLM behaviour. Most notably, we find that LLMs outperform current automatic measures for system-level evaluation but still struggle at providing satisfactory explanations for their answers.

Styles APA, Harvard, Vancouver, ISO, etc.

39

Hegde, Narayan, Madhurima Vardhan, Deepak Nathani, Emily Rosenzweig, Cathy Speed, Alan Karthikesalingam et Martin Seneviratne. « Infusing behavior science into large language models for activity coaching ». PLOS Digital Health 3, n^o 4 (2 avril 2024) : e0000431. http://dx.doi.org/10.1371/journal.pdig.0000431.

Texte intégral

Résumé :

Large language models (LLMs) have shown promise for task-oriented dialogue across a range of domains. The use of LLMs in health and fitness coaching is under-explored. Behavior science frameworks such as COM-B, which conceptualizes behavior change in terms of capability (C), Opportunity (O) and Motivation (M), can be used to architect coaching interventions in a way that promotes sustained change. Here we aim to incorporate behavior science principles into an LLM using two knowledge infusion techniques: coach message priming (where exemplar coach responses are provided as context to the LLM), and dialogue re-ranking (where the COM-B category of the LLM output is matched to the inferred user need). Simulated conversations were conducted between the primed or unprimed LLM and a member of the research team, and then evaluated by 8 human raters. Ratings for the primed conversations were significantly higher in terms of empathy and actionability. The same raters also compared a single response generated by the unprimed, primed and re-ranked models, finding a significant uplift in actionability and empathy from the re-ranking technique. This is a proof of concept of how behavior science frameworks can be infused into automated conversational agents for a more principled coaching experience.

Styles APA, Harvard, Vancouver, ISO, etc.

40

Xu, Jiechen, Lei Han, Shazia Sadiq et Gianluca Demartini. « On the Role of Large Language Models in Crowdsourcing Misinformation Assessment ». Proceedings of the International AAAI Conference on Web and Social Media 18 (28 mai 2024) : 1674–86. http://dx.doi.org/10.1609/icwsm.v18i1.31417.

Texte intégral

Résumé :

The proliferation of online misinformation significantly undermines the credibility of web content. Recently, crowd workers have been successfully employed to assess misinformation to address the limited scalability of professional fact-checkers. An alternative approach to crowdsourcing is the use of large language models (LLMs). These models are however also not perfect. In this paper, we investigate the scenario of crowd workers working in collaboration with LLMs to assess misinformation. We perform a study where we ask crowd workers to judge the truthfulness of statements under different conditions: with and without LLMs labels and explanations. Our results show that crowd workers tend to overestimate truthfulness when exposed to LLM-generated information. Crowd workers are misled by wrong LLM labels, but, on the other hand, their self-reported confidence is lower when they make mistakes due to relying on the LLM. We also observe diverse behaviors among crowd workers when the LLM is presented, indicating that leveraging LLMs can be considered a distinct working strategy.

Styles APA, Harvard, Vancouver, ISO, etc.

41

Lai, Honghao, Long Ge, Mingyao Sun, Bei Pan, Jiajie Huang, Liangying Hou, Qiuyu Yang et al. « Assessing the Risk of Bias in Randomized Clinical Trials With Large Language Models ». JAMA Network Open 7, n^o 5 (22 mai 2024) : e2412687. http://dx.doi.org/10.1001/jamanetworkopen.2024.12687.

Texte intégral

Résumé :

ImportanceLarge language models (LLMs) may facilitate the labor-intensive process of systematic reviews. However, the exact methods and reliability remain uncertain.ObjectiveTo explore the feasibility and reliability of using LLMs to assess risk of bias (ROB) in randomized clinical trials (RCTs).Design, Setting, and ParticipantsA survey study was conducted between August 10, 2023, and October 30, 2023. Thirty RCTs were selected from published systematic reviews.Main Outcomes and MeasuresA structured prompt was developed to guide ChatGPT (LLM 1) and Claude (LLM 2) in assessing the ROB in these RCTs using a modified version of the Cochrane ROB tool developed by the CLARITY group at McMaster University. Each RCT was assessed twice by both models, and the results were documented. The results were compared with an assessment by 3 experts, which was considered a criterion standard. Correct assessment rates, sensitivity, specificity, and F1 scores were calculated to reflect accuracy, both overall and for each domain of the Cochrane ROB tool; consistent assessment rates and Cohen κ were calculated to gauge consistency; and assessment time was calculated to measure efficiency. Performance between the 2 models was compared using risk differences.ResultsBoth models demonstrated high correct assessment rates. LLM 1 reached a mean correct assessment rate of 84.5% (95% CI, 81.5%-87.3%), and LLM 2 reached a significantly higher rate of 89.5% (95% CI, 87.0%-91.8%). The risk difference between the 2 models was 0.05 (95% CI, 0.01-0.09). In most domains, domain-specific correct rates were around 80% to 90%; however, sensitivity below 0.80 was observed in domains 1 (random sequence generation), 2 (allocation concealment), and 6 (other concerns). Domains 4 (missing outcome data), 5 (selective outcome reporting), and 6 had F1 scores below 0.50. The consistent rates between the 2 assessments were 84.0% for LLM 1 and 87.3% for LLM 2. LLM 1’s κ exceeded 0.80 in 7 and LLM 2’s in 8 domains. The mean (SD) time needed for assessment was 77 (16) seconds for LLM 1 and 53 (12) seconds for LLM 2.ConclusionsIn this survey study of applying LLMs for ROB assessment, LLM 1 and LLM 2 demonstrated substantial accuracy and consistency in evaluating RCTs, suggesting their potential as supportive tools in systematic review processes.

Styles APA, Harvard, Vancouver, ISO, etc.

42

Ben Shoham, Ofir, et Nadav Rappoport. « CPLLM : Clinical prediction with large language models ». PLOS Digital Health 3, n^o 12 (6 décembre 2024) : e0000680. https://doi.org/10.1371/journal.pdig.0000680.

Texte intégral

Résumé :

We present Clinical Prediction with Large Language Models (CPLLM), a method that involves fine-tuning a pre-trained Large Language Model (LLM) for predicting clinical disease and readmission. We utilized quantization and fine-tuned the LLM using prompts. For diagnostic predictions, we predicted whether patients would be diagnosed with a target disease during their next visit or in the subsequent diagnosis, leveraging their historical medical records. We compared our results to various baselines, including Retain and Med-BERT, the latter of which is the current state-of-the-art model for disease prediction using temporal structured EHR data. In addition, we also evaluated CPLLM’s utility in predicting hospital readmission and compared our method’s performance with benchmark baselines. Our experiments ultimately revealed that our proposed method, CPLLM, surpasses all the tested models in terms of PR-AUC and ROC-AUC metrics, providing state-of-the-art performance as a tool for predicting disease diagnosis and patient hospital readmission without requiring pre-training on medical data. Such a method can be easily implemented and integrated into the clinical workflow to help care providers plan next steps for their patients.

Styles APA, Harvard, Vancouver, ISO, etc.

43

Liu, Xukun, Bowen Lei, Ruqi Zhang et Dongkuan (DK) Xu. « Adaptive Draft-Verification for Efficient Large Language Model Decoding ». Proceedings of the AAAI Conference on Artificial Intelligence 39, n^o 23 (11 avril 2025) : 24668–76. https://doi.org/10.1609/aaai.v39i23.34647.

Texte intégral

Résumé :

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires a separate forward pass through the model for each token generated, which is computationally inefficient and poses challenges for deploying LLMs in latency-sensitive scenarios. The main limitations of current decoding methods stem from their inefficiencies and resource demands. Existing approaches either necessitate fine-tuning smaller models, which is resource-intensive, or relying on fixed retrieval schemes to construct drafts for the next tokens, which lack adaptability and fail to generalize across different models and contexts. To address these issues, we introduce a novel methodology called Adaptix, which accelerates LLM decoding without requiring fine-tuning. Our approach involves an adaptive draft-verification process that evolves over time to improve efficiency. We utilize a tri-gram matrix-based LLM representation to dynamically approximate the output distribution of the LLM, allowing the model to adjust to changing token probabilities during the decoding process. Additionally, we implement a draft construction mechanism that effectively balances exploration and exploitation, ensuring that the drafts generated are both diverse and close to the true output distribution of the LLM. The importance of this design lies in its ability to optimize the draft distribution adaptively, leading to faster and more accurate decoding. Through extensive experiments on various benchmark datasets and LLM architectures, we demonstrate that Adaptix significantly accelerates the decoding process while maintaining high accuracy, making it suitable for deployment in a wide range of practical applications.

Styles APA, Harvard, Vancouver, ISO, etc.

44

Dehal, Ramandeep Singh, Mehak Sharma et Enayat Rajabi. « Knowledge Graphs and Their Reciprocal Relationship with Large Language Models ». Machine Learning and Knowledge Extraction 7, n^o 2 (21 avril 2025) : 38. https://doi.org/10.3390/make7020038.

Texte intégral

Résumé :

The reciprocal relationship between Large Language Models (LLMs) and Knowledge Graphs (KGs) highlights their synergistic potential in enhancing artificial intelligence (AI) applications. LLMs, with their natural language understanding and generative capabilities, support the automation of KG construction through entity recognition, relation extraction, and schema generation. Conversely, KGs serve as structured and interpretable data sources that improve the transparency, factual consistency and reliability of LLM-based applications, mitigating challenges such as hallucinations and lack of explainability. This study conducts a systematic literature review of 77 studies to examine AI methodologies supporting LLM–KG integration, including symbolic AI, machine learning, and hybrid approaches. The research explores diverse applications spanning healthcare, finance, justice, and industrial automation, revealing the transformative potential of this synergy. Through in-depth analysis, this study identifies key limitations in current approaches, including challenges in scalability with maintaining dynamic and real-time Knowledge Graphs, difficulty in adapting general-purpose LLMs to specialized domains, limited explainability in tracing model outputs to interpretable reasoning, and ethical concerns surrounding bias, fairness, and transparency. In response, the study highlights potential strategies to optimize LLM–KG synergy. The findings from this study provide actionable insights for researchers and practitioners aiming for robust, transparent, and adaptive AI systems to enhance knowledge-driven AI applications through LLM–KG integration, further advancing generative AI and explainable AI (XAI) applications.

Styles APA, Harvard, Vancouver, ISO, etc.

45

Tang, Yiting. « Large Language Models Meet Automated Program Repair : Innovations, Challenges and Solutions ». Applied and Computational Engineering 117, n^o 1 (19 décembre 2024) : 22–30. https://doi.org/10.54254/2755-2721/2024.18303.

Texte intégral

Résumé :

As the field of Automated Program Repair (APR) continues to evolve, traditional Neural Program Repair (NPR) methods, while successful in low-resource computing scenarios, still confront numerous challenges, including the demand for extensive training data, the limited generality of specially designed networks, and a lack of robustness. In recent years, Large Language Models (LLMs) have demonstrated remarkable efficacy in downstream code-related tasks, thanks to their potent comprehension and text generation capabilities, gradually emerging as pivotal tools in automated program repair. Compared to NPR techniques, LLM-based APRs exhibit superior repair performance and enhanced generality, leading to their increasing adoption in APR tasks. Currently, the performance of zero-shot LLM-based APRs has surpassed that of NPR. LLM-based APRs have issues, such as excessive fine-tuning costs, data leakage concerns, and a shortage of domain-specific knowledge. This paper aims to review and summarize the latest advancements in LLM-based APRs from the perspectives of innovation, challenges, and solutions, providing researchers with profound insights and future directions.

Styles APA, Harvard, Vancouver, ISO, etc.

46

Long, Robert. « Introspective Capabilities in Large Language Models ». Journal of Consciousness Studies 30, n^o 9 (30 septembre 2023) : 143–53. http://dx.doi.org/10.53765/20512201.30.9.143.

Texte intégral

Résumé :

This paper considers the kind of introspection that large language models (LLMs) might be able to have. It argues that LLMs, while currently limited in their introspective capabilities, are not inherently unable to have such capabilities: they already model the world, including mental concepts, and already have some introspection-like capabilities. With deliberate training, LLMs may develop introspective capabilities. The paper proposes a method for such training for introspection, situates possible LLM introspection in the 'possible forms of introspection' framework proposed by Kammerer and Frankish, and considers the ethical ramifications of introspection and self-report in AI systems.

Styles APA, Harvard, Vancouver, ISO, etc.

47

Soos, Carlin, et Levon Haroutunian. « On the Question of Authorship in Large Language Models ». KNOWLEDGE ORGANIZATION 51, n^o 2 (2024) : 83–95. http://dx.doi.org/10.5771/0943-7444-2024-2-83.

Texte intégral

Résumé :

Adoption of pre-trained large language models (LLMs) across an increasingly diverse range of tasks and domains poses significant problems for authorial attribution and other basic knowledge organization practices. Utilizing methods from value-sensitive design, this paper examines the theoretical, practical, and ethical issues introduced by LLMs and describes how their use challenges the supposedly firm boundaries separating specific works and creators. Focusing on the implications of LLM usage for higher education, we use hypothetical value scenarios and stakeholder analysis to weigh the pedagogical risks and benefits of LLM usage, assessing the consequences of their use on and beyond college campuses. While acknowledging the unique challenges presented by this emerging educational trend, we ultimately argue that the issues associated with these novel tools are indicative of preexisting limitations within standard entity-relationship models, not wholly new issues ushered in by the advent of a relatively young technology. We contend that LLM-generated texts largely exacerbate, rather than invent from scratch, the preexisting faults that have frequently posed problems to those seeking to determine, ascribe, and regulate authorship attributions. As the growing popularity of generative AI raises concerns about plagiarism, academic integrity, and intellectual property, we advocate for a reevaluation of reductive work-creator associations and encourage the adoption of more expansive authorial concepts.

Styles APA, Harvard, Vancouver, ISO, etc.

48

Liu, Xingyu. « Research on Optimizing Virtual Reality User Experience Based on Large Language Models ». Journal of Computing and Electronic Information Management 16, n^o 2 (28 mars 2025) : 5–10. https://doi.org/10.54097/zgaxvc97.

Texte intégral

Résumé :

With the rapid development of virtual reality (VR) technology, how to further improve the user's experience in this field has become a research hotspot. Based on Large Language Model (LLM), this paper discusses its application and optimization path in VR field. Firstly, the basic principle and core technology of LLM are expounded, and its working mechanism is analyzed emphatically. Then, the application of LLM in VR field is discussed, including virtual assistant, intelligent recommendation, natural language interaction and multi-modal collaboration. Finally, a path for optimizing virtual reality user experience based on LLM is proposed, aiming to improve the accuracy of voice interaction, realize personalized content recommendation, optimize the interaction quality of dialogue system and strengthen multi-modal data fusion, so as to enhance the immersion and interactivity of virtual reality.

Styles APA, Harvard, Vancouver, ISO, etc.

49

Xia, Yuchen, Nasser Jazdi et Michael Weyrich. « Applying Large Language Models for Intelligent Industrial Automation ». atp magazin 66, n^o 6-7 (1 juillet 2024) : 62–71. http://dx.doi.org/10.17560/atp.v66i6-7.2739.

Texte intégral

Résumé :

This paper explores the transformative potential of Large Language Models (LLMs) in industrial automation, presenting a comprehensive framework for their integration into complex industrial systems. We begin with a theoretical overview of LLMs, elucidating their pivotal capabilities such as interpretation, task automation, and autonomous agent functionality. A generic methodology for integrating LLMs into industrial applications is outlined, explaining how to apply LLM for task-specific applications. Four case studies demonstrate the practical use of LLMs across different industrial environments: transforming unstructured data into structured data as asset administration shell model, improving user interactions with document databases through conversational systems, planning and controlling industrial operations autonomously, and interacting with simulation models to determine the parametrization of the process. The studies illustrate the ability of LLMs to manage versatile tasks and interface with digital twins and automation systems, indicating that efficiency and productivity improvements can be achieved by strategically deploying LLM technologies in industrial settings.

Styles APA, Harvard, Vancouver, ISO, etc.

50

Nelson, Jack Wright. « Large language models and the treaty interpretation game ». Cambridge International Law Journal 12, n^o 2 (28 décembre 2023) : 305–27. http://dx.doi.org/10.4337/cilj.2023.02.08.

Texte intégral

Résumé :

Large language models (LLMs) are currently disrupting law. Yet their precise impact on international law, especially treaty interpretation, remains underexplored. Treaty interpretation can be analogised to a game in which ‘players’ strategically deploy ‘cards’, usually principles of treaty interpretation, to persuade an ‘audience’ that their interpretation is correct. Leveraging this analogy, this paper offers a limited case study of how OpenAI’s ChatGPT, a prominent LLM-based chatbot, navigates the treaty interpretation game. In line with the existing research on ChatGPT’s legal abilities, the author concludes that ChatGPT competently plays the treaty interpretation game. This conclusion leads to a broader discussion of how LLM usage may impact international law’s development. The argument advanced is that, while LLMs have the potential to enhance efficiency and accessibility, biased training data and interpretative standardisation could reinforce international law’s dominant narratives. As such, this paper concludes with a cautionary note: the potential gains derived from LLMs risk being offset by disciplinary stagnation.

Styles APA, Harvard, Vancouver, ISO, etc.

Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!