Log in

Relevant bibliographies by topics / Bandit learning

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Bandit learning'

Author: Grafiati

Published: 10 December 2022

Last updated: 22 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Bandit learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Bandit learning"

1

Ciucanu, Radu, Pascal Lafourcade, Gael Marcadet, and Marta Soare. "SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits." Journal of Artificial Intelligence Research 73 (February 23, 2022): 737–65. http://dx.doi.org/10.1613/jair.1.13163.

Full text

Abstract:

The multi-armed bandit is a reinforcement learning model where a learning agent repeatedly chooses an action (pull a bandit arm) and the environment responds with a stochastic outcome (reward) coming from an unknown distribution associated with the chosen arm. Bandits have a wide-range of application such as Web recommendation systems. We address the cumulative reward maximization problem in a secure federated learning setting, where multiple data owners keep their data stored locally and collaborate under the coordination of a central orchestration server. We rely on cryptographic schemes and propose Samba, a generic framework for Secure federAted Multi-armed BAndits. Each data owner has data associated to a bandit arm and the bandit algorithm has to sequentially select which data owner is solicited at each time step. We instantiate Samba for five bandit algorithms. We show that Samba returns the same cumulative reward as the nonsecure versions of bandit algorithms, while satisfying formally proven security properties. We also show that the overhead due to cryptographic primitives is linear in the size of the input, which is confirmed by our proof-of-concept implementation.

APA, Harvard, Vancouver, ISO, and other styles

2

Azizi, Javad, Branislav Kveton, Mohammad Ghavamzadeh, and Sumeet Katariya. "Meta-Learning for Simple Regret Minimization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (2023): 6709–17. http://dx.doi.org/10.1609/aaai.v37i6.25823.

Full text

Abstract:

We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d. from an unknown prior distribution, and learns its meta-parameters to perform better on future tasks. We propose the first Bayesian and frequentist meta-learning algorithms for this setting. The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over m bandit tasks with horizon n is mere O(m / √n). On the other hand, the meta simple regret of the frequentist algorithm is O(n√m + m/ √n). While its regret is worse, the frequentist algorithm is more general because it does not need a prior distribution over the meta-parameters. It can also be analyzed in more settings. We instantiate our algorithms for several classes of bandit problems. Our algorithms are general and we complement our theory by evaluating them empirically in several environments.

APA, Harvard, Vancouver, ISO, and other styles

3

Sharaf, Amr, and Hal Daumé III. "Meta-Learning Effective Exploration Strategies for Contextual Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (2021): 9541–48. http://dx.doi.org/10.1609/aaai.v35i11.17149.

Full text

Abstract:

In contextual bandits, an algorithm must choose actions given ob- served contexts, learning from a reward signal that is observed only for the action chosen. This leads to an exploration/exploitation trade-off: the algorithm must balance taking actions it already believes are good with taking new actions to potentially discover better choices. We develop a meta-learning algorithm, Mêlée, that learns an exploration policy based on simulated, synthetic con- textual bandit tasks. Mêlée uses imitation learning against these simulations to train an exploration policy that can be applied to true contextual bandit tasks at test time. We evaluate Mêlée on both a natural contextual bandit problem derived from a learning to rank dataset as well as hundreds of simulated contextual ban- dit problems derived from classification tasks. Mêlée outperforms seven strong baselines on most of these datasets by leveraging a rich feature representation for learning an exploration strategy.

APA, Harvard, Vancouver, ISO, and other styles

4

Charniauski, Uladzimir, and Yao Zheng. "Autoregressive Bandits in Near-Unstable or Unstable Environment." American Journal of Undergraduate Research 21, no. 2 (2024): 15–25. http://dx.doi.org/10.33697/ajur.2024.116.

Full text

Abstract:

AutoRegressive Bandits (ARBs) is a novel model of a sequential decision-making problem as an autoregressive (AR) process. In this online learning setting, the observed reward follows an autoregressive process, whose action parameters are unknown to the agent and create an AR dynamic that depends on actions the agent chooses. This study empirically demonstrates how assigning the extreme values of systemic stability indexes and other reward-governing parameters severely impairs the ARBs learning in the respective environment. We show that this algorithm suffers numerically larger regrets of higher forms under a weakly stable environment and a strictly exponential regret under the unstable environment over the considered optimization horizon. We also test ARBs against other bandit baselines in both weakly stable and unstable systems to investigate the deteriorating effect of dropping systemic stability on their performance and demonstrate the potential advantage of choosing other competing algorithms in case of weakened stability. Finally, we measure the discussed bandit under various assigned values of key input parameters to study how we can possibly improve this algorithm’s performance under these extreme environmental conditions. KEYWORDS: Reinforcement Learning; Machine Learning; Autoregressive Processes; Bandit Algorithms; Non-Stationary Bandits; Online Learning

APA, Harvard, Vancouver, ISO, and other styles

5

Zhao, Yunfan, Tonghan Wang, Dheeraj Mysore Nagaraj, Aparna Taneja, and Milind Tambe. "The Bandit Whisperer: Communication Learning for Restless Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 22 (2025): 23404–13. https://doi.org/10.1609/aaai.v39i22.34508.

Full text

Abstract:

Applying Reinforcement Learning (RL) to Restless Multi-Arm Bandits (RMABs) offers a promising avenue for addressing allocation problems with resource constraints and temporal dynamics. However, classic RMAB models largely overlook the challenges of (systematic) data errors - a common occurrence in real-world scenarios due to factors like varying data collection protocols and intentional noise for differential privacy. We demonstrate that conventional RL algorithms used to train RMABs can struggle to perform well in such settings. To solve this problem, we propose the first communication learning approach in RMABs, where we study which arms, when involved in communication, are most effective in mitigating the influence of such systematic data errors. In our setup, the arms receive Q-function parameters from similar arms as messages to guide behavioral policies, steering Q-function updates. We learn communication strategies by considering the joint utility of messages across all pairs of arms and using a Q-network architecture that decomposes the joint utility. Both theoretical and empirical evidence validate the effectiveness of our method in significantly improving RMAB performance across diverse problems.

APA, Harvard, Vancouver, ISO, and other styles

6

Wan, Zongqi, Zhijie Zhang, Tongyang Li, Jialin Zhang, and Xiaoming Sun. "Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (2023): 10087–94. http://dx.doi.org/10.1609/aaai.v37i8.26202.

Full text

Abstract:

Multi-arm bandit (MAB) and stochastic linear bandit (SLB) are important models in reinforcement learning, and it is well-known that classical algorithms for bandits with time horizon T suffer from the regret of at least the square root of T. In this paper, we study MAB and SLB with quantum reward oracles and propose quantum algorithms for both models with the order of the polylog T regrets, exponentially improving the dependence in terms of T. To the best of our knowledge, this is the first provable quantum speedup for regrets of bandit problems and in general exploitation in reinforcement learning. Compared to previous literature on quantum exploration algorithms for MAB and reinforcement learning, our quantum input model is simpler and only assumes quantum oracles for each individual arm.

APA, Harvard, Vancouver, ISO, and other styles

7

Yang, Luting, Jianyi Yang, and Shaolei Ren. "Contextual Bandits with Delayed Feedback and Semi-supervised Learning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (2021): 15943–44. http://dx.doi.org/10.1609/aaai.v35i18.17968.

Full text

Abstract:

Contextual multi-armed bandit (MAB) is a classic online learning problem, where a learner/agent selects actions (i.e., arms) given contextual information and discovers optimal actions based on reward feedback. Applications of contextual bandit have been increasingly expanding, including advertisement, personalization, resource allocation in wireless networks, among others. Nonetheless, the reward feedback is delayed in many applications (e.g., a user may only provide service ratings after a period of time), creating challenges for contextual bandits. In this paper, we address delayed feedback in contextual bandits by using semi-supervised learning — incorporate estimates of delayed rewards to improve the estimation of future rewards. Concretely, the reward feedback for an arm selected at the beginning of a round is only observed by the agent/learner with some observation noise and provided to the agent after some a priori unknown but bounded delays. Motivated by semi-supervised learning that produces pseudo labels for unlabeled data to further improve the model performance, we generate fictitious estimates of rewards that are delayed and have yet to arrive based on already-learnt reward functions. Thus, by combining semi-supervised learning with online contextual bandit learning, we propose a novel extension and design two algorithms, which estimate the values for currently unavailable reward feedbacks to minimize the maximum estimation error and average estimation error, respectively.

APA, Harvard, Vancouver, ISO, and other styles

8

Zhou, Pengjie, Haoyu Wei, and Huiming Zhang. "Selective Reviews of Bandit Problems in AI via a Statistical View." Mathematics 13, no. 4 (2025): 665. https://doi.org/10.3390/math13040665.

Full text

Abstract:

Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes multi-armed bandit (MAB) and stochastic continuum-armed bandit (SCAB) problems, which model sequential decision-making under uncertainty. This review outlines the foundational models and assumptions of bandit problems, explores non-asymptotic theoretical tools like concentration inequalities and minimax regret bounds, and compares frequentist and Bayesian algorithms for managing exploration–exploitation trade-offs. Additionally, we explore K-armed contextual bandits and SCAB, focusing on their methodologies and regret analyses. We also examine the connections between SCAB problems and functional data analysis. Finally, we highlight recent advances and ongoing challenges in the field.

APA, Harvard, Vancouver, ISO, and other styles

9

Kapoor, Sayash, Kumar Kshitij Patel, and Purushottam Kar. "Corruption-tolerant bandit learning." Machine Learning 108, no. 4 (2018): 687–715. http://dx.doi.org/10.1007/s10994-018-5758-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Qu, Jiaming. "Survey of dynamic pricing based on Multi-Armed Bandit algorithms." Applied and Computational Engineering 37, no. 1 (2024): 160–65. http://dx.doi.org/10.54254/2755-2721/37/20230497.

Full text

Abstract:

Dynamic pricing seeks to determine the most optimal selling price for a product or service, taking into account factors like limited supply and uncertain demand. This study aims to provide a comprehensive exploration of dynamic pricing using the multi-armed bandit problem framework in various contexts. The investigation highlights the prevalence of Thompson sampling in dynamic pricing scenarios with a Bayesian backdrop, where the seller possesses prior knowledge of demand functions. On the other hand, in non-Bayesian situations, the Upper Confidence Bound (UCB) algorithm family gains traction due to their favorable regret bounds. As markets often exhibit temporal fluctuations, the domain of non-stationary multi-armed bandits within dynamic pricing emerges as crucial. Future research directions include enhancing traditional multi-armed bandit algorithms to suit online learning settings, especially those involving dynamic reward distributions. Additionally, merging prior insights into demand functions with contextual multi-armed bandit approaches holds promise for advancing dynamic pricing strategies. In conclusion, this study sheds light on dynamic pricing through the lens of multi-armed bandit problems, offering insights and pathways for further exploration.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Bandit learning"

1

Liu, Fang. "Efficient Online Learning with Bandit Feedback." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Klein, Nicolas. "Learning and Experimentation in Strategic Bandit Problems." Diss., lmu, 2010. http://nbn-resolving.de/urn:nbn:de:bvb:19-122728.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Talebi, Mazraeh Shahi Mohammad Sadegh. "Online Combinatorial Optimization under Bandit Feedback." Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-181321.

Full text

Abstract:

Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. Each decision is based on past decisions and observed rewards. The objective is to maximize the expected cumulative reward over some time horizon by balancing exploitation (arms with higher observed rewards should be selectedoften) and exploration (all arms should be explored to learn their average rewards). Equivalently, the performanceof a decision rule or algorithm can be measured through its expected regret, defined as the gap betweenthe expected reward achieved by the algorithm and that achieved by an oracle algorithm always selecting the bestarm. This thesis investigates stochastic and adversarial combinatorial MAB problems, where each arm is a collection of several basic actions taken from a set of $d$ elements, in a way that the set of arms has a certain combinatorial structure. Examples of such sets include the set of fixed-size subsets, matchings, spanning trees, paths, etc. These problems are specific forms of online linear optimization, where the decision space is a subset of $d$-dimensional hypercube.Due to the combinatorial nature, the number of arms generically grows exponentially with $d$. Hence, treating arms as independent and applying classical sequential arm selection policies would yield a prohibitive regret. It may then be crucial to exploit the combinatorial structure of the problem to design efficient arm selection algorithms.As the first contribution of this thesis, in Chapter 3 we investigate combinatorial MABs in the stochastic setting and with Bernoulli rewards. We derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any algorithm under bandit and semi-bandit feedback. The proposed lower bounds are problem-specific and tight in the sense that there exists an algorithm that achieves these regret bounds. Our derivation leverages some theoretical results in adaptive control of Markov chains. Under semi-bandit feedback, we further discuss the scaling of the proposed lower bound with the dimension of the underlying combinatorial structure. For the case of semi-bandit feedback, we propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the fourth chapter, we consider stochastic combinatorial MAB problems where the underlying combinatorial structure is a matroid. Specializing the results of Chapter 3 to matroids, we provide explicit regret lower bounds for this class of problems. For the case of semi-bandit feedback, we propose KL-OSM, a computationally efficient greedy-based algorithm that exploits the matroid structure. Through a finite-time analysis, we prove that the regret upper bound of KL-OSM matches the proposed lower bound, thus making it the first asymptotically optimal algorithm for this class of problems. Numerical experiments validate that KL-OSM outperforms state-of-the-art algorithms in practice, as well.In the fifth chapter, we investigate the online shortest-path routing problem which is an instance of combinatorial MABs with geometric rewards. We consider and compare three different types of online routing policies, depending (i) on where routing decisions are taken (at the source or at each node), and (ii) on the received feedback (semi-bandit or bandit). For each case, we derive the asymptotic regret lower bound. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact.For source routing under semi-bandit feedback, we then propose two algorithms with a trade-off betweencomputational complexity and performance. The regret upper bounds of these algorithms improve over those ofthe existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments. Finally, we discuss combinatorial MABs in the adversarial setting and under bandit feedback. We concentrate on the case where arms have the same number of basic actions but are otherwise arbitrary. We propose CombEXP, an algorithm that has the same regret scaling as state-of-the-art algorithms. Furthermore, we show that CombEXP admits lower computational complexity for some combinatorial problems.<br><p>QC 20160201</p>

APA, Harvard, Vancouver, ISO, and other styles

4

Lomax, S. E. "Cost-sensitive decision tree learning using a multi-armed bandit framework." Thesis, University of Salford, 2013. http://usir.salford.ac.uk/29308/.

Full text

Abstract:

Decision tree learning is one of the main methods of learning from data. It has been applied to a variety of different domains over the past three decades. In the real world, accuracy is not enough; there are costs involved, those of obtaining the data and those when classification errors occur. A comprehensive survey of cost-sensitive decision tree learning has identified over 50 algorithms, developing a taxonomy in order to classify the algorithms by the way in which cost has been incorporated, and a recent comparison shows that many cost-sensitive algorithms can process balanced, two class datasets well, but produce lower accuracy rates in order to achieve lower costs when the dataset is less balanced or has multiple classes. This thesis develops a new framework and algorithm concentrating on the view that cost-sensitive decision tree learning involves a trade-off between costs and accuracy. Decisions arising from these two viewpoints can often be incompatible resulting in the reduction of the accuracy rates. The new framework builds on a specific Game Theory problem known as the multi-armed bandit. This problem concerns a scenario whereby exploration and exploitation are required to solve it. For example, a player in a casino has to decide which slot machine (bandit) from a selection of slot machines is likely to pay out the most. Game Theory proposes a solution of this problem which is solved by a process of exploration and exploitation in which reward is maximized. This thesis utilizes these concepts from the multi-armed bandit game to develop a new algorithm by viewing the rewards as a reduction in costs, utilizing the exploration and exploitation techniques so that a compromise between decisions based on accuracy and decisions based on costs can be found. The algorithm employs the adapted multi-armed bandit game to select the attributes during decision tree induction, using a look-ahead methodology to explore potential attributes and exploit the attributes which maximizes the reward. The new algorithm is evaluated on fifteen datasets and compared to six well-known algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The results obtained show that the new multi-armed based algorithm can produce more cost-effective trees without compromising accuracy. The thesis also includes a critical appraisal of the limitations of the developed algorithm and proposes avenues for further research.

APA, Harvard, Vancouver, ISO, and other styles

5

Sakhi, Otmane. "Offline Contextual Bandit : Theory and Large Scale Applications." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAG011.

Full text

Abstract:

Cette thèse s'intéresse au problème de l'apprentissage à partir d'interactions en utilisant le cadre du bandit contextuel hors ligne. En particulier, nous nous intéressons à deux sujets connexes : (1) l'apprentissage de politiques hors ligne avec des certificats de performance, et (2) l'apprentissage rapide et efficace de politiques, pour le problème de recommandation à grande échelle. Pour (1), nous tirons d'abord parti des résultats du cadre d'optimisation distributionnellement robuste pour construire des bornes asymptotiques, sensibles à la variance, qui permettent l'évaluation des performances des politiques. Ces bornes nous aident à obtenir de nouveaux objectifs d'apprentissage plus pratiques grâce à leur nature composite et à leur calibrage simple. Nous analysons ensuite le problème d'un point de vue PAC-Bayésien et fournissons des bornes, plus étroites, sur les performances des politiques. Nos résultats motivent de nouvelles stratégies, qui offrent des certificats de performance sur nos politiques avant de les déployer en ligne. Les stratégies nouvellement dérivées s'appuient sur des objectifs d'apprentissage composites qui ne nécessitent pas de réglage supplémentaire. Pour (2), nous proposons d'abord un modèle bayésien hiérarchique, qui combine différents signaux, pour estimer efficacement la qualité de la recommandation. Nous fournissons les outils computationnels appropriés pour adapter l'inférence aux problèmes à grande échelle et démontrons empiriquement les avantages de l'approche dans plusieurs scénarios. Nous abordons ensuite la question de l'accélération des approches communes d'optimisation des politiques, en nous concentrant particulièrement sur les problèmes de recommandation avec des catalogues de millions de produits. Nous dérivons des méthodes d'optimisation, basées sur de nouvelles approximations du gradient calculées en temps logarithmique par rapport à la taille du catalogue. Notre approche améliore le temps linéaire des méthodes courantes de calcul de gradient, et permet un apprentissage rapide sans nuire à la qualité des politiques obtenues<br>This thesis presents contributions to the problem of learning from logged interactions using the offline contextual bandit framework. We are interested in two related topics: (1) offline policy learning with performance certificates, and (2) fast and efficient policy learning applied to large scale, real world recommendation. For (1), we first leverage results from the distributionally robust optimisation framework to construct asymptotic, variance-sensitive bounds to evaluate policies' performances. These bounds lead to new, more practical learning objectives thanks to their composite nature and straightforward calibration. We then analyse the problem from the PAC-Bayesian perspective, and provide tighter, non-asymptotic bounds on the performance of policies. Our results motivate new strategies, that offer performance certificates before deploying the policies online. The newly derived strategies rely on composite learning objectives that do not require additional tuning. For (2), we first propose a hierarchical Bayesian model, that combines different signals, to efficiently estimate the quality of recommendation. We provide proper computational tools to scale the inference to real world problems, and demonstrate empirically the benefits of the approach in multiple scenarios. We then address the question of accelerating common policy optimisation approaches, particularly focusing on recommendation problems with catalogues of millions of items. We derive optimisation routines, based on new gradient approximations, computed in logarithmic time with respect to the catalogue size. Our approach improves on common, linear time gradient computations, yielding fast optimisation with no loss on the quality of the learned policies

APA, Harvard, Vancouver, ISO, and other styles

6

CELLA, LEONARDO. "EFFICIENCY AND REALISM IN STOCHASTIC BANDITS." Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/807862.

Full text

Abstract:

This manuscript is dedicated to the analysis of the application of stochastic bandits to the recommender systems domain. Here a learning agent sequentially recommends one item from a catalog of available alternatives. Consequently, the environment returns a reward that is a noisy observation of the rating associated to the suggested item. The peculiarity of the bandit setting is that no information is given about not recommended products, and the collected rewards are the only information available to the learning agent. By relying on them the learner adapts his strategy towards reaching its learning objective, that is, maximizing the cumulative reward collected over all the interactions. In this dissertation we cover the investigation of two main research directions: the development of efficient learning algorithms and the introduction of a more realistic learning setting. In addressing the former objective we propose two approaches to speedup the learning process. The first solution aims to reduce the computational costs associated to the learning procedure, while the second's goal is to boost the learning phase by relying on data corresponding to terminated recommendation sessions. Regarding the latter research line, we propose a novel setting representing use-cases that do not fit in the standard bandit model.

APA, Harvard, Vancouver, ISO, and other styles

7

Liu, Sige. "Bandit Learning Enabled Task Offloading and Resource Allocation in Mobile Edge Computing." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/29719.

Full text

Abstract:

The Internet-of-Things (IoT) is envisioned as a promising paradigm for carrying the interconnections of massive devices through various communications protocols. With the rapid development of fifth-generation (5G), IoT has incentivized a large number of new computation-intensive applications and bridges diverse technologies to provide ubiquitous services with intelligence. However, with billions of devices anticipated to be connected in IoT systems in the coming years, IoT devices face a series of challenges from their inherent features. For instance, the IoT devices are usually densely deployed, and the vast data exchange among numerous devices will cause large overheads and communication/computing resource limitations. Integrated with mobile edge computing (MEC), which pushes the computation and storage resources to the edge of the network much closer to the local devices, IoT systems will benefit from a low propagation delay and privacy/security enhancement. Hence, merging MEC and IoT is a new promising paradigm for task offloading and resource allocation in future wireless communications in mobile networks. In this thesis, we introduce different task offloading and resource allocation strategies for IoT devices to efficiently utilize the limited resource, e.g., spectrum, computation, and budget. Bandit learning (BL), a typical online learning approach, offers a promising solution to deal with the communication/computing resource limitation. The inherent idea behind MEC is to design policies to make a better selection for devices or MEC servers. This coincides with the design purpose of BL. This match-in mechanism provides selection policies for better performance, such as lower latency, lower energy consumption, and higher task completion ratio.

APA, Harvard, Vancouver, ISO, and other styles

8

Jedor, Matthieu. "Bandit algorithms for recommender system optimization." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM027.

Full text

Abstract:

Dans cette thèse de doctorat, nous étudions l'optimisation des systèmes de recommandation dans le but de fournir des suggestions de produits plus raffinées pour un utilisateur.La tâche est modélisée à l'aide du cadre des bandits multi-bras.Dans une première partie, nous abordons deux problèmes qui se posent fréquemment dans les systèmes de recommandation : le grand nombre d'éléments à traiter et la gestion des contenus sponsorisés.Dans une deuxième partie, nous étudions les performances empiriques des algorithmes de bandit et en particulier comment paramétrer les algorithmes traditionnels pour améliorer les résultats dans les environnements stationnaires et non stationnaires qui l'on rencontre en pratique.Cela nous amène à analyser à la fois théoriquement et empiriquement l'algorithme glouton qui, dans certains cas, est plus performant que l'état de l'art<br>In this PhD thesis, we study the optimization of recommender systems with the objective of providing more refined suggestions of items for a user to benefit.The task is modeled using the multi-armed bandit framework.In a first part, we look upon two problems that commonly occured in recommendation systems: the large number of items to handle and the management of sponsored contents.In a second part, we investigate the empirical performance of bandit algorithms and especially how to tune conventional algorithm to improve results in stationary and non-stationary environments that arise in practice.This leads us to analyze both theoretically and empirically the greedy algorithm that, in some cases, outperforms the state-of-the-art

APA, Harvard, Vancouver, ISO, and other styles

9

Louëdec, Jonathan. "Stratégies de bandit pour les systèmes de recommandation." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30257/document.

Full text

Abstract:

Les systèmes de recommandation actuels ont besoin de recommander des objets pertinents aux utilisateurs (exploitation), mais pour cela ils doivent pouvoir également obtenir continuellement de nouvelles informations sur les objets et les utilisateurs encore peu connus (exploration). Il s'agit du dilemme exploration/exploitation. Un tel environnement s'inscrit dans le cadre de ce que l'on appelle " apprentissage par renforcement ". Dans la littérature statistique, les stratégies de bandit sont connues pour offrir des solutions à ce dilemme. Les contributions de cette thèse multidisciplinaire adaptent ces stratégies pour appréhender certaines problématiques des systèmes de recommandation, telles que la recommandation de plusieurs objets simultanément, la prise en compte du vieillissement de la popularité d'un objet ou encore la recommandation en temps réel<br>Current recommender systems need to recommend items that are relevant to users (exploitation), but they must also be able to continuously obtain new information about items and users (exploration). This is the exploration / exploitation dilemma. Such an environment is part of what is called "reinforcement learning". In the statistical literature, bandit strategies are known to provide solutions to this dilemma. The contributions of this multidisciplinary thesis the adaptation of these strategies to deal with some problems of the recommendation systems, such as the recommendation of several items simultaneously, taking into account the aging of the popularity of an items or the recommendation in real time

APA, Harvard, Vancouver, ISO, and other styles

10

Nakhe, Paresh [Verfasser], Martin [Gutachter] Hoefer, and Georg [Gutachter] Schnitger. "On bandit learning and pricing in markets / Paresh Nakhe ; Gutachter: Martin Hoefer, Georg Schnitger." Frankfurt am Main : Universitätsbibliothek Johann Christian Senckenberg, 2018. http://d-nb.info/1167856740/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Bandit learning"

1

Garofalo, Robert Joseph. Chorale and Shaker dance by John P. Zdechlik: A teaching-learning unit. Meredith Music Publications, 1999.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

2

Garofalo, Robert Joseph. Suite française by Darius Milhaud: A teaching-learning unit. Meredith Music Publications, 1998.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

3

Garofalo, Robert Joseph. On a hymnsong of Philip Bliss by David R. Holsinger: A teaching/learning unit. Meredith Music Publications, 2000.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

4

James, Patterson. Retour au collège: Le pire endroit du monde! Hachette romans, 2016.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

5

Christopher, Tebbetts, and Park Laura 1980 ill, eds. Just my rotten luck. Little Brown & Company, 2015.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

6

Bubeck, Sébastian, and Cesa-Bianchi Nicolò. Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Now Publishers, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

7

Zhao, Qing, and R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

8

Zhao, Qing, and R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

9

Zhao, Qing. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Springer International Publishing AG, 2019.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

10

Zhao, Qing, and R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Bandit learning"

1

Kakas, Antonis C., David Cohn, Sanjoy Dasgupta, et al. "Associative Bandit Problem." In Encyclopedia of Machine Learning. Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_39.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Mannor, Shie, Xin Jin, Jiawei Han, et al. "k-Armed Bandit." In Encyclopedia of Machine Learning. Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_424.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Fürnkranz, Johannes, Philip K. Chan, Susan Craw, et al. "Multi-Armed Bandit." In Encyclopedia of Machine Learning. Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_565.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Fürnkranz, Johannes, Philip K. Chan, Susan Craw, et al. "Multi-Armed Bandit Problem." In Encyclopedia of Machine Learning. Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_566.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Mannor, Shie. "k-Armed Bandit." In Encyclopedia of Machine Learning and Data Mining. Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_424.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Madani, Omid, Daniel J. Lizotte, and Russell Greiner. "The Budgeted Multi-armed Bandit Problem." In Learning Theory. Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-27819-1_46.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Munro, Paul, Hannu Toivonen, Geoffrey I. Webb, et al. "Bandit Problem with Side Information." In Encyclopedia of Machine Learning. Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_54.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Munro, Paul, Hannu Toivonen, Geoffrey I. Webb, et al. "Bandit Problem with Side Observations." In Encyclopedia of Machine Learning. Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_55.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Agarwal, Mudit, and Naresh Manwani. "ALBIF: Active Learning with BandIt Feedbacks." In Advances in Knowledge Discovery and Data Mining. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05981-0_28.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Vermorel, Joannès, and Mehryar Mohri. "Multi-armed Bandit Algorithms and Empirical Evaluation." In Machine Learning: ECML 2005. Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11564096_42.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Bandit learning"

1

Agarwal, Arpit, Rohan Ghuge, and Viswanath Nagarajan. "Semi-Bandit Learning for Monotone Stochastic Optimization*." In 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2024. http://dx.doi.org/10.1109/focs61266.2024.00083.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Xu, Zhuofan, Benedikt Bollig, Matthias Függer, and Thomas Nowak. "Permutation Equivariant Deep Reinforcement Learning for Multi-Armed Bandit." In 2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2024. https://doi.org/10.1109/ictai62512.2024.00140.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Li, Miao, Siyi Qiu, Jiong Liu, and Wenping Song. "Content Caching Optimization Based on Improved Bandit Learning Algorithm." In 2024 33rd International Conference on Computer Communications and Networks (ICCCN). IEEE, 2024. http://dx.doi.org/10.1109/icccn61486.2024.10637635.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Wu, Xiaoyi, and Bin Li. "Achieving Regular and Fair Learning in Combinatorial Multi-Armed Bandit." In IEEE INFOCOM 2024 - IEEE Conference on Computer Communications. IEEE, 2024. http://dx.doi.org/10.1109/infocom52122.2024.10621191.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Huang, Duo. "The Development and Future Challenges of the Multi-armed Bandit Algorithm." In 2024 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML). IEEE, 2024. https://doi.org/10.1109/icicml63543.2024.10957859.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Sushma, M., and K. P. Naveen. "Multi-Armed Bandit Based Learning Algorithms for Offloading in Queueing Systems." In 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring). IEEE, 2024. http://dx.doi.org/10.1109/vtc2024-spring62846.2024.10683365.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Hossain, Abrar, Abdel-Hameed A. Badawy, Mohammad A. Islam, Tapasya Patki, and Kishwar Ahmed. "HPC Application Parameter Autotuning on Edge Devices: A Bandit Learning Approach." In 2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC). IEEE, 2024. https://doi.org/10.1109/hipc62374.2024.00011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

das Dores, Silvia Cristina Nunes, Carlos Soares, and Duncan Ruiz. "Bandit-Based Automated Machine Learning." In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, 2018. http://dx.doi.org/10.1109/bracis.2018.00029.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Deng, Kun, Chris Bourke, Stephen Scott, Julie Sunderman, and Yaling Zheng. "Bandit-Based Algorithms for Budgeted Learning." In 2007 7th IEEE International Conference on Data Mining (ICDM '07). IEEE, 2007. http://dx.doi.org/10.1109/icdm.2007.91.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Zong, Jun, Ting Liu, Zhaowei Zhu, Xiliang Luo, and Hua Qian. "Social Bandit Learning: Strangers Can Help." In 2020 International Conference on Wireless Communications and Signal Processing (WCSP). IEEE, 2020. http://dx.doi.org/10.1109/wcsp49889.2020.9299725.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Bandit learning"

1

Shum, Matthew, Yingyao Hu, and Yutaka Kayaba. Nonparametric learning rules from bandit experiments: the eyes have it! Institute for Fiscal Studies, 2010. http://dx.doi.org/10.1920/wp.cem.2010.1510.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Liu, Haoyang, Keqin Liu, and Qing Zhao. Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit. Defense Technical Information Center, 2010. http://dx.doi.org/10.21236/ada554798.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Berlin, Noémie, Jan Dul, Marco Gazel, Louis Lévy-Garboua, and Todd Lubart. Creative Cognition as a Bandit Problem. CIRANO, 2023. http://dx.doi.org/10.54932/anre7929.

Full text

Abstract:

This paper characterizes creative cognition as a multi-armed bandit problem involving a trade-off between exploration and exploitation in sequential decisions from experience taking place in novel uncertain environments. Creative cognition implements an efficient learning process in this kind of dynamic decision. Special emphasis is put on the optimal sequencing of divergent and convergent behavior by showing that divergence must be inhibited at one point to converge toward creative behavior so that excessive divergence is counterproductive. We test this hypothesis in two behavioral experiments, using both novel and well-known tasks and precise measures of individual differences in creative potential in middle and high school students. Results in both studies confirmed that a task-dependent mix of divergence and convergence predicted high performance in a production task and better satisfaction in a consumption task, but exclusively in novel uncertain environments. These predictions were maintained after controlling for gender, personality, incentives, and other factors. As hypothesized, creative cognition was shown to be necessary for high performance under the appropriate conditions. However, it was not necessary for getting high grades in a traditional school system.

APA, Harvard, Vancouver, ISO, and other styles

4

García Marrugo, Alexandra I., Katherine Olston, Josh Aarts, Dashiell Moore, and Syed Kaliyadan. SCANA: Supporting students’ academic language development at The University of Sydney. Journal of the Australian and New Zealand Student Services Association, 2023. http://dx.doi.org/10.30688/janzssa.2023-2-01.

Full text

Abstract:

In 2021, the Learning Hub at The University of Sydney launched the Student Communication and Needs Analysis (SCANA). This program of support consists of a screening language task and associated support interventions in first year units of study (UoS). The self-marking online screening tool developed by the Language Testing Research Centre at The University of Melbourne classifies students into three bands, with Band 1 identifying students at risk of academic failure due to insufficient language proficiency. All students in selected UoS are encouraged to take SCANA and offered academic language support according to their needs. Students identified in Band 1 are advised to attend discipline-specific support targeting the language issues associated with written assignments. These students are also informed about other offerings, such as one-on-one consultations, generic academic workshops, peer-facilitated programs, and self-access resources. Students in Bands 2 and 3 are also offered options according to their level. The results from Semester 1 2022 showed that students identified in Band 1 who attended at least two support workshops obtained, on average, 12 more points in their final grade and were up to five times less likely to fail than those in Band 1 who did not attend any workshops. These promising results have motivated faculty to expand the program from seven UoS in 2021 to 32 UoS in 2023.

APA, Harvard, Vancouver, ISO, and other styles

5

Alwan, Iktimal, Dennis D. Spencer, and Rafeed Alkawadri. Comparison of Machine Learning Algorithms in Sensorimotor Functional Mapping. Progress in Neurobiology, 2023. http://dx.doi.org/10.60124/j.pneuro.2023.30.03.

Full text

Abstract:

Objective: To compare the performance of popular machine learning algorithms (ML) in mapping the sensorimotor cortex (SM) and identifying the anterior lip of the central sulcus (CS). Methods: We evaluated support vector machines (SVMs), random forest (RF), decision trees (DT), single layer perceptron (SLP), and multilayer perceptron (MLP) against standard logistic regression (LR) to identify the SM cortex employing validated features from six-minute of NREM sleep icEEG data and applying standard common hyperparameters and 10-fold cross-validation. Each algorithm was tested using vetted features based on the statistical significance of classical univariate analysis (p<0.05) and extended () 17 features representing power/coherence of different frequency bands, entropy, and interelectrode-based distance. The analysis was performed before and after weight adjustment for imbalanced data (w). Results: 7 subjects and 376 contacts were included. Before optimization, ML algorithms performed comparably employing conventional features (median CS accuracy: 0.89, IQR [0.88-0.9]). After optimization, neural networks outperformed others in means of accuracy (MLP: 0.86), the area under the curve (AUC) (SLPw, MLPw, MLP: 0.91), recall (SLPw: 0.82, MLPw: 0.81), precision (SLPw: 0.84), and F1-scores (SLPw: 0.82). SVM achieved the best specificity performance. Extending the number of features and adjusting the weights improved recall, precision, and F1-scores by 48.27%, 27.15%, and 39.15%, respectively, with gains or no significant losses in specificity and AUC across CS and Function (correlation r=0.71 between the two clinical scenarios in all performance metrics, p<0.001). Interpretation: Computational passive sensorimotor mapping is feasible and reliable. Feature extension and weight adjustments improve the performance and counterbalance the accuracy paradox. Optimized neural networks outperform other ML algorithms even in binary classification tasks. The best-performing models and the MATLAB® routine employed in signal processing are available to the public at (Link 1).

APA, Harvard, Vancouver, ISO, and other styles

6

Maloney, Megan, Sarah Becker, Andrew Griffin, Susan Lyon, and Kristofer Lasko. Automated built-up infrastructure land cover extraction using index ensembles with machine learning, automated training data, and red band texture layers. Engineer Research and Development Center (U.S.), 2024. http://dx.doi.org/10.21079/11681/49370.

Full text

Abstract:

Automated built-up infrastructure classification is a global need for planning. However, individual indices have weaknesses, including spectral confusion with bare ground, and computational requirements for deep learning are intensive. We present a computationally lightweight method to classify built-up infrastructure. We use an ensemble of spectral indices and a novel red-band texture layer with global thresholds determined from 12 diverse sites (two seasonally varied images per site). Multiple spectral indexes were evaluated using Sentinel-2 imagery. Our texture metric uses the red band to separate built-up infrastructure from spectrally similar bare ground. Our evaluation produced global thresholds by evaluating ground truth points against a range of site-specific optimal index thresholds across the 24 images. These were used to classify an ensemble, and then spectral indexes, texture, and stratified random sampling guided training data selection. The training data fit a random forest classifier to create final binary maps. Validation found an average overall accuracy of 79.95% (±4%) and an F1 score of 0.5304 (±0.07). The inclusion of the texture metric improved overall accuracy by 14–21%. A comparison to site-specific thresholds and a deep learning-derived layer is provided. This automated built-up infrastructure mapping framework requires only public imagery to support time-sensitive land management workflows.

APA, Harvard, Vancouver, ISO, and other styles

7

Konsam, Manis Kumar, Amanda Thounajam, Prasad Vaidya, Gopikrishna A, Uthej Dalavai, and Yashima Jain. Machine Learning-Enhanced Control System for Optimized Ceiling Fan and Air Conditioner Operation for Thermal Comfort. Indian Institute for Human Settlements, 2024. http://dx.doi.org/10.24943/mlcsocfacotc6.2023.

Full text

Abstract:

This paper proposes and tests the implementation of a sustainable cooling approach that uses a machine learning model to predict operative temperatures, and an automated control sequence that prioritises ceiling fans over air conditioners. The robustness of the machine learning model (MLM) is tested by comparing its prediction with that of a straight-line model (SLM) using the metrics of Mean Bias Error (MBE) and Root Mean Squared Error (RMSE). This comparison is done across several rooms to see how each prediction method performs when the conditions are different from those of the original room where the model was trained. A control sequence has been developed where the MLM’s prediction of Operative Temperature (OT) is used to adjust the adaptive thermal comfort band for increased air speed delivered by the ceiling fans to maintain acceptable OT. This control sequence is tested over a two-week period in two different buildings by comparing it with a constant air temperature setpoint (24ºC).

APA, Harvard, Vancouver, ISO, and other styles

8

McElhaney, Kevin W., Kelly Mills, Danae Kamdar, Anthony Baker, and Jeremy Roschelle. A Summary and Synthesis of Initial OpenSciEd Research. Digital Promise, 2023. http://dx.doi.org/10.51388/20.500.12265/171.

Full text

Abstract:

This report summarizes and synthesizes OpenSciEd research published as of August 2022, addressing two questions about OpenSciEd: (1) To what extent do teachers enact OpenSciEd units with integrity to its distinctive principles? and (2) To what extent do OpenSciEd teacher tools and professional learning experiences support teachers to enact OpenSciEd with integrity? This review includes 16 publications (journal articles, peer-reviewed conference proceedings, conference papers, doctoral dissertations, and published reports). Five of the papers focus on the design of OpenSciEd materials and do not have an empirical focus, seven have an empirical focus on classroom enactment, and four have an empirical focus on teacher supports. All but one of the papers were co-authored by affiliates of the OpenSciEd middle school development consortium, and all but one focus on the middle school grade band.

APA, Harvard, Vancouver, ISO, and other styles

9

Olivier, Jason, and Sally Shoop. Imagery classification for autonomous ground vehicle mobility in cold weather environments. Engineer Research and Development Center (U.S.), 2021. http://dx.doi.org/10.21079/11681/42425.

Full text

Abstract:

Autonomous ground vehicle (AGV) research for military applications is important for developing ways to remove soldiers from harm’s way. Current AGV research tends toward operations in warm climates and this leaves the vehicle at risk of failing in cold climates. To ensure AGVs can fulfill a military vehicle’s role of being able to operate on- or off-road in all conditions, consideration needs to be given to terrain of all types to inform the on-board machine learning algorithms. This research aims to correlate real-time vehicle performance data with snow and ice surfaces derived from multispectral imagery with the goal of aiding in the development of a truly all-terrain AGV. Using the image data that correlated most closely to vehicle performance the images were classified into terrain units of most interest to mobility. The best image classification results were obtained when using Short Wave InfraRed (SWIR) band values and a supervised classification scheme, resulting in over 95% accuracy.

APA, Harvard, Vancouver, ISO, and other styles

10

Becker, Sarah, Megan Maloney, and Andrew Griffin. A multi-biome study of tree cover detection using the Forest Cover Index. Engineer Research and Development Center (U.S.), 2021. http://dx.doi.org/10.21079/11681/42003.

Full text

Abstract:

Tree cover maps derived from satellite and aerial imagery directly support civil and military operations. However, distinguishing tree cover from other vegetative land covers is an analytical challenge. While the commonly used Normalized Difference Vegetation Index (NDVI) can identify vegetative cover, it does not consistently distinguish between tree and low-stature vegetation. The Forest Cover Index (FCI) algorithm was developed to take the multiplicative product of the red and near infrared bands and apply a threshold to separate tree cover from non-tree cover in multispectral imagery (MSI). Previous testing focused on one study site using 2-m resolution commercial MSI from WorldView-2 and 30-m resolution imagery from Landsat-7. New testing in this work used 3-m imagery from PlanetScope and 10-m imagery from Sentinel-2 in imagery in sites across 12 biomes in South and Central America and North Korea. Overall accuracy ranged between 23% and 97% for Sentinel-2 imagery and between 51% and 98% for PlanetScope imagery. Future research will focus on automating the identification of the threshold that separates tree from other land covers, exploring use of the output for machine learning applications, and incorporating ancillary data such as digital surface models and existing tree cover maps.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!