Bibliographies thématiques / Adversarial bandits

Sommaire

Articles de revues
Thèses
Livres
Chapitres de livres
Actes de conférences

Littérature scientifique sur le sujet « Adversarial bandits »

Auteur : Grafiati

Publié le 8 mars 2025

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les listes thématiques d’articles de revues, de livres, de thèses, de rapports de conférences et d’autres sources académiques sur le sujet « Adversarial bandits ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Articles de revues sur le sujet "Adversarial bandits"

Lu, Shiyin, Guanghui Wang et Lijun Zhang. « Stochastic Graphical Bandits with Adversarial Corruptions ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 10 (18 mai 2021) : 8749–57. http://dx.doi.org/10.1609/aaai.v35i10.17060.

Texte intégral

Résumé :

We study bandits with graph-structured feedback, where a learner repeatedly selects an arm and then observes rewards of the chosen arm as well as its neighbors in the feedback graph. Existing work on graphical bandits assumes either stochastic rewards or adversarial rewards, both of which are extremes and appear rarely in real-world scenarios. In this paper, we study graphical bandits with a reward model that interpolates between the two extremes, where the rewards are overall stochastically generated but a small fraction of them can be adversarially corrupted. For this problem, we propose an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions. The main idea is to restrict exploration to carefully-designed independent sets of the feedback graph and perform exploitation by adopting a soft version of arm elimination. Theoretical analysis shows that our algorithm attains an $O(\alpha \ln{K} \ln{T} + \alpha C)$ regret, where $\alpha$ is the independence number of the feedback graph, $K$ is the number of arms, $T$ is the time horizon, and $C$ quantifies the total corruptions introduced by the adversary. The effectiveness of our algorithm is demonstrated by numerical experiments.

Styles APA, Harvard, Vancouver, ISO, etc.

Pacchiano, Aldo, Heinrich Jiang et Michael I. Jordan. « Robustness Guarantees for Mode Estimation with an Application to Bandits ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 10 (18 mai 2021) : 9277–84. http://dx.doi.org/10.1609/aaai.v35i10.17119.

Texte intégral

Résumé :

Mode estimation is a classical problem in statistics with a wide range of applications in machine learning. Despite this, there is little understanding in its robustness properties under possibly adversarial data contamination. In this paper, we give precise robustness guarantees as well as privacy guarantees under simple randomization. We then introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We prove regret guarantees for the problems of top arm identification, top m-arms identification, contextual modal bandits, and infinite continuous arms top arm recovery. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences, thus rendering modal bandits an attractive choice in situations where the rewards may have outliers or adversarial corruptions.

Styles APA, Harvard, Vancouver, ISO, etc.

Wang, Zhiwei, Huazheng Wang et Hongning Wang. « Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits ». Proceedings of the AAAI Conference on Artificial Intelligence 38, n^o 14 (24 mars 2024) : 15770–77. http://dx.doi.org/10.1609/aaai.v38i14.29506.

Texte intégral

Résumé :

Adversarial attacks against stochastic multi-armed bandit (MAB) algorithms have been extensively studied in the literature. In this work, we focus on reward poisoning attacks and find most existing attacks can be easily detected by our proposed detection method based on the test of homogeneity, due to their aggressive nature in reward manipulations. This motivates us to study the notion of stealthy attack against stochastic MABs and investigate the resulting attackability. Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealthy attack depends on the environmental conditions and the realized reward of the arm pulled in the first round. We also analyze the situation for general MAB algorithms equipped with our attack detection method and find that it is possible to have a stealthy attack that almost always succeeds. This brings new insights into the security risks of MAB algorithms.

Styles APA, Harvard, Vancouver, ISO, etc.

Esfandiari, Hossein, Amin Karbasi, Abbas Mehrabian et Vahab Mirrokni. « Regret Bounds for Batched Bandits ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 8 (18 mai 2021) : 7340–48. http://dx.doi.org/10.1609/aaai.v35i8.16901.

Texte intégral

Résumé :

We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and provide the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.

Styles APA, Harvard, Vancouver, ISO, etc.

Chen, Cheng, Canzhe Zhao et Shuai Li. « Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model ». Proceedings of the AAAI Conference on Artificial Intelligence 36, n^o 6 (28 juin 2022) : 6202–10. http://dx.doi.org/10.1609/aaai.v36i6.20569.

Texte intégral

Résumé :

Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves O(log T) regret in the stochastic environment and O(m√nT) regret in the adversarial environment, where T is the number of rounds, n is the number of items and m is the number of positions. We also provide a lower bound of order Ω(m√nT) for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment.

Styles APA, Harvard, Vancouver, ISO, etc.

Wang, Lingda, Bingcong Li, Huozhi Zhou, Georgios B. Giannakis, Lav R. Varshney et Zhizhen Zhao. « Adversarial Linear Contextual Bandits with Graph-Structured Side Observations ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 11 (18 mai 2021) : 10156–64. http://dx.doi.org/10.1609/aaai.v35i11.17218.

Texte intégral

Résumé :

This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: contexts and side observations. In this setting, a learning agent repeatedly chooses from a set of K actions after being presented with a d-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on EXP3. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, EXP3-LGC-U, achieves a sub-linear regret with respect to the time horizon and the average independence number of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, EXP3-LGC-IX, is developed for a special class of problems, for which the regret is the same for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.

Styles APA, Harvard, Vancouver, ISO, etc.

Wachel, Pawel, et Cristian Rojas. « An Adversarial Approach to Adaptive Model Predictive Control ». Journal of Advances in Applied & ; Computational Mathematics 9 (19 septembre 2022) : 135–46. http://dx.doi.org/10.15377/2409-5761.2022.09.10.

Texte intégral

Résumé :

This paper presents a novel approach to introducing adaptation in Model Predictive Control (MPC). Assuming limited a priori knowledge about the process, we consider a finite set of possible models (a dictionary), and use the theory of adversarial multi-armed bandits to develop an adaptive version of MPC called adversarial adaptive MPC (AAMPC). Under weak assumptions on the dictionary components, we then establish theoretical bounds on the performance of AAMPC and show its empirical behaviour via simulation examples.

Styles APA, Harvard, Vancouver, ISO, etc.

Xu, Xiao, et Qing Zhao. « Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits ». IEEE Transactions on Signal Processing 69 (2021) : 2371–82. http://dx.doi.org/10.1109/tsp.2021.3070201.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Shi, Chengshuai, et Cong Shen. « On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications ». IEEE Journal on Selected Areas in Information Theory 2, n^o 2 (juin 2021) : 515–33. http://dx.doi.org/10.1109/jsait.2021.3076027.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Tae, Ki Hyun, Hantian Zhang, Jaeyoung Park, Kexin Rong et Steven Euijong Whang. « Falcon : Fair Active Learning Using Multi-Armed Bandits ». Proceedings of the VLDB Endowment 17, n^o 5 (janvier 2024) : 952–65. http://dx.doi.org/10.14778/3641204.3641207.

Texte intégral

Résumé :

Biased data can lead to unfair machine learning models, highlighting the importance of embedding fairness at the beginning of data analysis, particularly during dataset curation and labeling. In response, we propose Falcon, a scalable fair active learning framework. Falcon adopts a data-centric approach that improves machine learning model fairness via strategic sample selection. Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e.g., (attribute=female, label=positive)) that are the most informative for improving fairness. However, a challenge arises since these target groups are defined using ground truth labels that are not available during sample selection. To handle this, we propose a novel trial-and-error method, where we postpone using a sample if the predicted label is different from the expected one and falls outside the target group. We also observe the trade-off that selecting more informative samples results in higher likelihood of postponing due to undesired label prediction, and the optimal balance varies per dataset. We capture the trade-off between informativeness and postpone rate as policies and propose to automatically select the best policy using adversarial multi-armed bandit methods, given their computational efficiency and theoretical guarantees. Experiments show that Falcon significantly outperforms existing fair active learning approaches in terms of fairness and accuracy and is more efficient. In particular, only Falcon supports a proper trade-off between accuracy and fairness where its maximum fairness score is 1.8--4.5x higher than the second-best results.

Styles APA, Harvard, Vancouver, ISO, etc.

Plus de sources

Thèses sur le sujet "Adversarial bandits"

Maillard, Odalric-Ambrym. « APPRENTISSAGE SÉQUENTIEL : Bandits, Statistique et Renforcement ». Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00845410.

Texte intégral

Résumé :

Cette thèse traite des domaines suivant en Apprentissage Automatique: la théorie des Bandits, l'Apprentissage statistique et l'Apprentissage par renforcement. Son fil rouge est l'étude de plusieurs notions d'adaptation, d'un point de vue non asymptotique : à un environnement ou à un adversaire dans la partie I, à la structure d'un signal dans la partie II, à la structure de récompenses ou à un modèle des états du monde dans la partie III. Tout d'abord nous dérivons une analyse non asymptotique d'un algorithme de bandit à plusieurs bras utilisant la divergence de Kullback-Leibler. Celle-ci permet d'atteindre, dans le cas de distributions à support fini, la borne inférieure de performance asymptotique dépendante des distributions de probabilité connue pour ce problème. Puis, pour un bandit avec un adversaire possiblement adaptatif, nous introduisons des modèles dépendants de l'histoire et traduisant une possible faiblesse de l'adversaire et montrons comment en tirer parti pour concevoir des algorithmes adaptatifs à cette faiblesse. Nous contribuons au problème de la régression en montrant l'utilité des projections aléatoires, à la fois sur le plan théorique et pratique, lorsque l'espace d'hypothèses considéré est de dimension grande, voire infinie. Nous utilisons également des opérateurs d'échantillonnage aléatoires dans le cadre de la reconstruction parcimonieuse lorsque la base est loin d'être orthogonale. Enfin, nous combinons la partie I et II : pour fournir une analyse non-asymptotique d'algorithmes d'apprentissage par renforcement; puis, en amont du cadre des Processus Décisionnel de Markov, pour discuter du problème pratique du choix d'un bon modèle d'états.

Styles APA, Harvard, Vancouver, ISO, etc.

Aubert, Julien. « Théorie de l'estimation pour les processus d'apprentissage ». Electronic Thesis or Diss., Université Côte d'Azur, 2025. http://www.theses.fr/2025COAZ5001.

Texte intégral

Résumé :

Cette thèse examine le problème de l'estimation du processus d'apprentissage d'un individu au cours d'une tâche à partir des actions qu'il effectue. Cette question se situe à l'intersection de la cognition, des statistiques et de l'apprentissage par renforcement, et implique le développement de modèles qui capturent avec précision la dynamique de l'apprentissage, l'estimation des paramètres des modèles et la sélection du modèle le mieux adapté. L'une des principales difficultés réside dans le fait que l'apprentissage, par sa nature même, conduit à des données non indépendantes et non stationnaires, puisque l'individu ajuste son processus de décision en fonction du résultat de ses choix précédents. Les théories et méthodes statistiques existantes sont bien établies pour les données indépendantes et stationnaires, mais leur application à un cadre d'apprentissage introduit des défis inédits. L'enjeu de cette thèse est de développer une théorie statistique capable d'expliquer et de justifier les méthodes empiriques existantes. Je commence par explorer les propriétés de l'estimateur du maximum de vraisemblance sur un modèle d'apprentissage fondé sur un problème de bandit. Je présente ensuite des résultats théoriques généraux sur la sélection de modèles à partir d'un critère de log-vraisemblance pénalisée pour des données non stationnaires et dépendantes. Ces résultats nécessitent le développement d'une nouvelle inégalité de concentration pour des suprema de processus renormalisés. Je présente également une procédure de hold-out et des garanties théoriques pour cette procédure dans un cadre d'apprentissage. Ces résultats théoriques sont étayés par des applications sur des données synthétiques et sur des expériences cognitives réelles en psychologie et en éthologie
This thesis considers the problem of estimating the learning process of an individual during a task based on observed choices or actions of that individual. This question lies at the intersection of cognition, statistics, and reinforcement learning, and involves developing models that accurately capture the dynamics of learning, estimating model parameters, and selecting the best-fitting model. A key difficulty is that learning, by nature, leads to non-independent and non-stationary data, as the individual selects its actions depending on the outcome of its previous choices.Existing statistical theories and methods are well-established for independent and stationary data, but their application to a learning framework introduces significant challenges. This thesis seeks to bridge the gap between empirical methods and theoretical guarantees in computational modeling. I first explore the properties of maximum likelihood estimation on a model of learning based on a bandit problem. I then present general theoretical results on penalized log-likelihood model selection for non-stationary and dependent data, for which I develop a new concentration inequality for the suprema of renormalized processes. I also introduce a hold-out procedure and theoretical guarantees for it in a learning framework. These theoretical results are supported with applications on synthetic data and on real cognitive experiments in psychology and ethology

Styles APA, Harvard, Vancouver, ISO, etc.

Livres sur le sujet "Adversarial bandits"

Parsons, Dave. Bandits ! : Pictorial history of American adversarial aircraft. Osceola, WI : Motorbooks International, 1993.

Trouver le texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Nelson, Derek, et Dave Parsons. Bandits ! : Pictorial History of American Adversarial Aircraft. Motorbooks Intl, 1993.

Trouver le texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Chapitres de livres sur le sujet "Adversarial bandits"

Li, Yandi, et Jianxiong Guo. « A Modified EXP3 in Adversarial Bandits with Multi-user Delayed Feedback ». Dans Lecture Notes in Computer Science, 263–78. Cham : Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-49193-1_20.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Zheng, Rong, et Cunqing Hua. « Adversarial Multi-armed Bandit ». Dans Wireless Networks, 41–57. Cham : Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-50502-2_4.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

St-Pierre, David L., et Olivier Teytaud. « Sharing Information in Adversarial Bandit ». Dans Applications of Evolutionary Computation, 386–98. Berlin, Heidelberg : Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-45523-4_32.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Uchiya, Taishi, Atsuyoshi Nakamura et Mineichi Kudo. « Algorithms for Adversarial Bandit Problems with Multiple Plays ». Dans Lecture Notes in Computer Science, 375–89. Berlin, Heidelberg : Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-16108-7_30.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Lee, Chia-Jung, Yalei Yang, Sheng-Hui Meng et Tien-Wen Sung. « Adversarial Multiarmed Bandit Problems in Gradually Evolving Worlds ». Dans Advances in Smart Vehicular Technology, Transportation, Communication and Applications, 305–11. Cham : Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-70730-3_36.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

« Exp3 for Adversarial Linear Bandits ». Dans Bandit Algorithms, 278–85. Cambridge University Press, 2020. http://dx.doi.org/10.1017/9781108571401.034.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

« The Relation between Adversarial and Stochastic Linear Bandits ». Dans Bandit Algorithms, 306–12. Cambridge University Press, 2020. http://dx.doi.org/10.1017/9781108571401.036.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Srisawad, Phurinut, Juergen Branke et Long Tran-Thanh. « Identifying the Best Arm in the Presence of Global Environment Shifts ». Dans Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia240735.

Texte intégral

Résumé :

This paper formulates a new Best-Arm Identification problem in the non-stationary stochastic bandits setting, where the means of all arms are shifted in the same way due to a global influence of the environment. The aim is to identify the unique best arm across environmental change given a fixed total budget. While this setting can be regarded as a special case of Adversarial Bandits or Corrupted Bandits, we demonstrate that existing solutions tailored to those settings do not fully utilise the nature of this global influence, and thus, do not work well in practice (despite their theoretical guarantees). To overcome this issue, in this paper we develop a novel selection policy that is consistent and robust in dealing with global environmental shifts. We then propose an allocation policy, LinLUCB, which exploits information about global shifts across all arms in each environment. Empirical tests depict a significant improvement in our policies against other existing methods.

Styles APA, Harvard, Vancouver, ISO, etc.

Wissow, Stephen, et Masataro Asai. « Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning ». Dans Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia240994.

Texte intégral

Résumé :

Balancing exploration and exploitation has been an important problem in both adversarial games and automated planning. While it has been extensively analyzed in the Multi-Armed Bandit (MAB) literature, and the game community has achieved great success with MAB-based Monte Carlo Tree Search (MCTS) methods, the planning community has struggled to advance in this area. We describe how Upper Confidence Bound 1’s (UCB1’s) assumption of reward distributions with known bounded support shared among siblings (arms) is violated when MCTS/Trial-based Heuristic Tree Search (THTS) in previous work uses heuristic values of search nodes in classical planning problems as rewards. To address this issue, we propose a new Gaussian bandit, UCB1-Normal2, and analyze its regret bound. It is variance-aware like UCB1-Normal and UCB-V, but has a distinct advantage: it neither shares UCB-V’s assumption of known bounded support nor relies on UCB1-Normal’s conjectures on Student’s t and χ2 distributions. Our theoretical analysis predicts that UCB1-Normal2 will perform well when the estimated variance is accurate, which can be expected in deterministic, discrete, finite state-space search, as in classical planning. Our empirical evaluation confirms that MCTS combined with UCB1-Normal2 outperforms Greedy Best First Search (traditional baseline) as well as MCTS with other bandits.

Styles APA, Harvard, Vancouver, ISO, etc.

Actes de conférences sur le sujet "Adversarial bandits"

Huang, Yin, Qingsong Liu et Jie Xu. « Adversarial Combinatorial Bandits with Switching Cost and Arm Selection Constraints ». Dans IEEE INFOCOM 2024 - IEEE Conference on Computer Communications, 371–80. IEEE, 2024. http://dx.doi.org/10.1109/infocom52122.2024.10621364.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Li, Jinpeng, Yunni Xia, Xiaoning Sun, Peng Chen, Xiaobo Li et Jiafeng Feng. « Delay-Aware Service Caching in Edge Cloud : An Adversarial Semi-Bandits Learning-Based Approach ». Dans 2024 IEEE 17th International Conference on Cloud Computing (CLOUD), 411–18. IEEE, 2024. http://dx.doi.org/10.1109/cloud62652.2024.00053.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

La-aiddee, Panithan, Paramin Sangwongngam, Lunchakorn Wuttisittikulkij et Pisit Vanichchanunt. « A Generative Adversarial Network-Based Approach for Reflective-Metasurface Unit-Cell Synthesis in mmWave Bands ». Dans 2024 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC), 1–5. IEEE, 2024. http://dx.doi.org/10.1109/itc-cscc62988.2024.10628337.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Immorlica, Nicole, Karthik Abinav Sankararaman, Robert Schapire et Aleksandrs Slivkins. « Adversarial Bandits with Knapsacks ». Dans 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2019. http://dx.doi.org/10.1109/focs.2019.00022.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Lykouris, Thodoris, Vahab Mirrokni et Renato Paes Leme. « Stochastic bandits robust to adversarial corruptions ». Dans STOC '18 : Symposium on Theory of Computing. New York, NY, USA : ACM, 2018. http://dx.doi.org/10.1145/3188745.3188918.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Wan, Zongqi, Xiaoming Sun et Jialin Zhang. « Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback ». Dans Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California : International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/486.

Texte intégral

Résumé :

We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into d components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest d rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder nonoblivious setting. We show nonoblivious setting incurs Omega(T) pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys o(T) policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for K armed bandit and bandit convex optimization, our policy regret bound is in the order of T to the two third. We also prove a matching lower bound for K armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is nonoblivious. It answers the open problem proposed in [Wang, Wang, Huang 2021], showing that nonoblivious delay is enough to incur the regret in the order of T to the two third.

Styles APA, Harvard, Vancouver, ISO, etc.

Bande, Meghana, et Venugopal V. Veeravalli. « Adversarial Multi-user Bandits for Uncoordinated Spectrum Access ». Dans ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8682263.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Han, Shuguang, Michael Bendersky, Przemek Gajda, Sergey Novikov, Marc Najork, Bernhard Brodowsky et Alexandrin Popescul. « Adversarial Bandits Policy for Crawling Commercial Web Content ». Dans WWW '20 : The Web Conference 2020. New York, NY, USA : ACM, 2020. http://dx.doi.org/10.1145/3366423.3380125.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Howard, William W., Anthony F. Martone et R. Michael Buehrer. « Adversarial Multi-Player Bandits for Cognitive Radar Networks ». Dans 2022 IEEE Radar Conference (RadarConf22). IEEE, 2022. http://dx.doi.org/10.1109/radarconf2248738.2022.9764226.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Rangi, Anshuka, Massimo Franceschetti et Long Tran-Thanh. « Unifying the Stochastic and the Adversarial Bandits with Knapsack ». Dans Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California : International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/459.

Texte intégral

Résumé :

This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of the learning algorithm. We propose a novel algorithm EXP3.BwK and show that the expected regret of the algorithm is order optimal in the budget. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions. These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature.

Styles APA, Harvard, Vancouver, ISO, etc.

Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!