Добірка наукової літератури з теми "Adversarial bandits"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Adversarial bandits".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Adversarial bandits"

1

Lu, Shiyin, Guanghui Wang, and Lijun Zhang. "Stochastic Graphical Bandits with Adversarial Corruptions." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (2021): 8749–57. http://dx.doi.org/10.1609/aaai.v35i10.17060.

Повний текст джерела
Анотація:
We study bandits with graph-structured feedback, where a learner repeatedly selects an arm and then observes rewards of the chosen arm as well as its neighbors in the feedback graph. Existing work on graphical bandits assumes either stochastic rewards or adversarial rewards, both of which are extremes and appear rarely in real-world scenarios. In this paper, we study graphical bandits with a reward model that interpolates between the two extremes, where the rewards are overall stochastically generated but a small fraction of them can be adversarially corrupted. For this problem, we propose an
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Pacchiano, Aldo, Heinrich Jiang, and Michael I. Jordan. "Robustness Guarantees for Mode Estimation with an Application to Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (2021): 9277–84. http://dx.doi.org/10.1609/aaai.v35i10.17119.

Повний текст джерела
Анотація:
Mode estimation is a classical problem in statistics with a wide range of applications in machine learning. Despite this, there is little understanding in its robustness properties under possibly adversarial data contamination. In this paper, we give precise robustness guarantees as well as privacy guarantees under simple randomization. We then introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We prove regret guarantees for the problems of top arm identification, top m-arms identification, contextual modal bandits, and in
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Wang, Zhiwei, Huazheng Wang, and Hongning Wang. "Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15770–77. http://dx.doi.org/10.1609/aaai.v38i14.29506.

Повний текст джерела
Анотація:
Adversarial attacks against stochastic multi-armed bandit (MAB) algorithms have been extensively studied in the literature. In this work, we focus on reward poisoning attacks and find most existing attacks can be easily detected by our proposed detection method based on the test of homogeneity, due to their aggressive nature in reward manipulations. This motivates us to study the notion of stealthy attack against stochastic MABs and investigate the resulting attackability. Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealt
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Esfandiari, Hossein, Amin Karbasi, Abbas Mehrabian, and Vahab Mirrokni. "Regret Bounds for Batched Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (2021): 7340–48. http://dx.doi.org/10.1609/aaai.v35i8.16901.

Повний текст джерела
Анотація:
We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and provide the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Chen, Cheng, Canzhe Zhao, and Shuai Li. "Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (2022): 6202–10. http://dx.doi.org/10.1609/aaai.v36i6.20569.

Повний текст джерела
Анотація:
Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regular
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Wang, Lingda, Bingcong Li, Huozhi Zhou, Georgios B. Giannakis, Lav R. Varshney, and Zhizhen Zhao. "Adversarial Linear Contextual Bandits with Graph-Structured Side Observations." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (2021): 10156–64. http://dx.doi.org/10.1609/aaai.v35i11.17218.

Повний текст джерела
Анотація:
This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: contexts and side observations. In this setting, a learning agent repeatedly chooses from a set of K actions after being presented with a d-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Wachel, Pawel, and Cristian Rojas. "An Adversarial Approach to Adaptive Model Predictive Control." Journal of Advances in Applied & Computational Mathematics 9 (September 19, 2022): 135–46. http://dx.doi.org/10.15377/2409-5761.2022.09.10.

Повний текст джерела
Анотація:
This paper presents a novel approach to introducing adaptation in Model Predictive Control (MPC). Assuming limited a priori knowledge about the process, we consider a finite set of possible models (a dictionary), and use the theory of adversarial multi-armed bandits to develop an adaptive version of MPC called adversarial adaptive MPC (AAMPC). Under weak assumptions on the dictionary components, we then establish theoretical bounds on the performance of AAMPC and show its empirical behaviour via simulation examples.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Xu, Xiao, and Qing Zhao. "Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits." IEEE Transactions on Signal Processing 69 (2021): 2371–82. http://dx.doi.org/10.1109/tsp.2021.3070201.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Shi, Chengshuai, and Cong Shen. "On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications." IEEE Journal on Selected Areas in Information Theory 2, no. 2 (2021): 515–33. http://dx.doi.org/10.1109/jsait.2021.3076027.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Tae, Ki Hyun, Hantian Zhang, Jaeyoung Park, Kexin Rong, and Steven Euijong Whang. "Falcon: Fair Active Learning Using Multi-Armed Bandits." Proceedings of the VLDB Endowment 17, no. 5 (2024): 952–65. http://dx.doi.org/10.14778/3641204.3641207.

Повний текст джерела
Анотація:
Biased data can lead to unfair machine learning models, highlighting the importance of embedding fairness at the beginning of data analysis, particularly during dataset curation and labeling. In response, we propose Falcon, a scalable fair active learning framework. Falcon adopts a data-centric approach that improves machine learning model fairness via strategic sample selection. Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e.g., (attribute=female, label=positive)) that are the most informative for improving fairness. However, a challenge arise
Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Adversarial bandits"

1

Maillard, Odalric-Ambrym. "APPRENTISSAGE SÉQUENTIEL : Bandits, Statistique et Renforcement." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00845410.

Повний текст джерела
Анотація:
Cette thèse traite des domaines suivant en Apprentissage Automatique: la théorie des Bandits, l'Apprentissage statistique et l'Apprentissage par renforcement. Son fil rouge est l'étude de plusieurs notions d'adaptation, d'un point de vue non asymptotique : à un environnement ou à un adversaire dans la partie I, à la structure d'un signal dans la partie II, à la structure de récompenses ou à un modèle des états du monde dans la partie III. Tout d'abord nous dérivons une analyse non asymptotique d'un algorithme de bandit à plusieurs bras utilisant la divergence de Kullback-Leibler. Celle-ci perm
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Aubert, Julien. "Théorie de l'estimation pour les processus d'apprentissage." Electronic Thesis or Diss., Université Côte d'Azur, 2025. http://www.theses.fr/2025COAZ5001.

Повний текст джерела
Анотація:
Cette thèse examine le problème de l'estimation du processus d'apprentissage d'un individu au cours d'une tâche à partir des actions qu'il effectue. Cette question se situe à l'intersection de la cognition, des statistiques et de l'apprentissage par renforcement, et implique le développement de modèles qui capturent avec précision la dynamique de l'apprentissage, l'estimation des paramètres des modèles et la sélection du modèle le mieux adapté. L'une des principales difficultés réside dans le fait que l'apprentissage, par sa nature même, conduit à des données non indépendantes et non stationna
Стилі APA, Harvard, Vancouver, ISO та ін.

Книги з теми "Adversarial bandits"

1

Parsons, Dave. Bandits!: Pictorial history of American adversarial aircraft. Motorbooks International, 1993.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Nelson, Derek, and Dave Parsons. Bandits!: Pictorial History of American Adversarial Aircraft. Motorbooks Intl, 1993.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Adversarial bandits"

1

Li, Yandi, and Jianxiong Guo. "A Modified EXP3 in Adversarial Bandits with Multi-user Delayed Feedback." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-49193-1_20.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Zheng, Rong, and Cunqing Hua. "Adversarial Multi-armed Bandit." In Wireless Networks. Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-50502-2_4.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

St-Pierre, David L., and Olivier Teytaud. "Sharing Information in Adversarial Bandit." In Applications of Evolutionary Computation. Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-45523-4_32.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Uchiya, Taishi, Atsuyoshi Nakamura, and Mineichi Kudo. "Algorithms for Adversarial Bandit Problems with Multiple Plays." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-16108-7_30.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Lee, Chia-Jung, Yalei Yang, Sheng-Hui Meng, and Tien-Wen Sung. "Adversarial Multiarmed Bandit Problems in Gradually Evolving Worlds." In Advances in Smart Vehicular Technology, Transportation, Communication and Applications. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-70730-3_36.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

"Exp3 for Adversarial Linear Bandits." In Bandit Algorithms. Cambridge University Press, 2020. http://dx.doi.org/10.1017/9781108571401.034.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

"The Relation between Adversarial and Stochastic Linear Bandits." In Bandit Algorithms. Cambridge University Press, 2020. http://dx.doi.org/10.1017/9781108571401.036.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Srisawad, Phurinut, Juergen Branke, and Long Tran-Thanh. "Identifying the Best Arm in the Presence of Global Environment Shifts." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia240735.

Повний текст джерела
Анотація:
This paper formulates a new Best-Arm Identification problem in the non-stationary stochastic bandits setting, where the means of all arms are shifted in the same way due to a global influence of the environment. The aim is to identify the unique best arm across environmental change given a fixed total budget. While this setting can be regarded as a special case of Adversarial Bandits or Corrupted Bandits, we demonstrate that existing solutions tailored to those settings do not fully utilise the nature of this global influence, and thus, do not work well in practice (despite their theoretical guarantees). To overcome this issue, in this paper we develop a novel selection policy that is consistent and robust in dealing with global environmental shifts. We then propose an allocation policy, LinLUCB, which exploits information about global shifts across all arms in each environment. Empirical tests depict a significant improvement in our policies against other existing methods.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Wissow, Stephen, and Masataro Asai. "Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia240994.

Повний текст джерела
Анотація:
Balancing exploration and exploitation has been an important problem in both adversarial games and automated planning. While it has been extensively analyzed in the Multi-Armed Bandit (MAB) literature, and the game community has achieved great success with MAB-based Monte Carlo Tree Search (MCTS) methods, the planning community has struggled to advance in this area. We describe how Upper Confidence Bound 1’s (UCB1’s) assumption of reward distributions with known bounded support shared among siblings (arms) is violated when MCTS/Trial-based Heuristic Tree Search (THTS) in previous work uses heuristic values of search nodes in classical planning problems as rewards. To address this issue, we propose a new Gaussian bandit, UCB1-Normal2, and analyze its regret bound. It is variance-aware like UCB1-Normal and UCB-V, but has a distinct advantage: it neither shares UCB-V’s assumption of known bounded support nor relies on UCB1-Normal’s conjectures on Student’s t and χ2 distributions. Our theoretical analysis predicts that UCB1-Normal2 will perform well when the estimated variance is accurate, which can be expected in deterministic, discrete, finite state-space search, as in classical planning. Our empirical evaluation confirms that MCTS combined with UCB1-Normal2 outperforms Greedy Best First Search (traditional baseline) as well as MCTS with other bandits.
Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Adversarial bandits"

1

Huang, Yin, Qingsong Liu, and Jie Xu. "Adversarial Combinatorial Bandits with Switching Cost and Arm Selection Constraints." In IEEE INFOCOM 2024 - IEEE Conference on Computer Communications. IEEE, 2024. http://dx.doi.org/10.1109/infocom52122.2024.10621364.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Li, Jinpeng, Yunni Xia, Xiaoning Sun, Peng Chen, Xiaobo Li, and Jiafeng Feng. "Delay-Aware Service Caching in Edge Cloud: An Adversarial Semi-Bandits Learning-Based Approach." In 2024 IEEE 17th International Conference on Cloud Computing (CLOUD). IEEE, 2024. http://dx.doi.org/10.1109/cloud62652.2024.00053.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

La-aiddee, Panithan, Paramin Sangwongngam, Lunchakorn Wuttisittikulkij, and Pisit Vanichchanunt. "A Generative Adversarial Network-Based Approach for Reflective-Metasurface Unit-Cell Synthesis in mmWave Bands." In 2024 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC). IEEE, 2024. http://dx.doi.org/10.1109/itc-cscc62988.2024.10628337.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Immorlica, Nicole, Karthik Abinav Sankararaman, Robert Schapire, and Aleksandrs Slivkins. "Adversarial Bandits with Knapsacks." In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2019. http://dx.doi.org/10.1109/focs.2019.00022.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Lykouris, Thodoris, Vahab Mirrokni, and Renato Paes Leme. "Stochastic bandits robust to adversarial corruptions." In STOC '18: Symposium on Theory of Computing. ACM, 2018. http://dx.doi.org/10.1145/3188745.3188918.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Wan, Zongqi, Xiaoming Sun, and Jialin Zhang. "Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/486.

Повний текст джерела
Анотація:
We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into d components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest d rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder nonoblivious setting. We show nonoblivious setting incurs Omega(T) pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys o(T) policy regre
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Bande, Meghana, and Venugopal V. Veeravalli. "Adversarial Multi-user Bandits for Uncoordinated Spectrum Access." In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8682263.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Han, Shuguang, Michael Bendersky, Przemek Gajda, et al. "Adversarial Bandits Policy for Crawling Commercial Web Content." In WWW '20: The Web Conference 2020. ACM, 2020. http://dx.doi.org/10.1145/3366423.3380125.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Howard, William W., Anthony F. Martone, and R. Michael Buehrer. "Adversarial Multi-Player Bandits for Cognitive Radar Networks." In 2022 IEEE Radar Conference (RadarConf22). IEEE, 2022. http://dx.doi.org/10.1109/radarconf2248738.2022.9764226.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Rangi, Anshuka, Massimo Franceschetti, and Long Tran-Thanh. "Unifying the Stochastic and the Adversarial Bandits with Knapsack." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/459.

Повний текст джерела
Анотація:
This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!