Log in

Relevant bibliographies by topics / Epsilon greedy / Journal articles

To see the other types of publications on this topic, follow the link: Epsilon greedy.

Journal articles on the topic 'Epsilon greedy'

Author: Grafiati

Published: 5 June 2025

Last updated: 2 August 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Epsilon greedy.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Liu, Yang, Qiuyu Lu, Zhenfan Yu, Yue Chen, and Yinguo Yang. "Reinforcement Learning-Enhanced Adaptive Scheduling of Battery Energy Storage Systems in Energy Markets." Energies 17, no. 21 (2024): 5425. http://dx.doi.org/10.3390/en17215425.

Full text

Abstract:

Battery Energy Storage Systems (BESSs) play a vital role in modern power grids by optimally dispatching energy according to the price signal. This paper proposes a reinforcement learning-based model that optimizes BESS scheduling with the proposed Q-learning algorithm combined with an epsilon-greedy strategy. The proposed epsilon-greedy strategy-based Q-learning algorithm can efficiently manage energy dispatching under uncertain price signals and multi-day operations without retraining. Simulations are conducted under different scenarios, considering electricity price fluctuations and battery

APA, Harvard, Vancouver, ISO, and other styles

2

Kyoung, Dohyun, and Yunsick Sung. "Transformer Decoder-Based Enhanced Exploration Method to Alleviate Initial Exploration Problems in Reinforcement Learning." Sensors 23, no. 17 (2023): 7411. http://dx.doi.org/10.3390/s23177411.

Full text

Abstract:

In reinforcement learning, the epsilon (ε)-greedy strategy is commonly employed as an exploration technique This method, however, leads to extensive initial exploration and prolonged learning periods. Existing approaches to mitigate this issue involve constraining the exploration range using expert data or utilizing pretrained models. Nevertheless, these methods do not effectively reduce the initial exploration range, as the exploration by the agent is limited to states adjacent to those included in the expert data. This paper proposes a method to reduce the initial exploration range in reinfo

APA, Harvard, Vancouver, ISO, and other styles

3

KURNIAWATI, NAZMIA, YULI KURNIA NINGSIH, SOFIA DEBI PUSPA, and TRI SWASONO ADI. "Algoritma Epsilon Greedy pada Reinforcement Learning untuk Modulasi Adaptif Komunikasi Vehicle to Infrastructure (V2I)." ELKOMIKA: Jurnal Teknik Energi Elektrik, Teknik Telekomunikasi, & Teknik Elektronika 9, no. 3 (2021): 716. http://dx.doi.org/10.26760/elkomika.v9i3.716.

Full text

Abstract:

ABSTRAKKomunikasi Vehicle to Infrastructure (V2I) memungkinkan kendaraan dapat terhubung ke berbagai macam infrastruktur. Dengan kondisi kendaraan yang bergerak, maka kondisi lingkungan yang dilewati mempengaruhi parameter komunikasi. Implementasi modulasi adaptif pada skema V2I memperbolehkan sistem menggunakan skema modulasi yang berbeda untuk mengakomodasi perubahan kondisi lingkungan. Pada penelitian ini digunakan skema modulasi QPSK, 8PSK, dan 16-QAM dengan memanfaatkan reinforcement learning dan algoritma epsilon greedy untuk menentukan skema modulasi yang digunakan berdasarkan level AWG

APA, Harvard, Vancouver, ISO, and other styles

4

Liu, Zizhuo. "Investigation of progress and application related to Multi-Armed Bandit algorithms." Applied and Computational Engineering 37, no. 1 (2024): 155–59. http://dx.doi.org/10.54254/2755-2721/37/20230496.

Full text

Abstract:

This paper discusses four Multi-armed Bandit algorithms: Explore-then-Commit (ETC), Epsilon-Greedy, Upper Confidence Bound (UCB), and Thompson Sampling algorithm. ETC algorithm aims to spend the majority of rounds on the best arm, but it can lead to a suboptimal outcome if the environment changes rapidly. The Epsilon-Greedy algorithm is designed to explore and exploit simultaneously, while it often tries sub-optimal arm even after the algorithm finds the best arm. Thus, the Epsilon-Greedy algorithm performs well when the environment continuously changes. UCB algorithm is one of the most used M

APA, Harvard, Vancouver, ISO, and other styles

5

Yashiki, Koudai, Masayuki Wajima, Takashi Kawakami, Takahumi Oohori, and Masahiro Kinoshita. "2A1-J10 The group behavior using a epsilon-greedy." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2007 (2007): _2A1—J10_1—_2A1—J10_2. http://dx.doi.org/10.1299/jsmermd.2007._2a1-j10_1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Liu, Qiaojia. "Optimizing Short and Long Term Investment Returns Using Multi-Armed Slot Machine Algorithms." Applied and Computational Engineering 83, no. 1 (2024): 110–19. http://dx.doi.org/10.54254/2755-2721/83/2024glg0068.

Full text

Abstract:

This study focuses on comparing the effectiveness of UCB, Thompson sampling and Epsilon-Greedy algorithms in multi-armed slot machine algorithms for short-term and long-term investment return optimization in financial markets. This analysis examines stock performance data from Tesla, General Motors, and Ford over specific periods: five years of weekly data (2019-2024) and six months of daily data (February to August 2024). From the results, it is shown that for long-term investments, Over the five-year period, Thompson Sampling outperformed UCB and Epsilon-Greedy algorithms in terms of stabili

APA, Harvard, Vancouver, ISO, and other styles

7

Dell'Aversana, Paolo. "Reinforcement learning in optimization problems. Applications to geophysical data inversion." AIMS Geosciences 8, no. 3 (2022): 488–502. http://dx.doi.org/10.3934/geosci.2022027.

Full text

Abstract:

<abstract> <p>In this paper, we introduce a novel inversion methodology that combines the benefits offered by Reinforcement-Learning techniques with the advantages of the Epsilon-Greedy method for an expanded exploration of the model space. Among the various Reinforcement Learning approaches, we applied the set of algorithms included in the category of the Q-Learning methods. We show that the Temporal Difference algorithm offers an effective iterative approach that allows finding an optimal solution in geophysical inverse problems. Furthermore, the Epsilon-Greedy method properly co

APA, Harvard, Vancouver, ISO, and other styles

8

You, Xinhong, Pengping Zhang, Minglin Liu, Lingqi Lin, and Shuai Li. "Epsilon-Greedy-Based MQTT QoS Mode Selection and Power Control Algorithm for Power Distribution IoT." International Journal of Mobile Computing and Multimedia Communications 14, no. 1 (2023): 1–18. http://dx.doi.org/10.4018/ijmcmc.306976.

Full text

Abstract:

Employing message queuing telemetry transport (MQTT) in the power distribution internet of things (PD-IoT) can meet the demands of reliable data transmission while significantly reducing energy consumption through the dynamic and flexible selection of three different quality of service (QoS) modes and power control. However, there are still some challenges, including incomplete information, coupling of optimization variables, and dynamic tradeoff between packet-loss ratio and energy consumption. In this paper, the authors propose a joint optimization algorithm named EMMA for MQTT QoS mode sele

APA, Harvard, Vancouver, ISO, and other styles

9

Zhang, Lingxiang. "Analyzing the strengths and weaknesses of diverse algorithms for solving Multi-Armed Bandit problems using Python." Applied and Computational Engineering 68, no. 1 (2024): 205–14. http://dx.doi.org/10.54254/2755-2721/68/20241407.

Full text

Abstract:

With the rapid advancement of science and technology, the internet has become an integral part of daily life, revolutionizing how people access information and make decisions. In this context, algorithms play a pivotal role in helping individuals make informed choices tailored to their preferences across various domains. Utilizing the MovieLens dataset (https://grouplens.org/datasets/movielens/1m/), which contains a rich compilation of movie ratings and metadata, this study conducts a thorough analysis using Python to assess the performance of four distinct algorithms: Explore-then-Commit (ETC

APA, Harvard, Vancouver, ISO, and other styles

10

Malon, Krzysztof. "Evaluation of Radio Channel Utility using Epsilon-Greedy Action Selection." Journal of Telecommunictions and Information Technology 3, no. 2021 (2021): 10–17. http://dx.doi.org/10.26636/jtit.2021.153621.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Tran, T. D., and A. E. Koucheryavy. "Resource Optimization of Airborne Base Stations Using Artificial Intelligence Methods." Proceedings of Telecommunication Universities 11, no. 1 (2025): 62–68. https://doi.org/10.31854/1813-324x-2025-11-1-62-68.

Full text

Abstract:

In remote areas and disaster-stricken regions, unmanned aerial vehicles (UAVs) can serve as base stations, providing wireless communication to ground users. Due to their high mobility, low cost, and rapid deployment and retrieval capabilities, UAVs can continuously adjust their position in three-dimensional (3D) space, improving wireless connectivity and enhancing data transmission rates. In this paper, we investigate the problem of ABS (Aerial Base Station) deployment in 3D space and power allocation with the aim of maximizing the data transmission rate in the system. To address this non-conv

APA, Harvard, Vancouver, ISO, and other styles

12

Tian, Chuan. "Monte-Carlo tree search with Epsilon-Greedy for game of amazons." Applied and Computational Engineering 6, no. 1 (2023): 904–9. http://dx.doi.org/10.54254/2755-2721/6/20230956.

Full text

Abstract:

The Game of the Amazons is an abstract strategy board game. It has a high computational complexity similar to the game of Go. Due to its NP-complete nature and large branching factor of game tree, finding the optimal move given a specific game state is infeasible and it is not trivial to design a computer algorithm that is competitive to an expert in the game of amazons. One way to tackle this problem is to leverage the Monte-Carlo Tree Search by using random simulations. In this article, a computationally cheap heuristic function is proposed and use together with Monte-Carlo Tree Search algor

APA, Harvard, Vancouver, ISO, and other styles

13

Yu, Junpu. "Thompson -Greedy Algorithm: An Improvement to the Regret of Thompson Sampling and -Greedy on Multi-Armed Bandit Problems." Applied and Computational Engineering 8, no. 1 (2023): 525–34. http://dx.doi.org/10.54254/2755-2721/8/20230264.

Full text

Abstract:

The multi-armed bandit problem is one of the most classic reinforcement learning problems, aiming to find balanced decisions of exploration and exploitation and to increase the total reward of the actions from each round. To solve multi-armed bandit problems, algorithms were designed, including some of the most typical and widely used ones, like the Explore-Then-Commit algorithm, Upper Confidence Bound algorithm, Epsilon-Greedy algorithm, and Thompson Sampling algorithm. Some of them are improvements upon others, while all of them seek to increase total reward but contain specific weaknesses.

APA, Harvard, Vancouver, ISO, and other styles

14

Fei, Bo. "Comparative analysis and applications of classic multi-armed bandit algorithms and their variants." Applied and Computational Engineering 68, no. 1 (2024): 17–30. http://dx.doi.org/10.54254/2755-2721/68/20241389.

Full text

Abstract:

The multi-armed bandit problem, a pivotal aspect of Reinforcement Learning (RL), presents a classic dilemma in sequential decision-making, balancing exploration with exploitation. Renowned bandit algorithms like Explore-Then-Commit, Epsilon-Greedy, SoftMax, Upper Confidence Bound (UCB), and Thompson Sampling have demonstrated efficacy in addressing this issue. Nevertheless, each algorithm exhibits unique strengths and weaknesses, necessitating a detailed comparative evaluation. This paper executes a series of implementations of various established bandit algorithms and their derivatives, aimin

APA, Harvard, Vancouver, ISO, and other styles

15

N, Hariharan, and Paavai Anand G. "A Brief Study of Deep Reinforcement Learning with Epsilon-Greedy Exploration." International Journal of Computing and Digital Systems 11, no. 1 (2022): 541–51. http://dx.doi.org/10.12785/ijcds/110144.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Gu, Jiahao. "Assessing the robustness of Multi-Armed Bandit algorithms against biased initialization." Applied and Computational Engineering 54, no. 1 (2024): 213–18. http://dx.doi.org/10.54254/2755-2721/54/20241586.

Full text

Abstract:

The robustness of Multi-Armed Bandit (MAB) algorithms forms a cornerstone of the efficacy of contemporary recommender systems. This study provides a comparative analysis of four widely-adopted MAB algorithmsEpsilon Greedy, Explore Then Commit (ETC), Upper Confidence Bound (UCB1), and Thompson Samplingunder the influence of biased initialization. Conducted in a simulated environment that mirrors practical recommender scenarios, the study examines the adaptive responses of these algorithms over time, quantifying their performance using cumulative regret as a primary metric. Our findings indicate

APA, Harvard, Vancouver, ISO, and other styles

17

Senthil Kumar, S., Nada Alzaben, A. Sridevi, and V. Ranjith. "Improving Quality of Service (QoS) in Wireless Multimedia Sensor Networks using Epsilon Greedy Strategy." Measurement Science Review 24, no. 3 (2024): 113–17. http://dx.doi.org/10.2478/msr-2024-0016.

Full text

Abstract:

Abstract Wireless Multimedia Sensor Networks (WMSNs) are networks consisting of sensors that have limitations in terms of memory, computational power, bandwidth and battery life. Multimedia transmission using Wireless Sensor Network (WSN) is a difficult task because certain Quality of Service (QoS) guarantees are required. These guarantees include a large quantity of bandwidth, rigorous latency requirements, improved packet delivery and lower loss ratio. The main area of research would be to investigate the process of greedy techniques that could be modified to guarantee QoS provisioning for m

APA, Harvard, Vancouver, ISO, and other styles

18

Firman, Daru April, Hartomo Kristoko Dwi, and Purnomo Hindriyanto Dwi. "IPv6 flood attack detection based on epsilon greedy optimized Q learning in single board computer." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 5 (2023): 5782–91. https://doi.org/10.11591/ijece.v13i5.pp5782-5791.

Full text

Abstract:

Internet of things is a technology that allows communication between devices within a network. Since this technology depends on a network to communicate, the vulnerability of the exposed devices increased significantly. Furthermore, the use of internet protocol version 6 (IPv6) as the successor to internet protocol version 4 (IPv4) as a communication protocol constituted a significant problem for the network. Hence, this protocol was exploitable for flooding attacks in the IPv6 network. As a countermeasure against the flood, this study designed an IPv6 flood attack detection by using epsilon g

APA, Harvard, Vancouver, ISO, and other styles

19

Daru, April Firman, Kristoko Dwi Hartomo, and Hindriyanto Dwi Purnomo. "IPv6 flood attack detection based on epsilon greedy optimized Q learning in single board computer." International Journal of Electrical and Computer Engineering (IJECE) 13, no. 5 (2023): 5782. http://dx.doi.org/10.11591/ijece.v13i5.pp5782-5791.

Full text

Abstract:

<span lang="EN-US">Internet of things is a technology that allows communication between devices within a network. Since this technology depends on a network to communicate, the vulnerability of the exposed devices increased significantly. Furthermore, the use of internet protocol version 6 (IPv6) as the successor to internet protocol version 4 (IPv4) as a communication protocol constituted a significant problem for the network. Hence, this protocol was exploitable for flooding attacks in the IPv6 network. As a countermeasure against the flood, this study designed an IPv6 flood attack det

APA, Harvard, Vancouver, ISO, and other styles

20

Pazis, Jason, and Ronald Parr. "PAC Optimal Exploration in Continuous Space Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (2013): 774–81. http://dx.doi.org/10.1609/aaai.v27i1.8678.

Full text

Abstract:

Current exploration algorithms can be classified in two broad categories: Heuristic, and PAC optimal. While numerous researchers have used heuristic approaches such as epsilon-greedy exploration successfully, such approaches lack formal, finite sample guarantees and may need a significant amount of fine-tuning to produce good results. PAC optimal exploration algorithms, on the other hand, offer strong theoretical guarantees but are inapplicable in domains of realistic size. The goal of this paper is to bridge the gap between theory and practice, by introducing C-PACE, an algorithm which offers

APA, Harvard, Vancouver, ISO, and other styles

21

Ouyang, Enqi. "Tackling the cold start issue in movie recommendations with a refined epsilon-greedy approach." Applied and Computational Engineering 54, no. 1 (2024): 21–29. http://dx.doi.org/10.54254/2755-2721/54/20241140.

Full text

Abstract:

With the rapid growth of the Internet and the consequent surge in data, the current era is characterized by information overload. As the domain of data processing and storage expands, recommendation systems have become pivotal tools in navigating this deluge, assisting users in filtering through vast information landscapes. A notable segment of this is movie recommendation systems. As living standards rise, so does the demand for cinematic experiences. Enhancing and refining the methodologies of these recommendation systems is, therefore, of significant value. However, a consistent challenge i

APA, Harvard, Vancouver, ISO, and other styles

22

Song, Ruibo. "Optimizing decision-making in uncertain environments through analysis of stochastic stationary Multi-Armed Bandit algorithms." Applied and Computational Engineering 68, no. 1 (2024): 93–113. http://dx.doi.org/10.54254/2755-2721/68/20241406.

Full text

Abstract:

Reinforcement learning traditionally plays a pivotal role in artificial intelligence and various practical applications, focusing on the interaction between an agent and its environment. Within this broad field, the multi-armed bandit (MAB) problem represents a specific subset, characterized by a sequential interaction between a learner and an environment where the agents actions do not alter the environment or reward distributions. MABs are prevalent in recommendation systems and advertising and are increasingly applied in sectors like agriculture and adaptive clinical trials. The stochastic

APA, Harvard, Vancouver, ISO, and other styles

23

Zhang, Qinchuan. "Multi-Armed Bandit Algorithms: Analysis and Applications Across Domains." Highlights in Science, Engineering and Technology 94 (April 26, 2024): 170–74. http://dx.doi.org/10.54097/apzhv358.

Full text

Abstract:

This study provides an in-depth exploration of the pivotal role of Multi-Armed Bandit (MAB) Algorithms in decision-making across diverse sectors, focusing on their theoretical foundations, real-world applications, and empirical evidence. MAB Algorithms, metaphorically representing choices among various slot machine arms with different rewards, are crucial in optimizing decisions in uncertain settings by striking a balance between exploration and exploitation. It examines four principal algorithms—Greedy, Epsilon-Greedy, Upper Confidence Bound, and Thompson Sampling—each tailored for specific t

APA, Harvard, Vancouver, ISO, and other styles

24

Zhang, Shengshi. "Optimizing Data Filtering in Multi-Armed Bandit Algorithms for Reinforcement Learning." ITM Web of Conferences 73 (2025): 01024. https://doi.org/10.1051/itmconf/20257301024.

Full text

Abstract:

This study investigates the performance of data filtering algorithms in multi-armed bandit (MAB) problems for reinforcement learning applications. It focuses on five algorithms: Epsilon-Greedy (ε- Greedy), Upper Confidence Bound (UCB), Linear Upper Confidence Bound (LinUCB), Thompson Sampling, and Linear Thompson Sampling (LinTS). The algorithms were evaluated in static and dynamic environments using the MovieLens dataset and transferred to binary rewards to measure performance. Each algorithm was tested in simulations with 1,000 interactions, and compared with cumulative reward, accuracy, and

APA, Harvard, Vancouver, ISO, and other styles

25

Qiu, Zirou, Chen Chen, Madhav Marathe, et al. "Finding Nontrivial Minimum Fixed Points in Discrete Dynamical Systems: Complexity, Special Case Algorithms and Heuristics." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (2022): 9422–30. http://dx.doi.org/10.1609/aaai.v36i9.21174.

Full text

Abstract:

Networked discrete dynamical systems are often used to model the spread of contagions and decision-making by agents in coordination games. Fixed points of such dynamical systems represent configurations to which the system converges. In the dissemination of undesirable contagions (such as rumors and misinformation), convergence to fixed points with a small number of affected nodes is a desirable goal. Motivated by such considerations, we formulate a novel optimization problem of finding a nontrivial fixed point of the system with the minimum number of affected nodes. We establish that, unless

APA, Harvard, Vancouver, ISO, and other styles

26

Feng, Yunhe, and Chirag Shah. "Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image Search." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 11 (2022): 11882–90. http://dx.doi.org/10.1609/aaai.v36i11.21445.

Full text

Abstract:

Gender bias is one of the most common and well-studied demographic biases in information retrieval, and in general in AI systems. After discovering and reporting that gender bias for certain professions could change searchers' worldviews, mainstreaming image search engines, such as Google, quickly took action to correct and fix such a bias. However, given the nature of these systems, viz., being opaque, it is unclear if they addressed unequal gender representation and gender stereotypes in image search results systematically and in a sustainable way. In this paper, we propose adversarial attac

APA, Harvard, Vancouver, ISO, and other styles

27

Guo, Yushi. "Strategy Selection Using Multi-Armed Bandit Algorithms in Financial Markets." Applied and Computational Engineering 83, no. 1 (2024): 81–93. http://dx.doi.org/10.54254/2755-2721/83/2024glg0075.

Full text

Abstract:

This paper aims to evaluate the effectiveness of Multi-Armed Bandit (MAB) algorithms in choosing the optimal trading strategy among the sub optimal ones within financial markets. The research aims to addresses the challenge of adapting to dynamic market conditions. By introducing a Composite Trading Strategy that integrates trend-following, mean reversion, and momentum strategies, the study investigates whether increased trading frequency enhances the performance of profitability of various MAB algorithms, including UCB, Thompson Sampling, and epsilon-greedy. The experimental results indicate

APA, Harvard, Vancouver, ISO, and other styles

28

Ameen, Salem, and Sunil Vadera. "Pruning Neural Networks Using Multi-Armed Bandits." Computer Journal 63, no. 7 (2019): 1099–108. http://dx.doi.org/10.1093/comjnl/bxz078.

Full text

Abstract:

Abstract The successful application of deep learning has led to increasing expectations of their use in embedded systems. This, in turn, has created the need to find ways of reducing the size of neural networks. Decreasing the size of a neural network requires deciding which weights should be removed without compromising accuracy, which is analogous to the kind of problems addressed by multi-armed bandits (MABs). Hence, this paper explores the use of MABs for reducing the number of parameters of a neural network. Different MAB algorithms, namely $\epsilon $-greedy, win-stay, lose-shift, UCB1,

APA, Harvard, Vancouver, ISO, and other styles

29

Bui, Van-Hai, Akhtar Hussain, and Hak-Man Kim. "Q-Learning-Based Operation Strategy for Community Battery Energy Storage System (CBESS) in Microgrid System." Energies 12, no. 9 (2019): 1789. http://dx.doi.org/10.3390/en12091789.

Full text

Abstract:

Energy management systems (EMSs) of microgrids (MGs) can be broadly categorized as centralized or decentralized EMSs. The centralized approach may not be suitable for a system having several entities that have their own operation objectives. On the other hand, the use of the decentralized approach leads to an increase in the operation cost due to local optimization. In this paper, both centralized and decentralized approaches are combined for managing the operation of a distributed system, which is comprised of an MG and a community battery storage system (CBESS). The MG is formed by grouping

APA, Harvard, Vancouver, ISO, and other styles

30

Lin, Yiheng. "Finding the best opening in chess with multi-armed bandit algorithm." Applied and Computational Engineering 13, no. 1 (2023): 21–28. http://dx.doi.org/10.54254/2755-2721/13/20230704.

Full text

Abstract:

With many gaming AI being developed and being able to defeat top human players in recent years, AI has once again become the hottest topic in research and even in our daily life. This paper also researches on gaming AI and chess. Instead of using deep learning and the Monte Carlo Search algorithm, this paper focuses on the opening only with multi-armed bandit algorithms to find the best moves and opening. Specifically, the method used in this paper is epsilon greedy and Thompson sampling. The dataset used in this paper is from Kaggle. This paper considers each move as a set of choices one need

APA, Harvard, Vancouver, ISO, and other styles

31

Deshpande, Mihir. "Deep Reinforcement Learning for Supply Chain Optimization: A DQN and LSTM-Based Approach." International Journal for Research in Applied Science and Engineering Technology 13, no. 4 (2025): 3794–803. https://doi.org/10.22214/ijraset.2025.69117.

Full text

Abstract:

Abstract: Effective inventory management is essential for optimizing supply chains, balancing stock levels, minimizing holding costs, and preventing stockouts. Traditional forecasting andrule-basedsystemsoftenfailtoadapttoreal-timede- mandfluctuations and supplyuncertainties.Inthisresearch, we propose a Reinforcement Learning (RL)-based approach for dynamic inventory optimization, leveraging Deep Q-Networks (DQN)alongsideMulti-ArmedBandit(MAB)strategiessuch as Epsilon-Greedy, Upper Confidence Bound (UCB), KL-UCB, and Thompson Sampling. The DQN agent learns an optimal replenishment policybyinte

APA, Harvard, Vancouver, ISO, and other styles

32

Jin, Tianyuan, Hao-Lun Hsu, William Chang, and Pan Xu. "Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 11 (2024): 12956–64. http://dx.doi.org/10.1609/aaai.v38i11.29193.

Full text

Abstract:

We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored into overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local reward for each hyperedge, and the reward of the joint arm is the sum of these local rewards. Previous work introduced the multi-agent Thompson sampling (MATS) algorithm and derived a Bayesian regret bound. H

APA, Harvard, Vancouver, ISO, and other styles

33

Mintz, Yonatan, Anil Aswani, Philip Kaminsky, Elena Flowers, and Yoshimi Fukuoka. "Nonstationary Bandits with Habituation and Recovery Dynamics." Operations Research 68, no. 5 (2020): 1493–516. http://dx.doi.org/10.1287/opre.2019.1918.

Full text

Abstract:

In many sequential decision-making settings where there is uncertainty about the reward of each action, frequent selection of specific actions may reduce expected reward while choosing less frequently selected actions could lead to an increase. These effects are commonly observed in settings ranging from personalized healthcare interventions and targeted online advertising. To address this problem, the authors propose a new class of models called ROGUE (reducing or gaining unknown efficacy) multiarmed bandits. In the paper, the authors present a maximum likelihood approach to estimate the para

APA, Harvard, Vancouver, ISO, and other styles

34

Karimi, Maryam, Reza Javidan, and Manijeh Keshtgari. "A New Method for Intelligent Message Network Management in Ubiquitous Sensor Networks." Computer Engineering and Applications Journal 3, no. 3 (2014): 139–46. http://dx.doi.org/10.18495/comengapp.v3i3.69.

Full text

Abstract:

Ubiquitous Sensor Network (USN) computing is a useful technology forautonomic integrating in different environments which can be available anywhere.Managing USN plays an important role on the availability of nodes and paths. Inorder to manage nodes there is a cyclic route starts from manager, passing nodes,and come back to manager as feedback. In this paper, a new, self-optimizing methodpresented for finding this cyclic path by combining epsilon greedy and geneticalgorithm and then it is compared with other well-known methods in terms of cost ofthe route they find and the power consumption. Th

APA, Harvard, Vancouver, ISO, and other styles

35

Huang, Wuyue, Wenling Wang, Yudong Wu, and Chuheng Xi. "Comparative Study of Multi-Armed Bandit Algorithms in Clinical Trials." Applied and Computational Engineering 83, no. 1 (2024): 45–51. http://dx.doi.org/10.54254/2755-2721/83/2024glg0067.

Full text

Abstract:

In recent years, with the rapid development of the information age, the influence of Multi Armed Bandit Algorithms (MAB) models in clinical trials for disease prevention has been increasing. In this study, based on Python programming language, Multi-Armed Bandit Algorithms (MAB) algorithm, Upper Confidence Bound (UCB) algorithm, Adaptive Epsilon-Greedy Algorithm, and Thompson Sampling (TS) algorithms to validate the idea of preventing, controlling and predicting the occurrence of diseases. The results show that the MAB model can effectively solve various decision-making problems in clinical tr

APA, Harvard, Vancouver, ISO, and other styles

36

Wang, Zhiwei, Huazheng Wang, and Hongning Wang. "Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15770–77. http://dx.doi.org/10.1609/aaai.v38i14.29506.

Full text

Abstract:

Adversarial attacks against stochastic multi-armed bandit (MAB) algorithms have been extensively studied in the literature. In this work, we focus on reward poisoning attacks and find most existing attacks can be easily detected by our proposed detection method based on the test of homogeneity, due to their aggressive nature in reward manipulations. This motivates us to study the notion of stealthy attack against stochastic MABs and investigate the resulting attackability. Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealt

APA, Harvard, Vancouver, ISO, and other styles

37

Dowlatshahi, Mohammad, Vali Derhami, and Hossein Nezamabadi-pour. "Ensemble of Filter-Based Rankers to Guide an Epsilon-Greedy Swarm Optimizer for High-Dimensional Feature Subset Selection." Information 8, no. 4 (2017): 152. http://dx.doi.org/10.3390/info8040152.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Ghulam Mustafa, Hammad. "Self-Operating Stock Exchange – A Deep Reinforcement Learning Approach." UMT Artificial Intelligence Review 1, no. 1 (2021): 1. http://dx.doi.org/10.32350/umtair.11.02.

Full text

Abstract:

Stock trading approaches play an important role in equity. However, it is tough to create a financially beneficial approach in a complicated and evolving stock market. In this manuscript, we suggest an epsilon greedy policy in our DQN prototype that allows you to get effective policy for the agent this could optimize the predicted values of the total reward across any sequential steps ranging from the present state i. E. To maximize the state-action-value function through engaging with the environment q (s, a) to recommend when to buy, sell or hold. In this prototype, the state depends on rout

APA, Harvard, Vancouver, ISO, and other styles

39

de Curtò, J., I. de Zarzà, Gemma Roig, Juan Carlos Cano, Pietro Manzoni, and Carlos T. Calafate. "LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments." Electronics 12, no. 13 (2023): 2814. http://dx.doi.org/10.3390/electronics12132814.

Full text

Abstract:

In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle in the face of dynamic changes, we propose a strategy informed by LLMs that offers dynamic guidance on exploration versus exploitation, contingent on the current state of the bandits. We bring forward a new non-stationary bandit model with fluctuating reward distributions a

APA, Harvard, Vancouver, ISO, and other styles

40

Varsha D. Jadhav, Et al. "Understanding the Order of 500 and 1000 Rupees Notes Ban using Reinforcement Learning." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 10 (2023): 2482–88. http://dx.doi.org/10.17762/ijritcc.v11i10.9047.

Full text

Abstract:

In the field of machine learning called reinforcement learning, complicated sequential decision-making problems have been addressed. The issue that arises when an agent learns behavior by trial-and-error runs to determine the ideal policy, or the sequence of behaviors so that rewards are maximized,is known as reinforcement learning. Because many reinforcement learning methods use dynamic programming approaches, the environment is characterized as a Markov Decision Process (MDP). The research presents reinforcement learning using Bigram, trigram, and 4-gram models for tweets collected for "500

APA, Harvard, Vancouver, ISO, and other styles

41

Czech, Johannes, Patrick Korus, and Kristian Kersting. "Improving AlphaZero Using Monte-Carlo Graph Search." Proceedings of the International Conference on Automated Planning and Scheduling 31 (May 17, 2021): 103–11. http://dx.doi.org/10.1609/icaps.v31i1.15952.

Full text

Abstract:

The AlphaZero algorithm has been successfully applied in a range of discrete domains, most notably board games. It utilizes a neural network that learns a value and policy function to guide the exploration in a Monte-Carlo Tree Search. Although many search improvements such as graph search have been proposed for Monte-Carlo Tree Search in the past, most of them refer to an older variant of the Upper Confidence bounds for Trees algorithm that does not use a policy for planning. We improve the search algorithm for AlphaZero by generalizing the search tree to a directed acyclic graph. This enable

APA, Harvard, Vancouver, ISO, and other styles

42

Shen, Tongle. "Adaptive Game Mechanics: Leveraging Multi-Armed Bandits for Dynamic Difficulty Adjustment." Applied and Computational Engineering 105, no. 1 (2024): 117–22. https://doi.org/10.54254/2755-2721/2024.tj17900.

Full text

Abstract:

Dynamic Difficulty Adjustment (DDA) is a crucial task in video game design to select the appropriate difficulty level to maintain player engagement. Recent studies have highlighted the importance of developing adaptive systems that balance challenge and enjoyment. This paper explores the application of Multi-Armed Bandit algorithms to DDA, providing a computationally efficient and explainable solution to real-time difficulty adjustment. A numerical game engine was built with a damage calculator and a linear attribute generator for players and monsters. The study tested four Multi-Armed Bandits

APA, Harvard, Vancouver, ISO, and other styles

43

Siddharth Gupta. "Scaling and Optimizing Consumer Tech Products with Multi-Armed Bandit Algorithms: Applications in eCommerce." International Journal of Scientific Research in Computer Science, Engineering and Information Technology 11, no. 2 (2025): 275–86. https://doi.org/10.32628/cseit251112370.

Full text

Abstract:

This article explores the application of Multi-Armed Bandit (MAB) algorithms in optimizing consumer tech products, with a particular focus on eCommerce platforms. It provides a comprehensive article overview of the theoretical framework behind MAB algorithms, including the exploration-exploitation trade-off and comparisons with traditional A/B testing methods. The article delves into various MAB strategies commonly used in eCommerce, such as epsilon-greedy, Upper Confidence Bound (UCB), and Thompson Sampling, and examines their applications in personalized product recommendations, dynamic pric

APA, Harvard, Vancouver, ISO, and other styles

44

Wang, Dongjie, Pengyang Wang, Kunpeng Liu, Yuanchun Zhou, Charles E. Hughes, and Yanjie Fu. "Reinforced Imitative Graph Representation Learning for Mobile User Profiling: An Adversarial Training Perspective." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 5 (2021): 4410–17. http://dx.doi.org/10.1609/aaai.v35i5.16567.

Full text

Abstract:

In this paper, we study the problem of mobile user profiling, which is a critical component for quantifying users' characteristics in the human mobility modeling pipeline. Human mobility is a sequential decision-making process dependent on the users' dynamic interests. With accurate user profiles, the predictive model can perfectly reproduce users' mobility trajectories. In the reverse direction, once the predictive model can imitate users' mobility patterns, the learned user profiles are also optimal. Such intuition motivates us to propose an imitation-based mobile user profiling framework by

APA, Harvard, Vancouver, ISO, and other styles

45

Madarasi, Péter. "Matchings under distance constraints I." Annals of Operations Research 305, no. 1-2 (2021): 137–61. http://dx.doi.org/10.1007/s10479-021-04127-8.

Full text

Abstract:

AbstractThis paper introduces the d-distance matching problem, in which we are given a bipartite graph $$G=(S,T;E)$$ G = ( S , T ; E ) with $$S=\{s_1,\dots ,s_n\}$$ S = { s 1 , ⋯ , s n } , a weight function on the edges and an integer $$d\in \mathbb Z_+$$ d ∈ Z + . The goal is to find a maximum-weight subset $$M\subseteq E$$ M ⊆ E of the edges satisfying the following two conditions: (i) the degree of every node of S is at most one in M, (ii) if $$s_it,s_jt\in M$$ s i t , s j t ∈ M , then $$|j-i|\ge d$$ | j - i | ≥ d . This question arises naturally, for example, in various scheduling problems

APA, Harvard, Vancouver, ISO, and other styles

46

Valenzano, Richard, Nathan Sturtevant, Jonathan Schaeffer, and Fan Xie. "A Comparison of Knowledge-Based GBFS Enhancements and Knowledge-Free Exploration." Proceedings of the International Conference on Automated Planning and Scheduling 24 (May 11, 2014): 375–79. http://dx.doi.org/10.1609/icaps.v24i1.13681.

Full text

Abstract:

GBFS-based satisficing planners often augment their search with knowledge-based enhancements such as preferred operators and multiple heuristics. These techniques seek to improve planner performance by making the search more informed. In our work, we will focus on how these enhancements impact coverage and we will use a simple technique called epsilon-greedy node selection to demonstrate that planner coverage can also be improved by introducing knowledge-free random exploration into the search. We then revisit the existing knowledge-based enhancements so as to determine if the knowledge these

APA, Harvard, Vancouver, ISO, and other styles

47

Kletzander, Lucas, and Nysret Musliu. "Large-State Reinforcement Learning for Hyper-Heuristics." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (2023): 12444–52. http://dx.doi.org/10.1609/aaai.v37i10.26466.

Full text

Abstract:

Hyper-heuristics are a domain-independent problem solving approach where the main task is to select effective chains of problem-specific low-level heuristics on the fly for an unseen instance. This task can be seen as a reinforcement learning problem, however, the information available to the hyper-heuristic is very limited, usually leading to very limited state representations. In this work, for the first time we use the trajectory of solution changes for a larger set of features for reinforcement learning in the novel hyper-heuristic LAST-RL (Large-State Reinforcement Learning). Further, we

APA, Harvard, Vancouver, ISO, and other styles

48

An, Lei. "Multi-Armed Bandit Algorithms: Innovations and Applications in Dynamic Environments." Highlights in Science, Engineering and Technology 94 (April 26, 2024): 236–40. http://dx.doi.org/10.54097/3n7ctj84.

Full text

Abstract:

This paper delves into the fundamental concept of the Multi-Armed Bandit (MAB) problem, structuring its analysis around two primary phases. The initial phase, exploration, is dedicated to investigating the potential rewards of each arm. Subsequently, the exploitation phase utilizes insights from exploration to maximize returns. The discussion then progresses to elucidate the core methodologies and workflows of three principal MAB algorithms: Upper Confidence Bound (UCB), Thompson Sampling, and Epsilon-Greedy. These algorithms are meticulously analyzed for their unique approaches and efficienci

APA, Harvard, Vancouver, ISO, and other styles

49

Han, Huiyan, Jiaqi Wang, Liqun Kuang, Xie Han, and Hongxin Xue. "Improved Robot Path Planning Method Based on Deep Reinforcement Learning." Sensors 23, no. 12 (2023): 5622. http://dx.doi.org/10.3390/s23125622.

Full text

Abstract:

With the advancement of robotics, the field of path planning is currently experiencing a period of prosperity. Researchers strive to address this nonlinear problem and have achieved remarkable results through the implementation of the Deep Reinforcement Learning (DRL) algorithm DQN (Deep Q-Network). However, persistent challenges remain, including the curse of dimensionality, difficulties of model convergence and sparsity in rewards. To tackle these problems, this paper proposes an enhanced DDQN (Double DQN) path planning approach, in which the information after dimensionality reduction is fed

APA, Harvard, Vancouver, ISO, and other styles

50

El Wafi, Mouna, My Abdelkader Youssefi, Rachid Dakir, and Mohamed Bakir. "Intelligent Robot in Unknown Environments: Walk Path Using Q-Learning and Deep Q-Learning." Automation 6, no. 1 (2025): 12. https://doi.org/10.3390/automation6010012.

Full text

Abstract:

Autonomous navigation is essential for mobile robots to efficiently operate in complex environments. This study investigates Q-learning and Deep Q-learning to improve navigation performance. The research examines their effectiveness in complex maze configurations, focusing on how the epsilon-greedy strategy influences the agent’s ability to reach its goal in minimal time using Q-learning. A distinctive aspect of this work is the adaptive tuning of hyperparameters, where alpha and gamma values are dynamically adjusted throughout training. This eliminates the need for manually fixed parameters a

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!