Relevant bibliographies by topics / State Action Reward State Action

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'State Action Reward State Action'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'State Action Reward State Action.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "State Action Reward State Action"

Xu, Kuang, and Se-Young Yun. "Reinforcement with Fading Memories." Mathematics of Operations Research 45, no. 4 (2020): 1258–88. http://dx.doi.org/10.1287/moor.2019.1031.

Full text

Abstract:

We study the effect of imperfect memory on decision making in the context of a stochastic sequential action-reward problem. An agent chooses a sequence of actions, which generate discrete rewards at different rates. She is allowed to make new choices at rate β, whereas past rewards disappear from her memory at rate μ. We focus on a family of decision rules where the agent makes a new choice by randomly selecting an action with a probability approximately proportional to the amount of past rewards associated with each action in her memory. We provide closed form formulas for the agent’s steady-

APA, Harvard, Vancouver, ISO, and other styles

Jahn, Caroline I., Chiara Varazzani, Jérôme Sallet, Mark E. Walton, and Sébastien Bouret. "Noradrenergic But Not Dopaminergic Neurons Signal Task State Changes and Predict Reengagement After a Failure." Cerebral Cortex 30, no. 9 (2020): 4979–94. http://dx.doi.org/10.1093/cercor/bhaa089.

Full text

Abstract:

Abstract The two catecholamines, noradrenaline and dopamine, have been shown to play comparable roles in behavior. Both noradrenergic and dopaminergic neurons respond to cues predicting reward availability and novelty. However, even though both are thought to be involved in motivating actions, their roles in motivation have seldom been directly compared. We therefore examined the activity of putative noradrenergic neurons in the locus coeruleus and putative midbrain dopaminergic neurons in monkeys cued to perform effortful actions for rewards. The activity in both regions correlated with engag

APA, Harvard, Vancouver, ISO, and other styles

Anselmi, Jonatha, François Dufour, and Tomás Prieto-Rumeau. "Computable approximations for average Markov decision processes in continuous time." Journal of Applied Probability 55, no. 2 (2018): 571–92. http://dx.doi.org/10.1017/jpr.2018.36.

Full text

Abstract:

Abstract In this paper we study the numerical approximation of the optimal long-run average cost of a continuous-time Markov decision process, with Borel state and action spaces, and with bounded transition and reward rates. Our approach uses a suitable discretization of the state and action spaces to approximate the original control model. The approximation error for the optimal average reward is then bounded by a linear combination of coefficients related to the discretization of the state and action spaces, namely, the Wasserstein distance between an underlying probability measure μ and a m

APA, Harvard, Vancouver, ISO, and other styles

Sarhan, Shahenda, Mohamed Abu ElSoud, and Hebatullah Rashed. "Enhancing Video Games Policy Based on Least-Squares Continuous Action Policy Iteration: Case Study on StarCraft Brood War and Glest RTS Games and the 8 Queens Board Game." International Journal of Computer Games Technology 2016 (2016): 1–14. http://dx.doi.org/10.1155/2016/7090757.

Full text

Abstract:

With the rapid advent of video games recently and the increasing numbers of players and gamers, only a tough game with high policy, actions, and tactics survives. How the game responds to opponent actions is the key issue of popular games. Many algorithms were proposed to solve this problem such as Least-Squares Policy Iteration (LSPI) and State-Action-Reward-State-Action (SARSA) but they mainly depend on discrete actions, while agents in such a setting have to learn from the consequences of their continuous actions, in order to maximize the total reward over time. So in this paper we proposed

APA, Harvard, Vancouver, ISO, and other styles

Hasselmo, Michael E. "A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior." Journal of Cognitive Neuroscience 17, no. 7 (2005): 1115–29. http://dx.doi.org/10.1162/0898929054475190.

Full text

Abstract:

Many behavioral tasks require goal-directed actions to obtain delayed reward. The prefrontal cortex appears to mediate many aspects of goal-directed decision making. This article presents a model of prefrontal cortex function emphasizing the influence of goal-related activity on the choice of the next motor output. The model can be interpreted in terms of key elements of Reinforcement Learning Theory. Different neocortical minicolumns represent distinct sensory input states and distinct motor output actions. The dynamics of each minicolumn include separate phases of encoding and retrieval. Dur

APA, Harvard, Vancouver, ISO, and other styles

Ma, Shuai, and Jia Yuan Yu. "State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4512–19. http://dx.doi.org/10.1609/aaai.v33i01.33014512.

Full text

Abstract:

In the framework of MDP, although the general reward function takes three arguments—current state, action, and successor state; it is often simplified to a function of two arguments—current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected total reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We propose three successively more general state-augmentation transformations (SAT

APA, Harvard, Vancouver, ISO, and other styles

Beutler, Frederick J., and Keith W. Ross. "Time-average optimal constrained semi-Markov decision processes." Advances in Applied Probability 18, no. 02 (1986): 341–59. http://dx.doi.org/10.1017/s0001867800015792.

Full text

Abstract:

Optimal causal policies maximizing the time-average reward over a semi-Markov decision process (SMDP), subject to a hard constraint on a time-average cost, are considered. Rewards and costs depend on the state and action, and contain running as well as switching components. It is supposed that the state space of the SMDP is finite, and the action space compact metric. The policy determines an action at each transition point of the SMDP. Under an accessibility hypothesis, several notions of time average are equivalent. A Lagrange multiplier formulation involving a dynamic programming equation i

APA, Harvard, Vancouver, ISO, and other styles

Beutler, Frederick J., and Keith W. Ross. "Time-average optimal constrained semi-Markov decision processes." Advances in Applied Probability 18, no. 2 (1986): 341–59. http://dx.doi.org/10.2307/1427303.

Full text

Abstract:

Optimal causal policies maximizing the time-average reward over a semi-Markov decision process (SMDP), subject to a hard constraint on a time-average cost, are considered. Rewards and costs depend on the state and action, and contain running as well as switching components. It is supposed that the state space of the SMDP is finite, and the action space compact metric. The policy determines an action at each transition point of the SMDP.Under an accessibility hypothesis, several notions of time average are equivalent. A Lagrange multiplier formulation involving a dynamic programming equation is

APA, Harvard, Vancouver, ISO, and other styles

Archibald, Christopher, and Delma Nieves-Rivera. "Bayesian Execution Skill Estimation." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6014–21. http://dx.doi.org/10.1609/aaai.v33i01.33016014.

Full text

Abstract:

The performance of agents in many domains with continuous action spaces depends not only on their ability to select good actions to execute, but also on their ability to execute planned actions precisely. This ability, which has been called an agent’s execution skill, is an important characteristic of an agent which can have a significant impact on their success. In this paper, we address the problem of estimating the execution skill of an agent given observations of that agent acting in a domain. Each observation includes the executed action and a description of the state in which the action

APA, Harvard, Vancouver, ISO, and other styles

Kuroda, Seiya, Kazuteru Miyazaki, and Hiroaki Kobayashi. "Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation." Journal of Advanced Computational Intelligence and Intelligent Informatics 16, no. 6 (2012): 758–68. http://dx.doi.org/10.20965/jaciii.2012.p0758.

Full text

Abstract:

During a long-term reinforcement learning task, the efficiency of learning is heavily degraded because the probabilistic actions of an agent often cause the task to fail, which makes it difficult to reach the goal and receive a reward. To address this problem, a fixed mode state is proposed in this paper. If the agent acquires an adequate reward, a normal state is switched to a fixed mode state. In this mode, the agent selects an action using a greedy strategy, i.e., it selects the highest weight action deterministically. First, this paper combines Online Profit Sharing reinforcement learning

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "State Action Reward State Action"

Botelho, Neto Gutenberg Pessoa. "Aprendizado por esforço aplicado ao combate em jogos eletrônicos de estratégia em tempo real." Universidade Federal da Paraíba, 2014. http://tede.biblioteca.ufpb.br:8080/handle/tede/6128.

Full text

Abstract:

Made available in DSpace on 2015-05-14T12:36:51Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 4482656 bytes, checksum: 11b85e413d691749edd8d5be0d8f56d4 (MD5) Previous issue date: 2014-03-28<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES<br>Electronic games and, in particular, real-time strategy (RTS) games, are increasingly seen as viable and important fields for artificial intelligence research because of commonly held characteristics, like the presence of complex environments, usually dynamic and with multiple agents. In commercial RTS games, the computer beh

APA, Harvard, Vancouver, ISO, and other styles

Au, Manix. "Automatic State Construction using Decision Trees for Reinforcement Learning Agents." Queensland University of Technology, 2005. http://eprints.qut.edu.au/15965/.

Full text

Abstract:

Reinforcement Learning (RL) is a learning framework in which an agent learns a policy from continual interaction with the environment. A policy is a mapping from states to actions. The agent receives rewards as feedback on the actions performed. The objective of RL is to design autonomous agents to search for the policy that maximizes the expectation of the cumulative reward. When the environment is partially observable, the agent cannot determine the states with certainty. These states are called hidden in the literature. An agent that relies exclusively on the current observations will not

APA, Harvard, Vancouver, ISO, and other styles

Geißer, Florian [Verfasser], and Bernhard [Akademischer Betreuer] Nebel. "On planning with state-dependent action costs." Freiburg : Universität, 2018. http://d-nb.info/1189066688/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Alexander, Serena E. "From Planning to Action: An Evaluation of State Level Climate Action Plans." Cleveland State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=csu1470908879.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Davis, Gloria-Jeanne Halinski Ronald S. Lynn Mary Ann. "Affirmative action implementation in Illinois public state universities." Normal, Ill. Illinois State University, 1986. http://wwwlib.umi.com/cr/ilstu/fullcit?p8626589.

Full text

Abstract:

Thesis (Ph. D.)--Illinois State University, 1986.<br>Title from title page screen, viewed July 14, 2005. Dissertation Committee: Ronald S. Halinski, Mary Ann Lynn (co-chairs), Charles E. Morris, Jeanne B. Morris, Thomas W. Nelson. Includes bibliographical references (leaves 90-93) and abstract. Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

Melo, Andrés Felipe. "A state-action model for design process planning." Thesis, University of Cambridge, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.619610.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Nichols, B. "Reinforcement learning in continuous state- and action-space." Thesis, University of Westminster, 2014. https://westminsterresearch.westminster.ac.uk/item/967w8/reinforcement-learning-in-continuous-state-and-action-space.

Full text

Abstract:

Reinforcement learning in the continuous state-space poses the problem of the inability to store the values of all state-action pairs in a lookup table, due to both storage limitations and the inability to visit all states sufficiently often to learn the correct values. This can be overcome with the use of function approximation techniques with generalisation capability, such as artificial neural networks, to store the value function. When this is applied we can select the optimal action by comparing the values of each possible action; however, when the action-space is continuous this is not p

APA, Harvard, Vancouver, ISO, and other styles

Grönland, Axel, and Möllerstedt Viktor Eriksson. "Robust Reinforcement Learning in Continuous Action/State Space." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-293879.

Full text

Abstract:

In this project we aim to apply Robust Reinforce-ment Learning algorithms, presented by Doya and Morimoto [1],[2], to control problems. Specifically, we train an agent to balancea pendulum in the unstable equilibrium, which is the invertedstate.We investigate the performance of controllers based on twodifferent function approximators. One is quadratic, and the othermakes use of a Radial Basis Function neural network. To achieverobustness we will make use of an approach similar toH∞control, which amounts to introducing an adversary in the controlsystem.By changing the mass of the pendulum after

APA, Harvard, Vancouver, ISO, and other styles

Juma, Monica Kathina. "The politics of humanitarian assistance : state, non-state actors and displacement in Kenya and Uganda (1989-1998)." Thesis, University of Oxford, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365626.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bezuidenhout, Karen. "Compensation for excessive but otherwise lawful regulatory state action." Thesis, Stellenbosch : Stellenbosch University, 2015. http://hdl.handle.net/10019.1/96819.

Full text

Abstract:

Thesis (LLD)--Stellenbosch University, 2015<br>ENGLISH ABSTRACT : Section 25 of the South African Constitution authorises and sets the limits for two forms of legitimate regulatory interference with property, namely deprivation and expropriation. The focus of this dissertation is on the requirement in section 25(1) that no law may authorise arbitrary deprivation of property. According to the Constitutional Court, deprivation is arbitrary when there is insufficient reason for it. The Court listed a number of factors to consider in determining whether there is a sufficient relationship between t

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "State Action Reward State Action"

Nwabunwanne, Ifediora Christopher. Obasanjo's 2nd missionary journey, a rescue action: Nigerian teachers reward now on earth! Creative Forum Publishers], 1999.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

State action practice manual. 2nd ed. ABA Section of Antitrust Law, 2010.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Committee, Connecticut General Assembly Legislative Program Review and Investigations. Affirmative action in state government. The Committee, 1987.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Relations, Florida Advisory Council on Intergovernmental. State action on impact fees. The Council, 1985.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

1935-, Burrow J. W., ed. The limits of state action. Liberty Fund, 1993.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

(Palau), Ngardmau. Ngardmau State conservation action plan. [Ngardmau State, Republic of Palau], 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Governor's Advisory Board on Shelter, Nutrition, and Service Program for Homeless Individuals in Maryland. Homelessness: Recommendations for state action. The Board, 1986.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Congress: Facilitator of state action. State University of New York Press, 2010.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Peele, Cheri L. Washington State mercury chemical action plan. Washington State Dept. of Ecology, 2003.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Your governor: State government in action. Rosen Central Primary Source, 2004.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "State Action Reward State Action"

Powers, Jeanne M. "State Level Policy Action." In Charter Schools. Palgrave Macmillan US, 2009. http://dx.doi.org/10.1057/9780230622111_3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Szabó, Kinga Tibori. "Self-Defence in State-to-State Conflicts." In Anticipatory Action in Self-Defence. T. M. C. Asser Press, 2011. http://dx.doi.org/10.1007/978-90-6704-796-8_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Catlin, George E. G. "Is the State the Community?" In Preface to Action. Routledge, 2021. http://dx.doi.org/10.4324/9781003139911-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kallenberg, Lodewijk. "Finite State and Action MDPS." In International Series in Operations Research & Management Science. Springer US, 2003. http://dx.doi.org/10.1007/978-1-4615-0805-2_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wellman, Carl. "Moral Limits on State Action." In Terrorism and Counterterrorism. Springer Netherlands, 2013. http://dx.doi.org/10.1007/978-94-007-6007-3_6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kuznicki, Jason. "The Falsification of State Action." In Technology and the End of Authority. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-48692-5_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mabbott, J. D. "The Theories Limiting State Action." In The State and the Citizen. Routledge, 2021. http://dx.doi.org/10.4324/9781003222774-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cullity, Garrett. "Levels of climate action." In Climate Justice and Non-State Actors. Routledge, 2020. http://dx.doi.org/10.4324/9780429351877-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rochester, Colin. "A Perilous Partnership? Voluntary Action and the State." In Rediscovering Voluntary Action. Palgrave Macmillan UK, 2013. http://dx.doi.org/10.1057/9781137029461_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Singh, Jagpal. "Public action, castes, and the state." In Caste, State and Society. Routledge India, 2020. http://dx.doi.org/10.4324/9780429343063-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "State Action Reward State Action"

De Giacomo, Giuseppe, Marco Favorito, Luca Iocchi, Fabio Patrizi, and Alessandro Ronca. "Temporal Logic Monitoring Rewards via Transducers." In 17th International Conference on Principles of Knowledge Representation and Reasoning {KR-2020}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/kr.2020/89.

Full text

Abstract:

In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logic

APA, Harvard, Vancouver, ISO, and other styles

Ostwald, Dirk, Rasmus Bruckner, and Hauke Heekeren. "Computational mechanisms of human state-action-reward contingency learning under perceptual uncertainty." In 2018 Conference on Cognitive Computational Neuroscience. Cognitive Computational Neuroscience, 2018. http://dx.doi.org/10.32470/ccn.2018.1078-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sun, Mingfei, and Xiaojuan Ma. "Adversarial Imitation Learning from Incomplete Demonstrations." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/487.

Full text

Abstract:

Imitation learning targets deriving a mapping from states to actions, a.k.a. policy, from expert demonstrations. Existing methods for imitation learning typically require any actions in the demonstrations to be fully available, which is hard to ensure in real applications. Though algorithms for learning with unobservable actions have been proposed, they focus solely on state information and over- look the fact that the action sequence could still be partially available and provide useful information for policy deriving. In this paper, we propose a novel algorithm called Action-Guided Adversari

APA, Harvard, Vancouver, ISO, and other styles

Seurin, Mathieu, Florian Strub, Philippe Preux, and Olivier Pietquin. "Don’t Do What Doesn’t Matter: Intrinsic Motivation with Action Usefulness." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/406.

Full text

Abstract:

Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the environment's salient interaction opportunities. We propose a new exploration method, called Don't Do What Doesn't Matter (DoWhaM), shifting the emphasis from state novelty to state with relevant act

APA, Harvard, Vancouver, ISO, and other styles

Melo, Andre´s Felipe, and P. John Clarkson. "Planning and Scheduling Based on an Explicit Representation of the State of the Design." In ASME 2002 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2002. http://dx.doi.org/10.1115/detc2002/dtm-34008.

Full text

Abstract:

This paper describes a computational model that provides planning information useful for scheduling the design process. The model aims to reduce uncertainty in the design process and with it the risk of rework. The view is taken that planning is concerned with choosing between alternative actions and action sequences, but not with resource allocation. The planning model is based on an explicit representation of the state of the design process, the definition of the design capabilities as a pool of tasks, and on the generation and selection of plans by evaluating their reliability. Classical de

APA, Harvard, Vancouver, ISO, and other styles

Say, Buser, and Scott Sanner. "Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/669.

Full text

Abstract:

In this paper, we leverage the efficiency of Binarized Neural Networks (BNNs) to learn complex state transition models of planning domains with discretized factored state and action spaces. In order to directly exploit this transition structure for planning, we present two novel compilations of the learned factored planning problem with BNNs based on reductions to Boolean Satisfiability (FD-SAT-Plan) as well as Binary Linear Programming (FD-BLP-Plan). Experimentally, we show the effectiveness of learning complex transition models with BNNs, and test the runtime efficiency of both encodings on

APA, Harvard, Vancouver, ISO, and other styles

Shi, Wenjie, Shiji Song, and Cheng Wu. "Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/475.

Full text

Abstract:

Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we pre

APA, Harvard, Vancouver, ISO, and other styles

Barros, Gabriel Moraes, and Esther Colombini. "Reinforcement and Imitation Learning Applied to Autonomous Aerial Robot Control." In VIII Workshop de Teses e Dissertações em Robótica/Concurso de Teses e Dissertações em Robótica. Sociedade Brasileira de Computação - SBC, 2020. http://dx.doi.org/10.5753/wtdr_ctdr.2020.14956.

Full text

Abstract:

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt, and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. Reinforcement Learning (RL) aims at addressing this problem by enabling a robot to learn behaviors through trial-and-error. With RL, a Neural Network can be trained as a function approximator to directly map states to actuator commands making any predefined control structure not-needed for training. However, the knowledge required to converge these methods is usually built f

APA, Harvard, Vancouver, ISO, and other styles

Guo, Jiaming, Rui Zhang, Xishan Zhang, et al. "Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/341.

Full text

Abstract:

Policy gradient methods are appealing in deep reinforcement learning but suffer from high variance of gradient estimate. To reduce the variance, the state value function is applied commonly. However, the effect of the state value function becomes limited in stochastic dynamic environments, where the unexpected state dynamics and rewards will increase the variance. In this paper, we propose to replace the state value function with a novel hindsight value function, which leverages the information from the future to reduce the variance of the gradient estimate for stochastic dynamic environments.

APA, Harvard, Vancouver, ISO, and other styles

Abadi, Eden, and Ronen I. Brafman. "Learning and Solving Regular Decision Processes." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/270.

Full text

Abstract:

Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards. The non-Markovian behavior is restricted to depend on regular properties of the history. These can be specified using regular expressions or formulas in linear dynamic logic over finite traces. Fully specified RDPs can be solved by compiling them into an appropriate MDP. Learning RDPs from data is a challenging problem that has yet to be addressed, on which we focus in this paper. Our approach rests on a new representation for RDPs using Mealy Machines that emit a distri

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "State Action Reward State Action"

Goldman, Kenneth, and Nancy A. Lynch. Modelling Shared State in a Shared Action Model. Defense Technical Information Center, 1990. http://dx.doi.org/10.21236/ada221279.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Corlette, Sabrina Corlette, Sarah J. Dash Dash, and Amy Thomas Thomas. Implementing the Affordable Care Act: State Action on Quality Improvement in State-Based Marketplaces. Commonwealth Fund, 2014. http://dx.doi.org/10.15868/socialsector.25006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dell, Melissa, Nathaniel Lane, and Pablo Querubin. The Historical State, Local Collective Action, and Economic Development in Vietnam. National Bureau of Economic Research, 2017. http://dx.doi.org/10.3386/w23208.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Author, Not Given. New Mexico state information handbook: formerly utilized sites remedial action program. Office of Scientific and Technical Information (OSTI), 2014. http://dx.doi.org/10.2172/6662931.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lucia, Kevin W. Lucia, Sarah J. Dash Dash, and Amy Thomas Thomas. Implementing the Affordable Care Act: State Action to Establish SHOP Marketplaces. Commonwealth Fund, 2014. http://dx.doi.org/10.15868/socialsector.24991.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Houwer, Rebecca Houwer, Alexander Lovell Lovell, Uzo Anucha Anucha, and Andrew Galley Galley. Beyond Measure? The State of Evaluation and Action in Ontario's Youth Sector. Youth Research & Evaluation eXchange (YouthREX), 2016. http://dx.doi.org/10.15868/socialsector.33741.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Brown, Elizabeth, and R. Neal Elliott. State opportunities for action: Update of states' combined heat and power activities. Office of Scientific and Technical Information (OSTI), 2003. http://dx.doi.org/10.2172/1216241.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Corlette, Sabrina Corlette, Kevin W. Lucia Lucia, and Justin Giovannelli Giovannelli. Implementing the Affordable Care Act: State Action to Reform the Individual Health Insurance Market. Commonwealth Fund, 2014. http://dx.doi.org/10.15868/socialsector.25003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

van Wassenaer, Lan, Mireille van Hilten, Marcel van Asseldonk, and Erik van Ingen. Applying blockchain to climate action in agriculture : State of play and outlook : background paper. Wageningen Economic Research, 2021. http://dx.doi.org/10.18174/532926.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sarraf, Saket, Shilpi Anand, Yash Shukla, Paul Mathew, and Reshma Singh. Building Energy Benchmarking in India: an Action Plan for Advancing the State-of-the-Art. Office of Scientific and Technical Information (OSTI), 2014. http://dx.doi.org/10.2172/1171348.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'State Action Reward State Action'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "State Action Reward State Action"

Dissertations / Theses on the topic "State Action Reward State Action"

Books on the topic "State Action Reward State Action"

Book chapters on the topic "State Action Reward State Action"

Conference papers on the topic "State Action Reward State Action"

Reports on the topic "State Action Reward State Action"