Academic literature on the topic 'State Action Reward State Action'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'State Action Reward State Action.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "State Action Reward State Action"

1

Xu, Kuang, and Se-Young Yun. "Reinforcement with Fading Memories." Mathematics of Operations Research 45, no. 4 (2020): 1258–88. http://dx.doi.org/10.1287/moor.2019.1031.

Full text
Abstract:
We study the effect of imperfect memory on decision making in the context of a stochastic sequential action-reward problem. An agent chooses a sequence of actions, which generate discrete rewards at different rates. She is allowed to make new choices at rate β, whereas past rewards disappear from her memory at rate μ. We focus on a family of decision rules where the agent makes a new choice by randomly selecting an action with a probability approximately proportional to the amount of past rewards associated with each action in her memory. We provide closed form formulas for the agent’s steady-
APA, Harvard, Vancouver, ISO, and other styles
2

Jahn, Caroline I., Chiara Varazzani, Jérôme Sallet, Mark E. Walton, and Sébastien Bouret. "Noradrenergic But Not Dopaminergic Neurons Signal Task State Changes and Predict Reengagement After a Failure." Cerebral Cortex 30, no. 9 (2020): 4979–94. http://dx.doi.org/10.1093/cercor/bhaa089.

Full text
Abstract:
Abstract The two catecholamines, noradrenaline and dopamine, have been shown to play comparable roles in behavior. Both noradrenergic and dopaminergic neurons respond to cues predicting reward availability and novelty. However, even though both are thought to be involved in motivating actions, their roles in motivation have seldom been directly compared. We therefore examined the activity of putative noradrenergic neurons in the locus coeruleus and putative midbrain dopaminergic neurons in monkeys cued to perform effortful actions for rewards. The activity in both regions correlated with engag
APA, Harvard, Vancouver, ISO, and other styles
3

Anselmi, Jonatha, François Dufour, and Tomás Prieto-Rumeau. "Computable approximations for average Markov decision processes in continuous time." Journal of Applied Probability 55, no. 2 (2018): 571–92. http://dx.doi.org/10.1017/jpr.2018.36.

Full text
Abstract:
Abstract In this paper we study the numerical approximation of the optimal long-run average cost of a continuous-time Markov decision process, with Borel state and action spaces, and with bounded transition and reward rates. Our approach uses a suitable discretization of the state and action spaces to approximate the original control model. The approximation error for the optimal average reward is then bounded by a linear combination of coefficients related to the discretization of the state and action spaces, namely, the Wasserstein distance between an underlying probability measure μ and a m
APA, Harvard, Vancouver, ISO, and other styles
4

Sarhan, Shahenda, Mohamed Abu ElSoud, and Hebatullah Rashed. "Enhancing Video Games Policy Based on Least-Squares Continuous Action Policy Iteration: Case Study on StarCraft Brood War and Glest RTS Games and the 8 Queens Board Game." International Journal of Computer Games Technology 2016 (2016): 1–14. http://dx.doi.org/10.1155/2016/7090757.

Full text
Abstract:
With the rapid advent of video games recently and the increasing numbers of players and gamers, only a tough game with high policy, actions, and tactics survives. How the game responds to opponent actions is the key issue of popular games. Many algorithms were proposed to solve this problem such as Least-Squares Policy Iteration (LSPI) and State-Action-Reward-State-Action (SARSA) but they mainly depend on discrete actions, while agents in such a setting have to learn from the consequences of their continuous actions, in order to maximize the total reward over time. So in this paper we proposed
APA, Harvard, Vancouver, ISO, and other styles
5

Hasselmo, Michael E. "A Model of Prefrontal Cortical Mechanisms for Goal-directed Behavior." Journal of Cognitive Neuroscience 17, no. 7 (2005): 1115–29. http://dx.doi.org/10.1162/0898929054475190.

Full text
Abstract:
Many behavioral tasks require goal-directed actions to obtain delayed reward. The prefrontal cortex appears to mediate many aspects of goal-directed decision making. This article presents a model of prefrontal cortex function emphasizing the influence of goal-related activity on the choice of the next motor output. The model can be interpreted in terms of key elements of Reinforcement Learning Theory. Different neocortical minicolumns represent distinct sensory input states and distinct motor output actions. The dynamics of each minicolumn include separate phases of encoding and retrieval. Dur
APA, Harvard, Vancouver, ISO, and other styles
6

Ma, Shuai, and Jia Yuan Yu. "State-Augmentation Transformations for Risk-Sensitive Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4512–19. http://dx.doi.org/10.1609/aaai.v33i01.33014512.

Full text
Abstract:
In the framework of MDP, although the general reward function takes three arguments—current state, action, and successor state; it is often simplified to a function of two arguments—current state and action. The former is called a transition-based reward function, whereas the latter is called a state-based reward function. When the objective involves the expected total reward only, this simplification works perfectly. However, when the objective is risk-sensitive, this simplification leads to an incorrect value. We propose three successively more general state-augmentation transformations (SAT
APA, Harvard, Vancouver, ISO, and other styles
7

Beutler, Frederick J., and Keith W. Ross. "Time-average optimal constrained semi-Markov decision processes." Advances in Applied Probability 18, no. 02 (1986): 341–59. http://dx.doi.org/10.1017/s0001867800015792.

Full text
Abstract:
Optimal causal policies maximizing the time-average reward over a semi-Markov decision process (SMDP), subject to a hard constraint on a time-average cost, are considered. Rewards and costs depend on the state and action, and contain running as well as switching components. It is supposed that the state space of the SMDP is finite, and the action space compact metric. The policy determines an action at each transition point of the SMDP. Under an accessibility hypothesis, several notions of time average are equivalent. A Lagrange multiplier formulation involving a dynamic programming equation i
APA, Harvard, Vancouver, ISO, and other styles
8

Beutler, Frederick J., and Keith W. Ross. "Time-average optimal constrained semi-Markov decision processes." Advances in Applied Probability 18, no. 2 (1986): 341–59. http://dx.doi.org/10.2307/1427303.

Full text
Abstract:
Optimal causal policies maximizing the time-average reward over a semi-Markov decision process (SMDP), subject to a hard constraint on a time-average cost, are considered. Rewards and costs depend on the state and action, and contain running as well as switching components. It is supposed that the state space of the SMDP is finite, and the action space compact metric. The policy determines an action at each transition point of the SMDP.Under an accessibility hypothesis, several notions of time average are equivalent. A Lagrange multiplier formulation involving a dynamic programming equation is
APA, Harvard, Vancouver, ISO, and other styles
9

Archibald, Christopher, and Delma Nieves-Rivera. "Bayesian Execution Skill Estimation." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6014–21. http://dx.doi.org/10.1609/aaai.v33i01.33016014.

Full text
Abstract:
The performance of agents in many domains with continuous action spaces depends not only on their ability to select good actions to execute, but also on their ability to execute planned actions precisely. This ability, which has been called an agent’s execution skill, is an important characteristic of an agent which can have a significant impact on their success. In this paper, we address the problem of estimating the execution skill of an agent given observations of that agent acting in a domain. Each observation includes the executed action and a description of the state in which the action
APA, Harvard, Vancouver, ISO, and other styles
10

Kuroda, Seiya, Kazuteru Miyazaki, and Hiroaki Kobayashi. "Introduction of Fixed Mode States into Online Reinforcement Learning with Penalties and Rewards and its Application to Biped Robot Waist Trajectory Generation." Journal of Advanced Computational Intelligence and Intelligent Informatics 16, no. 6 (2012): 758–68. http://dx.doi.org/10.20965/jaciii.2012.p0758.

Full text
Abstract:
During a long-term reinforcement learning task, the efficiency of learning is heavily degraded because the probabilistic actions of an agent often cause the task to fail, which makes it difficult to reach the goal and receive a reward. To address this problem, a fixed mode state is proposed in this paper. If the agent acquires an adequate reward, a normal state is switched to a fixed mode state. In this mode, the agent selects an action using a greedy strategy, i.e., it selects the highest weight action deterministically. First, this paper combines Online Profit Sharing reinforcement learning
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "State Action Reward State Action"

1

Botelho, Neto Gutenberg Pessoa. "Aprendizado por esforço aplicado ao combate em jogos eletrônicos de estratégia em tempo real." Universidade Federal da Paraí­ba, 2014. http://tede.biblioteca.ufpb.br:8080/handle/tede/6128.

Full text
Abstract:
Made available in DSpace on 2015-05-14T12:36:51Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 4482656 bytes, checksum: 11b85e413d691749edd8d5be0d8f56d4 (MD5) Previous issue date: 2014-03-28<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES<br>Electronic games and, in particular, real-time strategy (RTS) games, are increasingly seen as viable and important fields for artificial intelligence research because of commonly held characteristics, like the presence of complex environments, usually dynamic and with multiple agents. In commercial RTS games, the computer beh
APA, Harvard, Vancouver, ISO, and other styles
2

Au, Manix. "Automatic State Construction using Decision Trees for Reinforcement Learning Agents." Queensland University of Technology, 2005. http://eprints.qut.edu.au/15965/.

Full text
Abstract:
Reinforcement Learning (RL) is a learning framework in which an agent learns a policy from continual interaction with the environment. A policy is a mapping from states to actions. The agent receives rewards as feedback on the actions performed. The objective of RL is to design autonomous agents to search for the policy that maximizes the expectation of the cumulative reward. When the environment is partially observable, the agent cannot determine the states with certainty. These states are called hidden in the literature. An agent that relies exclusively on the current observations will not
APA, Harvard, Vancouver, ISO, and other styles
3

Geißer, Florian [Verfasser], and Bernhard [Akademischer Betreuer] Nebel. "On planning with state-dependent action costs." Freiburg : Universität, 2018. http://d-nb.info/1189066688/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Alexander, Serena E. "From Planning to Action: An Evaluation of State Level Climate Action Plans." Cleveland State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=csu1470908879.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Davis, Gloria-Jeanne Halinski Ronald S. Lynn Mary Ann. "Affirmative action implementation in Illinois public state universities." Normal, Ill. Illinois State University, 1986. http://wwwlib.umi.com/cr/ilstu/fullcit?p8626589.

Full text
Abstract:
Thesis (Ph. D.)--Illinois State University, 1986.<br>Title from title page screen, viewed July 14, 2005. Dissertation Committee: Ronald S. Halinski, Mary Ann Lynn (co-chairs), Charles E. Morris, Jeanne B. Morris, Thomas W. Nelson. Includes bibliographical references (leaves 90-93) and abstract. Also available in print.
APA, Harvard, Vancouver, ISO, and other styles
6

Melo, Andrés Felipe. "A state-action model for design process planning." Thesis, University of Cambridge, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.619610.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nichols, B. "Reinforcement learning in continuous state- and action-space." Thesis, University of Westminster, 2014. https://westminsterresearch.westminster.ac.uk/item/967w8/reinforcement-learning-in-continuous-state-and-action-space.

Full text
Abstract:
Reinforcement learning in the continuous state-space poses the problem of the inability to store the values of all state-action pairs in a lookup table, due to both storage limitations and the inability to visit all states sufficiently often to learn the correct values. This can be overcome with the use of function approximation techniques with generalisation capability, such as artificial neural networks, to store the value function. When this is applied we can select the optimal action by comparing the values of each possible action; however, when the action-space is continuous this is not p
APA, Harvard, Vancouver, ISO, and other styles
8

Grönland, Axel, and Möllerstedt Viktor Eriksson. "Robust Reinforcement Learning in Continuous Action/State Space." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-293879.

Full text
Abstract:
In this project we aim to apply Robust Reinforce-ment Learning algorithms, presented by Doya and Morimoto [1],[2], to control problems. Specifically, we train an agent to balancea pendulum in the unstable equilibrium, which is the invertedstate.We investigate the performance of controllers based on twodifferent function approximators. One is quadratic, and the othermakes use of a Radial Basis Function neural network. To achieverobustness we will make use of an approach similar toH∞control, which amounts to introducing an adversary in the controlsystem.By changing the mass of the pendulum after
APA, Harvard, Vancouver, ISO, and other styles
9

Juma, Monica Kathina. "The politics of humanitarian assistance : state, non-state actors and displacement in Kenya and Uganda (1989-1998)." Thesis, University of Oxford, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365626.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Bezuidenhout, Karen. "Compensation for excessive but otherwise lawful regulatory state action." Thesis, Stellenbosch : Stellenbosch University, 2015. http://hdl.handle.net/10019.1/96819.

Full text
Abstract:
Thesis (LLD)--Stellenbosch University, 2015<br>ENGLISH ABSTRACT : Section 25 of the South African Constitution authorises and sets the limits for two forms of legitimate regulatory interference with property, namely deprivation and expropriation. The focus of this dissertation is on the requirement in section 25(1) that no law may authorise arbitrary deprivation of property. According to the Constitutional Court, deprivation is arbitrary when there is insufficient reason for it. The Court listed a number of factors to consider in determining whether there is a sufficient relationship between t
APA, Harvard, Vancouver, ISO, and other styles
More sources

Books on the topic "State Action Reward State Action"

1

Nwabunwanne, Ifediora Christopher. Obasanjo's 2nd missionary journey, a rescue action: Nigerian teachers reward now on earth! Creative Forum Publishers], 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

State action practice manual. 2nd ed. ABA Section of Antitrust Law, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Committee, Connecticut General Assembly Legislative Program Review and Investigations. Affirmative action in state government. The Committee, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Relations, Florida Advisory Council on Intergovernmental. State action on impact fees. The Council, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

1935-, Burrow J. W., ed. The limits of state action. Liberty Fund, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

(Palau), Ngardmau. Ngardmau State conservation action plan. [Ngardmau State, Republic of Palau], 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Governor's Advisory Board on Shelter, Nutrition, and Service Program for Homeless Individuals in Maryland. Homelessness: Recommendations for state action. The Board, 1986.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Congress: Facilitator of state action. State University of New York Press, 2010.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Peele, Cheri L. Washington State mercury chemical action plan. Washington State Dept. of Ecology, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Your governor: State government in action. Rosen Central Primary Source, 2004.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
More sources

Book chapters on the topic "State Action Reward State Action"

1

Powers, Jeanne M. "State Level Policy Action." In Charter Schools. Palgrave Macmillan US, 2009. http://dx.doi.org/10.1057/9780230622111_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Szabó, Kinga Tibori. "Self-Defence in State-to-State Conflicts." In Anticipatory Action in Self-Defence. T. M. C. Asser Press, 2011. http://dx.doi.org/10.1007/978-90-6704-796-8_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Catlin, George E. G. "Is the State the Community?" In Preface to Action. Routledge, 2021. http://dx.doi.org/10.4324/9781003139911-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Kallenberg, Lodewijk. "Finite State and Action MDPS." In International Series in Operations Research & Management Science. Springer US, 2003. http://dx.doi.org/10.1007/978-1-4615-0805-2_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wellman, Carl. "Moral Limits on State Action." In Terrorism and Counterterrorism. Springer Netherlands, 2013. http://dx.doi.org/10.1007/978-94-007-6007-3_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kuznicki, Jason. "The Falsification of State Action." In Technology and the End of Authority. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-48692-5_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Mabbott, J. D. "The Theories Limiting State Action." In The State and the Citizen. Routledge, 2021. http://dx.doi.org/10.4324/9781003222774-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cullity, Garrett. "Levels of climate action." In Climate Justice and Non-State Actors. Routledge, 2020. http://dx.doi.org/10.4324/9780429351877-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rochester, Colin. "A Perilous Partnership? Voluntary Action and the State." In Rediscovering Voluntary Action. Palgrave Macmillan UK, 2013. http://dx.doi.org/10.1057/9781137029461_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Singh, Jagpal. "Public action, castes, and the state." In Caste, State and Society. Routledge India, 2020. http://dx.doi.org/10.4324/9780429343063-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "State Action Reward State Action"

1

De Giacomo, Giuseppe, Marco Favorito, Luca Iocchi, Fabio Patrizi, and Alessandro Ronca. "Temporal Logic Monitoring Rewards via Transducers." In 17th International Conference on Principles of Knowledge Representation and Reasoning {KR-2020}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/kr.2020/89.

Full text
Abstract:
In Markov Decision Processes (MDPs), rewards are assigned according to a function of the last state and action. This is often limiting, when the considered domain is not naturally Markovian, but becomes so after careful engineering of extended state space. The extended states record information from the past that is sufficient to assign rewards by looking just at the last state and action. Non-Markovian Reward Decision Processes (NRMDPs) extend MDPs by allowing for non-Markovian rewards, which depend on the history of states and actions. Non-Markovian rewards can be specified in temporal logic
APA, Harvard, Vancouver, ISO, and other styles
2

Ostwald, Dirk, Rasmus Bruckner, and Hauke Heekeren. "Computational mechanisms of human state-action-reward contingency learning under perceptual uncertainty." In 2018 Conference on Cognitive Computational Neuroscience. Cognitive Computational Neuroscience, 2018. http://dx.doi.org/10.32470/ccn.2018.1078-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sun, Mingfei, and Xiaojuan Ma. "Adversarial Imitation Learning from Incomplete Demonstrations." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/487.

Full text
Abstract:
Imitation learning targets deriving a mapping from states to actions, a.k.a. policy, from expert demonstrations. Existing methods for imitation learning typically require any actions in the demonstrations to be fully available, which is hard to ensure in real applications. Though algorithms for learning with unobservable actions have been proposed, they focus solely on state information and over- look the fact that the action sequence could still be partially available and provide useful information for policy deriving. In this paper, we propose a novel algorithm called Action-Guided Adversari
APA, Harvard, Vancouver, ISO, and other styles
4

Seurin, Mathieu, Florian Strub, Philippe Preux, and Olivier Pietquin. "Don’t Do What Doesn’t Matter: Intrinsic Motivation with Action Usefulness." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/406.

Full text
Abstract:
Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the environment's salient interaction opportunities. We propose a new exploration method, called Don't Do What Doesn't Matter (DoWhaM), shifting the emphasis from state novelty to state with relevant act
APA, Harvard, Vancouver, ISO, and other styles
5

Melo, Andre´s Felipe, and P. John Clarkson. "Planning and Scheduling Based on an Explicit Representation of the State of the Design." In ASME 2002 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. ASMEDC, 2002. http://dx.doi.org/10.1115/detc2002/dtm-34008.

Full text
Abstract:
This paper describes a computational model that provides planning information useful for scheduling the design process. The model aims to reduce uncertainty in the design process and with it the risk of rework. The view is taken that planning is concerned with choosing between alternative actions and action sequences, but not with resource allocation. The planning model is based on an explicit representation of the state of the design process, the definition of the design capabilities as a pool of tasks, and on the generation and selection of plans by evaluating their reliability. Classical de
APA, Harvard, Vancouver, ISO, and other styles
6

Say, Buser, and Scott Sanner. "Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/669.

Full text
Abstract:
In this paper, we leverage the efficiency of Binarized Neural Networks (BNNs) to learn complex state transition models of planning domains with discretized factored state and action spaces. In order to directly exploit this transition structure for planning, we present two novel compilations of the learned factored planning problem with BNNs based on reductions to Boolean Satisfiability (FD-SAT-Plan) as well as Binary Linear Programming (FD-BLP-Plan). Experimentally, we show the effectiveness of learning complex transition models with BNNs, and test the runtime efficiency of both encodings on
APA, Harvard, Vancouver, ISO, and other styles
7

Shi, Wenjie, Shiji Song, and Cheng Wu. "Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/475.

Full text
Abstract:
Maximum entropy deep reinforcement learning (RL) methods have been demonstrated on a range of challenging continuous tasks. However, existing methods either suffer from severe instability when training on large off-policy data or cannot scale to tasks with very high state and action dimensionality such as 3D humanoid locomotion. Besides, the optimality of desired Boltzmann policy set for non-optimal soft value function is not persuasive enough. In this paper, we first derive soft policy gradient based on entropy regularized expected reward objective for RL with continuous actions. Then, we pre
APA, Harvard, Vancouver, ISO, and other styles
8

Barros, Gabriel Moraes, and Esther Colombini. "Reinforcement and Imitation Learning Applied to Autonomous Aerial Robot Control." In VIII Workshop de Teses e Dissertações em Robótica/Concurso de Teses e Dissertações em Robótica. Sociedade Brasileira de Computação - SBC, 2020. http://dx.doi.org/10.5753/wtdr_ctdr.2020.14956.

Full text
Abstract:
In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt, and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. Reinforcement Learning (RL) aims at addressing this problem by enabling a robot to learn behaviors through trial-and-error. With RL, a Neural Network can be trained as a function approximator to directly map states to actuator commands making any predefined control structure not-needed for training. However, the knowledge required to converge these methods is usually built f
APA, Harvard, Vancouver, ISO, and other styles
9

Guo, Jiaming, Rui Zhang, Xishan Zhang, et al. "Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/341.

Full text
Abstract:
Policy gradient methods are appealing in deep reinforcement learning but suffer from high variance of gradient estimate. To reduce the variance, the state value function is applied commonly. However, the effect of the state value function becomes limited in stochastic dynamic environments, where the unexpected state dynamics and rewards will increase the variance. In this paper, we propose to replace the state value function with a novel hindsight value function, which leverages the information from the future to reduce the variance of the gradient estimate for stochastic dynamic environments.
APA, Harvard, Vancouver, ISO, and other styles
10

Abadi, Eden, and Ronen I. Brafman. "Learning and Solving Regular Decision Processes." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/270.

Full text
Abstract:
Regular Decision Processes (RDPs) are a recently introduced model that extends MDPs with non-Markovian dynamics and rewards. The non-Markovian behavior is restricted to depend on regular properties of the history. These can be specified using regular expressions or formulas in linear dynamic logic over finite traces. Fully specified RDPs can be solved by compiling them into an appropriate MDP. Learning RDPs from data is a challenging problem that has yet to be addressed, on which we focus in this paper. Our approach rests on a new representation for RDPs using Mealy Machines that emit a distri
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "State Action Reward State Action"

1

Goldman, Kenneth, and Nancy A. Lynch. Modelling Shared State in a Shared Action Model. Defense Technical Information Center, 1990. http://dx.doi.org/10.21236/ada221279.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Corlette, Sabrina Corlette, Sarah J. Dash Dash, and Amy Thomas Thomas. Implementing the Affordable Care Act: State Action on Quality Improvement in State-Based Marketplaces. Commonwealth Fund, 2014. http://dx.doi.org/10.15868/socialsector.25006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Dell, Melissa, Nathaniel Lane, and Pablo Querubin. The Historical State, Local Collective Action, and Economic Development in Vietnam. National Bureau of Economic Research, 2017. http://dx.doi.org/10.3386/w23208.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Author, Not Given. New Mexico state information handbook: formerly utilized sites remedial action program. Office of Scientific and Technical Information (OSTI), 2014. http://dx.doi.org/10.2172/6662931.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lucia, Kevin W. Lucia, Sarah J. Dash Dash, and Amy Thomas Thomas. Implementing the Affordable Care Act: State Action to Establish SHOP Marketplaces. Commonwealth Fund, 2014. http://dx.doi.org/10.15868/socialsector.24991.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Houwer, Rebecca Houwer, Alexander Lovell Lovell, Uzo Anucha Anucha, and Andrew Galley Galley. Beyond Measure? The State of Evaluation and Action in Ontario's Youth Sector. Youth Research & Evaluation eXchange (YouthREX), 2016. http://dx.doi.org/10.15868/socialsector.33741.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Brown, Elizabeth, and R. Neal Elliott. State opportunities for action: Update of states' combined heat and power activities. Office of Scientific and Technical Information (OSTI), 2003. http://dx.doi.org/10.2172/1216241.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Corlette, Sabrina Corlette, Kevin W. Lucia Lucia, and Justin Giovannelli Giovannelli. Implementing the Affordable Care Act: State Action to Reform the Individual Health Insurance Market. Commonwealth Fund, 2014. http://dx.doi.org/10.15868/socialsector.25003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

van Wassenaer, Lan, Mireille van Hilten, Marcel van Asseldonk, and Erik van Ingen. Applying blockchain to climate action in agriculture : State of play and outlook : background paper. Wageningen Economic Research, 2021. http://dx.doi.org/10.18174/532926.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Sarraf, Saket, Shilpi Anand, Yash Shukla, Paul Mathew, and Reshma Singh. Building Energy Benchmarking in India: an Action Plan for Advancing the State-of-the-Art. Office of Scientific and Technical Information (OSTI), 2014. http://dx.doi.org/10.2172/1171348.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!