To see the other types of publications on this topic, follow the link: Reinforcement value.

Dissertations / Theses on the topic 'Reinforcement value'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 48 dissertations / theses for your research on the topic 'Reinforcement value.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Mahadevan, Swaminathan. "Probabilistic linear function approximation for value-based reinforcement learning." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98759.

Full text
Abstract:
Reinforcement learning (RL) is a computational framework for learning sequential decision strategies from the interaction of an agent with an unknown dynamic environment. This thesis focuses on value-based reinforcement learning methods, which rely on computing utility values for different behavior strategies that can be adopted by the agent. Real-world complex problems involve very large discrete or continuous state spaces where the use of approximate methods is required. It has been observed that subtle differences in the approximate methods result in very different theoretical properties an
APA, Harvard, Vancouver, ISO, and other styles
2

James, Michael. "The estimation of reward and value in reinforcement learning." Thesis, University of York, 2003. http://etheses.whiterose.ac.uk/14066/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Beus, Ben. "Conditioned Reinforcement and the Value of Praise in Children with Autism." DigitalCommons@USU, 2014. https://digitalcommons.usu.edu/etd/3848.

Full text
Abstract:
Many efforts in teaching children with autism are focused on increasing the value of praise as a reward for work. Increasing the value of praise can help children with autism to work in a natural setting, without requiring constant rewards of food or toys for work. In this study, I analyzed a pairing method—a technique of providing verbal praise while simultaneously providing a food reward—to assess whether it would result in an increased value for praise for participants in the study. First, a baseline phase was conducted in which praise statements were provided as a reward for a certain task
APA, Harvard, Vancouver, ISO, and other styles
4

Wingate, David. "Solving Large MDPs Quickly with Partitioned Value Iteration." Diss., CLICK HERE for online access, 2004. http://contentdm.lib.byu.edu/ETD/image/etd437.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mac, Dermed Liam Charles. "Value methods for efficiently solving stochastic games of complete and incomplete information." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50270.

Full text
Abstract:
Multi-agent reinforcement learning (MARL) poses the same planning problem as traditional reinforcement learning (RL): What actions over time should an agent take in order to maximize its rewards? MARL tackles a challenging set of problems that can be better understood by modeling them as having a relatively simple environment but with complex dynamics attributed to the presence of other agents who are also attempting to maximize their rewards. A great wealth of research has developed around specific subsets of this problem, most notably when the rewards for each agent are either the same or di
APA, Harvard, Vancouver, ISO, and other styles
6

Rossi, Martina. "Opponent Modelling using Inverse Reinforcement Learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22263/.

Full text
Abstract:
Un’area di ricerca particolarmente attiva ultimamente nel campo dell'intelligenza artificiale (IA) riguarda lo studio di agenti autonomi, notevolmente diffusi anche nella vita quotidiana. L'obiettivo principale è sviluppare agenti che interagiscano in modo efficiente con altri agenti o esseri umani. Di conseguenza, queste relazioni potrebbero essere notevolmente semplificate grazie alla capacità di dedurre autonomamente le preferenze di altre entità e di adattare di conseguenza la strategia dell'agente. Pertanto, lo scopo di questa tesi è implementare un agente, in grado di apprendere, che int
APA, Harvard, Vancouver, ISO, and other styles
7

Bredthauer, Jennifer Lyn Johnston James M. "The assessment of preference for qualitatively different reinforcers in persons with developmental and learning disabilities a comparison of value using behavioral economic and standard preference assessment procedures /." Auburn, Ala, 2009. http://hdl.handle.net/10415/1809.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Russell, Danielle M. "Using Progressive Ratio Schedules to Evaluate Edible, Leisure, and Token Reinforcement." Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc271888/.

Full text
Abstract:
The general purpose of the current study was to evaluate the potency of different categories of reinforcers with young children diagnosed with developmental delays. The participants were two boys and one girl who were between the ages of seven and eight. In Phase 1, we evaluated the reinforcing potency of tokens, edible items, and leisure items by using a progressive ratio (PR) schedule. For two participants, we found that tokens resulted in the highest PR break points. For one participant, edibles resulted in the highest break points (tokens were found to have the lowest break points). In Pha
APA, Harvard, Vancouver, ISO, and other styles
9

Masters, David M. "Verifying Value Iteration and Policy Iteration in Coq." Ohio University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1618999718015199.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Francisco, Monica T. "Evaluation of absolute and relative reinforcer value using progressive ratio schedules." Scholarly Commons, 2007. https://scholarlycommons.pacific.edu/uop_etds/672.

Full text
Abstract:
We evaluated behavior exhibited by individuals with developmental disabilities using progressive ratio schedules. High- and low-preferred stimuli were determined based on the results of a paired-stimulus preference assessment and were evaluated in subsequent reinforcer and progressive ratio assessments using concurrent- and single-operant schedules of presentation in a modified reversal design. Results showed that for two participants, stimuli determined to be low-preferred via a preference assessment functioned as reinforcers when evaluated independently of high-preferred stimuli and under gr
APA, Harvard, Vancouver, ISO, and other styles
11

Maglieri, Kristen A. "Assessing preference for and reinforcer value of employee- and manager-selected rewards in an organizational setting." abstract and full text PDF (free order & download UNR users only), 2005. http://0-gateway.proquest.com.innopac.library.unr.edu/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1433395.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Alt, Jonathan K. "Learning from Noisy and Delayed Rewards The Value of Reinforcement Learning to Defense Modeling and Simulation." Thesis, Monterey, California. Naval Postgraduate School, 2012. http://hdl.handle.net/10945/17313.

Full text
Abstract:
Approved for public release; distribution is unlimited<br>Modeling and simulation of military operations requires human behavior models capable of learning from experi-ence in complex environments where feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of AI learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action-value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning
APA, Harvard, Vancouver, ISO, and other styles
13

O'Daly, Matthew. "Influence of temporal context on value : an exploration of various operant conditioning procedures /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC IP addresses, 2005. http://wwwlib.umi.com/cr/ucsd/fullcit?p3159872.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Vinogradska, Julia [Verfasser], Jan [Akademischer Betreuer] Peters, and Carl [Akademischer Betreuer] Rasmussen. "Gaussian Processes in Reinforcement Learning: Stability Analysis and Efficient Value Propagation / Julia Vinogradska ; Jan Peters, Carl Rasmussen." Darmstadt : Universitäts- und Landesbibliothek Darmstadt, 2018. http://d-nb.info/1156713633/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Roper, Zachary Joseph Jackson. "The manifold role of reward value on visual attention." Diss., University of Iowa, 2015. https://ir.uiowa.edu/etd/2005.

Full text
Abstract:
The environment is abundant with visual information. Each moment, this information competes for representation in the brain. From billboards and pop-up ads to smart phones and flat screens, in modern society our attention is constantly drawn from one salient object to the next. Learning how to focus on the objects that are most important for the current task is a major developmental hurdle. Fortunately, rewards help us to learn what is important by providing feedback signals to the brain. Sometimes,
APA, Harvard, Vancouver, ISO, and other styles
16

Au, Manix. "Automatic State Construction using Decision Trees for Reinforcement Learning Agents." Thesis, Queensland University of Technology, 2005. https://eprints.qut.edu.au/15965/1/Manix_Au_Thesis.pdf.

Full text
Abstract:
Reinforcement Learning (RL) is a learning framework in which an agent learns a policy from continual interaction with the environment. A policy is a mapping from states to actions. The agent receives rewards as feedback on the actions performed. The objective of RL is to design autonomous agents to search for the policy that maximizes the expectation of the cumulative reward. When the environment is partially observable, the agent cannot determine the states with certainty. These states are called hidden in the literature. An agent that relies exclusively on the current observations will
APA, Harvard, Vancouver, ISO, and other styles
17

Au, Manix. "Automatic State Construction using Decision Trees for Reinforcement Learning Agents." Queensland University of Technology, 2005. http://eprints.qut.edu.au/15965/.

Full text
Abstract:
Reinforcement Learning (RL) is a learning framework in which an agent learns a policy from continual interaction with the environment. A policy is a mapping from states to actions. The agent receives rewards as feedback on the actions performed. The objective of RL is to design autonomous agents to search for the policy that maximizes the expectation of the cumulative reward. When the environment is partially observable, the agent cannot determine the states with certainty. These states are called hidden in the literature. An agent that relies exclusively on the current observations will not
APA, Harvard, Vancouver, ISO, and other styles
18

Smith, Aaron Paul. "NEUROBEHAVIORAL MEASUREMENTS OF NATURAL AND OPIOID REWARD VALUE." UKnowledge, 2019. https://uknowledge.uky.edu/psychology_etds/164.

Full text
Abstract:
In the last decade, (non)prescription opioid abuse, opioid use disorder (OUD) diagnoses, and opioid-related overdoses have risen and represent a significant public health concern. One method of understanding OUD is as a disorder of choice that requires choosing opioid rewards at the expense of other nondrug rewards. The characterization of OUD as a disorder of choice is important as it implicates decision- making processes as therapeutic targets, such as the valuation of opioid rewards. However, reward-value measurement and interpretation are traditionally different in substance abuse research
APA, Harvard, Vancouver, ISO, and other styles
19

Nebe, Stephan. "Value-based decision making and alcohol use disorder." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-233855.

Full text
Abstract:
Alcohol use disorder (AUD) is a widespread mental disease denoted by chronic alcohol use despite significant negative consequences for a person’s life. It affected more than 14 million persons in Europe alone and accounted for more than 5% of deaths worldwide in 2011-2012. Understanding the psychological and neurobiological mechanisms driving the development and maintenance of pathological alcohol use is key to conceptualizing new programs for prevention and therapy of AUD. There has been a variety of etiological models trying to describe and relate these mechanisms. Lately, the view of AUD as
APA, Harvard, Vancouver, ISO, and other styles
20

Kaisaravalli, Bhojraj Gokul, and Yeswanth Surya Achyut Markonda. "Policy-based Reinforcement learning control for window opening and closing in an office building." Thesis, Högskolan Dalarna, Mikrodataanalys, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-34420.

Full text
Abstract:
The level of indoor comfort can highly be influenced by window opening and closing behavior of the occupant in an office building. It will not only affect the comfort level but also affects the energy consumption, if not properly managed. This occupant behavior is not easy to predict and control in conventional way. Nowadays, to call a system smart it must learn user behavior, as it gives valuable information to the controlling system. To make an efficient way of controlling a window, we propose RL (Reinforcement Learning) in our thesis which should be able to learn user behavior and maintain
APA, Harvard, Vancouver, ISO, and other styles
21

Nord, Christina M. "The Behavioral Economics of Effort." Thesis, University of North Texas, 2014. https://digital.library.unt.edu/ark:/67531/metadc699857/.

Full text
Abstract:
Although response effort is considered a dimension of the cost to obtain reinforcement, little research has examined the economic impact of effort on demand for food. The goal of the present study was to explore the relationship between effort and demand. Three Sprague Dawley rats were trained to press a force transducer under a series of fixed-ratio schedules (1, 10, 18, 32, 56, 100, 180, 320, and 560) under different force requirements (5.6 g and 56 g). Thus, nominal unit price (responses / food) remained constant while minimal response force requirements varied. Using a force transducer all
APA, Harvard, Vancouver, ISO, and other styles
22

Nanduri, Vishnuteja. "Generation capacity expansion in restructured energy markets." [Tampa, Fla] : University of South Florida, 2009. http://purl.fcla.edu/usf/dc/et/SFE0003031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Liljekvist, Markus, and Daniel Andersson. "Kostnad-/nyttoanalys av bergtekniska förundersökningar med statistisk datavärdesanalys." Thesis, Luleå tekniska universitet, Institutionen för samhällsbyggnad och naturresurser, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-82089.

Full text
Abstract:
Många infrastrukturprojekt involverar byggnation i berg där det alltid finns osäkerheter att hantera, dessa kan minskas genom att utföra fler undersökningar. Eftersom förundersökningar i berg är relativt dyra gäller det att hitta en balans där fältprogrammet är ekonomiskt försvarbart. Ett området som studerats det senaste åren är att utreda kostnadsnyttan som förundersökningarna tillför. Metoden som har utvecklats kallas datavärdesanalys och används för att kunna bedöma kostnadsnyttan av att utföra ytterligare förundersökningar innan de är utförda. Syfte med studien är att utvärdera kostnadsny
APA, Harvard, Vancouver, ISO, and other styles
24

Kazhuthuveettil, Sreedharan Jithin. "Échantillonnage et inférence dans réseaux complexes." Thesis, Université Côte d'Azur (ComUE), 2016. http://www.theses.fr/2016AZUR4121/document.

Full text
Abstract:
L’émergence récente de grands réseaux, surtout réseaux sociaux en ligne (OSN), a révélé la difficulté de crawler le réseau complet et a déclenché le développement de nouvelles techniques distribuées. Dans cette thèse, nous concevons et analysons des algorithmes basés sur les marches aléatoires et la diffusion pour l'échantillonnage, l'estimation et l'inférence des fonctions des réseaux. La thèse commence par le problème classique de trouver les valeurs propres dominants et leurs vecteurs propres de matrices de graphe symétriques, comme la matrice Laplacienne de graphes non orientés. En utilisa
APA, Harvard, Vancouver, ISO, and other styles
25

Hill, Meagan E. "Adding Value to Recycled Polyethylene Through the Addition of Multi-Scale Reinforcements." University of Akron / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=akron1125419618.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Théro, Héloïse. "Contrôle, agentivité et apprentissage par renforcement." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE028/document.

Full text
Abstract:
Le sentiment d’agentivité est défini comme le sentiment de contrôler nos actions, et à travers elles, les évènements du monde extérieur. Cet ensemble phénoménologique dépend de notre capacité d’apprendre les contingences entre nos actions et leurs résultats, et un algorithme classique pour modéliser cela vient du domaine de l’apprentissage par renforcement. Dans cette thèse, nous avons utilisé l’approche de modélisation cognitive pour étudier l’interaction entre agentivité et apprentissage par renforcement. Tout d’abord, les participants réalisant une tâche d’apprentissage par renforcement ten
APA, Harvard, Vancouver, ISO, and other styles
27

Dolabela, Ana Carmen de Freitas Oliveira. "Um estudo sobre as possíveis interações entre o Chronic Mild Stress e o desempenho operante." Pontifícia Universidade Católica de São Paulo, 2004. https://tede2.pucsp.br/handle/handle/16810.

Full text
Abstract:
Made available in DSpace on 2016-04-29T13:18:05Z (GMT). No. of bitstreams: 1 ana carmen.pdf: 438103 bytes, checksum: 918d056717333d77fc09477efa760621 (MD5) Previous issue date: 2004-04-15<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior<br>Chronic Mild Stress-induced anhedonia is an animal experimental model that exposes rats to a mild stressors regime for a long period of time. This model was proposed in 1987 Willner, Towell, Sampson, Sophokleus e Muscat. The purpose of the present study was to verify whether the exposure of rats to an operate procedure when a concurrent sch
APA, Harvard, Vancouver, ISO, and other styles
28

Thomaz, Cássia Roberta da Cunha. "Efeito da submissão ao chronic mild stress (CMS) sobre o valor reforçador do estímulo." Pontifícia Universidade Católica de São Paulo, 2001. https://tede2.pucsp.br/handle/handle/16811.

Full text
Abstract:
Made available in DSpace on 2016-04-29T13:18:06Z (GMT). No. of bitstreams: 1 diss completa.pdf: 1344921 bytes, checksum: 0d5eb60c4e0b5309d3eaeef2b4c9187d (MD5) Previous issue date: 2001-09-28<br>Fundação de Amparo a Pesquisa do Estado de São Paulo<br>Chronic Mild Stress (CMS) is an experimental model for depression: Rats are submitted to a set of stressing conditions and as a result their consumption of water and water with sucrose, as well as the animals previous preference for water and water and sucrose also drops. It is argued that the stress changes the organism and, consequently,
APA, Harvard, Vancouver, ISO, and other styles
29

Rouault, Marion. "Integration of beliefs and affective values in human decision-making." Thesis, Paris, Ecole normale supérieure, 2015. http://www.theses.fr/2015ENSU0052/document.

Full text
Abstract:
Le contrôle exécutif de l'action fait référence a la capacité de l'homme a contrôler et adapter son comportement de manière flexible, en lien avec ses états mentaux internes. Il repose sur l’évaluation des conséquences des actions pour ajuster les choix futurs. Les actions peuvent être renforcées ou dévalues en fonction de la valeur affective des conséquences, impliquant notamment les ganglions de la base et le cortex préfrontal médian. En outre, les conséquences des actions portent une information, qui permet d'ajuster le comportement en relation avec des croyances internes, impliquant le cor
APA, Harvard, Vancouver, ISO, and other styles
30

Schach, Rainer, and Manuel Hentschel. "Grundlagen für die Nutzwertanalyse für Verstärkungen aus textilbewehrtem Beton." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2009. http://nbn-resolving.de/urn:nbn:de:bsz:14-ds-1244049476991-75979.

Full text
Abstract:
Im Rahmen des Transferprojektes sollen baubetriebliche Rahmenbedingungen und Kennwerte, die zur Beurteilung der wirtschaftlichen Anwendung des Verfahrens geeignet sind, erarbeitet werden. Untersucht werden soll die Applikation von textilbewehrtem Beton im Bereich der Sanierung und Verstärkung von großflächigen Betonbauteilen. Generell können Bauaufgaben in sehr vielen Fällen durch verschiedene Bauverfahren realisiert werden, die sich regelmäßig hinsichtlich der Kosten, der benötigten Bauzeit aber auch hinsichtlich der gelieferten Qualität und des Einflusses auf die Umwelt unterscheiden. Aus ba
APA, Harvard, Vancouver, ISO, and other styles
31

Janin, Jean-Pierre. "Tunnels en milieu urbain : Prévisions des tassements avec prise en compte des effets des pré-soutènements (renforcement du front de taille et voûte-parapluie)." Phd thesis, INSA de Lyon, 2012. http://tel.archives-ouvertes.fr/tel-00743362.

Full text
Abstract:
Dans les années 80, de nouvelles techniques, basées sur la mise en place de système de présoutènements à l'avant du front de taille, se sont développées dans le domaine du creusement de tunnels dans des terrains difficiles. Ce travail de thèse représente une contribution à l'étude des effets de deux types de présoutènements mis en œuvre, le boulonnage au front et la voûte parapluie, sur les déformations du massif et sur la valeur du taux de déconfinement. L'objectif étant d'améliorer les méthodes de prévision des tassements en surface pour les tunnels réalisés avec des techniques de présoutène
APA, Harvard, Vancouver, ISO, and other styles
32

Sanguanpuak, T. (Tachporn). "Radio resource sharing with edge caching for multi-operator in large cellular networks." Doctoral thesis, Oulun yliopisto, 2019. http://urn.fi/urn:isbn:9789526221564.

Full text
Abstract:
Abstract The aim of this thesis is to devise new paradigms on radio resource sharing including cache-enabled virtualized large cellular networks for mobile network operators (MNOs). Also, self-organizing resource allocation for small cell networks is considered. In such networks, the MNOs rent radio resources from the infrastructure provider (InP) to support their subscribers. In order to reduce the operational costs, while at the same time to significantly increase the usage of the existing network resources, it leads to a paradigm where the MNOs share their infrastructure, i.e., base station
APA, Harvard, Vancouver, ISO, and other styles
33

Bernigau, Holger. "Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop." Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-164734.

Full text
Abstract:
Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a s
APA, Harvard, Vancouver, ISO, and other styles
34

Painter-Wakefield, Christopher Robert. "Sparse Value Function Approximation for Reinforcement Learning." Diss., 2013. http://hdl.handle.net/10161/7250.

Full text
Abstract:
<p>A key component of many reinforcement learning (RL) algorithms is the approximation of the value function. The design and selection of features for approximation in RL is crucial, and an ongoing area of research. One approach to the problem of feature selection is to apply sparsity-inducing techniques in learning the value function approximation; such sparse methods tend to select relevant features and ignore irrelevant features, thus automating the feature selection process. This dissertation describes three contributions in the area of sparse value function approximation for reinforcemen
APA, Harvard, Vancouver, ISO, and other styles
35

Mitchley, Michael. "Adaptive value function approximation in reinforcement learning using wavelets." Thesis, 2016. http://hdl.handle.net/10539/19298.

Full text
Abstract:
A thesis submitted to the Faculty of Science, School of Computational and Applied Mathematics University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, South Africa, July 2015.<br>Reinforcement learning agents solve tasks by finding policies that maximise their reward over time. The policy can be found from the value function, which represents the value of each state-action pair. In continuous state spaces, the value function must be approximated. Often, this is done using a fixed linear combination of functions ac
APA, Harvard, Vancouver, ISO, and other styles
36

Fazel, Hesham. "Exploring the influence of social threat and value reinforcement on emotional reactions to value transgressions." 2013. http://hdl.handle.net/1993/17593.

Full text
Abstract:
Religiosity and morality constitute the fundamental components of any culture and set up rules and regulate interpersonal behavior. In the context of religion, to understanding value transgressions, their emotional consequences and the moderating role of social threats (in-group and out-group interactions), the psychological underpinnings of value-reinforcement, and complementing role of self-affirmation at a group level represent the focal points of this dissertation. The findings of study 1 show that value transgression has a direct effect on the level of negative emotion experienced by the
APA, Harvard, Vancouver, ISO, and other styles
37

Tsao, Yi-Ting, and 曹怡亭. "Laplacian Based State-value Function Transfer in Discrete Reinforcement Learning." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/98130616801409287726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Tung-ChengWu and 吳東承. "Moderating Maximal Value - a Practical Expectation-Based Method for Value Function Approximation in Reinforcement Learning." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6mj7s9.

Full text
Abstract:
碩士<br>國立成功大學<br>工程科學系<br>107<br>In this study, we propose a practical expectation-based value function approximation method to decrease the value overestimation in temporal-difference (TD) learning. Because of Optimizer's Curse, value will be overestimated easily either in softmax policy or greedy policy. In order to address this problem, we use the tanh value of action-value instead of action-value to calculate policy. Tanh function will limit the extreme and decline the influence of maximal action-value in policy. With this tanh softmax policy, our expectation-based method can decrease the
APA, Harvard, Vancouver, ISO, and other styles
39

Vinogradska, Julia. "Gaussian Processes in Reinforcement Learning: Stability Analysis and Efficient Value Propagation." Phd thesis, 2018. https://tuprints.ulb.tu-darmstadt.de/7286/1/GPs_in_RL_Stability_Analysis_and_Efficient_Value_Propagation_Version1.pdf.

Full text
Abstract:
Control of nonlinear systems on continuous domains is a challenging task for various reasons. For robust and accurate control of complex systems a precise model of the system dynamics is essential. Building such highly precise dynamics models from physical knowledge often requires substantial manual effort and poses a great challenge in industrial applications. Acquiring a model automatically from system measurements employing regression techniques allows to decrease manual effort and, thus, poses an interesting alternative to knowledge-based modeling. Based on such a learned dynamics mo
APA, Harvard, Vancouver, ISO, and other styles
40

Fan, Huiyuan. "Sequential frameworks for statistics-based value function representation in approximate dynamic programming." 2008. http://hdl.handle.net/10106/1099.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Taylor, Gavin. "Feature Selection for Value Function Approximation." Diss., 2011. http://hdl.handle.net/10161/3891.

Full text
Abstract:
<p>The field of reinforcement learning concerns the question of automated action selection given past experiences. As an agent moves through the state space, it must recognize which state choices are best in terms of allowing it to reach its goal. This is quantified with value functions, which evaluate a state and return the sum of rewards the agent can expect to receive from that state. Given a good value function, the agent can choose the actions which maximize this sum of rewards. Value functions are often chosen from a linear space defined by a set of features; this method offers a con
APA, Harvard, Vancouver, ISO, and other styles
42

Banhudo, Guilherme Sousa Falcão Duarte. "Adaptive value-at-risk policy optimization: a deep reinforcement learning approach for minimizing the capital charge." Master's thesis, 2019. http://hdl.handle.net/10071/19197.

Full text
Abstract:
In 1995, the Basel Committee on Banking Supervision emitted an amendment to the first Basel Accord, allowing financial institutions to develop internal risk models, based on the value-at-risk (VaR), as opposed to using the regulator’s predefined model. From that point onwards, the scientific community has focused its efforts on improving the accuracy of the VaR models to reduce the capital requirements stipulated by the regulatory framework. In contrast, some authors proposed that the key towards disclosure optimization would not lie in improving the existing models, but in manipulating the es
APA, Harvard, Vancouver, ISO, and other styles
43

Maurilus, Emmy. "The Effect of the Establishment of Reinforcement Value for Math on Rate of Learning for Pre-Kindergarten Students." Thesis, 2018. https://doi.org/10.7916/D8WM2WFN.

Full text
Abstract:
The objective of Experiment I was to determine whether establishing conditioned reinforcement for engaging in math for pre-kindergarten students was possible using the three conditioning procedures outlined in previous research for conditioning book stimuli. The purpose of Experiment II was to determine whether this change in preference for engaging in math had an effect on 6 pre-kindergarten participants’ rate of learning math. In Experiment I a counterbalanced pre- and post-intervention ABAB/BABA functional analysis and a delayed multiple probe across dyads design, was used to measure the in
APA, Harvard, Vancouver, ISO, and other styles
44

Nebe, Stephan. "Value-based decision making and alcohol use disorder." Doctoral thesis, 2017. https://tud.qucosa.de/id/qucosa%3A30840.

Full text
Abstract:
Alcohol use disorder (AUD) is a widespread mental disease denoted by chronic alcohol use despite significant negative consequences for a person’s life. It affected more than 14 million persons in Europe alone and accounted for more than 5% of deaths worldwide in 2011-2012. Understanding the psychological and neurobiological mechanisms driving the development and maintenance of pathological alcohol use is key to conceptualizing new programs for prevention and therapy of AUD. There has been a variety of etiological models trying to describe and relate these mechanisms. Lately, the view of AUD as
APA, Harvard, Vancouver, ISO, and other styles
45

Mueller, William Graham. "Essays on Network Formation." Thesis, 2012. http://hdl.handle.net/1969.1/ETD-TAMU-2012-08-11705.

Full text
Abstract:
This dissertation contains two essays which examine the roles that individual incentives, competition, and information play in network formation. In the first essay, I examine a model in which two competing groups offer different allocation rules that may depend on the network of connections among the individuals that make up each group. I assume the existence of a single divisible good, such as a monetary prize, which will be divided amongst the members of the winning network. The probability of winning the prize will depend on the network sizes. I examine two well-known allocation rules: the
APA, Harvard, Vancouver, ISO, and other styles
46

Li, Richard. "Toward Growth-Accommodating Polymeric Heart Valves with Graphene-Network Reinforcement." Thesis, 2021. https://doi.org/10.7916/d8-wjg8-rk59.

Full text
Abstract:
Graphene is a 2D material well known for its high intrinsic strength of 100 GPa and Young’s modulus of 1 TPa. Because of its 2D nature, the most promising avenues to utilize graphene as a mechanical material include incorporating it as reinforcement in a nanocomposite and creating free-standing foams and aerogels. However, the current techniques are not well-controlled – the reinforcing graphene particles are often discontinuous and randomly dispersed – making it difficult to accurately model and predict the resulting material properties. Here we aim to develop a framework for a new cla
APA, Harvard, Vancouver, ISO, and other styles
47

Bernigau, Holger. "Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop: Causal Models over Infinite Graphs and their Application to theSensorimotor Loop: General Stochastic Aspects and GradientMethods for Optimal Control." Doctoral thesis, 2014. https://ul.qucosa.de/id/qucosa%3A13254.

Full text
Abstract:
Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a s
APA, Harvard, Vancouver, ISO, and other styles
48

Carstens, Christoffel. "Invloed van televisie op die verwestersingsproses by die Swart adolessent." Thesis, 1995. http://hdl.handle.net/10500/16335.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!