Literatura académica sobre el tema "Exploitation dilemma"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Exploitation dilemma".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Exploitation dilemma"

1

Berger-Tal, Oded, Jonathan Nathan, Ehud Meron y David Saltz. "The Exploration-Exploitation Dilemma: A Multidisciplinary Framework". PLoS ONE 9, n.º 4 (22 de abril de 2014): e95693. http://dx.doi.org/10.1371/journal.pone.0095693.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

James, Russell N. "Exploration-exploitation: A cognitive dilemma still unresolved". Cognitive Neuroscience 6, n.º 4 (28 de agosto de 2015): 219–21. http://dx.doi.org/10.1080/17588928.2015.1051012.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Laureiro-Martínez, Daniella, Stefano Brusoni y Maurizio Zollo. "The neuroscientific foundations of the exploration−exploitation dilemma." Journal of Neuroscience, Psychology, and Economics 3, n.º 2 (noviembre de 2010): 95–115. http://dx.doi.org/10.1037/a0018495.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Benner, Mary J. y Michael L. Tushman. "Exploitation, Exploration, and Process Management: The Productivity Dilemma Revisited". Academy of Management Review 28, n.º 2 (1 de abril de 2003): 238. http://dx.doi.org/10.2307/30040711.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Benner, Mary J. y Michael L. Tushman. "Exploitation, Exploration, and Process Management: The Productivity Dilemma Revisited". Academy of Management Review 28, n.º 2 (abril de 2003): 238–56. http://dx.doi.org/10.5465/amr.2003.9416096.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Awasthi, Ashutosh, Kripal Singh, Audrey O’Grady, Ronan Courtney, Alok Kalra, Rana Pratap Singh, Artemi Cerdà, Yosef Steinberger y D. D. Patra. "Designer ecosystems: A solution for the conservation-exploitation dilemma". Ecological Engineering 93 (agosto de 2016): 73–75. http://dx.doi.org/10.1016/j.ecoleng.2016.05.010.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Lunnan, Randi y Theodor Barth. "Managing the exploration vs. exploitation dilemma in transnational “bridging teams”". Journal of World Business 38, n.º 2 (mayo de 2003): 110–26. http://dx.doi.org/10.1016/s1090-9516(03)00005-1.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Yogeswaran, Mohan y S. G. Ponnambalam. "Reinforcement learning: exploration–exploitation dilemma in multi-agent foraging task". OPSEARCH 49, n.º 3 (10 de abril de 2012): 223–36. http://dx.doi.org/10.1007/s12597-012-0077-2.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

De Cremer, David. "Trust and fear of exploitation in a public goods dilemma". Current Psychology 18, n.º 2 (junio de 1999): 153–63. http://dx.doi.org/10.1007/s12144-999-1024-0.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Domenech, Philippe, Sylvain Rheims y Etienne Koechlin. "Neural mechanisms resolving exploitation-exploration dilemmas in the medial prefrontal cortex". Science 369, n.º 6507 (27 de agosto de 2020): eabb0184. http://dx.doi.org/10.1126/science.abb0184.

Texto completo
Resumen
Everyday life often requires arbitrating between pursuing an ongoing action plan by possibly adjusting it versus exploring a new action plan instead. Resolving this so-called exploitation-exploration dilemma involves the medial prefrontal cortex (mPFC). Using human intracranial electrophysiological recordings, we discovered that neural activity in the ventral mPFC infers and tracks the reliability of the ongoing plan to proactively encode upcoming action outcomes as either learning signals or potential triggers to explore new plans. By contrast, the dorsal mPFC exhibits neural responses to action outcomes, which results in either improving or abandoning the ongoing plan. Thus, the mPFC resolves the exploitation-exploration dilemma through a two-stage, predictive coding process: a proactive ventromedial stage that constructs the functional signification of upcoming action outcomes and a reactive dorsomedial stage that guides behavior in response to action outcomes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Más fuentes

Tesis sobre el tema "Exploitation dilemma"

1

Bouhlel, Imen. "Essais sur le dilemme exploration-exploitation". Thesis, Université Côte d'Azur (ComUE), 2019. http://theses.univ-cotedazur.fr/2019AZUR0037.

Texto completo
Resumen
Au cours des deux dernières décennies, de nombreux travaux empiriques ont mis en lumière des divergences dans les choix individuels lorsque ceux-ci sont faits à partir d’une description de l’espace des états du monde et de leurs probabilités sous-jacentes (decision from description), et lorsqu’ils résultent de l’expérimentation de cet espace via l’échantillonnage (decision from experience). En effet, dans le premier cas, l’individu dispose d’une connaissance parfaite de l’espace des états du monde. Par différence, dans le second cas, l’individu ne connaît pas à l’avance toutes les alternatives possibles ou/et leurs probabilités de survenance. Cette divergence entre les choix individuels observés dans ces deux configurations est communément qualifiée de description/experience gap. Le phénomène d’undersearch est l’une des causes avancées dans la littérature pour expliquer cet écart. Etant donnée l’importance de la question du choix en incertitude en économie, le processus de search mérite une attention plus approfondie. Cette thèse a pour but de contribuer à la littérature théorique et expérimentale qui étudie ce processus et le dilemme exploration-exploitation qui lui est inhérent, tant au niveau individuel que collectif. La thèse est composée de 3 essais combinant modélisation théorique, modélisation multi-agents, algorithmes évolutionnistes et expériences en laboratoire. Le premier chapitre de cette thèse examine les déterminants du processus de search dans le contexte d’un problème individuel d’arrêt optimal (optimal stopping problem). Les résultats obtenus montrent que ce processus dépend en grande partie du degré de certitude de l’information et que le regret et l’anticipation y jouent un rôle important. Le deuxième chapitre étudie le comportement de partage d’informations dans le cadre d’une recherche collective concurrentielle à l’aide de simulations multi-agents et d’algorithmes évolutionnistes. Il met en évidence l’existence de bénéfices individuels au partage, même lorsque les autres ne partagent pas en retour, à condition que deux mécanismes soient présents: l’imitation avec un certain niveau d’innovation et la visibilité locale. Le troisième chapitre teste et valide expérimentalement ces résultats et souligne le rôle crucial de l’apprentissage
A growing body of empirical evidence during the two last decades has been showing inconsistencies between individual choices when the individuals make decisions from description (i.e., when they are provided with a perfect knowledge about the states space, including all the possible outcomes, and the underlying probabilities), compared to when they make decisions from experience (i.e., when they do not know all the possible outcomes or/and their occurrence probabilities). These inconsistencies are referred to as the description/experience gap. Undersearch has been pointed out as one of the key determinants of this gap. Hence, even though little studied in economics, search becomes a central question, deserving serious interest. This thesis aims at contributing to the theoretical and experimental literature studying search and the related exploration-exploitation dilemma, both at the individual and at the collective level. The thesis is made of 3 essays, combining theoretical, agent-based modelling, evolutionary simulations and laboratory experiments. The first chapter of this thesis examines the determinants of search behavior in the context of an individual optimal stopping problem and shows that this behavior largely depends on the degree of certainty of the information, and is affected by both regret and anticipation. The second chapter investigates information sharing behavior in competitive collective search using agent-based and evolutionary simulations. It finds robust evidence for the individual benefits of sharing, even when others do not reciprocate, as long as two mechanisms as present: Imitation with a certain level of innovation and local visibility. The third chapter experimentally tests and supports the validity of theses results, and stresses the crucial role of learning
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Fruit, Ronan. "Exploration-exploitation dilemma in reinforcement learning under various form of prior knowledge". Thesis, Lille 1, 2019. http://www.theses.fr/2019LIL1I086.

Texto completo
Resumen
Combinés à des réseaux de neurones profonds ("Deep Neural Networks"), certains algorithmes d'apprentissage par renforcement tels que "Q-learning" ou "Policy Gradient" sont désormais capables de battre les meilleurs joueurs humains à la plupart des jeux de console Atari ainsi qu'au jeu de Go. Malgré des résultats spectaculaires et très prometteurs, ces méthodes d'apprentissage par renforcement dit "profond" ("Deep Reinforcement Learning") requièrent un nombre considérable d'observations pour apprendre, limitant ainsi leur déploiement partout où l'obtention de nouveaux échantillons s'avère coûteuse. Le manque d'efficacité de tels algorithmes dans l'exploitation des échantillons peut en partie s'expliquer par l'utilisation de réseaux de neurones profonds, connus pour être très gourmands en données. Mais il s'explique surtout par le recours à des algorithmes de renforcement explorant leur environnement de manière inefficace et non ciblée. Ainsi, des algorithmes tels que Q-learning ou encore Policy-Gradient exécutent des actions partiellement randomisées afin d'assurer une exploration suffisante. Cette stratégie est dans la plupart des cas inappropriée pour atteindre un bon compromis entre l'exploration indispensable à la découverte de nouvelles régions avantageuses (aux récompenses élevées), et l'exploitation de régions déjà identifiées comme telles. D'autres approches d'apprentissage par renforcement ont été développées, pour lesquelles il est possible de garantir un meilleur compromis exploration-exploitation, parfois proche de l'optimum théorique. Cet axe de recherche s'inspire notamment de la littérature sur le cas particulier du problème du bandit manchot, avec des algorithmes s'appuyant souvent sur le principe "d'optimisme dans l'incertain". Malgré les nombreux travaux sur le compromis exploration-exploitation, beaucoup dequestions restent encore ouvertes. Dans cette thèse, nous nous proposons de généraliser les travaux existants sur le compromis exploration-exploitation à des contextes différents, avec plus ou moins de connaissances a priori. Nous proposons plusieurs améliorations des algorithmes de l'état de l'art ainsi qu'une analyse théorique plus fine permettant de répondre à plusieurs questions ouvertes sur le compromis exploration-exploitation. Nous relâchons ensuite l'hypothèse peu réaliste (bien que fréquente) selon laquelle il existe toujours un chemin permettant de relier deux régions distinctes de l'environnement. Le simple fait de relâcher cette hypothèse permet de mettre en lumière l'impact des connaissances a priori sur les limites intrinsèques du compromis exploration-exploitation. Enfin, nous montrons comment certaines connaissances a priori comme l'amplitude de la fonction valeur ou encore des ensembles de macro-actions peuvent être exploitées pour accélérer l'apprentissage. Tout au long de cette thèse, nous nous sommes attachés à toujours tenir compte de la complexité algorithmique des différentes méthodes proposées. Bien que relativement efficaces, tous les algorithmes présentés nécessitent une phase de planification et souffrent donc du problème bien connu du "fléau de la dimension", ce qui limite fortement leur potentiel applicatif (avec les méthodes actuelles). L'objectif phare des présents travaux est d'établir des principes généraux pouvant être combinés avec des approches plus heuristiques pour dépasser les limites des algorithmes actuels
In combination with Deep Neural Networks (DNNs), several Reinforcement Learning (RL) algorithms such as "Q-learning" of "Policy Gradient" are now able to achieve super-human performaces on most Atari Games as well as the game of Go. Despite these outstanding and promising achievements, such Deep Reinforcement Learning (DRL) algorithms require millions of samples to perform well, thus limiting their deployment to all applications where data acquisition is costly. The lack of sample efficiency of DRL can partly be attributed to the use of DNNs, which are known to be data-intensive in the training phase. But more importantly, it can be attributed to the type of Reinforcement Learning algorithm used, which only perform a very inefficient undirected exploration of the environment. For instance, Q-learning and Policy Gradient rely on randomization for exploration. In most cases, this strategy turns out to be very ineffective to properly balance the exploration needed to discover unknown and potentially highly rewarding regions of the environment, with the exploitation of rewarding regions already identified as such. Other RL approaches with theoretical guarantees on the exploration-exploitation trade-off have been investigated. It is sometimes possible to formally prove that the performances almost match the theoretical optimum. This line of research is inspired by the Multi-Armed Bandit literature, with many algorithms relying on the same underlying principle often referred as "optimism in the face of uncertainty". Even if a significant effort has been made towards understanding the exploration-exploitation dilemma generally, many questions still remain open. In this thesis, we generalize existing work on exploration-exploitation to different contexts with different amounts of prior knowledge on the learning problem. We introduce several algorithmic improvements to current state-of-the-art approaches and derive a new theoretical analysis which allows us to answer several open questions of the literature. We then relax the (very common although not very realistic) assumption that a path between any two distinct regions of the environment should always exist. Relaxing this assumption highlights the impact of prior knowledge on the intrinsic limitations of the exploration-exploitation dilemma. Finally, we show how some prior knowledge such as the range of the value function or a set of macro-actions can be efficiently exploited to speed-up learning. In this thesis, we always strive to take the algorithmic complexity of the proposed algorithms into account. Although all these algorithms are somehow computationally "efficient", they all require a planning phase and therefore suffer from the well-known "curse of dimensionality" which limits their applicability to real-world problems. Nevertheless, the main focus of this work is to derive general principles that may be combined with more heuristic approaches to help overcome current DRL flaws
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Prange, Christiane y Bodo B. Schlegelmilch. "The Role of Ambidexterity in Marketing Strategy Implementation: Resolving the Exploration-Exploitation Dilemma". SpringerOpen, 2009. http://dx.doi.org/10.1007/BF03342712.

Texto completo
Resumen
Formulating consistent marketing strategies is a difficult task, but successfully implementing them is even more challenging. This is even more pertinent as marketing strategies quite often incorporate inherent conflicts between major breakthroughs and consolidation. Consequently, marketers need to balance exploratory and exploitative strategies. However, the literature lacks concrete insights for marketing managers as to how exploratory and exploitative strategies can be best combined. This paper addresses this issue by introducing a framework of multiple types of ambidexterity. Based on qualitative research, tools and procedures are identified to overcome marketing dilemmas and support strategy implementation by drawing on ambidextrous designs. (authors' abstract)
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Cogliati, Dezza Irene. "“Vanilla, Vanilla .but what about Pistachio?” A Computational Cognitive Clinical Neuroscience Approach to the Exploration-Exploitation Dilemma". Doctoral thesis, Universite Libre de Bruxelles, 2018. https://dipot.ulb.ac.be/dspace/bitstream/2013/278730/3/Document1.pdf.

Texto completo
Resumen
On the 24th November of 1859, Charles Darwin published the first edition of The Origin of Species. One hundred fifty-nine years later, our understanding of human and animal adaptation to the surrounding environment remains a major scientific challenge. How do humans and animals generate apt decision strategies in order to achieve this adaptation? How does their brain efficiently carry out complex computations in order to produce such adaptive behaviors? Although an exhaustive answer to these questions continues to feel out of reach, the investigation of adaptive processing results relevant in understanding mind/brain relationship and in elucidating scenarios where mind/brain interactions are corrupted such as in psychiatric disorders. Additionally, understanding how the brain efficiently scales problems when producing complex and adaptive behaviors can inspire and contribute to resolve Artificial Intelligence (AI) problems (e.g. scaling problems, generalization etc.) and consequently to the develop intelligent machines. During my PhD, I investigated adaptive behaviors at behavioral, cognitive, and neural level. I strongly believe that, as Marr already pointed out, in order to understand how our brain-machine works we need to investigate the phenomenon from 3 different levels: behavioral, algorithm and neural implementation. For this reason, throughout my doctoral work I took advantages of computational modeling methods together with cognitive neuroscience techniques in order to investigate the underlying mechanisms of adaptive behaviors.
Doctorat en Sciences psychologiques et de l'éducation
info:eu-repo/semantics/nonPublished
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Degelder, Francois y Robert Melbye. "Competence Development : What can project-based organizations learn from the management of a hockey team?" Thesis, Linköpings universitet, Företagsekonomi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-141786.

Texto completo
Resumen
Project-based organizations (PBOs) have drawn attention in business management and represent an increasingly important part of organizations. If managing by projects represents an adapted way to cope with the current environment, it also comes with new challenges. This research brings light on the organizational tensions between immediate performance and sustained performance in PBOs by focusing on competence development as the crucial means to achieve sustained performance. Because PBOs are temporary by nature, competence development represents one of their challenges. Therefore, this research was conducted to gain a better understanding of how this tradeoff can be managed by PBOs. With that purpose, we both researched how this organizational tradeoff and competence development processes were managed in a hockey organization. In sport organizations, player succession is crucial to the organization’s overall performance and survival, therefore making competence development a key activity. The research led us to grasp a better understanding of the nature of the tradeoff between immediate performance and sustained performance as well as brought additional findings on competence development processes. More specifically, it was found that this tradeoff requires adaptation to project stages. We summarized and visualized the findings by providing a framework that can act as a tool for practitioners in PBOs to understand and therefore manage the tradeoff between immediate performance and sustained performance by implementing competence development.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

MEULEAU, NICOLAS. "Le dilemme entre exploration et exploitation dans l'apprentissage par renforcement : optimisation adaptative des modeles de decision multi-etats". Caen, 1996. http://www.theses.fr/1996CAEN2038.

Texto completo
Resumen
Cette these s'interesse au dilemme entre l'exploration et l'exploitation tel qu'il se pose dans les algorithmes de l'apprentissage par renforcement, c'est-a-dire au probleme du choix de l'action lors de l'optimisation adaptative des modeles de decision multi-etats. Nous nous focalisons sur le cas des processus de decision markoviens. L'apprentissage par renforcement se caracterise par l'utilisation de solutions approchees. Notre recherche vise a ameliorer ces solutions. Dans ce but, nous nous inspirons du travail d'autres communautes comme l'aide a la decision et la commande optimale adaptative. Au travers de la presentation des resultats des differents domaines, nous soulignons les trois difficultes suivantes: 1 l'impossibilite d'obtenir des certitudes sur les parametres inconnus avant un nombre infini d'experimentations, et donc la necessite de choisir entre ne jamais abandonner completement l'exploration et prendre le risque de se focaliser sur une solution sous-optimale ; 2 l'insuffisance des raisonnements a l'echelle locale, c'est-a-dire la necessite de savoir anticiper, depuis un etat du modele, les observations qui pourront etre faites dans les autres etats ; 3 la sensibilite des algorithmes a la representation du probleme utilisee. Beaucoup d'algorithmes de l'apprentissage par renforcement utilisent une approche distribuee qui consiste a representer le probleme de l'apprentissage d'un modele multi-etats, par un ensemble de problemes a un etat appeles problemes de bandit. Nous soulignons quelques limites de cette demarche, et en particulier le fait qu'elle n'est pas satisfaisante au regard du deuxieme des points enonces ci-dessus. Nous proposons de contourner cela en introduisant un mecanisme de retropropagation de l'incertitude mesuree, de maniere a simuler un raisonnement a l'echelle globale. Cela permet de concevoir des algorithmes satisfaisant au regard des trois difficultes soulignees precedemment. Des simulations numeriques sont menees de maniere a mettre en evidence l'interet et les limites des differentes propositions. Pour cela, nous utilisons un ensemble de problemes de decision markoviens extraits de la litterature, ou construit de maniere a couvrir le plus largement la gamme des problemes possibles. La contribution de cette these consiste donc en: la synthese des differentes approches du probleme, l'etude des limites des architectures distribuees de l'apprentissage par renforcement, la proposition d'algorithmes utilisant la retropropagation de l'incertitude, les resultats des simulations numeriques
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Soulerot, Marion. "Planification et ambidextérité : le cas des programmes d'amélioration de la performance". Phd thesis, Université Paris Dauphine - Paris IX, 2008. http://tel.archives-ouvertes.fr/tel-00472392.

Texto completo
Resumen
Environnement turbulent, accélération des rythmes d'innovation semblent avoir sonner le glas de la planification stratégique dans les années 1990. Parallèlement, nombreux sont les grands groupes à s'être lancés dans des programmes à moyen terme visant l'amélioration de leur performance opérationnelle. Dans ce contexte, l'objectif de cette thèse est de comprendre à quoi ils servent ? Une revue approfondie de la littérature nous conduit à appréhender ces programmes à travers deux prismes : leur articulation avec les autres dispositifs de planification et la réponse qu'ils apportent au dilemme entre l'exploitation efficiente des ressources et l'exploration de nouvelles ressources. A l'issue d'une étude exploratoire, une grille d'analyse composée de quatre axes est proposée : le degré d'ambidextérité, la structure, le processus de contrôle et la posture des acteurs. Une étude de cas longitudinale permet de mettre en évidence que ces programmes génèrent une rupture dans les repères spatiotemporels des managers. Cette rupture permet de proposer une vision renouvelée de la planification mais suppose également la construction d'une posture managériale ambidextre.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Mann, Timothy 1984. "Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration". Thesis, 2012. http://hdl.handle.net/1969.1/148402.

Texto completo
Resumen
The purpose of this dissertation is to understand how algorithms can efficiently learn to solve new tasks based on previous experience, instead of being explicitly programmed with a solution for each task that we want it to solve. Here a task is a series of decisions, such as a robot vacuum deciding which room to clean next or an intelligent car deciding to stop at a traffic light. In such a case, state-of-the-art learning algorithms are difficult to employ in practice because they often make thou- sands of mistakes before reliably solving a task. However, humans learn solutions to novel tasks, often making fewer mistakes, which suggests that efficient learning algorithms may exist. One advantage that humans have over state- of-the-art learning algorithms is that, while learning a new task, humans can apply knowledge gained from previously solved tasks. The central hypothesis investigated by this dissertation is that learning algorithms can solve new tasks more efficiently when they take into consideration knowledge learned from solving previous tasks. Al- though this hypothesis may appear to be obviously true, what knowledge to use and how to apply that knowledge to new tasks is a challenging, open research problem. I investigate this hypothesis in three ways. First, I developed a new learning algorithm that is able to use prior knowledge to constrain the exploration space. Second, I extended a powerful theoretical framework in machine learning, called Probably Approximately Correct, so that I can formally compare the efficiency of algorithms that solve only a single task to algorithms that consider knowledge from previously solved tasks. With this framework, I found sufficient conditions for using knowledge from previous tasks to improve efficiency of learning to solve new tasks and also identified conditions where transferring knowledge may impede learning. I present situations where transfer learning can be used to intelligently constrain the exploration space so that optimality loss can be minimized. Finally, I tested the efficiency of my algorithms in various experimental domains. These theoretical and empirical results provide support for my central hypothesis. The theory and experiments of this dissertation provide a deeper understanding of what makes a learning algorithm efficient so that it can be widely used in practice. Finally, these results also contribute the general goal of creating autonomous machines that can be reliably employed to solve complex tasks.
Los estilos APA, Harvard, Vancouver, ISO, etc.

Libros sobre el tema "Exploitation dilemma"

1

Trammel, Crystal. Tamar's Dilemma: An Overview of Sexual Exploitation. Morris Publishing, 2003.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Gureckis, Todd M. y Bradley C. Love. Computational Reinforcement Learning. Editado por Jerome R. Busemeyer, Zheng Wang, James T. Townsend y Ami Eidels. Oxford University Press, 2015. http://dx.doi.org/10.1093/oxfordhb/9780199957996.013.5.

Texto completo
Resumen
Reinforcement learning (RL) refers to the scientific study of how animals and machines adapt their behavior in order to maximize reward. The history of RL research can be traced to early work in psychology on instrumental learning behavior. However, the modern field of RL is a highly interdisciplinary area that lies that the intersection of ideas in computer science, machine learning, psychology, and neuroscience. This chapter summarizes the key mathematical ideas underlying this field including the exploration/exploitation dilemma, temporal-difference (TD) learning, Q-learning, and model-based versus model-free learning. In addition, a broad survey of open questions in psychology and neuroscience are reviewed.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Ince, Onur Ulas. Colonial Capitalism and the Dilemmas of Liberalism. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780190637293.001.0001.

Texto completo
Resumen
This book analyzes the relationship between liberalism and empire from the perspective of political economy. It investigates the formative impact of “colonial capitalism” on the historical development of British liberal thought between the late seventeenth and early nineteenth centuries. It argues that liberalism as a political language developed through early modern debates over the contested meanings of property, exchange, and labor, which it examines respectively in the context of colonial land appropriations in the Americas, militarized trading in South Asia, and state-led proletarianization in Australasia. The book contends that the British Empire could be extolled as the “empire of liberty”—that is, the avatar of private property, free trade, and free labor—only on the condition that its colonial expropriation, extraction, and exploitation were “disavowed” and dissociated from the increasingly liberal conception of its capitalist economy. It identifies exemplary strategies of disavowal in the works of John Locke, Edmund Burke, and Edward G. Wakefield, who, as three liberal intellectuals of empire, attempted to navigate the ideological tensions between the liberal self-image of Britain and the violence that shaped its imperial economy. Challenging the prevalent tendency to study liberalism and empire around an abstract politics of universalism and colonial difference, the book discloses the ideological contradictions internal to Britain’s imperial economy and their critical influence on the formation of liberalism. It concludes that the disavowal of the violence constitutive of capitalist relations in the colonies has been crucial for crafting a liberal image for Anglophone imperialism and more generally for global capitalism.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Rosenthal, Laura J. Ways of the World. Cornell University Press, 2020. http://dx.doi.org/10.7591/cornell/9781501751585.001.0001.

Texto completo
Resumen
This book explores cosmopolitanism as it emerged during the Restoration and the role theater played in both memorializing and satirizing its implications and consequences. Rooted in the Stuart ambition to raise the status of England through two crucial investments — global traffic, including the slave trade, and cultural sophistication — this intensified global orientation led to the creation of global mercantile networks and to the rise of an urban British elite who drank Ethiopian coffee out of Asian porcelain at Ottoman-inspired coffeehouses. Restoration drama exposed cosmopolitanism's most embarrassing and troubling aspects, with such writers as Joseph Addison, Aphra Behn, John Dryden, and William Wycherley dramatizing the emotional and ethical dilemmas that imperial and commercial expansion brought to light. Altering standard narratives about Restoration drama, the book shows how the reinvention of theater in this period helped make possible performances that held the actions of the nation up for scrutiny, simultaneously indulging and ridiculing the violence and exploitation being perpetuated. In doing so, it reveals an otherwise elusive consistency between Restoration genres (comedy, tragedy, heroic plays, and tragicomedy), disrupts conventional understandings of the rise and reception of early capitalism, and offers a fresh perspective on theatrical culture in the context of the shifting political realities of seventeenth- and eighteenth-century Britain.
Los estilos APA, Harvard, Vancouver, ISO, etc.

Capítulos de libros sobre el tema "Exploitation dilemma"

1

Tantiwechwuttikul, Ranaporn, Masaru Yarime y Kohzo Ito. "Solar Photovoltaic Market Adoption: Dilemma of Technological Exploitation vs Technological Exploration". En Technologies and Eco-innovation towards Sustainability II, 215–27. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-1196-3_18.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Rejeb, Lilia, Zahia Guessoum y Rym M’Hallah. "An Adaptive Approach for the Exploration-Exploitation Dilemma for Learning Agents". En Multi-Agent Systems and Applications IV, 316–25. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11559221_32.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Rejeb, Lilia, Zahia Guessoum y Rym M’Hallah. "An Adaptive Approach for the Exploration-Exploitation Dilemma and Its Application to Economic Systems". En Learning and Adaption in Multi-Agent Systems, 165–76. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11691839_10.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Anicho, Ogbonnaya, Philip B. Charlesworth, Gurvinder S. Baicher y Atulya K. Nagar. "Reinforcement Learning for Multiple HAPS/UAV Coordination: Impact of Exploration–Exploitation Dilemma on Convergence". En Advances in Intelligent Systems and Computing, 149–59. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-3290-0_12.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Sledge, Isaac J. y José C. Príncipe. "Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning". En Handbook of Reinforcement Learning and Control, 557–610. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-60990-0_19.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Glynn, Simon. "Capitalism’s Moral and Ontological Dilemmas: Competition, the Inevitably Exploitative Response, and the Crisis of Overproduction". En The Economic Logic of Late Capitalism and the Inevitable Triumph of Socialism, 55–58. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-52667-2_8.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

"8. Colonial Profits And The Liberal Dilemma". En The Politics of Colonial Exploitation, 145–61. Cornell University Press, 2018. http://dx.doi.org/10.7591/9781501719127-009.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Burton-Chellew, Maxwell N., Alex Kacelnik, Michal Arbilly, Miguel dos Santos, Kimberley J. Mathot, John M. McNamara, Friederike Mengel, Joël van der Weele y Björn Vollan. "The Ecological and Economic Conditions of Exploitation Strategies". En Investors and Exploiters in Ecology and Economics. The MIT Press, 2017. http://dx.doi.org/10.7551/mitpress/9780262036122.003.0003.

Texto completo
Resumen
In many situations across biology and economics, there is often one individual, or “agent,” that invests effort into a beneficial task and also one individual that, in contrast, foregoes the effort of investing, and instead simply exploits the efforts of another. What makes an individual choose to invest in production versus exploiting the efforts of another? If everyone invests, then exploitative strategies become very profitable; however if everyone is exploitative, there will be no investments to exploit. How does natural selection resolve this dilemma? What can economic institutions do to encourage investment? Can biologists and economists learn from the approach of each other’s discipline? This chapter outlines the commonalities and differences in approach of the two disciplines to the general problem of investment versus exploitation. It develops a model to encapsulate the general features of many scenarios (“games”) involving potential exploitation and explores the benefits of a unified approach, outlining current limitations and important areas for future investigation.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

"Ambidexterity revisited: the influence of structure and context and the dilemma exploration vs. exploitation". En Knowledge Spillover-based Strategic Entrepreneurship, 168–203. Routledge, 2016. http://dx.doi.org/10.4324/9781315445281-20.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Barta, Zoltán. "Producer–Scrounger Models and Aspects of Natural Resource Use". En Investors and Exploiters in Ecology and Economics. The MIT Press, 2017. http://dx.doi.org/10.7551/mitpress/9780262036122.003.0004.

Texto completo
Resumen
Humans are using natural resources at unprecedented rates, a situation that could lead to various global catastrophes. To mitigate eventual consequences, the processes involved must be better understood. Resource use frequently involves groups; thus free-riding behavior must be expected. Exploitation of others’ efforts can dramatically alter how resources are utilized. This chapter argues that exploitation of harvesting efforts can be analyzed as a producer–scrounger evolutionary game. The presence of scroungers (exploiters) in a group usually decreases overall use of resources by the group. Factors that increase the proportion of scroungers can further decrease resource use. By contrast, aggression and the compatibility of scrounger and producer strategies elevate resource use. Encouraging scrounging may lower resource use, but this raises a moral dilemma: individual scrounging is bad, reduced resource overuse by the population is good. The consequences of cheating in natural resource management demands attention in future research.
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Exploitation dilemma"

1

Peterson, Erik y Timothy Verstynen. "A way around the exploration-exploitation dilemma". En 2019 Conference on Cognitive Computational Neuroscience. Brentwood, Tennessee, USA: Cognitive Computational Neuroscience, 2019. http://dx.doi.org/10.32470/ccn.2019.1365-0.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Zhang, Kaifu y Wei Pan. "The Two Facets of the Exploration-Exploitation Dilemma". En 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology. IEEE, 2006. http://dx.doi.org/10.1109/iat.2006.120.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Cogliati Dezza, Irene, Xavier Noel, Axel Cleeremans y Angela Yu. "The Exploration-Exploitation Dilemma as a Tool for Studying Addiction". En 2018 Conference on Cognitive Computational Neuroscience. Brentwood, Tennessee, USA: Cognitive Computational Neuroscience, 2018. http://dx.doi.org/10.32470/ccn.2018.1080-0.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Shen, Yuanxia y Chuanhua Zeng. "An Adaptive Approach for the Exploration-Exploitation Dilemma in Non-stationary Environment". En 2008 International Conference on Computer Science and Software Engineering. IEEE, 2008. http://dx.doi.org/10.1109/csse.2008.677.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Michlmayr, Elke. "Self-Organization for Search in Peer-to-Peer Networks: The Exploitation-Exploration Dilemma". En 2006 1st Bio-Inspired Models of Network, Information and Computing Systems. IEEE, 2006. http://dx.doi.org/10.1109/bimnics.2006.361796.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Namiki, Naoya, Kuratomo Oyo y Tatsuji Takahashi. "How Do Humans Handle the Dilemma of Exploration and Exploitation in Sequential Decision Making?" En 8th International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS). ACM, 2015. http://dx.doi.org/10.4108/icst.bict.2014.258045.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Ou, Mingdong, Nan Li, Shenghuo Zhu y Rong Jin. "Multinomial Logit Bandit with Linear Utility Functions". En Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/361.

Texto completo
Resumen
Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a K-cardinality subset from N candidate items, and receives a reward which is governed by a multinomial logit (MNL) choice model considering both item utility and substitution property among items. The player's objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon T. This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model separately and incur a regret bound which is not preferred for large candidate set size N. In this paper, we consider the linear utility MNL choice model whose item utilities are represented as linear functions of d-dimension item features, and propose an algorithm, titled LUMB, to exploit the underlying structure. It is proven that the proposed algorithm achieves regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Lindner, David, Hoda Heidari y Andreas Krause. "Addressing the Long-term Impact of ML Decisions via Policy Regret". En Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/75.

Texto completo
Resumen
Machine Learning (ML) increasingly informs the allocation of opportunities to individuals and communities in areas such as lending, education, employment, and beyond. Such decisions often impact their subjects' future characteristics and capabilities in an a priori unknown fashion. The decision-maker, therefore, faces exploration-exploitation dilemmas akin to those in multi-armed bandits. Following prior work, we model communities as arms. To capture the long-term effects of ML-based allocation decisions, we study a setting in which the reward from each arm evolves every time the decision-maker pulls that arm. We focus on reward functions that are initially increasing in the number of pulls but may become (and remain) decreasing after a certain point. We argue that an acceptable sequential allocation of opportunities must take an arm's potential for growth into account. We capture these considerations through the notion of policy regret, a much stronger notion than the often-studied external regret, and present an algorithm with provably sub-linear policy regret for sufficiently long time horizons. We empirically compare our algorithm with several baselines and find that it consistently outperforms them, in particular for long time horizons.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía