To see the other types of publications on this topic, follow the link: Processos de decisió de Markov.

Dissertations / Theses on the topic 'Processos de decisió de Markov'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Processos de decisió de Markov.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Pellegrini, Jerônimo. "Processo de decisão de Markov limitados por linguagem." [s.n.], 2006. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276256.

Full text
Abstract:
Orientador: Jacques Wainer<br>Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação<br>Made available in DSpace on 2018-08-08T13:44:24Z (GMT). No. of bitstreams: 1 Pellegrini_Jeronimo_D.pdf: 889995 bytes, checksum: 1b9f02c9ce7815bf114b1b82de6df579 (MD5) Previous issue date: 2006<br>Resumo: Processos de decisão de Markov (MDPs) são usados para modelar situações onde é necessário executar ações em sequência em ambientes com incerteza. Este trabalho define uma nova formulação dos processos de decisão de Markov, adicionando a estes a possibilidade de restringir as ações e observações a serem consideradas a cada época de decisão. Estas restrições são descritas na forma de um autômato finito ? assim, a sequência de possíveis ações e observações consideradas na busca pela política ótima passa a ser uma linguagem regular. Chamamos estes processos de Markov limitados por linguagem (LLMDPs e LL-POMDPs). O uso de autômatos para a especificação de restrições facilita o processo de modelagem de problemas. Apresentamos diferentes abordagens para a solução destes problemas, e comparamos seus desempenhos, mostrando que a solução é viável, e mostramos também que em algumas situações o uso de restrições pode ser usado para acelerar a busca por uma solução. Além disso, apresentamos uma modificação nos LLPOMDPs de forma que seja possível especificar duração probabilística discreta para as ações e observações<br>Abstract: Markov decision processes (MDPs) are used to model situations where one needs to execute sequences of actions under uncertainty. This work defines a new formulation of Markov decision processes, with the possibility of restricting the actions and observations to be considered at each decision epoch. These restrictions are described as a finite automation, so the sequence of possible actions (and observations) considered during the search for an optimal policy is a regular language. We call these ?language limited Markov decision processes (LL-MDPs and LL-POMDPs). The use of automata for specifying restrictions helps make the modeling process easier. We present different approaches to solve these problems, and compare their performance, showing that the solution is feasible, and we also show that in some situations some restrictions can be used to speed up the search for a solution. Besides that, we also present one modification on LL-POMDPs to make it possible to specify probabilistic discrete duration for actions and observations<br>Doutorado<br>Sistemas de Informação<br>Doutor em Ciência da Computação
APA, Harvard, Vancouver, ISO, and other styles
2

Torres, João Vitor. "Representações compactas para processos de decisão de Markov e sua aplicação na adminsitração de impressoras." Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/3/3152/tde-05092006-130307/.

Full text
Abstract:
Os Processos de Decisão de Markov (PDMs) são uma importante ferramenta de planejamento e otimização em ambientes que envolvem incertezas. Contudo a especificação e representação computacional das distribuições de probabilidades subjacentes a PDMs é uma das principais dificuldades de utilização desta ferramenta. Este trabalho propõe duas estratégias para representação destas probabilidades de forma compacta e eficiente. Estas estratégias utilizam redes Bayesianas e regularidades entre os estados e as variáveis. As estratégias apresentadas são especialmente úteis em sistemas onde as variáveis têm muitas categorias e possuem forte inter-relação. Além disso, é apresentada a aplicação destes modelos no gerenciamento de grupos de impressoras (um problema real da indústria e que motivou o desenvolvimento do trabalho) permitindo que estas atuem coletiva e não individualmente. O último tópico discutido é uma análise comparativa da mesma aplicação utilizando Lógica Difusa.<br>Markov Decision Processes (MDPs) are an important tool for planning and optimization in environments under uncertainty. The specification and computational representation of the probability distributions underlying MDPs are central difficulties for their application. This work proposes two strategies for representation of probabilities in a compact and efficient way. These strategies use Bayesian networks and regularities among states and variables. The proposed strategies are particularly useful in systems whose variables have many categories and have strong interrelation. This proposal has been applied to the management of clusters of printers, a real problem that in fact motivated the work. Markov Decision Processes are then used to allow printers to act as a group, and not just individually. The work also presents a comparison between MDPs and Fuzzy Logic in the context of clusters of printers.
APA, Harvard, Vancouver, ISO, and other styles
3

Eboli, Mônica Goes. "Transformação de redes de Petri coloridas em processos de decisão markovianos com probabilidades imprecisas." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/3/3152/tde-20082010-163351/.

Full text
Abstract:
Este trabalho foi motivado pela necessidade de considerar comportamento estocástico durante o planejamento da produção de sistemas de manufatura, ou seja, o que produzir e em que ordem. Estes sistemas possuem um comportamento estocástico geralmente não considerado no planejamento da produção. O principal objetivo deste trabalho foi obter um método que modelasse sistemas de manufatura e representasse seu comportamento estocástico durante o planejamento de produção destes sistemas. Como os métodos que eram ideais para planejamento não forneciam a modelagem adequada dos sistemas, e os com modelagem adequada não forneciam a capacidade de planejamento necessária, decidiu-se combinar dois métodos para atingir o objetivo desejado. Decidiu-se modelar os sistemas em rede de Petri e convertê-los em processos de decisão markovianos, e então realizar o planejamento com o ultimo. Para que fosse possível modelar as probabilidades envolvidas nos processos, foi proposto um tipo especial de rede de Petri, nomeada rede de Petri fatorada. Utilizando este tipo de rede de Petri, foi desenvolvido o método de conversão em processos de decisão markovianos. A conversão ocorreu com sucesso, conforme testes que mostraram que planos podem ser produzidos utilizando-se algoritmos de ponta para processos de decisão markovianos.<br>The present work was motivated by the need to consider stochastic behavior when planning the production mix in a manufacturing system. These systems are exposed to stochastic behavior that is usually not considered during production planning. The main goal of this work was to obtain a method to model manufacturing systems and to represent their stochastic behavior when planning the production for these systems. Because the methods that were suitable for planning were not adequate for modeling the systems and vice-versa, two methods were combined to achieve the main goal. It was decided to model the systems in Petri nets and to convert them into Markov decision processes, to do the planning with the latter. In order to represent probabilities in the process, a special type of Petri nets, named Factored Petri nets, were proposed. Using this kind of Petri nets, a conversion method into Markov decision processes was developed. The conversion is successful as tests showed that plans can be produced within seconds using state-of-art algorithms for Markov decision processes.
APA, Harvard, Vancouver, ISO, and other styles
4

Pla, Aragonés Lluís Miquel. "A Markov sow herd model for on-farm decision support." Doctoral thesis, Universitat de Lleida, 2001. http://hdl.handle.net/10803/8155.

Full text
Abstract:
El sector porquí Espanyol ha sofert recentment profuns canvis degut bàsicament a<br/>l'augment de la competència i al procés de globalització econòmica. Ademes, els<br/>avenços tecnológicos i el grau creixent d'especialització en el sector afavoreixen el<br/>desenvolupament i l'adopció de eienes avançades per a la pressa de decisions. En<br/>aquest context, l'objectiu d'aquesta Tesi ha estat formular i implementar un model<br/>dinàmic estocàstic del comportament productiu d'un remat de truges, basat en<br/>processos de decisió Markovians i capaç d'ésser utilitzat en condicions reals. La<br/>finalitat del model és representar alternatives de maneig reproductiu i de reposició en<br/>explotacins porcines per assistir als grangers, tècnics i gerents en la pressa de<br/>decisions en granja.<br/>El model de decisió semi-Markovià i el Markovià que s'en deriva del primer han<br/>demostrat ser models útils per la representació de les estratègies productives de<br/>maneig reproductiu i de la reposició. La disponibilitat de dades de camp de granges<br/>individuals ha permès la validació del model en situacions reals. La validació ha servit<br/>també per mostrar com el model no pot ser aplicat indiscriminadament a qualsevol<br/>granja, prèviament s'ha d'assegurar l'ajust del model a les condicions concretes de<br/>cada explotació. També s'ha ficat de manifest que quan el model s'utilitza per a<br/>calcular l'estructura de la població a l'equilibri no és necessari que la matriu de<br/>transició representi el pas de temps constant, la qual cosa ha permès treballar amb<br/>transicions associades als estats biològics que són més fàcils d'estimar (embedded<br/>Markov process), ademes, proporcionen estalvis computacionals que permeten una<br/>avaluado més ràpida d'alternatives de maneig reproductiu i la implementació<br/>d'algorismes d'optimització pel problema de la reposició més eficientes.<br/>La implementació del model de decisió semi-Markovià dins d'un sistema d'ajut a la<br/>pressa de decisions (DSS: Decision Support Systems) ha mostrat l'ús potencial del<br/>model en granja. El desenvolupament del DSS ha facilitat la disponibilitat d'un model<br/>complex com el presentat a potencials usuaris menys especialitzats. El DSS permet al<br/>granger avaluar a peu de granja diferentes alternatives productives, analitzar la<br/>sensibilitat dels paràmetres que consideri crítics i optimitzar la política de reposició.<br/>Ademes, la integració del DSS en un sistema de gestió informatitzat (BDporc®2)<br/>facilita la difusió del DSS en empreses de producció porcina i també l'obtenció de<br/>noves variables com el número de serveis por monta, la detecció de zels, la detecció<br/>de la gestació, instalacions, etc, que poden ajudar a incrementar la precisió dels<br/>resultats. El disseny sofisticat del interface del DSS ha millorat la interpretació dels<br/>resultats del model que no sempre és inmediata. La incorporació de l'anàlisi de<br/>sensibilitat permet estudiar i profunditzar en els components crítics del model que<br/>sovint resulta més important que l'obtenció d'un resultat precis. Finalment, el model de<br/>remat formulat de forma flexible, és capaç d'adaptar-se a diferents propòsits amb<br/>canvis mínims, la qual cosa contribueix a una millor comprensió dels efectes de<br/>diferentes alternatives de maneig reproductiu sobre la millora de la eficiència<br/>econòmica de tot el sistema productiu.<br>El sector porcino en España ha sufrido recientemente profundos cambios debido<br/>básicamente al aumento de la competencia y al proceso de globalización económica.<br/>Además, los avances tecnológicos y el creciente grado de especialización en el sector<br/>favorecen el desarrollo y la adopción de herramientas avanzadas para la toma de<br/>decisiones. En este contexto, el objetivo de esta Tesis es presentar la formulación e<br/>¡mplementadón de un modelo dinámico estocástico del comportamiento productivo de<br/>un rebaño de cerdas, basado en procesos de decisión Markovianos y capaz de ser<br/>usado en condiciones reales. La finalidad del modelo es representar alternativas de<br/>manejo reproductivo y de reposición en explotacines porcinas para asistir a granjeros,<br/>técnicos y gerentes en la toma de decisiones en granja.<br/>El modelo de decisión semi-Markovianos y el Markoviano que deriva del primero han<br/>demostrado ser modelos útiles en la representación de las estrategias productivas de<br/>manejo reproductivo y de la reposición. La disponibilidad de datos de campo de<br/>granjas individuales ha permitido la validación del modelo en situaciones reales. La<br/>validación ha servido también para mostrar como el modelo no puede ser aplicado<br/>indiscriminadamente en cualquier granja, previamente se asegurar el ajuste del<br/>modelo a las condiciones concretas de cada explotación. También se ha puesto de<br/>manifiesto que cuando el modelo se utiliza para calcular la estructura de la población<br/>en equilibrio no es necesario que la matriz de transición represente de paso de tiempo<br/>constante, con lo cual ha sido posible trabajar con transiciones asociadas a los<br/>estados biológicas que son más fáciles estimar (embedded Markov process), además,<br/>proporcionan ahorros computacionales que permiten una evaluación más rápida de<br/>alternativas de manejo reproductivo y la implementación de algoritmos de optimización<br/>para el problema de la reposición más eficientes.<br/>La implementación del modelo de decisión semi-Markoviano dentro de un sistema de<br/>ayuda a la toma de decisiones (DSS: Decision Support Systems) ha mostrado el uso<br/>potencial del modelo en granja. El desarrollo del DSS ha facilitado la disponibilidad de<br/>un modelo complejo como el presentado a potenciales usuarios menos especializados.<br/>El DSS permite al granjero evaluar a pie de granja diferntes alternativas productivas,<br/>analizar la sensibilidad de los parámetros que considere críticos y optimiza la política<br/>de reposición. Además, la integración del DSS en un sistema de gestión informatizado<br/>(BDporc®3) facilita la difusión del DSS en empresas de producción porcina y también<br/>la obtención de nuevas variables como el número de servicios por monta, la detección<br/>de celos, la detección de la gestación, instalaciones, etc, que pueden ayudar a<br/>incrementar la precisión de los resultados. El diseño sofisticados del interface del DSS<br/>ha mejorado la interpretación de los resultados que no siempre es inmediata. El<br/>análisis de sensibilidad incorporado permite estudiar y profundizar en los componentes<br/>críticos del modelo que a menudo resulta más importante que el disponer de un<br/>resultado preciso. Finalmente, el modelo de rebaño formulado de forma flexible, es<br/>capaz de adaptarse a distintos propósitos con cambios mínimos, lo que redunda en<br/>una mejor compresión de los efectos de diferentes alternativas de manejo reproductivo<br/>a fin de mejorar la eficiencia económica de todo el sistema productivo.<br>Spanish pig sector has gone through a deep change during recent times, that is due<br/>basically to the increase in competitiveness and the globalisation process of the<br/>economics. Furthermore, technological advances and the increasing degree of<br/>specialisation have maden possible the development and adoption of advanced tools<br/>for decision support. In this context, the objective of this Thesis has been to formulate<br/>and implement a dynamic estochastic model representing the productive behaviour of<br/>a sow herd, based on Markov decision processes. The model was aimed to be used in<br/>field condicions to analyse different management alternatives on reproduction and<br/>replacement, supporting farm managers in the decision-making process.<br/>The semi-Markovià decision model and the derived Markov decision model (embedded<br/>Markov process) have demonstrated to be useful in the representation of management<br/>alternatives on reproduction and replacement. The availability of data from individual<br/>farms has allowed the validation of the model in real situations. The validation also has<br/>served to show how the model can not be applied indiscriminately on any farm.<br/>Previously, it has been required to assess the fit of the model in specific farm<br/>conditions. Also, it is shown that when the model has been used to calculate the<br/>population structure at equilibrium was not necessary a transition matrix being time<br/>step constant. Instead have been possible to consider transitions associated to<br/>biological states that are easier to estimate, more precise and provided computational<br/>time savings. Hence the model, as it was formulated, allowed a faster evaluation of<br/>management alternatives on reproduction and an efficient impleemntation of algorithms<br/>to optimise replacement policies.<br/>The implementation of the semi-Markov model into a DSS (DSS: Decision Support<br/>Systems) has shown the potential on-farm use of the model. The development of the<br/>DSS makes easier the availability of complex models to less specialised users. The<br/>DSS allows the farm manager to evaluate on-farm different management alternatives<br/>on reproduction and optimise replacement decisions. Moreover, the integration of the<br/>DSS in a management information system (BDporc®1) makes easier the spreading of it<br/>over swine enterprises, as well the obtention of new variables like the number of<br/>services by mating, heat detections, pregnancy detection, facilities, etc, can help to<br/>increment model precision. The sophisticated dessign of the DSS interface has<br/>improved the interpretation of the model results, that not always is right direct. The<br/>addition of sensitivity análisis capability provided insight about the impact of changes in<br/>critical components of the model, that quite often result in a more interest than a single<br/>precise result. Finally, the sow herd model formulated in a flexible way, was able to be<br/>adapted to différents goals with minimum changes, thus it contribute to improve the<br/>knowledge about the effect of different management alternatives on overall economic<br/>efficiency of the system.
APA, Harvard, Vancouver, ISO, and other styles
5

Delgado, Karina Valdivia. "Processos de decisão Markovianos fatorados com probabilidades imprecisas." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-28112010-095311/.

Full text
Abstract:
Em geral, quando modelamos problemas de planejamento probabilístico do mundo real, usando o arcabouço de Processos de Decisão Markovianos (MDPs), é difícil obter uma estimativa exata das probabilidades de transição. A incerteza surge naturalmente na especificação de um domínio, por exemplo, durante a aquisição das probabilidades de transição a partir de um especialista ou de dados observados através de técnicas de amostragem, ou ainda de distribuições de transição não estacionárias decorrentes do conhecimento insuficiente do domínio. Com o objetivo de se determinar uma política robusta, dada a incerteza nas transições de estado, Processos de Decisão Markovianos com Probabilidades Imprecisas (MDP-IPs) têm sido usados para modelar esses cenários. Infelizmente, apesar de existirem diversos algoritmos de solução para MDP-IPs, muitas vezes eles exigem chamadas externas de rotinas de otimização que podem ser extremamente custosas. Para resolver esta deficiência, nesta tese, introduzimos o MDP-IP fatorado e propomos métodos eficientes de programação matemática e programação dinâmica que permitem explorar a estrutura de um domínio de aplicação. O método baseado em programação matemática propõe soluções aproximadas eficientes para MDP-IPs fatorados, estendendo abordagens anteriores de programação linear para MDPs fatorados. Essa proposta, baseada numa formulação multilinear para aproximações robustas da função valor de estados, explora a representação fatorada de um MDP-IP, reduzindo em ordens de magnitude o tempo consumido em relação às abordagens não-fatoradas previamente propostas. O segundo método proposto, baseado em programação dinâmica, resolve o gargalo computacional existente nas soluções de programação dinâmica para MDP-IPs propostas na literatura: a necessidade de resolver múltiplos problemas de otimização não-linear. Assim, mostramos como representar a função valor de maneira compacta usando uma nova estrutura de dados chamada de Diagramas de Decisão Algébrica Parametrizados, e como aplicar técnicas de aproximação para reduzir drasticamente a sobrecarga computacional das chamadas a um otimizador não-linear, produzindo soluções ótimas aproximadas com erro limitado. Nossos resultados mostram uma melhoria de tempo e até duas ordens de magnitude em comparação às abordagens tradicionais enumerativas baseadas em programação dinâmica e uma melhoria de tempo de até uma ordem de magnitude sobre a extensão de técnicas de iteração de valor aproximadas para MDPs fatorados. Além disso, produzimos o menor erro de todos os algoritmos de aproximação avaliados.<br>When modeling real-world decision-theoretic planning problems with the framework of Markov Decision Processes(MDPs), it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, uncertainty arises in the specification of transitions due to elicitation of MDP transition models from an expert or data, or non-stationary transition distributions arising from insuficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, Markov Decision Processes with Imprecise Transition Probabilities (MDP-IPs) have been introduced. Unfortunately, while various solutions exist for MDP-IPs, they often require external calls to optimization routines and thus can be extremely time-consuming in practice. To address this deficiency, we introduce the factored MDP-IP and propose eficient mathematical programming and dynamic programming methods to exploit its structure. First, we derive eficient approximate solutions for Factored MDP-IPs based on mathematical programming resulting in a multilinear formulation for robust maximin linear-value approximations in Factored MDP-IPs. By exploiting factored structure in MDP-IPs we are able to demonstrate orders of magnitude reduction in solution time over standard exact non-factored approaches. Second, noting that the key computational bottleneck in the dynamic programming solution of factored MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional at dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs while producing the lowest error among all approximation algorithm evaluated.
APA, Harvard, Vancouver, ISO, and other styles
6

Truzzi, Flávio Sales. "Modelagem e soluções para redes de anúncios." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-24042015-113950/.

Full text
Abstract:
Redes de Anúncios (Ad Networks) são redes que promovem a distribuição de anúncios pela internet, de forma a maximizar o lucro total gerado pela exibição dos anúncios nos websites. Estas redes tipicamente operam através do modelo de negócios chamado CPC (Custo por Clique), em que o anunciante paga um determinado valor somente se algum usuário clicar em seu anúncio. A escolha de como o intermediador planeja a distribuição dos anúncios aos websites é de extrema importância, já que a taxa de cliques nos anúncios é extremamente baixa. Atualmente a alocação dos anúncios tem sido feita através de uma solução aproximada baseada na alocação ótima definida com dados de um período anterior, a qual é calculada através de programação linear aliada à utilização de heurísticas. Entretanto, este sistema claramente é um processo de decisão sequencial em que diversas restrições são aplicáveis, como por exemplo: o orçamento dos anunciantes, limites mínimos do número de exibições de cada anúncio, categorias dos anúncios, entre outras. Neste trabalho argumenta-se que MDPs (Markov Decision Processes) fornecem uma melhor modelagem para o problema, já que conseguem levar em conta a dinâmica do sistema, considerando, por exemplo, que um anúncio que tem poucas chances de ser clicado consiga ser alocado de forma eficiente em relação ao retorno de longo prazo, mesmo quando outros anúncios proveriam um lucro maior a curto prazo. No entanto, devido ao grande número de estados, utilizar uma solução ótima através de MDPs é impraticável. Portanto analisa-se o desempenho relativo entre o estado da arte e a modelagem ótima, obtendo garantias de que a solução aproximada baseada em programação linear não está longe da solução ótima, e que em problemas grandes (similares aos encontrados na prática) essa diferença pode ser ignorada. Por fim, propõe-se uma modelagem baseada em aprendizado por reforço para a solução deste problema, utilizando duas abordagens, uma desconsiderando informações de contexto e outra considerando informações de contexto. Aqui argumenta-se que o uso de aprendizado por reforço é mais apropriado para a solução do problema de alocação de anúncios, já que ele é capaz de adaptar sua política de alocação em função das mudanças que ocorrem como, por exemplo, no perfil do usuário.<br>Ad Networks promote the distribution of ads in the internet, so as to maximize the revenue generated by their display of ads in websites. These networks typically operate using the CPC (Cost per Click) business model, where the advertiser pays a monetary value when a user clicks in its advertisement. The choice of how the Ad Network distributes ads to websites is of utmost importance, since the rate of clicks on ads is extremely low. The allocation of ads has been done by an approximate solution based on data from an early period of time, which is calculated using linear programming combined with heuristics. However, this problem is clearly a sequential decision process in which multiple sequential restrictions apply, such as: the budget of the advertisers, minimum limits on the number of views for each campaign, categories of advertisements. In this dissertation we argue that MDPs (Markov Decision Processes) provide a better model for the problem, since they can automatically take into account the dynamics of the system, considering, for example, an ad with little chance of being clicked can be allocated in an efficient way, even when other ads would provide a higher profit in the short term. However, due to the large number of states, an optimal solution through MDPs is impractical; therefore we analyze here the relative performance between the linear programming and the MDP approaches, deriving guarantees that the approximate solution based on linear programming is not far from the MDP optimal solution, and in large problems (similar to those found in practice) this difference can be disregarded. Finally, we propose a model based on reinforcement learning using two different approaches, one disregarding the contextual information, and the other using contextual information. We argue that the use of reinforcement learning is more suitable for solving the problem of allocation of ads, since it is able to adapt its allocation policy to reflect changes that occur, e.g., in the user profile.
APA, Harvard, Vancouver, ISO, and other styles
7

Santos, Felipe Martins dos. "Soluções eficientes para processos de decisão markovianos baseadas em alcançabilidade e bissimulações estocásticas." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-12022014-140538/.

Full text
Abstract:
Planejamento em inteligência artificial é a tarefa de determinar ações que satisfaçam um dado objetivo. Nos problemas de planejamento sob incerteza, as ações podem ter efeitos probabilísticos. Esses problemas são modelados como Processos de Decisão Markovianos (Markov Decision Processes - MDPs), modelos que permitem o cálculo de soluções ótimas considerando o valor esperado de cada ação em cada estado. Contudo, resolver problemas grandes de planejamento probabilístico, i.e., com um grande número de estados e ações, é um enorme desafio. MDPs grandes podem ser reduzidos através da computação de bissimulações estocásticas, i.e., relações de equivalência sobre o conjunto de estados do MDP original. A partir das bissimulações estocásticas, que podem ser exatas ou aproximadas, é possível obter um modelo abstrato reduzido que pode ser mais fácil de resolver do que o MDP original. No entanto, para problemas de alguns domínios, a computação da bissimulação estocástica sobre todo o espaço de estados é inviável. Os algoritmos propostos neste trabalho estendem os algoritmos usados para a computação de bissimulações estocásticas para MDPs de forma que elas sejam computadas sobre o conjunto de estados alcançáveis a partir de um dado estado inicial, que pode ser muito menor do que o conjunto de estados completo. Os resultados experimentais mostram que é possível resolver problemas grandes de planejamento probabilístico com desempenho superior às técnicas conhecidas de bissimulação estocástica.<br>Planning in artificial intelligence is the task of finding actions to reach a given goal. In planning under uncertainty, the actions can have probabilistic effects. This problems are modeled using Markov Decision Processes (MDPs), models that enable the computation of optimal solutions considering the expected value of each action when applied in each state. However, to solve big probabilistic planning problems, i.e., those with a large number of states and actions, is still a challenge. Large MDPs can be reduced by computing stochastic bisimulations, i.e., equivalence relations over the original MDP states. From the stochastic bisimulations, that can be exact or approximated, it is possible to get an abstract reduced model that can be easier to solve than the original MDP. But, for some problems, the stochastic bisimulation computation over the whole state space is unfeasible. The algorithms proposed in this work extend the algorithms that are used to compute stochastic bisimulations for MDPs in a way that they can be computed over the reachable set of states with a given initial state, which can be much smaller than the complete set of states. The empirical results show that it is possible to solve large probabilistic planning problems with better performance than the known techniques of stochastic bisimulation.
APA, Harvard, Vancouver, ISO, and other styles
8

Dias, Daniel Baptista. "Programação dinâmica em tempo real para processos de decisão markovianos com probabilidades imprecisas." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-21012015-083016/.

Full text
Abstract:
Em problemas de tomada de decisão sequencial modelados como Processos de Decisão Markovianos (MDP) pode não ser possível obter uma medida exata para as probabilidades de transição de estados. Visando resolver esta situação os Processos de Decisão Markovianos com Probabilidades Imprecisas (Markov Decision Processes with Imprecise Transition Probabilities, MDP-IPs) foram introduzidos. Porém, enquanto estes MDP-IPs se mostram como um arcabouço robusto para aplicações de planejamento no mundo real, suas soluções consomem muito tempo na prática. Em trabalhos anteriores, buscando melhorar estas soluções foram propostos algoritmos de programação dinâmica síncrona eficientes para resolver MDP-IPs com uma representação fatorada para as funções de transição probabilística e recompensa, chamados de MDP-IP fatorados. Entretanto quando o estado inicial de um problema do Caminho mais Curto Estocástico (Stochastic Shortest Path MDP, SSP MDP) é dado, estas soluções não utilizam esta informação. Neste trabalho será introduzido o problema do Caminho mais Curto Estocástico com Probabilidades Imprecisas (Stochastic Shortest Path MDP-IP, SSP MDP-IP) tanto em sua forma enumerativa, quanto na fatorada. Um algoritmo de programação dinâmica assíncrona para SSP MDP-IP enumerativos com probabilidades dadas por intervalos foi proposto por Buffet e Aberdeen (2005). Entretanto, em geral um problema é dado de forma fatorada, i.e., em termos de variáveis de estado e nesse caso, mesmo se for assumida a imprecisão dada por intervalos sobre as variáveis, ele não poderá ser mais aplicado, pois as probabilidades de transição conjuntas serão multilineares. Assim, será mostrado que os SSP MDP-IPs fatorados são mais expressivos que os enumerativos e que a mudança do SSP MDP-IP enumerativo para o caso geral de um SSP MDP-IPs fatorado leva a uma mudança de resolução da função objetivo do Bellman backup de uma função linear para uma não-linear. Também serão propostos algoritmos enumerativos, chamados de RTDP-IP (Real-time Dynamic Programming with Imprecise Transition Probabilities), LRTDP-IP (Labeled Real-time Dynamic Programming with Imprecise Transition Probabilities), SSiPP-IP (Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) e LSSiPP-IP (Labeled Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) e fatorados chamados factRTDP-IP (factored RTDP-IP) e factLRTDP-IP (factored LRTDP-IP). Eles serão avaliados em relação aos algoritmos de programação dinâmica síncrona em termos de tempo de convergência da solução e de escalabilidade.<br>In sequential decision making problems modelled as Markov Decision Processes (MDP) we may not have the state transition probabilities. To solve this issue, the framework based in Markov Decision Processes with Imprecise Transition Probabilities (MDP-IPs) is introduced. Therefore, while MDP-IPs is a robust framework to use in real world planning problems, its solutions are time-consuming in practice. In previous works, efficient algorithms based in synchronous dynamic programming to solve MDP-IPs with factored representations of the probabilistic transition function and reward function, called factored MDP-IPs. However, given a initial state of a system, modeled as a Stochastic Shortest Path MDP (SSP MDP), solutions does not use this information. In this work we introduce the Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) in enumerative form and in factored form. An efficient asynchronous dynamic programming solution for SSP MDP-IPs with enumerated states has been proposed by Buffet e Aberdeen (2005) before which is restricted to interval-based imprecision. Nevertheless, in general the problem is given in a factored form, i.e., in terms of state variables and in this case even if we assume interval-based imprecision over the variables, the previous solution is no longer applicable since we have multilinear parameterized joint transition probabilities. In this work we show that the innocuous change from the enumerated SSP MDP-IP cases to the general case of factored SSP MDP-IPs leads to a switch from a linear to nonlinear objectives in the Bellman backup. Also we propose assynchronous dynamic programming enumerative algorithms, called RTDP-IP (Real-time Dynamic Programming with Imprecise Transition Probabilities), LRTDP-IP (Labeled Real-time Dynamic Programming with Imprecise Transition Probabilities), SSiPP-IP (Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities) and LSSiPP-IP (Labeled Short-Sighted Probabilistic Planner with Imprecise Transition Probabilities), and factored algorithms called factRTDP-IP (factored RTDP-IP) and factLRTDP-IP (factored LRTDP-IP). There algorithms will be evaluated with the synchronous dynamic programming algorithms previously proposed in terms of convergence time and scalability.
APA, Harvard, Vancouver, ISO, and other styles
9

Filho, Ricardo Shirota. "Processos de decisão Markovianos com probabilidades imprecisas e representações relacionais: algoritmos e fundamentos." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/3/3152/tde-13062013-160912/.

Full text
Abstract:
Este trabalho é dedicado ao desenvolvimento teórico e algorítmico de processos de decisão markovianos com probabilidades imprecisas e representações relacionais. Na literatura, essa configuração tem sido importante dentro da área de planejamento em inteligência artificial, onde o uso de representações relacionais permite obter descrições compactas, e o emprego de probabilidades imprecisas resulta em formas mais gerais de incerteza. São três as principais contribuições deste trabalho. Primeiro, efetua-se uma discussão sobre os fundamentos de tomada de decisão sequencial com probabilidades imprecisas, em que evidencia-se alguns problemas ainda em aberto. Esses resultados afetam diretamente o (porém não restrito ao) modelo de interesse deste trabalho, os processos de decisão markovianos com probabilidades imprecisas. Segundo, propõe-se três algoritmos para processos de decisão markovianos com probabilidades imprecisas baseadas em programação (otimização) matemática. E terceiro, desenvolvem-se ideias propostas por Trevizan, Cozman e de Barros (2008) no uso de variantes do algoritmo Real-Time Dynamic Programming para resolução de problemas de planejamento probabilístico descritos através de versões estendidas da linguagem de descrição de domínios de planejamento (PPDDL).<br>This work is devoted to the theoretical and algorithmic development of Markov Decision Processes with Imprecise Probabilities and relational representations. In the literature, this configuration is important within artificial intelligence planning, where the use of relational representations allow compact representations and imprecise probabilities result in a more general form of uncertainty. There are three main contributions. First, we present a brief discussion of the foundations of decision making with imprecise probabilities, pointing towards key questions that remain unanswered. These results have direct influence upon the model discussed within this text, that is, Markov Decision Processes with Imprecise Probabilities. Second, we propose three algorithms for Markov Decision Processes with Imprecise Probabilities based on mathematical programming. And third, we develop ideas proposed by Trevizan, Cozman e de Barros (2008) on the use of variants of Real-Time Dynamic Programming to solve problems of probabilistic planning described by an extension of the Probabilistic Planning Domain Definition Language (PPDDL).
APA, Harvard, Vancouver, ISO, and other styles
10

Ferreira, L. A. "Programação em lógica não-monotônica aplicada à redução do espaço de planos em processos de decisão de Markov/." reponame:Biblioteca Digital de Teses e Dissertações da FEI, 2016. https://doi.org/10.31414/EE.2016.T.128438.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Nunes, Luiz Guilherme Nadal. "Processo semi-Markoviano de decisão aplicado ao controle do acesso às facilidades em uma rede digital de serviços integrados." Instituto Tecnológico de Aeronáutica, 1991. http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=1798.

Full text
Abstract:
Este trabalho estuda o problema de controlar o acesso às facilidades de uma rede digital de comunicações que integra os tráfegos de voz comutado por circuito e de dados comutado por pacotes. No nosso estudo formulamos o problema como um Processo Semi-Markoviano de Decisão e utilizamos os algoritmos de Iteração de Políticas e de Iteração de Valores para determinar uma política ótima de controle de acesso às facilidades da rede.
APA, Harvard, Vancouver, ISO, and other styles
12

Pinheiro, Paulo Gurgel 1983. "Localização multirrobo cooperativa com planejamento." [s.n.], 2009. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276155.

Full text
Abstract:
Orientador: Jacques Wainer<br>Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação<br>Made available in DSpace on 2018-09-11T21:14:07Z (GMT). No. of bitstreams: 1 Pinheiro_PauloGurgel_M.pdf: 1259816 bytes, checksum: a4783df9aa3755becb68ee233ad43e3c (MD5) Previous issue date: 2009<br>Resumo: Em um problema de localização multirrobô cooperativa, um grupo de robôs encontra-se em um determinado ambiente, cuja localização exata de cada um dos robôs é desconhecida. Neste cenário, uma distribuição de probabilidades aponta as chances de um robô estar em um determinado estado. É necessário então, que os robôs se movimentem pelo ambiente e gerem novas observações que serão compartilhadas, para calcular novas estimativas. Nos últimos anos, muitos trabalhos têm focado no estudo de técnicas probabilísticas, modelos de comunicação e modelos de detecções, para resolver o problema de localização. No entanto, a movimentação dos robôs é, em geral, definida por ações aleatórias. Ações aleatórias geram observações que podem ser inúteis para a melhoria da estimativa. Este trabalho apresenta uma proposta de localização com suporte a planejamento de ações. O objetivo é apresentar um modelo cujas ações realizadas pelos robôs são definidas por políticas. Escolhendo a melhor ação a ser realizada, é possível receber informações mais úteis dos sensores internos e externos e estimar as posturas mais rapidamente. O modelo proposto, denominado Modelo de Localização Planejada - MLP, utiliza POMDPs para modelar os problemas de localização e algoritmos específicos de geração de políticas. Foi utilizada a localização de Markov como técnica probabilística de localização e implementadas versões de modelos de detecção e propagação de informação. Neste trabalho, um simulador de problemas de localização multirrobô foi desenvolvido, no qual foram realizados experimentos em que o modelo proposto foi comparado a um modelo que não faz uso de planejamento de ações. Os resultados obtidos apontam que o modelo proposto é capaz de estimar as posturas dos robôs com uma menor quantidade de passos, sendo significativamente mais e ciente do que o modelo comparado sem planejamento.<br>Abstract: In a cooperative multi-robot localization problem, a group of robots is in a certain environment, where the exact location of each robot is unknown. In this scenario, there is only a distribution of probabilities indicating the chance of a robot to be in a particular state. It is necessary for the robots to move in the environment generating new observations, which will be shared to calculate new estimates. Currently, many studies have focused on the study of probabilistic techniques, models of communication and models of detection to solve the localization problem. However, the movement of robots is generally defined by random actions. Random actions generate observations that can be useless for improving the estimate. This work describes a proposal for multi-robot localization with support planning of actions. The objective is to describe a model whose actions performed by robots are defined by policies. Choosing the best action to be performed, the robot gets more useful information from internal and external sensors and estimates the posture more quickly. The proposed model, called Model of Planned Localization - MPL, uses POMDPs to model the problems of location and specific algorithms to generate policies. The Markov localization was used as probabilistic technique of localization and implemented versions of detection models and information propagation model. In this work, a simulator to multi-robot localization problems was developed, in which experiments were performed. The proposed model was compared to a model that does not make use of planning actions. The results showed that the proposed model is able to estimate the positions of robots with lower number of steps, being more e-cient than model compared.<br>Mestrado<br>Inteligencia Artificial<br>Mestre em Ciência da Computação
APA, Harvard, Vancouver, ISO, and other styles
13

Almeida, Luisa Amaral de. "Soft-UCT : algoritmo para planejamento probabilístico baseado em um operador de média generalizada." Instituto Tecnológico de Aeronáutica, 2015. http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=3376.

Full text
Abstract:
O planejamento probabilístico busca a tomada de decisões racionais em condições de incerteza. Dentre os diversos algoritmos para planejamento probabilístico, o Upper Confidence bounds applied to Trees (UCT) tem se destacado como um bom planejador para problemas complexos representados como Processos de Decisão Markovianos (Markov Decision Processes - MDPs), quando uma restrição de tempo é imposta ao planejamento. Este trabalho propõe uma variante do algoritmo UCT, como uma alternativa para planejamento probabilístico independente de domínio com restrição de tempo. O algoritmo UCT original constrói progressivamente a árvore decisória que representa um MDP e propaga recompensas médias através da árvore. Como resultado, recompensas altas e singulares podem ficar "escondidas" na árvore parcial gerada pelo UCT. Já o algoritmo proposto, Soft-UCT, utiliza um operador "soft" de média generalizada de ordem na propagação das recompensas pela árvore decisória. Esse operador faz com que políticas que apresentem probabilidade de muitas recompensas altas sejam preferíveis em relação a políticas que apresentem uma única recompensa muito alta. Assim, esta dissertação mostra detalhes sobre a implementação do Soft-UCT, como a definição do parâmetro utilizado no cálculo da média generalizada, além de uma heurística para estimar o horizonte ideal de busca na árvore decisória. O algoritmo é avaliado em dois benchmarks: um problema prático na área de Web Marketing e os domínios da competição International Probabilistic Planning Competition (IPPC) 2011. Os resultados obtidos no MDP relacionado a Web Marketing mostraram que o Soft-UCT pode ser aplicado em problemas reais de alta complexidade. No benchmark da competição IPPC 2011, foi possível verificar ainda que o Soft-UCT obteve um desempenho superior ao UCT em termos de recompensa média com a aplicação das políticas, além de obter um resultado melhor do que o planejador que teve a segunda colocação na competição. De forma geral, o algoritmo pode ser aplicado em qualquer MDP de horizonte finito e os experimentos realizados demonstraram bons resultados em comparação ao UCT original.
APA, Harvard, Vancouver, ISO, and other styles
14

Cella, Leonardo Oliveira Gois. "Regressão ordinal Bayesiana." reponame:Repositório Institucional da UnB, 2013. http://repositorio.unb.br/handle/10482/15131.

Full text
Abstract:
Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Estatística, 2013.<br>Submitted by Alaíde Gonçalves dos Santos (alaide@unb.br) on 2014-01-27T13:38:48Z No. of bitstreams: 1 2013_LeonardoOliveiraGoisCella.pdf: 783579 bytes, checksum: c7b0917eb8cf3bdcf8dfba759c5c0d04 (MD5)<br>Approved for entry into archive by Guimaraes Jacqueline(jacqueline.guimaraes@bce.unb.br) on 2014-02-11T12:30:18Z (GMT) No. of bitstreams: 1 2013_LeonardoOliveiraGoisCella.pdf: 783579 bytes, checksum: c7b0917eb8cf3bdcf8dfba759c5c0d04 (MD5)<br>Made available in DSpace on 2014-02-11T12:30:19Z (GMT). No. of bitstreams: 1 2013_LeonardoOliveiraGoisCella.pdf: 783579 bytes, checksum: c7b0917eb8cf3bdcf8dfba759c5c0d04 (MD5)<br>Este trabalho apresenta a inferência do modelo de regressão ordinal, considerando a ligação Logit e a abordagem da verossimilhança multinomial. Foi proposta uma reparametrização do modelo de regressão. As inferências foram realizadas dentro de um cenário bayesiano fazendo-se o uso das técnicas de MCMC (Markov Chain Monte Carlo). São apresentadas estimativas pontuais dos parâmetros e seus respectivos intervalos HPD, assim como um teste de significância genuinamente bayesiano FBST (Full Bayesian Significance Test) para os parâmetros de regressão. A metodologia adotada foi aplicada em dados simulados e ilustrada por um problema genético que verificou a influência de um certo tipo de radiação na ocorrência de danos celulares. A abordagem da verossimilhança multinomial combinada à reparametrização do modelo é de fácil tratamento devido ao aumento da capacidade computacional e do avanço dos métodos MCMC. Além disso, o FBST se mostrou um procedimento simples e útil para testar a significância dos coeficientes de regressão, motivando assim a utilização de uma abordagem bayesiana na modelagem de dados ordinais. _______________________________________________________________________________________ ABSTRACT<br>This work presents inferences of ordinal regression models considering the Logit link functions and the multinomial likelihood approach. A new reparametrization was proposed for the regression model. The inferences were performed in a bayesian scenario, using the MCMC (Markov Chain Monte Carlo) technics. Point estimates of the parameters and their respective HPD credibility intervals are presented, as well a Full Bayesian Significance Test (FBST) for the regression parameters. This methodology was applied on simulated data and illustrated in a genetic problem which was to verify the inuence of certain radiation on the occurrence of cellular damage. The multinomial likelihood approach combined with the model reparametrization is easy to treat due the increasing computing power and the advancement of MCMC methods. Moreover, the FBST proved being a simple and useful procedure for testing the significance of regression coeficients, thus motivating the use of a bayesian approach in ordinal data modeling.
APA, Harvard, Vancouver, ISO, and other styles
15

Lacerda, Dênis Antonio. "Aprendizado por reforço em lote: um estudo de caso para o problema de tomada de decisão em processos de venda." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-03072014-101251/.

Full text
Abstract:
Planejamento Probabilístico estuda os problemas de tomada de decisão sequencial de um agente, em que as ações possuem efeitos probabilísticos, modelados como um processo de decisão markoviano (Markov Decision Process - MDP). Dadas a função de transição de estados probabilística e os valores de recompensa das ações, é possível determinar uma política de ações (i.e., um mapeamento entre estado do ambiente e ações do agente) que maximiza a recompensa esperada acumulada (ou minimiza o custo esperado acumulado) pela execução de uma sequência de ações. Nos casos em que o modelo MDP não é completamente conhecido, a melhor política deve ser aprendida através da interação do agente com o ambiente real. Este processo é chamado de aprendizado por reforço. Porém, nas aplicações em que não é permitido realizar experiências no ambiente real, por exemplo, operações de venda, é possível realizar o aprendizado por reforço sobre uma amostra de experiências passadas, processo chamado de aprendizado por reforço em lote (Batch Reinforcement Learning). Neste trabalho, estudamos técnicas de aprendizado por reforço em lote usando um histórico de interações passadas, armazenadas em um banco de dados de processos, e propomos algumas formas de melhorar os algoritmos existentes. Como um estudo de caso, aplicamos esta técnica no aprendizado de políticas para o processo de venda de impressoras de grande formato, cujo objetivo é a construção de um sistema de recomendação de ações para vendedores iniciantes.<br>Probabilistic planning studies the problems of sequential decision-making of an agent, in which actions have probabilistic effects, and can be modeled as a Markov decision process (MDP). Given the probabilities and reward values of each action, it is possible to determine an action policy (in other words, a mapping between the state of the environment and the agent\'s actions) that maximizes the expected reward accumulated by executing a sequence of actions. In cases where the MDP model is not completely known, the best policy needs to be learned through the interaction of the agent in the real environment. This process is called reinforcement learning. However, in applications where it is not allowed to perform experiments in the real environment, for example, sales process, it is possible to perform the reinforcement learning using a sample of past experiences. This process is called Batch Reinforcement Learning. In this work, we study techniques of batch reinforcement learning (BRL), in which learning is done using a history of past interactions, stored in a processes database. As a case study, we apply this technique for learning policies in the sales process for large format printers, whose goal is to build a action recommendation system for beginners sellers.
APA, Harvard, Vancouver, ISO, and other styles
16

Vicini, Lorena. "Modelos de processo de Poisson não-homogêneo na presença de um ou mais pontos de mudança, aplicados a dados de poluição do ar." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/305867.

Full text
Abstract:
Orientadores: Luiz Koodi Hotta, Jorge Alberto Achcar<br>Tese (doutorado) ¿ Universidade Estadual de Campinas, Instituto de Matemática, Estatística e Computação Científica<br>Made available in DSpace on 2018-08-20T14:22:26Z (GMT). No. of bitstreams: 1 Vicini_Lorena_D.pdf: 75122511 bytes, checksum: 796c27170036587b321bbe88bc0d369e (MD5) Previous issue date: 2012<br>Resumo: A poluição do ar é um problema que tem afetado várias regiões ao redor do mundo. Em grandes centros urbanos, como é esperado, a concentração de poluição do ar é maior. Devido ao efeito do vento, no entanto, este problema não se restringe a esses centros, e consequentemente a poluição do ar se espalha para outras regiões. Os dados de poluição do ar são modelados por processos de Poisson não-homogêneos (NHPP) em três artigos: dois usando métodos Bayesianos com Markov Chain Monte Carlo (MCMC) para dados de contagem, e um usando análise de dados funcionais. O primeiro artigo discute o problema da especificação das distribuições a priori, incluindo a discussão de sensibilidade e convergência das cadeias MCMC. O segundo artigo introduz um modelo incluindo pontos de mudança para NHPP com a função taxa modelada por uma distribuição gama generalizada, usando métodos Bayesianos. Modelos com e sem pontos de mudança foram considerados para fins de comparação. O terceiro artigo utiliza análise de dados funcionais para estimar a função taxa de um NHPP. Esta estimação é feita sob a suposição de que a função taxa é contínua, mas com um número finito de pontos de descontinuidade na sua primeira derivada, localizados exatamente nos pontos de mudança. A função taxa e seus pontos de mudança foram estimadas utilizando suavização splines e uma função de penalização baseada nos candidatos a pontos de mudança. Os métodos desenvolvidos neste trabalho foram testadas através de simulações e aplicados a dados de poluição de ozônio da Cidade do México, descrevendo a qualidade do ar neste centro urbano. Ele conta quantas vezes, em um determinado período, a poluição do ar excede um limiar especificado de qualidade do ar, com base em níveis de concentração de ozônio. Observou-se que quanto mais complexos os modelos, incluindo os pontos de mudança, melhor foi o ajuste<br>Abstract: Air pollution is a problem that is currently affecting several regions around the world. In major urban centers, as expected, the concentration of air pollution is higher. Due to wind effect, however, this problem does not remain constrained in such centers, and air pollution spreads to other regions. In the thesis the air pollution data is modeled by Non-Homogeneous Poisson Process (NHPP) in three papers: two using Bayesian methods with Markov Chain Monte Carlo (MCMC) for count data, and one using functional data analysis. Paper one discuss the problem of the prior specification, including discussion of the sensitivity and convergence of the MCMC chains. Paper two introduces a model including change point for NHPP with rate function modeled by a generalized gamma distribution, using Bayesian methods. Models with and without change points were considered for comparison purposes. Paper three uses functional data analysis to estimate the rate function of a NHPP. This estimation is done under the assumption that the rate function is continuous, but with a finite number of discontinuity points in its first derivative, exactly at the change-points. The rate function and its change-points were estimated using splines smoothing and a penalty function based on candidate change points. The methods developed in this work were tested using simulations and applied to ozone pollution data from Mexico City, describing the air quality in this urban center. It counts how many times, in a determined period, air pollution exceeds a specified threshold of air quality, based on ozone concentration levels. It was observed that the more complex the models, including change-points, the better the fitting<br>Doutorado<br>Estatistica<br>Doutor em Estatística
APA, Harvard, Vancouver, ISO, and other styles
17

Fernandes, Luísa Martins. "Inferência bayesiana em modelos discretos com fração de cura." reponame:Repositório Institucional da UnB, 2013. http://repositorio.unb.br/handle/10482/15136.

Full text
Abstract:
Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Estatística, Programa de Mestrado em Estatística, 2013.<br>Submitted by Alaíde Gonçalves dos Santos (alaide@unb.br) on 2014-01-29T11:43:35Z No. of bitstreams: 1 2013_LuisaMartinsFernandes.pdf: 1810343 bytes, checksum: 245568a555335f8f6f78b949879f36c9 (MD5)<br>Approved for entry into archive by Guimaraes Jacqueline(jacqueline.guimaraes@bce.unb.br) on 2014-02-11T14:14:36Z (GMT) No. of bitstreams: 1 2013_LuisaMartinsFernandes.pdf: 1810343 bytes, checksum: 245568a555335f8f6f78b949879f36c9 (MD5)<br>Made available in DSpace on 2014-02-11T14:14:36Z (GMT). No. of bitstreams: 1 2013_LuisaMartinsFernandes.pdf: 1810343 bytes, checksum: 245568a555335f8f6f78b949879f36c9 (MD5)<br>Este trabalho apresenta inferências do modelo Weibull discreto para dados de sobrevivência com fração de cura. As inferências foram realizadas dentro de um cenário bayesiano fazendo-se o uso das técnicas de MCMC (Markov Chain Monte Carlo). São apresentadas estimativas pontuais dos parâmetros do modelo e seus respectivos intervalos de credibilidade HPD (Highest Posterior Density), assim como um teste de significância genuinamente bayesiano – FBST (Full Bayesian Significance Test) como uma forma de seleção de modelos. A metodologia apresentada foi aplicada em dados simulados e ilustrada por dois problemas práticos: o primeiro sobre o tempo até a rehospitalização de pacientes com esquizofrenia, e o segundo sobre o tempo até a morte de homens com AIDS. O FBST se mostrou um procedimento simples e útil para seleção de modelos, motivando assim uma abordagem bayesiana na modelagem de dados discretos de sobrevivência. _______________________________________________________________________________________ ABSTRACT<br>This work presents inferences of the discrete Weibull model for survival data with cure rate. The inferences were conducted within a Bayesian context, using the MCMC (Markov Chain Monte Carlo) techniques. Point estimates of model’s parameters and their respective HPD (Highest Posterior Density) credible intervals are presented, as well as a Full Bayesian Significance Test (FBST) as a way to model selection. The methodology presented was applied on simulated data and illustrated by two practical problems: the time until re-hospitalization of patients with schizophrenia and the time until death of men with AIDS. The FBST proved being a simple and useful procedure for model selection, thus motivating a Bayesian approach in the modeling of discrete survival data.
APA, Harvard, Vancouver, ISO, and other styles
18

Freitas, Elthon Manhas de. "Planejamento probabilístico sensível a risco com ILAO* e função utilidade exponencial." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-17012019-092638/.

Full text
Abstract:
Os processos de decisão de Markov (Markov Decision Process - MDP) têm sido usados para resolução de problemas de tomada de decisão sequencial. Existem problemas em que lidar com os riscos do ambiente para obter um resultado confiável é mais importante do que maximizar o retorno médio esperado. MDPs que lidam com esse tipo de problemas são chamados de processos de decisão de Markov sensíveis a risco (Risk-Sensitive Markov Decision Process - RSMDP). Dentre as diversas variações de RSMDP, estão os trabalhos baseados em utilidade exponencial que utilizam um fator de risco, o qual modela a atitude a risco do agente e que pode ser propensa ou aversa. Os algoritmos existentes na literatura para resolver esse tipo de RSMDPs são ineficientes se comparados a outros algoritmos de MDP. Neste projeto, é apresentada uma solução que pode ser usada em problemas maiores, tanto por executar cálculos apenas em estados relevantes para atingir um conjunto de estados meta partindo de um estado inicial, quanto por permitir processamento de números com expoentes muito elevados para os ambientes computacionais atuais. Os experimentos realizados evidenciam que (i) o algoritmo proposto é mais eficiente, se comparado aos algoritmos estado-da-arte para RSMDPs; e (ii) o uso da técnica LogSumExp permite resolver o problema de trabalhar com expoentes muito elevados em RSMDPs.<br>Markov Decision Process (MDP) has been used very efficiently to solve sequential decision-making problems. There are problems where dealing with environmental risks to get a reliable result is more important than maximizing the expected average return. MDPs that deal with this type of problem are called risk-sensitive Markov decision processes (RSMDP). Among the several variations of RSMDP are the works based on exponential utility that use a risk factor, which models the agent\'s risk attitude that can be prone or averse. The algorithms in the literature to solve this type of RSMDPs are inefficient when compared to other MDP algorithms. In this project, a solution is presented that can be used in larger problems, either by performing calculations only in relevant states to reach a set of meta states starting from an initial state, or by allowing the processing of numbers with very high exponents for the current computational environments. The experiments show that (i) the proposed algorithm is more efficient when compared to state-of-the-art algorithms for RSMDPs; and (ii) the LogSumExp technique solves the problem of working with very large exponents in RSMDPs
APA, Harvard, Vancouver, ISO, and other styles
19

Oliveira, Carlos Roberto Mariano de. "Um modelo de processo markoviano de decisão aplicado à assistência domiciliar de pacientes." Instituto Tecnológico de Aeronáutica, 2006. http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=385.

Full text
Abstract:
A Racionalização dos recursos num sistema hospitalar pode ser melhorada e depende, dentre outros fatores, de políticas de admissão de pacientes que podem ser implementadas pelo controle nas admissões e pela combinação de altas prematuras de pacientes com Assistência Domiciliar (Homecare). A Assistência Domiciliar destina-se a pacientes cujo tratamento exige internações prolongadas e freqüentes. Este trabalho tem como objetivos (a) apresentar o sistema de Assistência Domiciliar como política viável de controle de alta hospitalar; e (b) apresentar a formulação de um modelo de Processo Markoviano de Decisão aplicado à Assistência Domiciliar de pacientes. Estes pacientes são classificados em pacientes tipo 1 (graves) e pacientes tipo 2 (não graves). O modelo matemático proposto permite monitorar a chegada dos pacientes tipo 2 e alta prematura de ambos os tipos de pacientes no sistema. A adoção do programa de Assistência Domiciliar depende da elegibilidade do paciente para este tipo de tratamento, que inclui aspectos social e econômico do paciente além da classificação da complexidade da doença.
APA, Harvard, Vancouver, ISO, and other styles
20

James, Huw William. "Transient Markov decision processes." Thesis, University of Bristol, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.430192.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Ferns, Norman Francis. "Metrics for Markov decision processes." Thesis, McGill University, 2003. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=80263.

Full text
Abstract:
We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP as well as to the value function of the aggregate MDP. Finally, we present an algorithm for calculating such a metric up to a prescribed degree of accuracy and some empirical results.
APA, Harvard, Vancouver, ISO, and other styles
22

Silva, Valdinei Freire da. "Extração de preferências por meio de avaliações de comportamentos observados." Universidade de São Paulo, 2009. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-01072009-131819/.

Full text
Abstract:
Recentemente, várias tarefas tem sido delegadas a sistemas computacionais, principalmente quando sistemas computacionais são mais confiáveis ou quando as tarefas não são adequadas para seres humanos. O uso de extração de preferências ajuda a realizar a delegação, permitindo que mesmo pessoas leigas possam programar facilmente um sistema computacional com suas preferências. As preferências de uma pessoa são obtidas por meio de respostas para questões específicas, que são formuladas pelo próprio sistema computacional. A pessoa age como um usuário do sistema computacional, enquanto este é visto como um agente que age no lugar da pessoa. A estrutura e contexto das questões são apontadas como fonte de variações das respostas do usuário, e tais variações podem impossibilitar a factibilidade da extração de preferências. Uma forma de evitar tais variações é questionar um usuário sobre a sua preferência entre dois comportamentos observados por ele. A questão de avaliar relativamente comportamentos observados é mais simples e transparente ao usuário, diminuindo as possíveis variações, mas pode não ser fácil para o agente interpretar tais avaliações. Se existem divergências entre as percepções do agente e do usuário, o agente pode ficar impossibilitado de aprender as preferências do usuário. As avaliações são geradas com base nas percepções do usuário, mas tudo que um agente pode fazer é relacionar tais avaliações às suas próprias percepções. Um outro problema é que questões, que são expostas ao usuário por meio de comportamentos demonstrados, são agora restritas pela dinâmica do ambiente e um comportamento não pode ser escolhido arbitrariamente. O comportamento deve ser factível e uma política de ação deve ser executada no ambiente para que um comportamento seja demonstrado. Enquanto o primeiro problema influencia a inferência de como o usuário avalia comportamentos, o segundo problema influencia quão rápido e acurado o processo de aprendizado pode ser feito. Esta tese propõe o problema de Extração de Preferências com base em Comportamentos Observados utilizando o arcabouço de Processos Markovianos de Decisão, desenvolvendo propriedades teóricas em tal arcabouço que viabilizam computacionalmente tal problema. O problema de diferentes percepções é analisado e soluções restritas são desenvolvidas. O problema de demonstração de comportamentos é analisado utilizando formulação de questões com base em políticas estacionárias e replanejamento de políticas, sendo implementados algoritmos com ambas soluções para resolver a extração de preferências em um cenário sob condições restritas.<br>Recently, computer systems have been delegated to accomplish a variety of tasks, when the computer system can be more reliable or when the task is not suitable or not recommended for a human being. The use of preference elicitation in computational systems helps to improve such delegation, enabling lay people to program easily a computer system with their own preference. The preference of a person is elicited through his answers to specific questions, that the computer system formulates by itself. The person acts as an user of the computer system, whereas the computer system can be seen as an agent that acts in place of the person. The structure and context of the questions have been pointed as sources of variance regarding the users answers, and such variance can jeopardize the feasibility of preference elicitation. An attempt to avoid such variance is asking an user to choose between two behaviours that were observed by himself. Evaluating relatively observed behaviours turn questions more transparent and simpler for the user, decreasing the variance effect, but it might not be easier interpreting such evaluations. If divergences between agents and users perceptions occur, the agent may not be able to learn the users preference. Evaluations are generated regarding users perception, but all an agent can do is to relate such evaluation to his own perception. Another issue is that questions, which are exposed to the user through behaviours, are now constrained by the environment dynamics and a behaviour cannot be chosen arbitrarily, but the behaviour must be feasible and a policy must be executed in order to achieve a behaviour. Whereas the first issue influences the inference regarding users evaluation, the second problem influences how fast and accurate the learning process can be made. This thesis proposes the problem of Preference Elicitation under Evaluations over Observed Behaviours using the Markov Decision Process framework and theoretic properties in such framework are developed in order to turn such problem computationally feasible. The problem o different perceptions is analysed and constraint solutions are developed. The problem of demonstrating a behaviour is considered under the formulation of question based on stationary policies and non-stationary policies. Both type of questions was implemented and tested to solve the preference elicitation in a scenario with constraint conditions.
APA, Harvard, Vancouver, ISO, and other styles
23

Ataky, Steve Tsham Mpinda. "Análise de dados sequenciais heterogêneos baseada em árvore de decisão e modelos de Markov : aplicação na logística de transporte." Universidade Federal de São Carlos, 2015. https://repositorio.ufscar.br/handle/ufscar/7242.

Full text
Abstract:
Submitted by Bruna Rodrigues (bruna92rodrigues@yahoo.com.br) on 2016-09-16T12:52:39Z No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5)<br>Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-16T19:59:28Z (GMT) No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5)<br>Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-16T19:59:34Z (GMT) No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5)<br>Made available in DSpace on 2016-09-16T19:59:41Z (GMT). No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5) Previous issue date: 2015-10-16<br>Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)<br>Latterly, the development of data mining techniques has emerged in many applications’ fields with aim at analyzing large volumes of data which may be simple and / or complex. The logistics of transport, the railway setor in particular, is a sector with such a characteristic in that the data available in are of varied natures (classic variables such as top speed or type of train, symbolic variables such as the set of routes traveled by train, degree of tack, etc.). As part of this dissertation, one addresses the problem of classification and prediction of heterogeneous data; it is proposed to study through two main approaches. First, an automatic classification approach was implemented based on classification tree technique, which also allows new data to be efficiently integrated into partitions initialized beforehand. The second contribution of this work concerns the analysis of sequence data. It has been proposed to combine the above classification method with Markov models for obtaining a time series (temporal sequences) partition in homogeneous and significant groups based on probabilities. The resulting model offers good interpretation of classes built and allows us to estimate the evolution of the sequences of a particular vehicle. Both approaches were then applied onto real data from the a Brazilian railway information system company in the spirit of supporting the strategic management of planning and coherent prediction. This work is to initially provide a thinner type of planning to solve the problems associated with the existing classification in homogeneous circulations groups. Second, it sought to define a typology of train paths (sucession traffic of the same train) in order to provide or predict the next movement of statistical characteristics of a train carrying the same route. The general methodology provides a supportive environment for decision-making to monitor and control the planning organization. Thereby, a formula with two variants was proposed to calculate the adhesion degree between the track effectively carried out or being carried out with the planned one.<br>Nos últimos anos aflorou o desenvolvimento de técnicas de mineração de dados em muitos domínios de aplicação com finalidade de analisar grandes volumes de dados, os quais podendo ser simples e/ou complexos. A logística de transporte, o setor ferroviário em particular, é uma área com tal característica em que os dados disponíveis são muitos e de variadas naturezas (variáveis clássicas como velocidade máxima ou tipo de trem, variáveis simbólicas como o conjunto de vias percorridas pelo trem, etc). Como parte desta dissertação, aborda-se o problema de classificação e previsão de dados heterogêneos, propõe-se estudar através de duas abordagens principais. Primeiramente, foi utilizada uma abordagem de classificação automática com base na técnica por ´arvore de classificação, a qual também permite que novos dados sejam eficientemente integradas nas partições inicial. A segunda contribuição deste trabalho diz respeito à análise de dados sequenciais. Propôs-se a combinar o método de classificação anterior com modelos de Markov para obter uma participação de sequências temporais em grupos homogêneos e significativos com base nas probabilidades. O modelo resultante oferece uma boa interpretação das classes construídas e permite estimar a evolução das sequências de um determinado veículo. Ambas as abordagens foram então aplicadas nos dados do sistema de informação ferroviário, no espírito de dar apoio à gestão estratégica de planejamentos e previsões aderentes. Este trabalho consiste em fornecer inicialmente uma tipologia mais fina de planejamento para resolver os problemas associados com a classificação existente em grupos de circulações homogêneos. Em segundo lugar, buscou-se definir uma tipologia de trajetórias de trens (sucessão de circulações de um mesmo trem) para assim fornecer ou prever características estatísticas da próxima circulação mais provável de um trem realizando o mesmo percurso. A metodologia geral proporciona um ambiente de apoio à decisão para o monitoramento e controle da organização de planejamento. Deste fato, uma fórmula com duas variantes foi proposta para calcular o grau de aderência entre a trajetória efetivamente realizada ou em curso de realização com o planejado.
APA, Harvard, Vancouver, ISO, and other styles
24

Castro, Rivadeneira Pablo Samuel. "Bayesian exploration in Markov decision processes." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=18479.

Full text
Abstract:
Markov Decision Processes are a mathematical framework widely used for stochastic optimization and control problems. Reinforcement Learning is a branch of Artificial Intelligence that deals with stochastic environments where the dynamics of the system are unknown. A major issue for learning algorithms is the need to balance the amount of exploration of new experiences with the exploitation of existing knowledge. We present three methods for dealing with this exploration-exploitation tradeoff for Markov Decision Processes. The approach taken is Bayesian, in that we use and maintain a model estimate. The existence of an optimal policy for Bayesian exploration has been shown, but its computation is infeasible. We present three approximations to the optimal policy by the use of statistical sampling.\\ The first approach uses a combination of Linear Programming and Q-learning. We present empirical results demonstrating its performance. The second approach is an extension of this idea, and we prove theoretical guarantees along with empirical evidence of its performance. Finally, we present an algorithm that adapts itself efficiently to the amount of time granted for computation. This idea is presented as an approximation to an infinite dimensional linear program and we guarantee convergence as well as prove strong duality.<br>Les processus de décision Markoviens sont des modèles mathématiques fréquemment utilisés pour résoudre des problèmes d'optimisation stochastique et de contrôle. L'apprentissage par renforcement est une branche de l'intelligence artificielle qui s'intéresse aux environnements stochastiques où la dynamique du système est inconnue. Un problème majeur des algorithmes d'apprentissage est de bien balancer l'exploration de l'environnement, pour acquérir de nouvelles connaissances, et l'exploitation des connaissances acquises. Nous présentons trois méthodes pour obtenir de bons compromis exploration-exploitation dans les processus de décision Markoviens. L'approche adoptée est Bayésienne, en ce sens où nous utilisons et maintenons une estimation du modèle. L'existence d'une politique optimale pour l'exploration Bayésienne a été démontrée, mais elle est impossible à calculer efficacement. Nous présentons trois approximations de la politique optimale qui utilise l'échantillonnage statistique. \\ La première approche utilise une combinaison de programmation linéaire et de l'algorithme "Q-Learning". Nous présentons des résultats empiriques qui démontrent sa performance. La deuxième approche est une extension de cette idée, et nous démontrons des garanties théoriques de son efficacité, confirmées par des résultats empiriques. Finalement, nous présentons un algorithme qui s'adapte efficacement au temps alloué pour le calcul de la politique. Cette idée est présentée comme une approximation d'un programme linéaire à dimension infini; nous garantissons sa convergence et démontrons une dualité forte.
APA, Harvard, Vancouver, ISO, and other styles
25

Chu, Shanyun. "Some contributions to Markov decision processes." Thesis, University of Liverpool, 2015. http://livrepository.liverpool.ac.uk/2038000/.

Full text
Abstract:
In a nutshell, this thesis studies discrete-time Markov decision processes (MDPs) on Borel Spaces, with possibly unbounded costs, and both expected (discounted) total cost and long-run expected average cost criteria. In Chapter 2, we systematically investigate a constrained absorbing MDP with expected total cost criterion and possibly unbounded (from both above and below) cost functions. We apply the convex analytic approach to derive the optimality and duality results, along with the existence of an optimal finite mixing policy. We also provide mild conditions under which a general constrained MDP model with state-action-dependent discount factors can be equivalently transformed into an absorbing MDP model. Chapter 3 treats a more constrained absorbing MDP, as compared with that in Chapter 2. The dynamic programming approach is applied to a reformulated unconstrained MDP model and the optimality results are obtained. In addition, the correspondence between policies in the original model and the reformulated one is illustrated. In Chapter 4, we attempt to extend the dynamic programming approach for standard MDPs with expected total cost criterion to the case, where the (iterated) coherent risk measure of the cost is taken as the performance measure to be minimized. The cost function under our consideration is allowed to be unbounded from the below, and possibly arbitrarily unbounded from the above. Under a fairly weak version of continuity-compactness conditions, we derive the optimality results for both the finite and infinite horizon cases, and establish value iteration as well as policy iteration algorithms. The standard MDP and the iterated conditional value-at-risk of the cost function are illustrated as two examples. Chapter 5 and 6 tackle MDPs with long-run expected average cost criterion. In Chapter 5, we consider a constrained MDP with possibly unbounded (from both above and below) cost functions. Under Lyapunov-like conditions, we show the sufficiency of stable policies to the concerned constrained problem. Furthermore, we introduce the corresponding space of performance vectors and manage to characterize each of its extreme points with a deterministic stationary policy. Finally, the existence of an optimal finite mixing policy is justified. Chapter 6 concerns an unconstrained MDP with the cost functions unbounded from the below and possibly arbitrarily unbounded from the above. We provide a detailed discussion on the issue of sufficient policies in the denumerable case, establish the average cost optimality inequality (ACOI) and show the existence of an optimal deterministic stationary policy. In Chapter 7, an inventory-production system is taken as an example of real-world applications to illustrate the main results in Chapter 2 and 5.
APA, Harvard, Vancouver, ISO, and other styles
26

Gonçalves, Luciano Vargas. "Uma arquitetura de Agentes BDI para auto-regulação de Trocas Sociais em Sistemas Multiagentes Abertos." Universidade Catolica de Pelotas, 2009. http://tede.ucpel.edu.br:8080/jspui/handle/tede/105.

Full text
Abstract:
Made available in DSpace on 2016-03-22T17:26:22Z (GMT). No. of bitstreams: 1 dm2_Luciano_vargas.pdf: 637463 bytes, checksum: b08b63e8c6a347cd2c86fc24fdfd8986 (MD5) Previous issue date: 2009-03-31<br>The study and development of systems to control interactions in multiagent systems is an open problem in Artificial Intelligence. The system of social exchange values of Piaget is a social approach that allows for the foundations of the modeling of interactions between agents, where the interactions are seen as service exchanges between pairs of agents, with the evaluation of the realized or received services, thats is, the investments and profits in the exchange, and credits and debits to be charged or received, respectively, in future exchanges. This evaluation may be performed in different ways by the agents, considering that they may have different exchange personality traits. In an exchange process along the time, the different ways in the evaluation of profits and losses may cause disequilibrium in the exchange balances, where some agents may accumulate profits and others accumulate losses. To solve the exchange equilibrium problem, we use the Partially Observable Markov Decision Processes (POMDP) to help the agent decision of actions that can lead to the equilibrium of the social exchanges. Then, each agent has its own internal process to evaluate its current balance of the results of the exchange process between the other agents, observing its internal state, and with the observation of its partner s exchange behavior, it is able to deliberate on the best action it should perform in order to get the equilibrium of the exchanges. Considering an open multiagent system, it is necessary a mechanism to recognize the different personality traits, to build the POMDPs to manage the exchanges between the pairs of agents. This recognizing task is done by Hidden Markov Models (HMM), which, from models of known personality traits, can approximate the personality traits of the new partners, just by analyzing observations done on the agent behaviors in exchanges. The aim of this work is to develop an hybrid agent architecture for the self-regulation of social exchanges between personalitybased agents in a open multiagent system, based in the BDI (Beliefs, Desires, Intentions) architecture, where the agent plans are obtained from optimal policies of POMDPs, which model personality traits that are recognized by HMMs. To evaluate the proposed approach some simulations were done considering (known or new) different personality traits<br>O estudo e desenvolvimento de sistemas para o controle de interações em sistemas multiagentes é um tema em aberto dentro da Inteligência Artificial. O sistema de valores de trocas sociais de Piaget é uma abordagem social que possibilita fundamentar a modelagem de interações de agentes, onde as interações são vistas como trocas de serviços entre pares de agentes, com a valorização dos serviços realizados e recebidos, ou seja, investimentos e ganhos na troca realizada, e, também os créditos e débitos a serem cobrados ou recebidos, respectivamente, em trocas futuras. Esta avaliação pode ser realizada de maneira diferenciada pelos agentes envolvidos, considerando que estes apresentam traços de personalidade distintos. No decorrer de processo de trocas sociais a forma diferenciada de avaliar os ganhos e perdas nas interações pode causar desequilíbrio nos balanços de trocas dos agentes, onde alguns agentes acumulam ganhos e outros acumulam perdas. Para resolver a questão do equilíbrio das trocas, encontrou-se nos Processos de Decisão de Markov Parcialmente Observáveis (POMDP) uma metodologia capaz de auxiliar a tomada de decisões de cursos de ações na busca do equilíbrio interno dos agentes. Assim, cada agente conta com um mecanismo próprio para avaliar o seu estado interno, e, de posse das observações sobre o comportamento de troca dos parceiros, torna-se apto para deliberar sobre as melhores ações a seguir na busca do equilíbrio interno para o par de agentes. Com objetivo de operar em sistema multiagentes aberto, torna-se necessário um mecanismo para reconhecer os diferentes traços de personalidade, viabilizando o uso de POMDPs nestes ambientes. Esta tarefa de reconhecimento é desempenhada pelos Modelos de Estados Ocultos de Markov (HMM), que, a partir de modelos de traços de personalidade conhecidos, podem inferir os traços aproximados de novos parceiros de interações, através das observações sobre seus comportamentos nas trocas. O objetivo deste trabalho é desenvolver uma arquitetura de agentes híbrida para a auto-regulação de trocas sociais entre agentes baseados em traços de personalidade em sistemas multiagentes abertos. A arquitetura proposta é baseada na arquitetura BDI (Beliefs, Desires, Intentions), onde os planos dos agentes são obtidos através de políticas ótimas de POMDPs, que modelam traços de personalidade reconhecidos através de HMMs. Para avaliar a proposta, foram realizadas simulações envolvendo traços de personalidade conhecidos e novos traços
APA, Harvard, Vancouver, ISO, and other styles
27

Patrascu, Relu-Eugen. "Linear Approximations For Factored Markov Decision Processes." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/1171.

Full text
Abstract:
A Markov Decision Process (MDP) is a model employed to describe problems in which a decision must be made at each one of several stages, while receiving feedback from the environment. This type of model has been extensively studied in the operations research community and fundamental algorithms have been developed to solve associated problems. However, these algorithms are quite inefficient for very large problems, leading to a need for alternatives; since MDP problems are provably hard on compressed representations, one becomes content even with algorithms which may perform well at least on specific classes of problems. The class of problems we deal with in this thesis allows succinct representations for the MDP as a dynamic Bayes network, and for its solution as a weighted combination of basis functions. We develop novel algorithms for producing, improving, and calculating the error of approximate solutions for MDPs using a compressed representation. Specifically, we develop an efficient branch-and-bound algorithm for computing the Bellman error of the compact approximate solution regardless of its provenance. We introduce an efficient direct linear programming algorithm which, using incremental constraints generation, achieves run times significantly smaller than existing approximate algorithms without much loss of accuracy. We also show a novel direct linear programming algorithm which, instead of employing constraints generation, transforms the exponentially many constraints into a compact form more amenable for tractable solutions. In spite of its perceived importance, the efficient optimization of the Bellman error towards an approximate MDP solution has eluded current algorithms; to this end we propose a novel branch-and-bound approximate policy iteration algorithm which makes direct use of our branch-and-bound method for computing the Bellman error. We further investigate another procedure for obtaining an approximate solution based on the dual of the direct, approximate linear programming formulation for solving MDPs. To address both the loss of accuracy resulting from the direct, approximate linear program solution and the question of where basis functions come from we also develop a principled system able not only to produce the initial set of basis functions, but also able to augment it with new basis functions automatically generated such that the approximation error decreases according to the user's requirements and time limitations.
APA, Harvard, Vancouver, ISO, and other styles
28

Cheng, Hsien-Te. "Algorithms for partially observable Markov decision processes." Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/29073.

Full text
Abstract:
The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes. For the infinite horizon problem, only discounted reward case is considered. Several new algorithms for the finite horizon and the infinite horizon problems are developed. For the finite horizon problem, two new algorithms are developed. The first algorithm is called the relaxed region algorithm. For each support in the value function, this algorithm determines a region not smaller than its support region and modifies it implicitly in later steps until the exact support region is found. The second algorithm, called linear support algorithm, systematically approximates the value function until all supports in the value function are found. The most important feature of this algorithm is that it can be modified to find an approximate value function. The number of regions determined explicitly by both algorithms is the same as the number of supports in the value function, which is much less than the number of regions generated by the one-pass algorithm. Since the vertices of each region have to be found, these two algorithms are more efficient than the one-pass algorithm. The limited numerical examples also show that both methods are more efficient than the existing algorithms. For the infinite horizon problem, it is first shown that the approximation version of linear support algorithm can be used to substitute the policy improvement step in a standard successive approximation method to obtain an e-optimal value function. Next, an iterative discretization procedure is developed which uses a small number of states to find new supports and improve the value function between two policy improvement steps. Since only a finite number of states are chosen in this process, some techniques developed for finite MDP can be applied here. Finally, we prove that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm. The last part of the thesis deals with problems with continuous signals. We first show that if the signal processes are uniformly distributed, then the problem can be reformulated as a problem with finite number of signals. Then the result is extended to where the signal processes are step functions. Since step functions can be easily used to approximate most of the probability distributions, this method can be used to approximate most of the problems with continuous signals. Finally, we present some conditions which guarantee that the linear support can be computed for any given state, then the methods developed for finite signal cases can be easily modified and applied to problems for which the conditions hold.<br>Business, Sauder School of<br>Graduate
APA, Harvard, Vancouver, ISO, and other styles
29

Mundt, André Philipp. "Dynamic risk management with Markov decision processes." Karlsruhe Univ.-Verl. Karlsruhe, 2007. http://d-nb.info/987216511/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Mundt, André Philipp. "Dynamic risk management with Markov decision processes." Karlsruhe, Baden : Universitätsverl. Karlsruhe, 2008. http://www.uvka.de/univerlag/volltexte/2008/294/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Paduraru, Cosmin. "Off-policy evaluation in Markov decision processes." Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=117008.

Full text
Abstract:
This dissertation is set in the context of a widely used framework for formalizing autonomous decision-making, namely Markov decision processes (MDPs). One of the key problems that arise in MDPs is that of evaluating a decision-making strategy, typically called a policy. It is often the case that data collected under the policy one wishes to evaluate is difficult or even impossible to obtain. In this case, data collected under some other policy needs to be used, a setting known as off-policy evaluation. The main goal of this dissertation is to offer new insights into the properties of methods for off-policy evaluation. This is achieved through a series of novel theoretical results and empirical illustrations. The first set of results concerns the bandit setting (single state, single decision step MDPs). In this basic setting, the bias and variance of various off-policy estimators can be computed in closed form without resorting to approximations. We also compare the bias-variance trade-offs for the different estimators, both theoretically and empirically. In the sequential setting (more than one decision step), a comparative empirical study of different off-policy estimators for MDPs with discrete state and action spaces is conducted. The methods compared include three existing estimators, and two new ones proposed in this dissertation. All of these estimators are shown to be consistent and asymptotically normal. The empirical study illustrates how the relative behaviour of the estimators is affected by changes in problem parameters. The analysis for discrete MDPs is completed by recursive bias and variance formulas for the commonly used model-based estimator. These are the first analytic formulas for finite-horizon MDPs, and are shown to produce more accurate results than bootstrap estimates. The final contribution consists of introducing a new framework for bounding the return of a policy. The framework can be used whenever bounds on the next state and reward are available, regardless of whether the state and action spaces are discrete or continuous. If the next-state bounds are computed by assuming Lipschitz continuity of the transition function and using a batch of sampled transitions, then our framework can lead to tighter bounds than those proposed in previous work. Throughout this dissertation, the empirical performance of the estimators being studied is illustrated on several computational sustainability problems: a model of food-related greenhouse gas emissions, a mallard population dynamics model, and a fishery management domain.<br>Cette thèse se situe dans le contexte d'un cadre largement utilisé pour formaliser les méchanismes autonomes de décision, à savoir les processus de décision markoviens (MDP). L'un des principaux problèmes qui se posent dans les MDP est celui de l'évaluation d'une stratégie de prise de décision, généralement appelée une politique. C'est souvent le cas qu'obtenir des données recueillies dans le cadre de la politique qu'on souhaite évaluer est difficile, ou même impossible. Dans ce cas, des données recueillies sous une autre politique doivent être utilisées, une situation appelée "évaluation hors-politique". L'objectif principal de cette thèse est de proposer un nouvel éclairage sur les propriétés des méthodes pour l'évaluation hors-politique. Ce résultat est obtenu grâce à une série de nouveaux résultats théoriques et illustrations empiriques. La première série de résultats concerne des problèmes de type bandit (des MDP avec un seul état et une seule étape de décision). Dans cette configuration, le biais et la variance de divers estimateurs hors-politique peuvent être calculés sous forme fermée sans avoir recours à des approximations. Nous comparons également le compromis biais-variance pour les différents estimateurs, du point de vue théorique et empirique. Dans le cadre séquentiel (plus d'une étape de décision), une étude empirique comparative des différents estimateurs hors-politique pour les MDP avec des états et des actions discrètes est menée. Les méthodes comparées sont trois estimateurs existants, ainsi que deux nouveaux proposés dans cette thèse. Tous ces estimateurs se sont avérés convergents et asymptotiquement normaux. L'étude empirique montre comment le comportement relatif des estimateurs est affecté par des changements aux paramètres du problème. L'analyse des MDP discrets est complétée par des formules récursives pour le biais et la variance pour l'estimateur basé sur le modèle. Ce sont les premières formules analytiques pour les MDP à horizon fini, et on montre qu'ils produisent des résultats plus précis que les estimations "bootstrap".La contribution finale consiste à introduire un nouveau cadre pour délimiter le retour d'une politique. Le cadre peut être utilisé chaque fois que des bornes sur le prochain état et la récompense sont disponibles, indépendamment du fait que les espaces d'état et d'action soient discrètes ou continues. Si les limites du prochain état sont calculées en supposant la continuité Lipschitz de la fonction de transition et en utilisant un échantillon de transitions, notre cadre peut conduire à des bornes plus strictes que celles qui sont proposées dans des travaux antérieurs.Tout au long de cette thèse, la performance empirique des estimateurs étudiés est illustrée sur plusieurs problèmes de durabilité: un modèle de calcul des émissions de gaz à effet de serre associées à la consommation de nourriture, un modèle dynamique de la population des mallards, et un domaine de gestion de la pêche.
APA, Harvard, Vancouver, ISO, and other styles
32

Dai, Peng. "FASTER DYNAMIC PROGRAMMING FOR MARKOV DECISION PROCESSES." UKnowledge, 2007. http://uknowledge.uky.edu/gradschool_theses/428.

Full text
Abstract:
Markov decision processes (MDPs) are a general framework used by Artificial Intelligence (AI) researchers to model decision theoretic planning problems. Solving real world MDPs has been a major and challenging research topic in the AI literature. This paper discusses two main groups of approaches in solving MDPs. The first group of approaches combines the strategies of heuristic search and dynamic programming to expedite the convergence process. The second makes use of graphical structures in MDPs to decrease the effort of classic dynamic programming algorithms. Two new algorithms proposed by the author, MBLAO* and TVI, are described here.
APA, Harvard, Vancouver, ISO, and other styles
33

Black, Mary. "Applying Markov decision processes in asset management." Thesis, University of Salford, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.400817.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Marbach, Peter 1966. "Simulation-based optimization of Markov decision processes." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/9660.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.<br>Includes bibliographical references (p. 127-129).<br>Markov decision processes have been a popular paradigm for sequential decision making under uncertainty. Dynamic programming provides a framework for studying such problems, as well as for devising algorithms to compute an optimal control policy. Dynamic programming methods rely on a suitably defined value function that has to be computed for every state in the state space. However, many interesting problems involve very large state spaces ( "curse of dimensionality"), which prohibits the application of dynamic programming. In addition, dynamic programming assumes the availability of an exact model, in the form of transition probabilities ( "curse of modeling"). In many situations, such a model is not available and one must resort to simulation or experimentation with an actual system. For all of these reasons, dynamic programming in its pure form may be inapplicable. In this thesis we study an approach for overcoming these difficulties where we use (a) compact (parametric) representations of the control policy, thus avoiding the curse of dimensionality, and (b) simulation to estimate quantities of interest, thus avoiding model-based computations and the curse of modeling. ,Furthermore, .our approach is not limited to Markov decision processes, but applies to general Markov reward processes for which the transition probabilities and the one-stage rewards depend on a tunable parameter vector 0. We propose gradient-type algorithms for updating 0 based on the simulation of a single sample path, so as to improve a given performance measure. As possible performance measures, we consider the weighted reward-to-go and the average reward. The corresponding algorithms(a) can be implemented online and update the parameter vector either at visits to a certain state; or at every time step . . . ,(b) have the property that the gradient ( with respect to 0) of the performance 'measure converges to O with probability 1. This is the strongest possible result · for gradient:-related stochastic approximation algorithms.<br>by Peter Marbach.<br>Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
35

Winder, Lee F. (Lee Francis) 1973. "Hazard avoidance alerting with Markov decision processes." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/28860.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2004.<br>Includes bibliographical references (p. 123-125).<br>(cont.) (incident rate and unnecessary alert rate), the MDP-based logic can meet or exceed that of alternate logics.<br>This thesis describes an approach to designing hazard avoidance alerting systems based on a Markov decision process (MDP) model of the alerting process, and shows its benefits over standard design methods. One benefit of the MDP method is that it accounts for future decision opportunities when choosing whether or not to alert, or in determining resolution guidance. Another benefit is that it provides a means of modeling uncertain state information, such as unmeasurable mode variables, so that decisions are more informed. A mode variable is an index for distinct types of behavior that a system exhibits at different times. For example, in many situations normal system behavior tends to be safe, but rare deviations from the normal increase the likelihood of a harmful incident. Accurate modeling of mode information is needed to minimize alerting system errors such as unnecessary or late alerts. The benefits of the method are illustrated with two alerting scenarios where a pair of aircraft must avoid collisions when passing one another. The first scenario has a fully observable state and the second includes an uncertain mode describing whether an intruder aircraft levels off safely above the evader or is in a hazardous blunder mode. In MDP theory, outcome preferences are described in terms of utilities of different state trajectories. In keeping with this, alerting system requirements are stated in the form of a reward function. This is then used with probabilistic dynamic and sensor models to compute an alerting logic (policy) that maximizes expected utility. Performance comparisons are made between the MDP-based logics and alternate logics generated with current methods. It is found that in terms of traditional performance measures<br>by Lee F. Winder.<br>Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
36

Junyent, Barbany Miquel. "Width-Based Planning and Learning." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/672779.

Full text
Abstract:
Optimal sequential decision making is a fundamental problem to many diverse fields. In recent years, Reinforcement Learning (RL) methods have experienced unprecedented success, largely enabled by the use of deep learning models, reaching human-level performance in several domains, such as the Atari video games or the ancient game of Go. In contrast to the RL approach in which the agent learns a policy from environment interaction samples, ignoring the structure of the problem, the planning approach for decision making assumes known models for the agent's goals and domain dynamics, and focuses on determining how the agent should behave to achieve its objectives. Current planners are able to solve problem instances involving huge state spaces by precisely exploiting the problem structure that is defined in the state-action model. In this work we combine the two approaches, leveraging fast and compact policies from learning methods and the capacity to perform lookaheads in combinatorial problems from planning methods. In particular, we focus on a family of planners called width-based planners, that has demonstrated great success in recent years due to its ability to scale independently of the size of the state space. The basic algorithm, Iterated Width (IW), was originally proposed for classical planning problems, where a model for state transitions and goals, represented by sets of atoms, is fully determined. Nevertheless, width-based planners do not require a fully defined model of the environment, and can be used with simulators. For instance, they have been recently applied in pixel domains such as the Atari games. Despite its success, IW is purely exploratory, and does not leverage past reward information. Furthermore, it requires the state to be factored into features that need to be pre-defined for the particular task. Moreover, running the algorithm with a width larger than 1 in practice is usually computationally intractable, prohibiting IW from solving higher width problems. We begin this dissertation by studying the complexity of width-based methods when the state space is defined by multivalued features, as in the RL setting, instead of Boolean atoms. We provide a tight upper bound on the amount of nodes expanded by IW, as well as overall algorithmic complexity results. In order to deal with more challenging problems (i.e., those with a width higher than 1), we present a hierarchical algorithm that plans at two levels of abstraction. A high-level planner uses abstract features that are incrementally discovered from low-level pruning decisions. We illustrate this algorithm in classical planning PDDL domains as well as in pixel-based simulator domains. In classical planning, we show how IW(1) at two levels of abstraction can solve problems of width 2. To leverage past reward information, we extend width-based planning by incorporating an explicit policy in the action selection mechanism. Our method, called π-IW, interleaves width-based planning and policy learning using the state-actions visited by the planner. The policy estimate takes the form of a neural network and is in turn used to guide the planning step, thus reinforcing promising paths. Notably, the representation learned by the neural network can be used as a feature space for the width-based planner without degrading its performance, thus removing the requirement of pre-defined features for the planner. We compare π-IW with previous width-based methods and with AlphaZero, a method that also interleaves planning and learning, in simple environments, and show that π-IW has superior performance. We also show that the π-IW algorithm outperforms previous width-based methods in the pixel setting of Atari games suite. Finally, we show that the proposed hierarchical IW can be seamlessly integrated with our policy learning scheme, resulting in an algorithm that outperforms flat IW-based planners in Atari games with sparse rewards.<br>La presa seqüencial de decisions òptimes és un problema fonamental en diversos camps. En els últims anys, els mètodes d'aprenentatge per reforç (RL) han experimentat un èxit sense precedents, en gran part gràcies a l'ús de models d'aprenentatge profund, aconseguint un rendiment a nivell humà en diversos dominis, com els videojocs d'Atari o l'antic joc de Go. En contrast amb l'enfocament de RL, on l'agent aprèn una política a partir de mostres d'interacció amb l'entorn, ignorant l'estructura del problema, l'enfocament de planificació assumeix models coneguts per als objectius de l'agent i la dinàmica del domini, i es basa en determinar com ha de comportar-se l'agent per aconseguir els seus objectius. Els planificadors actuals són capaços de resoldre problemes que involucren grans espais d'estats precisament explotant l'estructura del problema, definida en el model estat-acció. En aquest treball combinem els dos enfocaments, aprofitant polítiques ràpides i compactes dels mètodes d'aprenentatge i la capacitat de fer cerques en problemes combinatoris dels mètodes de planificació. En particular, ens enfoquem en una família de planificadors basats en el width (ample), que han tingut molt èxit en els últims anys gràcies a que la seva escalabilitat és independent de la mida de l'espai d'estats. L'algorisme bàsic, Iterated Width (IW), es va proposar originalment per problemes de planificació clàssica, on el model de transicions d'estat i objectius ve completament determinat, representat per conjunts d'àtoms. No obstant, els planificadors basats en width no requereixen un model de l'entorn completament definit i es poden utilitzar amb simuladors. Per exemple, s'han aplicat recentment a dominis gràfics com els jocs d'Atari. Malgrat el seu èxit, IW és un algorisme purament exploratori i no aprofita la informació de recompenses anteriors. A més, requereix que l'estat estigui factoritzat en característiques, que han de predefinirse per a la tasca en concret. A més, executar l'algorisme amb un width superior a 1 sol ser computacionalment intractable a la pràctica, el que impedeix que IW resolgui problemes de width superior. Comencem aquesta tesi estudiant la complexitat dels mètodes basats en width quan l'espai d'estats està definit per característiques multivalor, com en els problemes de RL, en lloc d'àtoms booleans. Proporcionem un límit superior més precís en la quantitat de nodes expandits per IW, així com resultats generals de complexitat algorísmica. Per fer front a problemes més complexos (és a dir, aquells amb un width superior a 1), presentem un algorisme jeràrquic que planifica en dos nivells d'abstracció. El planificador d'alt nivell utilitza característiques abstractes que es van descobrint gradualment a partir de decisions de poda en l'arbre de baix nivell. Il·lustrem aquest algorisme en dominis PDDL de planificació clàssica, així com en dominis de simuladors gràfics. En planificació clàssica, mostrem com IW(1) en dos nivells d'abstracció pot resoldre problemes de width 2. Per aprofitar la informació de recompenses passades, incorporem una política explícita en el mecanisme de selecció d'accions. El nostre mètode, anomenat π-IW, intercala la planificació basada en width i l'aprenentatge de la política usant les accions visitades pel planificador. Representem la política amb una xarxa neuronal que, al seu torn, s'utilitza per guiar la planificació, reforçant així camins prometedors. A més, la representació apresa per la xarxa neuronal es pot utilitzar com a característiques per al planificador sense degradar el seu rendiment, eliminant així el requisit d'usar característiques predefinides. Comparem π-IW amb mètodes anteriors basats en width i amb AlphaZero, un mètode que també intercala planificació i aprenentatge, i mostrem que π-IW té un rendiment superior en entorns simples. També mostrem que l'algorisme π-IW supera altres mètodes basats en width en els jocs d'Atari. Finalment, mostrem que el mètode IW jeràrquic proposat pot integrar-se fàcilment amb el nostre esquema d'aprenentatge de la política, donant com a resultat un algorisme que supera els planificadors no jeràrquics basats en IW en els jocs d'Atari amb recompenses distants.<br>La toma secuencial de decisiones óptimas es un problema fundamental en diversos campos. En los últimos años, los métodos de aprendizaje por refuerzo (RL) han experimentado un éxito sin precedentes, en gran parte gracias al uso de modelos de aprendizaje profundo, alcanzando un rendimiento a nivel humano en varios dominios, como los videojuegos de Atari o el antiguo juego de Go. En contraste con el enfoque de RL, donde el agente aprende una política a partir de muestras de interacción con el entorno, ignorando la estructura del problema, el enfoque de planificación asume modelos conocidos para los objetivos del agente y la dinámica del dominio, y se basa en determinar cómo debe comportarse el agente para lograr sus objetivos. Los planificadores actuales son capaces de resolver problemas que involucran grandes espacios de estados precisamente explotando la estructura del problema, definida en el modelo estado-acción. En este trabajo combinamos los dos enfoques, aprovechando políticas rápidas y compactas de los métodos de aprendizaje y la capacidad de realizar búsquedas en problemas combinatorios de los métodos de planificación. En particular, nos enfocamos en una familia de planificadores basados en el width (ancho), que han demostrado un gran éxito en los últimos años debido a que su escalabilidad es independiente del tamaño del espacio de estados. El algoritmo básico, Iterated Width (IW), se propuso originalmente para problemas de planificación clásica, donde el modelo de transiciones de estado y objetivos viene completamente determinado, representado por conjuntos de átomos. Sin embargo, los planificadores basados en width no requieren un modelo del entorno completamente definido y se pueden utilizar con simuladores. Por ejemplo, se han aplicado recientemente en dominios gráficos como los juegos de Atari. A pesar de su éxito, IW es un algoritmo puramente exploratorio y no aprovecha la información de recompensas anteriores. Además, requiere que el estado esté factorizado en características, que deben predefinirse para la tarea en concreto. Además, ejecutar el algoritmo con un width superior a 1 suele ser computacionalmente intratable en la práctica, lo que impide que IW resuelva problemas de width superior. Empezamos esta tesis estudiando la complejidad de los métodos basados en width cuando el espacio de estados está definido por características multivalor, como en los problemas de RL, en lugar de átomos booleanos. Proporcionamos un límite superior más preciso en la cantidad de nodos expandidos por IW, así como resultados generales de complejidad algorítmica. Para hacer frente a problemas más complejos (es decir, aquellos con un width superior a 1), presentamos un algoritmo jerárquico que planifica en dos niveles de abstracción. El planificador de alto nivel utiliza características abstractas que se van descubriendo gradualmente a partir de decisiones de poda en el árbol de bajo nivel. Ilustramos este algoritmo en dominios PDDL de planificación clásica, así como en dominios de simuladores gráficos. En planificación clásica, mostramos cómo IW(1) en dos niveles de abstracción puede resolver problemas de width 2. Para aprovechar la información de recompensas pasadas, incorporamos una política explícita en el mecanismo de selección de acciones. Nuestro método, llamado π-IW, intercala la planificación basada en width y el aprendizaje de la política usando las acciones visitadas por el planificador. Representamos la política con una red neuronal que, a su vez, se utiliza para guiar la planificación, reforzando así caminos prometedores. Además, la representación aprendida por la red neuronal se puede utilizar como características para el planificador sin degradar su rendimiento, eliminando así el requisito de usar características predefinidas. Comparamos π-IW con métodos anteriores basados en width y con AlphaZero, un método que también intercala planificación y aprendizaje, y mostramos que π-IW tiene un rendimiento superior en entornos simples. También mostramos que el algoritmo π-IW supera otros métodos basados en width en los juegos de Atari. Finalmente, mostramos que el IW jerárquico propuesto puede integrarse fácilmente con nuestro esquema de aprendizaje de la política, dando como resultado un algoritmo que supera a los planificadores no jerárquicos basados en IW en los juegos de Atari con recompensas distantes.
APA, Harvard, Vancouver, ISO, and other styles
37

Zang, Peng. "Scaling solutions to Markov Decision Problems." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42906.

Full text
Abstract:
The Markov Decision Problem (MDP) is a widely applied mathematical model useful for describing a wide array of real world decision problems ranging from navigation to scheduling to robotics. Existing methods for solving MDPs scale poorly when applied to large domains where there are many components and factors to consider. In this dissertation, I study the use of non-tabular representations and human input as scaling techniques. I will show that the joint approach has desirable optimality and convergence guarantees, and demonstrates several orders of magnitude speedup over conventional tabular methods. Empirical studies of speedup were performed using several domains including a clone of the classic video game, Super Mario Bros. In the course of this work, I will address several issues including: how approximate representations can be used without losing convergence and optimality properties, how human input can be solicited to maximize speedup and user engagement, and how that input should be used so as to insulate against possible errors.
APA, Harvard, Vancouver, ISO, and other styles
38

LEAL, Cynthia Feitosa. "Processo markoviano de decisão para alocação dinâmica de recursos e controle de admissão de conexão em redes IEEE 802.16." Universidade Federal do Pará, 2010. http://repositorio.ufpa.br/jspui/handle/2011/2625.

Full text
Abstract:
Submitted by Edisangela Bastos (edisangela@ufpa.br) on 2012-04-16T15:39:59Z No. of bitstreams: 2 Dissertacao_ProcessoMarkovianoDecisao.pdf: 6066727 bytes, checksum: ac5813e00484fc9c63fc48a18aa5b735 (MD5) license_rdf: 23898 bytes, checksum: e363e809996cf46ada20da1accfcd9c7 (MD5)<br>Approved for entry into archive by Edisangela Bastos(edisangela@ufpa.br) on 2012-04-16T15:41:10Z (GMT) No. of bitstreams: 2 Dissertacao_ProcessoMarkovianoDecisao.pdf: 6066727 bytes, checksum: ac5813e00484fc9c63fc48a18aa5b735 (MD5) license_rdf: 23898 bytes, checksum: e363e809996cf46ada20da1accfcd9c7 (MD5)<br>Made available in DSpace on 2012-04-16T15:41:10Z (GMT). No. of bitstreams: 2 Dissertacao_ProcessoMarkovianoDecisao.pdf: 6066727 bytes, checksum: ac5813e00484fc9c63fc48a18aa5b735 (MD5) license_rdf: 23898 bytes, checksum: e363e809996cf46ada20da1accfcd9c7 (MD5) Previous issue date: 2010<br>CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior<br>Este trabalho apresenta uma solução para o problema de controle admissão de conexão e alocação dinâmica de recursos em redes IEEE 802.16 através da modelagem de um Processo Markoviano de Decisão (PMD) utilizando o conceito de degradação de largura de banda, o qual é baseado nos requisitos diferenciados de largura de banda das classes de serviço do IEEE 802.16. Para o critério de desempenho do PMD é feita a atribuição de diferentes retornos a cada classe de serviço, fazendo assim o tratamento diferenciado de cada fluxo. Nesse sentido, é possível avaliar a política ótima, obtida através de um algoritmo de iteração de valores, considerando aspectos como o nível de degradação médio das classes de serviço, utilização dos recursos e probabilidades de bloqueios de cada classe de serviço em relação à carga do sistema. Resultados obtidos mostram que o método de controle markoviano proposto é capaz de priorizar as classes de serviço consideradas mais relevantes para o sistema.<br>This work presents a solution to the problem of connection admission control and dynamic resource allocation in IEEE 802.16 networks by modeling a Markov Decision Process (MDP) using the concept of bandwidth degradation, which is based on different bandwidth requirements of IEEE 802.16 service classes. In oder to test the performance of the MDP, different returns for each class of service are allocated, thus making the differential treatment of each service classes. Therefore, it is possible to evaluate the optimal policy, obtained through a value iteration algorithm, considering aspects such as the service classes average adjustment, resource utilization and blocking probability in relation to system load. Results obtained show that the Markov control method proposed is able to prioritize service classes considered most relevant to the system.
APA, Harvard, Vancouver, ISO, and other styles
39

Aberdeen, Douglas Alexander. "Policy-gradient algorithms for partially observable Markov decision processes /." View thesis entry in Australian Digital Theses Program, 2003. http://thesis.anu.edu.au/public/adt-ANU20030410.111006/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Ferns, Norman Francis. "State-similarity metrics for continuous Markov decision processes." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=103383.

Full text
Abstract:
In recent years, various metrics have been developed for measuring the similarity of states in probabilistic transition systems (Desharnais et al., 1999; van Breugel & Worrell, 2001a). In the context of Markov decision processes, we have devised metrics providing a robust quantitative analogue of bisimulation. Most importantly, the metric distances can be used to bound the differences in the optimal value function that is integral to reinforcement learning (Ferns et al. 2004; 2005). More recently, we have discovered an efficient algorithm to calculate distances in the case of finite systems (Ferns et al., 2006). In this thesis, we seek to properly extend state-similarity metrics to Markov decision processes with continuous state spaces both in theory and in practice. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming, is a crucial first step in real-world planning; many practical problems are continuous in nature, e.g. robot navigation, and often a parametric model or crude finite approximation does not suffice. State-similarity metrics allow us to reason about the quality of replacing one model with another. In practice, they can be used directly to aggregate states.
APA, Harvard, Vancouver, ISO, and other styles
41

Allard, David. "A metric for structured red Markov decision processes /." Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81261.

Full text
Abstract:
Solution methods for MDPs employing approximation allow for more acceptable computation time in domains with large state spaces. In this thesis we discuss the use of a metric based on the notion of bisimulation for solving MDPs described in a factored form. We propose an algorithm for performing state aggregation using this metric, and illustrate its performance on a standard benchmark domain.
APA, Harvard, Vancouver, ISO, and other styles
42

Jaulmes, Robin. "Active learning in partially observable Markov decision processes." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98733.

Full text
Abstract:
People are efficient when they make decisions under uncertainty, even when their decisions have long-term ramifications, or when their knowledge and their perception of the environment are uncertain. We are able to experiment with the environment and learn, improving our behavior as experience is gathered. Most of the problems we face in real life are of that kind, and most of the problems that an automated agent would face in robotics too.<br>Our goal is to build Artificial Intelligence algorithms able to reproduce the reasoning of humans for these complex problems. We use the Reinforcement Learning framework, which allows to learn optimal behaviors in dynamic environments. More precisely, we adapt Partially-Observable Markov Decision Processes (POMDPs) to environments that are partially known.<br>We take inspiration from the field of Active Learning: we assume the existence of an oracle, who can, during a short learning phase, provide the agent with additional information about its environment. The agent actively learns everything that is useful in the environment, with a minimum use of the oracle.<br>After reviewing existing methods for solving learning problems in partially observable environments, we expose a theoretical active learning setup. We propose an algorithm, MEDUSA, and show theoretical and empirical proofs of performance for it.
APA, Harvard, Vancouver, ISO, and other styles
43

Srisuma, Sorawoot. "Essays on semiparametric estimation of Markov decision processes." Thesis, London School of Economics and Political Science (University of London), 2010. http://etheses.lse.ac.uk/2371/.

Full text
Abstract:
Dynamic models of forward looking agents, whose goal is to maximize expected in-tertemporal payoffs, are useful modelling frameworks in economics. With an exception of a small class of dynamic decision processes, the estimation of the primitives in these models is computationally burdensome due to the presence of the value functions that has no closed form. We follow a popular two-step approach which estimates the functions of interest rather than use direct numerical approximation. The first chapter, joint with Oliver Linton, considers a class of dynamic discrete choice models that contain observable continuously distributed state variables. Most papers on the estimation of dynamic discrete choice models assume that the observable state variables can only take finitely many values. We show that the extension to the infinite dimensional case leads to a well-posed inverse problem. We derive the distribution theory for the finite and the infinite dimensional parameters. Dynamic models with continuous choice can sometimes avoid the numerical issues related to the value function through the use of Euler's equation. The second chapter considers models with continuous choice that do not necessarily belong to the Euler class but frequently arise in applied problems. In this chapter, a class of minimum distance estimators is proposed, their distribution theory along with the infinite dimensional parameters of the decision models are derived. The third chapter demonstrates how the methodology developed for the discrete and continuous choice problems can be adapted to estimate a variety of other dynamic models. The final chapter discusses an important problem, and provides an example, where some well-known estimation procedures in the literature may fail to consistently estimate an identified model. The estimation methodologies I propose in the preceding chapters may not suffer from the problems of this kind.
APA, Harvard, Vancouver, ISO, and other styles
44

Vuong, Hon Fai 1975. "Learning bounded optimal behavior using Markov decision processes." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42173.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2007.<br>Includes bibliographical references (p. 171-175).<br>Creating agents that behave rationally in the real-world is one goal of Artificial Intelligence. A rational agent is one that takes, at each point in time, the optimal action such that its expected utility is maximized. However, to determine the optimal action the agent may need to engage in lengthy deliberations or computations. The effect of computation is generally not explicitly considered when performing deliberations. In reality, spending too much time in deliberation may yield high quality plans that do not satisfy the natural timing constraints of a problem, making them effectively useless. Enforcing shortened deliberation times may yield timely plans, but these may be of diminished utility. These two cases suggest the possibility of optimizing an agent's deliberation process. This thesis proposes a framework for generating meta level controllers that select computational actions to perform by optimally trading off their benefit against their costs. The metalevel optimization problem is posed within a Markov Decision Process framework and is solved off-line to determine a policy for carrying out computations. Once the optimal policy is determined, it serves efficiently as an online metalevel controller that selects computational actions conditioned upon the current state of computation. Solving for the exact policy of the metalevel optimization problem becomes computationally intractable with problem size. A learning approach that takes advantage of the problem structure is proposed to generate approximate policies that are shown to perform relatively well in comparison to optimal policies. Metalevel policies are generated for two types of problem scenarios, distinguished by the representation of the cost of computation. In the first case, the cost of computation is explicitly defined as part of the problem description. In the second case, it is implicit in the timing constraints of problem. Results are presented to validate the beneficial effects of metalevel planning over traditional methods when the cost of computation has a significant effect on the utility of a plan.<br>by Hon Fai Vuong.<br>Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
45

Smart, David P. (David Paul). "Tensor decomposition and parallelization of Markov Decision Processes." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105018.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Computation for Design and Optimization Program, 2016.<br>Cataloged from PDF version of thesis.<br>Includes bibliographical references (pages 85-81).<br>Markov Decision Processes (MDPs) with large state spaces arise frequently when applied to real world problems. Optimal solutions to such problems exist, but may not be computationally tractable, as the required processing scales exponentially with the number of states. Unsurprisingly, investigating methods for efficiently determining optimal or near-optimal policies has generated substantial interest and remains an active area of research. A recent paper introduced an MDP representation as a tensor composition of a set of smaller component MDPs, and suggested a method for solving an MDP by decomposition into its tensor components and solving the smaller problems in parallel, combining their solutions into one for the original problem. Such an approach promises an increase in solution efficiency, since each smaller problem could be solved exponentially faster than the original. This paper develops this MDP tensor decomposition and parallelization algorithm, and analyzes both its computational performance and the optimality of its resultant solutions.<br>by David P. Smart.<br>S.M.
APA, Harvard, Vancouver, ISO, and other styles
46

Archibald, Thomas Welsh. "Parallel iterative solution methods for Markov decision processes." Thesis, University of Edinburgh, 1992. http://hdl.handle.net/1842/14792.

Full text
Abstract:
Markov decision processes form an important class of dynamic programming problems because they are widely applicable. However solving real applications of Markov decision processes on serial computers is often impractical due to constraints on memory and processing time. Parallel processing has long been considered a potential solution to the computational intractability of these problems on serial machines, but prior to this work no detailed theoretical or practical studies in this area had been carried out. This thesis examines several successful serial iterative solution methods for infinite horizon, time invariant, discounted Markov decision processes and develops efficient analogous parallel algorithms. Particular consideration is given to the two classes of iterative solution methods known as value iteration methods and reward revision, but the techniques developed and the conclusions drawn are applicable to other iterative methods for Markov decision processes (for example policy iteration methods) and also to iterative methods in general. Iterative methods are applied to many other problem areas including dynamic programming and the solution of linear and differential equations. The main thrust of this thesis is concerned with the optimisation of the performance of the parallel algorithms developed. A detailed analysis of the implementation of several parallel iterative solution methods on a distributed memory, multiple instruction, multiple data, parallel processor reveals the key issues involved in optimising performance. Timing models are developed for processor communication time, processor calculation time and overall run time. These models guide the choice of the connection topology, the communication protocols and the degree of overlapping of communication and calculation. This leads to the development of a phased pipeline algorithm which yields 60 fold speed-ups when a ring of 121 transputers is used to solve problems with 60,000 states and sparse transition structures.
APA, Harvard, Vancouver, ISO, and other styles
47

Madani, Omid. "Complexity results for infinite-horizon Markov decision processes /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/7016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Yu, Huizhen Ph D. Massachusetts Institute of Technology. "Approximate solution methods for partially observable Markov and semi-Markov decision processes." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/35299.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.<br>This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.<br>Includes bibliographical references (p. 165-169).<br>We consider approximation methods for discrete-time infinite-horizon partially observable Markov and semi-Markov decision processes (POMDP and POSMDP). One of the main contributions of this thesis is a lower cost approximation method for finite-space POMDPs with the average cost criterion, and its extensions to semi-Markov partially observable problems and constrained POMDP problems, as well as to problems with the undiscounted total cost criterion. Our method is an extension of several lower cost approximation schemes, proposed individually by various authors, for discounted POMDP problems. We introduce a unified framework for viewing all of these schemes together with some new ones. In particular, we establish that due to the special structure of hidden states in a POMDP, there is a class of approximating processes, which are either POMDPs or belief MDPs, that provide lower bounds to the optimal cost function of the original POMDP problem. Theoretically, POMDPs with the long-run average cost criterion are still not fully understood.<br>(cont.) The major difficulties relate to the structure of the optimal solutions, such as conditions for a constant optimal cost function, the existence of solutions to the optimality equations, and the existence of optimal policies that are stationary and deterministic. Thus, our lower bound result is useful not only in providing a computational method, but also in characterizing the optimal solution. We show that regardless of these theoretical difficulties, lower bounds of the optimal liminf average cost function can be computed efficiently by solving modified problems using multichain MDP algorithms, and the approximating cost functions can be also used to obtain suboptimal stationary control policies. We prove the asymptotic convergence of the lower bounds under certain assumptions. For semi-Markov problems and total cost problems, we show that the same method can be applied for computing lower bounds of the optimal cost function. For constrained average cost POMDPs, we show that lower bounds of the constrained optimal cost function can be computed by solving finite-dimensional LPs. We also consider reinforcement learning methods for POMDPs and MDPs. We propose an actor-critic type policy gradient algorithm that uses a structured policy known as a finite-state controller.<br>(cont.) We thus provide an alternative to the earlier actor-only algorithm GPOMDP. Our work also clarifies the relationship between the reinforcement learning methods for POMDPs and those for MDPs. For average cost MDPs, we provide a convergence and convergence rate analysis for a least squares temporal difference (TD) algorithm, called LSPE, and previously proposed for discounted problems. We use this algorithm in the critic portion of the policy gradient algorithm for POMDPs with finite-state controllers. Finally, we investigate the properties of the limsup and liminf average cost functions of various types of policies. We show various convexity and concavity properties of these costfunctions, and we give a new necessary condition for the optimal liminf average cost to be constant. Based on this condition, we prove the near-optimality of the class of finite-state controllers under the assumption of a constant optimal liminf average cost. This result provides a theoretical guarantee for the finite-state controller approach.<br>by Huizhen Yu.<br>Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
49

Lusena, Christopher. "Finite Memory Policies for Partially Observable Markov Decision Proesses." UKnowledge, 2001. http://uknowledge.uky.edu/gradschool_diss/323.

Full text
Abstract:
This dissertation makes contributions to areas of research on planning with POMDPs: complexity theoretic results and heuristic techniques. The most important contributions are probably the complexity of approximating the optimal history-dependent finite-horizon policy for a POMDP, and the idea of heuristic search over the space of FFTs.
APA, Harvard, Vancouver, ISO, and other styles
50

Olafsson, Björgvin. "Partially Observable Markov Decision Processes for Faster Object Recognition." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.

Full text
Abstract:
Object recognition in the real world is a big challenge in the field of computer vision. Given the potentially enormous size of the search space it is essential to be able to make intelligent decisions about where in the visual field to obtain information from to reduce the computational resources needed. In this report a POMDP (Partially Observable Markov Decision Process) learning framework, using a policy gradient method and information rewards as a training signal, has been implemented and used to train fixation policies that aim to maximize the information gathered in each fixation. The purpose of such policies is to make object recognition faster by reducing the number of fixations needed. The trained policies are evaluated by simulation and comparing them with several fixed policies. Finally it is shown that it is possible to use the framework to train policies that outperform the fixed policies for certain observation models.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!