Dissertations / Theses on the topic 'Visual recognition system'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Visual recognition system.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Campbell, Larry W. "An intelligent tutor system for visual aircraft recognition." Thesis, Monterey, California: Naval Postgraduate School, 1990. http://hdl.handle.net/10945/27723.
Full textVisual aircraft recognition (VACR) is a critical skill for U.S. Army Short Range Air Defense (SHORAD) soldiers. It is the most reliable means of identifying aircraft, however VACR skills are not easy to teach or learn, and once learned they are highly degradable. The numerous training aids that exist to help units train soldiers require qualified instructors who are not always available. Also, the varying degrees of proficiency among soldiers make group training less than ideal. In an attempt to alleviate the problems in most VASC training programs, an intelligent tutor system has been developed to teach VACR in accordance with the Wings, Engine, Fuselage, Tail (WEFT) cognitive model. The Aircraft Recognition Tutor is a graphics based, object oriented instructional program that teaches, reviews and tests VACR skills at a level appropriate to the student. The tutor adaptively coaches the student from the novice level, through the intermediate level, to the expert level. The tutor was provided to two U.S. Army Air Defense Battalions for testing and evaluation. The six month implementation, testing, and evaluation process demonstrated that, using existing technology in Computer Science and Artificial Intelligence, useful training tools could be developed quickly and inexpensively for deployment on existing computers in field.
Dong, Junda. "Designing a Visual Front End in Audio-Visual Automatic Speech Recognition System." DigitalCommons@CalPoly, 2015. https://digitalcommons.calpoly.edu/theses/1382.
Full textWojnowski, Christine. "Reasoning with visual knowledge in an object recognition system /." Online version of thesis, 1990. http://hdl.handle.net/1850/10596.
Full textSun, Yongbin Ph D. Massachusetts Institute of Technology. "An RFID-based visual recognition system for the retail industry." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/104277.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 63-67).
In this thesis, I aim to build an accurate fine-grained retail product recognition system for improving customer in-store shopping experience. To achieve high accuracy, I developed a two-phase visual recognition scheme to identify the viewed retail product by verifying different types of visual features. The proposed scheme is robust enough to distinguish visually similar products in the tests. However, the computation cost of this scheme increases as the database scale becomes larger since it needs to verify all the products in the database. To improve the computation efficiency, my system integrates RFID as a second data source. By attaching an RFID tag to each product, the RFID reader is able to capture the identity information of surrounding products. The detection results can help reduce the verification scope from the whole database to the detected products only. Hence computation cost is saved. In the experiments, I first tested the recognition accuracy of my visual recognition scheme on a database containing visually similar products for different viewing angles, and my scheme achieved over 97.92% recognition accuracy for horizontal viewpoint variations of less than 30 degree. I then experimentally measured the computation cost of both the original system and the RFID-enhanced system. The computation cost is the processing time to recognize a target product. The RFID-enhanced system speeds up system performance dramatically when the scale of detected surrounding products is small.
by Yongbin Sun.
S.M.
Koprnicky, Miroslav. "Towards a Versatile System for the Visual Recognition of Surface Defects." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/888.
Full textThis thesis proposes a framework for generalizing and automating the design of the defect classification stage of an automated visual inspection system. It involves using an expandable set of features which are optimized along with the classifier operating on them in order to adapt to the application at hand. The particular implementation explored involves optimizing the feature set in disjoint sets logically grouped by feature type to keep search spaces reasonable. Operator input is kept at a minimum throughout this customization process, since it is limited only to those cases in which the existing feature library cannot adequately delineate the classes at hand, at which time new features (or pools) may have to be introduced by an engineer with experience in the domain.
Two novel methods are put forward which fit well within this framework: cluster-space and hybrid-space classifiers. They are compared in a series of tests against both standard benchmark classifiers, as well as mean and majority vote multi-classifiers, on feature sets comprised of just the logical feature subsets, as well as the entire feature sets formed by their union. The proposed classifiers as well as the benchmarks are optimized with both a progressive combinatorial approach and with an genetic algorithm. Experimentation was performed on true colour industrial lumber defect images, as well as binary hand-written digits.
Based on the experiments conducted in this work, it was found that the sequentially optimized multi hybrid-space methods are capable of matching the performances of the benchmark classifiers on the lumber data, with the exception of the mean-rule multi-classifiers, which dominated most experiments by approximately 3% in classification accuracy. The genetic algorithm optimized hybrid-space multi-classifier achieved best performance however; an accuracy of 79. 2%.
The numeral dataset results were less promising; the proposed methods could not equal benchmark performance. This is probably because the numeral feature-sets were much more conducive to good class separation, with standard benchmark accuracies approaching 95% not uncommon. This indicates that the cluster-space transform inherent to the proposed methods appear to be most useful in highly dependant or confusing feature-spaces, a hypothesis supported by the outstanding performance of the single hybrid-space classifier in the difficult texture feature subspace: 42. 6% accuracy, a 6% increase over the best benchmark performance.
The generalized framework proposed appears promising, because classifier performance over feature sets formed by the union of independently optimized feature subsets regularly met and exceeded those classifiers operating on feature sets formed by the optimization of the feature set in its entirety. This finding corroborates earlier work with similar results [3, 9], and is an aspect of pattern recognition that should be examined further.
Sjöholm, Alexander. "Closing the Loop : Mobile Visual Location Recognition." Thesis, Linköpings universitet, Datorseende, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-112547.
Full textSu, Ying-fung. "Role of temporal texture in visual system exploration with computer simulations /." Click to view the E-thesis via HKUTO, 2010. http://sunzi.lib.hku.hk/hkuto/record/B43703768.
Full textKaplan, Bernhard. "Modeling prediction and pattern recognition in the early visual and olfactory systems." Doctoral thesis, KTH, Beräkningsbiologi, CB, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166127.
Full textQC 20150504
Adjei-Kumi, Theophilus. "The development of an intelligent system for visual simulation of construction projects." Thesis, University of Strathclyde, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.311845.
Full textSu, Ying-fung, and 蘇盈峰. "Role of temporal texture in visual system: exploration with computer simulations." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2010. http://hub.hku.hk/bib/B43703768.
Full textIsik, Leyla. "The dynamics of invariant object and action recognition in the human visual system." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/98000.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 123-138).
Humans can quickly and effortlessly recognize objects, and people and their actions from complex visual inputs. Despite the ease with which the human brain solves this problem, the underlying computational steps have remained enigmatic. What makes object and action recognition challenging are identity-preserving transformations that alter the visual appearance of objects and actions, such as changes in scale, position, and viewpoint. The majority of visual neuroscience studies examining visual recognition either use physiology recordings, which provide high spatiotemporal resolution data with limited brain coverage, or functional MRI, which provides high spatial resolution data from across the brain with limited temporal resolution. High temporal resolution data from across the brain is needed to break down and understand the computational steps underlying invariant visual recognition. In this thesis I use magenetoencephalography, machine learning, and computational modeling to study invariant visual recognition. I show that a temporal association learning rule for learning invariance in hierarchical visual systems is very robust to manipulations and visual disputations that happen during development (Chapter 2). I next show that object recognition occurs very quickly, with invariance to size and position developing in stages beginning around 100ms after stimulus onset (Chapter 3), and that action recognition occurs on a similarly fast time scale, 200 ms after video onset, with this early representation being invariant to changes in actor and viewpoint (Chapter 4). Finally, I show that the same hierarchical feedforward model can explain both the object and action recognition timing results, putting this timing data in the broader context of computer vision systems and models of the brain. This work sheds light on the computational mechanisms underlying invariant object and action recognition in the brain and demonstrates the importance of using high temporal resolution data to understand neural computations.
by Leyla Isik.
Ph. D.
Stone, Thomas Jonathan. "Mechanisms of place recognition and path integration based on the insect visual system." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28909.
Full textAmundberg, Joel, and Martin Moberg. "System Agnostic GUI Testing : Analysis of Augmented Image Recognition Testing." Thesis, Blekinge Tekniska Högskola, Institutionen för programvaruteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-21441.
Full textEccles, John. "Stochastic relaxation labelling of visual features in a multi-sensor sensory system for robotic assembly." Thesis, Heriot-Watt University, 1994. http://hdl.handle.net/10399/1370.
Full textWallenberg, Marcus. "Components of Embodied Visual Object Recognition : Object Perception and Learning on a Robotic Platform." Licentiate thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-93812.
Full textEmbodied Visual Object Recognition
Evans, Benjamin D. "Learning transformation-invariant visual representations in spiking neural networks." Thesis, University of Oxford, 2012. https://ora.ox.ac.uk/objects/uuid:15bdf771-de28-400e-a1a7-82228c7f01e4.
Full textRezazadegan, Tavakoli H. (Hamed). "Visual saliency and eye movement:modeling and applications." Doctoral thesis, Oulun yliopisto, 2014. http://urn.fi/urn:isbn:9789526205816.
Full textTiivistelmä Ihmiset kykenevät kohdistamaan katseensa hetkessä näkymän keskeisiin asioihin, mikä vaatii näköjärjestelmältä valtavan suurten tietomäärien käsittelyä. Kuten ihmisen myös tietokoneen pitäisi pystyä käsittelemään vastaavasti suurta määrää visuaalista informaatiota. Tällaisen mekanismin toteuttaminen tietokonenäöllä edellyttää menetelmiä, joilla redundanttista tietoa voidaan suodattaa. Tämän vuoksi salienssista eli silmiinpistävyydestä on muodostunut viime aikoina suosittu tutkimusaihe tietotekniikassa ja erityisesti tietokonenäön tutkimusyhteisössä, vaikka sitä sinänsä on jo pitkään tutkittu kognitiivisissa tieteissä. Salienssimenetelmien tunnettavuus erityisesti tietokonenäössä johtuu pääasiassa niiden laskennallisesta tehokkuudesta, mikä taas mahdollistaa menetelmien käytön monissa tietokonenäön sovelluksissa kuten kuvan ja videon pakkaamisessa, objektin tunnistuksessa, seurannassa, etc. Tässä väitöskirjassa tutkitaan visuaalisen salienssin mallintamista, millä tarkoitetaan muunnosta kuvasta salienssikartaksi siten, että laskennallinen silmiinpistävyys vastaa ihmisen silmänliikkeistä muodostettavaa statistiikkaa. Työssä tarkastellaan keinoja, miten kuvan- ja videonkäsittelyä voidaan käyttää kehittämään salienssimenetelmiä tietokonenäön tarpeisiin. Työssä esitellään esimerkiksi harvaa näytteistystä ja ydinestimointia hyödyntävä kuvien salienssimitta. Työssä tutkitaan myös silmänliikkeiden merkitystä salienssin mallintamisen kannalta. Tätä varten esitellään partikkelisuodatusta hyödyntävä lähestymistapa sakkadien generointiin, joka voidaan liittää salienssimalliin. Lisäksi silmänliikkeitä ja salienssia hyödynnetään useissa sovelluksissa. Suoritetun tutkimuksen tieteellisiin kontribuutioihin sisältyvät useat esitetyt salienssimallit kuvasta ja videosta saatavalle herätteelle, lähestymistapa silmänliikkeiden laskennalliseen mallintamiseen ja generointiin osana salienssimallia sekä salienssimallien ja silmänliikkeiden sovellettavuuden tutkiminen visuaalisessa seurannassa, taustanvähennyksessä, näkymäanalyysissa ja valenssin tunnistuksessa
North, Ben. "Learning dynamical models for visual tracking." Thesis, University of Oxford, 1998. http://ora.ox.ac.uk/objects/uuid:6ed12552-4c30-4d80-88ef-7245be2d8fb8.
Full textTromans, James Matthew. "Computational neuroscience of natural scene processing in the ventral visual pathway." Thesis, University of Oxford, 2012. http://ora.ox.ac.uk/objects/uuid:b82e1332-df7b-41db-9612-879c7a7dda39.
Full textZukauskis, Ronald L. "Tachistoscopic recognition of vertical and horizontal letter symmetry in response to the contralateral organization of the human nervous system." Virtual Press, 2001. http://liblink.bsu.edu/uhtbin/catkey/1221268.
Full textDepartment of Educational Psychology
Teichmann, Michael. "A plastic multilayer network of the early visual system inspired by the neocortical circuit." Universitätsverlag der Technischen Universität Chemnitz, 2018. https://monarch.qucosa.de/id/qucosa%3A31832.
Full textDas visuelle System des Menschen hat die herausragende Fähigkeit zur invarianten Objekterkennung. Ein besseres Verständnis seiner Arbeitsweise kann zu besseren Computersystemen für das Bildverstehen führen und könnte darüber hinaus unser Verständnis von den zugrundeliegenden Prinzipien unserer Intelligenz verbessern. Diese Arbeit stellt ein Modell der visuellen Areale V1 und V2 vor, welches eine komplexe, von den Strukturen des Neokortex inspirierte, Verbindungsstruktur integriert. Es kombiniert die drei wichtigsten kortikalen Plastizitäten: 1) Hebbsche synaptische Plastizität, um die Stärke der exzitatorischen und inhibitorischen Synapsen zu lernen, welches auch „trace“-Lernen, zum Lernen invarianter Repräsentationen, umfasst. 2) Intrinsische Plastizität, um das Antwortverhalten der Neuronen zu regulieren und damit das Lernen in tieferen Schichten zu stabilisieren. 3) Strukturelle Plastizität, um die Verbindungen zu modifizieren und damit den Einfluss anfänglicher Festlegungen auf das Lernergebnis zu reduzieren. Neben weiteren Ergebnissen wird gezeigt, dass die Neuronen des Modells vergleichbare rezeptive Felder zu Neuronen des visuellen Kortex erlernen. Ebenso wird die Leistungsfähigkeit des Modells zur invariante Objekterkennung verifiziert. Des Weiteren wird der Zusammenhang von Gewichtsstärke und Verbindungswahrscheinlichkeit zur Korrelation der Aktivitäten der Neuronen aufgezeigt. Die gefundenen Verbindungswahrscheinlichkeiten der inhibitorischen Neuronen werden in Zusammenhang mit der Funktionsweise der inhibitorischen Plastizität gesetzt, womit erklärt wird warum inhibitorische Verbindungen unspezifisch erscheinen. Das vorgestellte Modell ist detaillierter als vorangegangene Arbeiten. Es ermöglicht neurowissenschaftliche Erkenntnisse nachzuvollziehen, wobei es ebenso die Hauptleistung des visuellen Systems erbringt, invariante Objekterkennung. Darüber hinaus ermöglichen sein Detailgrad und seine Selbstorganisationsprinzipien weitere neurowissenschaftliche Erkenntnisse und die Modellierung komplexerer Modelle der Verarbeitung im Gehirn.
Beuth, Frederik. "Visual attention in primates and for machines - neuronal mechanisms." Universitätsverlag Chemnitz, 2017. https://monarch.qucosa.de/id/qucosa%3A35655.
Full textVisuelle Aufmerksamkeit ist ein wichtiges kognitives Konzept für das tägliche Leben des Menschen. Es ist aber immer noch nicht komplett verstanden, so dass es ein langjähriges Ziel der Neurowissenschaften ist, das Phänomen grundlegend zu durchdringen. Gleichzeitig wird es aufgrund des mangelnden Verständnisses nur selten in maschinellen Sehsystemen in der Informatik eingesetzt. Das Verständnis von visueller Aufmerksamkeit ist jedoch eine komplexe Herausforderung, da Aufmerksamkeit äußerst vielfältige und scheinbar unterschiedliche Aspekte besitzt. Sie verändert multipel sowohl die neuronalen Feuerraten als auch das menschliche Verhalten. Daher ist es sehr schwierig, eine einheitliche Erklärung von visueller Aufmerksamkeit zu finden, welche für alle Aspekte gleichermaßen gilt. Um dieses Problem anzugehen, hat diese Arbeit das Ziel, einen gemeinsamen Satz neuronaler Mechanismen zu identifizieren, welche sowohl den neuronalen als auch den verhaltenstechnischen Aspekten zugrunde liegen. Die Mechanismen werden in neuro-computationalen Modellen simuliert, wodurch ein einzelnes Modellierungsframework entsteht, welches zum ersten Mal viele und verschiedenste Phänomene von visueller Aufmerksamkeit auf einmal erklären kann. Als Aspekte wurden in dieser Dissertation multiple neurophysiologische Effekte, Realwelt Objektlokalisation und ein visuelles Maskierungsparadigma (OSM) gewählt. In jedem dieser betrachteten Felder wird gleichzeitig der State-of-the-Art verbessert, um auch diesen Teilbereich von Aufmerksamkeit selbst besser zu verstehen. Die drei gewählten Gebiete zeigen, dass der Ansatz grundlegende neurophysiologische, funktionale und verhaltensbezogene Eigenschaften von visueller Aufmerksamkeit erklären kann. Da die gefundenen Mechanismen somit ausreichend sind, das Phänomen so umfassend zu erklären, könnten die Mechanismen vielleicht sogar das essentielle neuronale Substrat von visueller Aufmerksamkeit im Cortex darstellen. Für die Informatik stellt die Arbeit damit ein tiefergehendes Verständnis von visueller Aufmerksamkeit dar. Darüber hinaus liefert das Framework mit seinen neuronalen Mechanismen sogar eine Referenzimplementierung um Aufmerksamkeit in zukünftige Systeme integrieren zu können. Aufmerksamkeit könnte laut der vorliegenden Forschung sehr nützlich für diese sein, da es im Gehirn eine Aufgabenspezifische Optimierung des visuellen Systems bereitstellt. Dieser Aspekt menschlicher Wahrnehmung fehlt meist in den aktuellen, starken Computervisionssystemen, so dass eine Integration in aktuelle Systeme deren Leistung sprunghaft erhöhen und eine neue Klasse definieren dürfte.:1. General introduction 2. The state-of-the-art in modeling visual attention 3. Microcircuit model of attention 4. Object localization with a model of visual attention 5. Object substitution masking 6. General conclusion
Lindqvist, Zebh. "Design Principles for Visual Object Recognition Systems." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-80769.
Full textFreytag, Alexander [Verfasser]. "Lifelong Learning for Visual Recognition Systems / Alexander Freytag." München : Verlag Dr. Hut, 2017. http://d-nb.info/1126297100/34.
Full textRao, Ram Raghavendra. "Audio-visual interaction in multimedia." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.
Full textRabi, Gihad. "Visual speech recognition by recurrent neural networks." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0010/MQ36169.pdf.
Full textErhard, Matthew John. "Visual intent recognition in a multiple camera environment /." Online version of thesis, 2006. http://hdl.handle.net/1850/3365.
Full textBarb, Adrian S. "Knowledge representation and exchange of visual patterns using semantic abstractions." Diss., Columbia, Mo. : University of Missouri-Columbia, 2008. http://hdl.handle.net/10355/6674.
Full textThe entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on July 21, 2009) Includes bibliographical references.
Khan, Rizwan Ahmed. "Détection des émotions à partir de vidéos dans un environnement non contrôlé." Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10227/document.
Full textCommunication in any form i.e. verbal or non-verbal is vital to complete various daily routine tasks and plays a significant role inlife. Facial expression is the most effective form of non-verbal communication and it provides a clue about emotional state, mindset and intention. Generally automatic facial expression recognition framework consists of three step: face tracking, feature extraction and expression classification. In order to built robust facial expression recognition framework that is capable of producing reliable results, it is necessary to extract features (from the appropriate facial regions) that have strong discriminative abilities. Recently different methods for automatic facial expression recognition have been proposed, but invariably they all are computationally expensive and spend computational time on whole face image or divides the facial image based on some mathematical or geometrical heuristic for features extraction. None of them take inspiration from the human visual system in completing the same task. In this research thesis we took inspiration from the human visual system in order to find from where (facial region) to extract features. We argue that the task of expression analysis and recognition could be done in more conducive manner, if only some regions are selected for further processing (i.e.salient regions) as it happens in human visual system. In this research thesis we have proposed different frameworks for automatic recognition of expressions, all getting inspiration from the human vision. Every subsequently proposed addresses the shortcomings of the previously proposed framework. Our proposed frameworks in general, achieve results that exceeds state-of-the-artmethods for expression recognition. Secondly, they are computationally efficient and simple as they process only perceptually salient region(s) of face for feature extraction. By processing only perceptually salient region(s) of the face, reduction in feature vector dimensionality and reduction in computational time for feature extraction is achieved. Thus making them suitable for real-time applications
Athukorala, Aravinda S. "A strategy for the visual recognition of objects in an industrial environment." Thesis, University of Edinburgh, 1985. http://hdl.handle.net/1842/4860.
Full textGobin, Paméla. "Propagation de l’activation entre le lexique orthographique et le système affectif." Thesis, Bordeaux 2, 2011. http://www.theses.fr/2011BOR21821/document.
Full textThe aim of this thesis was to study the activation of the affective system mediated by the orthographic lexicon during visual word recognition. More precisely, we have investigated the influence of the negative emotional orthographic neighbourhood and the sensitivity of orthographic priming to the negative valence of higher-frequency neighbours in the lexical decision task (LDT) combined with a priming paradigm. The recording of behavioural and electrophysiological (event-related brain potentials) measures provides also evidences on the early activation of affective components of the neighbours. Neutral words (e.g., FUSEAU [spindle], TOISON [fleece]) with one higher-frequency neighbour, that was either neutral (e.g., museau [muzzle]) or negative (e.g., poison), were presented in the LDT. They were preceded either by their neighbour or by a nonalphabetic control prime, presented 66 or 166 ms. Firstly, the emotional state of participants was controlled (Experiments 1-4). Secondly, it was manipulated a priori by a sad mood induction (Experiments 5 and 7) or determined a posteriori by considering the burnout level of participants (Experiments 7-8). The processing of negative or neutral frequent words have been also examined (Experiment 6). The results showed an inhibitory effect of negative emotional orthographic neighbourhood on target recognition time and an inhibitory effect of orthographic priming, increased by prime duration. Three components (P150, N200, and N400) were the electrophysiological correlates of orthographic priming effect, also depending on the negative valence of higher-frequency neighbours and prime duration. Finally, the emotional state of individuals modified the orthographic priming effect. The results are interpreted in an Interactive Activation model extended to affective processing
Bridges, Seth. "Low-power visual pattern classification in analog VLSI /." Thesis, Connect to this title online; UW restricted, 2006. http://hdl.handle.net/1773/6984.
Full textTivive, Fok Hing Chi. "A new class of convolutional neural networks based on shunting inhibition with applications to visual pattern recognition." Access electronically, 2006. http://www.library.uow.edu.au/adt-NWU/public/adt-NWU20061025.164437/index.html.
Full textJafari, Moghadamfard Ramtin, and Saeid Payvar. "The Potential of Visual Features : to Improve Voice Recognition Systems in Vehicles Noisy Environment." Thesis, Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-27273.
Full textLi, Jun. "Image texture decomposition and application in food quality analysis /." free to MU campus, to others for purchase, 2001. http://wwwlib.umi.com/cr/mo/fullcit?p3036842.
Full textPenatti, Otávio Augusto Bizetto 1984. "Image and video representations based on visual = Representações de imagens e vídeos baseadas em dicionários visuais." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275667.
Full textTese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-22T02:56:23Z (GMT). No. of bitstreams: 1 Penatti_OtavioAugustoBizetto_D.pdf: 9249507 bytes, checksum: cb1c8b77d85ae6c83d2572ab7848025b (MD5) Previous issue date: 2012
Resumo: Codificar de maneira eficaz as propriedades visuais de conteúdo multimídia é um desafio. Uma abordagem popular para tratar esse desafio consiste no modelo de dicionários visuais. Neste modelo, imagens são consideradas como um conjunto desordenado de características locais e são representadas por um saco de palavras visuais (bag of visual words). Nesta tese, trabalhamos em três problemas de pesquisa relacionados ao modelo de dicionários visuais. O primeiro deles é relacionado ao poder de generalização dos dicionários visuais, que se refere à capacidade de criar boas representações para imagens de uma dada coleção mesmo usando um dicionário criado sobre outra coleção ou usando um dicionário criado sobre pequenas amostras da coleção. Experimentos foram realizados em coleções fechadas de imagens e em um ambiente Web. Os resultados obtidos sugerem que o uso de amostras diversas em termos de aparência é suficiente para se gerar bons dicionários. O segundo problema de pesquisa é relacionado à importância da informação espacial das palavras visuais no espaço da imagem. Esta informação pode ser fundamental para diferenciar tipos de objetos e cenas. As técnicas mais comuns de pooling normalmente descartam a configuração espacial das palavras visuais na imagem. Propomos uma nova técnica de pooling, chamada de Word Spatial Arrangement (WSA), que codifica a posição relativa das palavras visuais na imagem e tem a vantagem de gerar vetores de características mais compactos do que a maioria das técnicas de pooling espacial existentes. Experimentos em recuperação de imagens mostram que o WSA supera em eficácia a técnica mais popular de pooling espacial, as pirâmides espaciais. O terceiro problema de pesquisa em investigação nesta tese é relacionado à falta de informação semântica no modelo de dicionários visuais. Mostramos que o problema de não haver semântica no espaço de características de baixo nível é reduzido ao passarmos para o espaço das representações baseadas em sacos de palavras visuais. Contudo, mesmo no espaço destas representações, mostramos que existe pouca separabilidade entre distribuições de distância de conceitos semânticos diferentes. Portanto, questionamos sobre passar para um novo espaço e propomos uma representação baseada em palavras visuais que carreguem mais semântica de acordo com a percepção visual humana. Propomos um modelo de saco de protótipos, segundo o qual os protótipos são elementos com maior significado. Esta abordagem tem potencial para reduzir a chamada lacuna semântica entre a interpretação do usuário sobre uma imagem e a sua representação. Propomos um dicionário baseado em cenas, que é usado para representar vídeos em experimentos de geolocalização. Geo-localização de vídeos é a tarefa de atribuir uma posição geográfica para um dado vídeo. A avaliação foi conduzida no contexto da Placing Task da competição MediaEval e o modelo proposto mostrou resultados promissores
Abstract: Effectively encoding visual properties from multimedia content is challenging. One popular approach to deal with this challenge is the visual dictionary model. In this model, images are handled as an unordered set of local features being represented by the so-called bag-of-(visual-) words vector. In this thesis, we work on three research problems related to the visual dictionary model. The first research problem is concerned with the generalization power of dictionaries, which is related to the ability of representing well images from one dataset even using a dictionary created over other dataset, or using a dictionary created on small dataset samples. We perform experiments in closed datasets, as well as in a Web environment. Obtained results suggest that diverse samples in terms of appearances are enough to generate a good dictionary. The second research problem is related to the importance of the spatial information of visual words in the image space, which could be crucial to distinguish types of objects and scenes. The traditional pooling methods usually discard the spatial configuration of visual words in the image. We have proposed a pooling method, named Word Spatial Arrangement (WSA), which encodes the relative position of visual words in the image, having the advantage of generating more compact feature vectors than most of the existing spatial pooling strategies. Experiments for image retrieval show that WSA outperforms the most popular spatial pooling method, the Spatial Pyramids. The third research problem under investigation in this thesis is related to the lack of semantic information in the visual dictionary model. We show that the problem of having no semantics in the space of low-level descriptions is reduced when we move to the bag-of-words representation. However, even in the bag-of-words space, we show that there is little separability between distance distributions of different semantic concepts. Therefore, we question about moving one step further and propose a representation based on visual words which carry more semantics, according to the human visual perception. We have proposed a bag-of-prototypes model, according to which the prototypes are the elements containing more semantics. This approach goes in the direction of reducing the so-called semantic gap problem. We propose a dictionary based on scenes that are used ix for video representation in experiments for video geocoding. Video geocoding is the task of assigning a geographic location to a given video. The evaluation was performed in the context of the Placing Task of the MediaEval challenge and the proposed bag-of-scenes model has shown promising performance
Doutorado
Ciência da Computação
Doutor em Ciência da Computação
Ma, Xiren. "Deep Learning-Based Vehicle Recognition Schemes for Intelligent Transportation Systems." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42247.
Full textKozlovski, Nikolai. "TEXT-IMAGE RESTORATION AND TEXT ALIGNMENT FOR MULTI-ENGINE OPTICAL CHARACTER RECOGNITION SYSTEMS." Master's thesis, University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3607.
Full textM.S.E.E.
Department of Electrical and Computer Engineering
Engineering and Computer Science
Electrical Engineering
Ndiour, Ibrahima Jacques. "Dynamic curve estimation for visual tracking." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37283.
Full textOsorio, Fernando Santos. "Um estudo sobre reconhecimento visual de caracteres através de redes neurais." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1991. http://hdl.handle.net/10183/24184.
Full textThis work presents a study of visual character recognition using neural networks. It describes some aspects related to Digital Image Processing, character recognition systems and neural networks. The implementation proposal of one OCR system, for printed character recognition, is also presented. This system uses one neural network specifically developed for this purpose. The OCR system, named N2OCR, has a prototype implementation, which is also described. Several topics related to Digital Image Processing are presented, including some referent to image acquisition, image processing and pattern recognition. Some aspects on image acquisiton are treated, like acquisition equipments and kinds of image data obtained from those equipments. The following items about text image processing are mentioned: halftoning, hystogram generation and alteration, thresholding and filtering operations. A brief analysis about pattern recognition related to this theme is done. Different kinds of character recognition systems are described, as the techniques and algorithms used by them. Besides, a di cussi on about performance estimation of this OCR systems is done, including typical OCR problems description and analysis. In this work, neural networks are presented, describing their characteristics, historical aspects and research evolution in this field. Different famous neural network models are described: Perceptron, Adaline, Madaline, multilevel networks. ART, Hopfield's model , Boltzmann machine, BAM and Kohonen's model. From the analysis of such different neural network models, we arrive to a proposal of a new neural net model, where are described items related to learning, recognition and possible model extensions. A possible hardware implementation of this model is also presented. A global vision of N2OCR system is presented at the end of this work, describing each of its modules. A description of the prototype implementation and functions is also provided.
Makrushin, Andrey [Verfasser], and Jana [Akademischer Betreuer] Dittmann. "Visual recognition systems in a car passenger compartment with the focus on facial driver identification / Andrey Makrushin. Betreuer: Jana Dittmann." Magdeburg : Universitätsbibliothek, 2014. http://d-nb.info/1054638888/34.
Full textWåhlén, Herje. "Voice Assisted Visual Search." Thesis, Umeå universitet, Institutionen för informatik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-38204.
Full textVoice-Assisted Visual Search
Lee, Jehoon. "Statistical and geometric methods for visual tracking with occlusion handling and target reacquisition." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/43582.
Full textPereira, Joaquim Jose Fantin. "Uma ferramenta de programação visual para previsão e reconhecimento de padrões." [s.n.], 2007. http://repositorio.unicamp.br/jspui/handle/REPOSIP/259920.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação
Made available in DSpace on 2018-08-10T05:34:59Z (GMT). No. of bitstreams: 1 Pereira_JoaquimJoseFantin_M.pdf: 1686812 bytes, checksum: 5ff18327a2f501a5035fbf6c56ae0eda (MD5) Previous issue date: 2007
Resumo: A tomada de decisão, em qualquer setor e nos mais diversos níveis, é um processo cada vez mais complexo, principalmente em função do nível de incerteza em relação ao futuro. Neste contexto, a disponibilidade de previsões torna-se um fator importante para uma decisão mais eficaz. As ferramentas de reconhecimento de padrões, por sua vez, são importantes em muitas áreas, tais como nas determinações de comportamentos típicos e em sistemas de controle. Nessa conjuntura, a proposta deste trabalho consistiu em explorar a criação e o uso de uma linguagem de programação visual, denominada Linguagem VisualPREV, de modo a facilitar a concepção e a execução dos modelos de previsão e classificação. Nesta Linguagem, blocos visuais colocados num diagrama (interface visual computacional) representam conceitos envolvidos num processo de modelagem do problema. O modelo pode então ser configurado, executado e armazenado para acesso futuro. Embora essa escolha implique uma perda de vantagens exclusivas da programação em código tradicional, como a maior flexibilidade para programação genérica, por exemplo, a linguagem diminui sensivelmente o tempo de criação dos modelos específicos para tratamento de dados em previsão de séries temporais e reconhecimento de padrões. Em algumas aplicações com dados relevantes, a linguagem foi avaliada com critérios baseados em métricas de usabilidade e os resultados foram discutidos ao longo do trabalho
Abstract : Decision making, in any area and in many different levels, is a process with growing complexity, mainly if you consider the level of uncertainty related to the future. In this context, the possibility of forecasting plays a major role in an efficient decision. On the other hand, pattern recognition tools are important in many areas, like fitting typical behaviors and in control systems, as well. In this context, we propose a visual programming language, called VisualPREV Language, intended to make easier the conception and execution of forecasting and pattern recognition models. Within this language, visual blocks that can be put into a diagram (computational visual interface) represent concepts involved when modeling the processes. These models can be configured, executed and stored for future access. Although these approach implies losing exclusive advantages of traditional programming (like flexibility of generic programming, for example), VisualPREV decreases considerably the amount of time needed for creating specific models for forecasting and pattern recognition. In few applications with relevant data, the language was evaluated based on usability metrics, and the results were discussed throughout the text
Mestrado
Energia Eletrica
Mestre em Engenharia Elétrica
Blanco, Myra. "Relationship Between Driver Characteristics, Nighttime Driving Risk Perception, and Visual Performance under Adverse and Clear Weather Conditions and Different Vision Enhancement Systems." Diss., Virginia Tech, 2002. http://hdl.handle.net/10919/27806.
Full textPh. D.
Kenklies, Kai Malte. "Instructing workers through a head-worn Augmented Reality display and through a stationary screen on manual industrial assembly tasks : A comparison study." Thesis, Umeå universitet, Institutionen för informatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-172888.
Full textLam, Benny, and Jakob Nilsson. "Creating Good User Experience in a Hand-Gesture-Based Augmented Reality Game." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-156878.
Full textHill, Evelyn June. "Applying statistical and syntactic pattern recognition techniques to the detection of fish in digital images." University of Western Australia. School of Mathematics and Statistics, 2004. http://theses.library.uwa.edu.au/adt-WU2004.0070.
Full textHernández-Vela, Antonio. "From pixels to gestures: learning visual representations for human analysis in color and depth data sequences." Doctoral thesis, Universitat de Barcelona, 2015. http://hdl.handle.net/10803/292488.
Full textL’anàlisi visual de persones a partir d'imatges és un tema de recerca molt important, atesa la rellevància que té a una gran quantitat d'aplicacions dins la visió per computador, com per exemple: detecció de vianants, monitorització i vigilància,interacció persona-màquina, “e-salut” o sistemes de recuperació d’matges a partir de contingut, entre d'altres. En aquesta tesi volem aprendre diferents representacions visuals del cos humà, que siguin útils per a la anàlisi visual de persones en imatges i vídeos. Per a tal efecte, analitzem diferents modalitats d'imatge com són les imatges de color RGB i les imatges de profunditat, i adrecem el problema a diferents nivells d'abstracció, des dels píxels fins als gestos: segmentació de persones, estimació de la pose humana i reconeixement de gestos. Primer, mostrem com la segmentació binària (objecte vs. fons) del cos humà en seqüències d'imatges ajuda a eliminar soroll pertanyent al fons de l'escena en qüestió. El mètode presentat, basat en optimització “Graph cuts”, imposa consistència espai-temporal a Ies màscares de segmentació obtingudes en “frames” consecutius. En segon lloc, presentem un marc metodològic per a la segmentació multi-classe, amb la qual podem obtenir una descripció més detallada del cos humà, en comptes d'obtenir una simple representació binària separant el cos humà del fons, podem obtenir màscares de segmentació més detallades, separant i categoritzant les diferents parts del cos. A un nivell d'abstraccíó més alt, tenim com a objectiu obtenir representacions del cos humà més simples, tot i ésser suficientment descriptives. Els mètodes d'estimació de la pose humana sovint es basen en models esqueletals del cos humà, formats per segments (o rectangles) que representen les extremitats del cos, connectades unes amb altres seguint les restriccions cinemàtiques del cos humà. A la pràctica, aquests models esqueletals han de complir certes restriccions per tal de poder aplicar mètodes d'inferència que permeten trobar la solució òptima de forma eficient, però a la vegada aquestes restriccions suposen una gran limitació en l'expressivitat que aques.ts models son capaços de capturar. Per tal de fer front a aquest problema, proposem un enfoc “top-down” per a predir la posició de les parts del cos del model esqueletal, introduïnt una representació de parts de mig nivell basada en “Poselets”. Finalment. proposem un marc metodològic per al reconeixement de gestos, basat en els “bag of visual words”. Aprofitem els avantatges de les imatges RGB i les imatges; de profunditat combinant vocabularis visuals específiques per a cada modalitat, emprant late fusion. Proposem un nou descriptor per a imatges de profunditat invariant a rotació, que millora l'estat de l'art, i fem servir piràmides espai-temporals per capturar certa estructura espaial i temporal dels gestos. Addicionalment, presentem una reformulació probabilística del mètode “Dynamic Time Warping” per al reconeixement de gestos en seqüències d'imatges. Més específicament, modelem els gestos amb un model probabilistic gaussià que implícitament codifica possibles deformacions tant en el domini espaial com en el temporal.
Dekhtiar, Jonathan. "Deep Learning and unsupervised learning to automate visual inspection in the manufacturing industry." Thesis, Compiègne, 2019. http://www.theses.fr/2019COMP2513.
Full textAlthough studied since 1970, automatic visual inspection on production lines still struggles to be applied on a large scale and at low cost. The methods used depend greatly on the availability of domain experts. This inevitably leads to increased costs and reduced flexibility in the methods used. Since 2012, advances in the field of Deep Learning have enabled many advances in this direction, particularly thanks to convolutional neura networks that have achieved near-human performance in many areas associated with visual perception (e.g. object recognition and detection, etc.). This thesis proposes an unsupervised approach to meet the needs of automatic visual inspection. This method, called AnoAEGAN, combines adversarial learning and the estimation of a probability density function. These two complementary approaches make it possible to jointly estimate the pixel-by-pixel probability of a visual defect on an image. The model is trained from a very limited number of images (i.e. less than 1000 images) without using expert knowledge to "label" the data beforehand. This method allows increased flexibility with a limited training time and therefore great versatility, demonstrated on ten different tasks without any modification of the model. This method should reduce development costs and the time required to deploy in production. This method can also be deployed in a complementary way to a supervised approach in order to benefit from the advantages of each approach