Log in

Relevant bibliographies by topics / Sign language recognition / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Sign language recognition.

Dissertations / Theses on the topic 'Sign language recognition'

Author: Grafiati

Published: 4 June 2021

Last updated: 20 February 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Sign language recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Nel, Warren. "An integrated sign language recognition system." Thesis, University of Western Cape, 2014. http://hdl.handle.net/11394/3584.

Full text

Abstract:

Doctor Educationis
Research has shown that five parameters are required to recognize any sign language gesture: hand shape, location, orientation and motion, as well as facial expressions. The South African Sign Language (SASL) research group at the University of the Western Cape has created systems to recognize Sign Language gestures using single parameters. Using a single parameter can cause ambiguities in the recognition of signs that are similarly signed resulting in a restriction of the possible vocabulary size. This research pioneers work at the group towards combining multiple parameters to achieve a larger recognition vocabulary set. The proposed methodology combines hand location and hand shape recognition into one combined recognition system. The system is shown to be able to recognize a very large vocabulary of 50 signs at a high average accuracy of 74.1%. This vocabulary size is much larger than existing SASL recognition systems, and achieves a higher accuracy than these systems in spite of the large vocabulary. It is also shown that the system is highly robust to variations in test subjects such as skin colour, gender and body dimension. Furthermore, the group pioneers research towards continuously recognizing signs from a video stream, whereas existing systems recognized a single sign at a time. To this end, a highly accurate continuous gesture segmentation strategy is proposed and shown to be able to accurately recognize sentences consisting of five isolated SASL gestures.

APA, Harvard, Vancouver, ISO, and other styles

2

Zafrulla, Zahoor. "Automatic recognition of American sign language classifiers." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53461.

Full text

Abstract:

Automatically recognizing classifier-based grammatical structures of American Sign Language (ASL) is a challenging problem. Classifiers in ASL utilize surrogate hand shapes for people or "classes" of objects and provide information about their location, movement and appearance. In the past researchers have focused on recognition of finger spelling, isolated signs, facial expressions and interrogative words like WH-questions (e.g. Who, What, Where, and When). Challenging problems such as recognition of ASL sentences and classifier-based grammatical structures remain relatively unexplored in the field of ASL recognition. One application of recognition of classifiers is toward creating educational games to help young deaf children acquire language skills. Previous work developed CopyCat, an educational ASL game that requires children to engage in a progressively more difficult expressive signing task as they advance through the game. We have shown that by leveraging context we can use verification, in place of recognition, to boost machine performance for determining if the signed responses in an expressive signing task, like in the CopyCat game, are correct or incorrect. We have demonstrated that the quality of a machine verifier's ability to identify the boundary of the signs can be improved by using a novel two-pass technique that combines signed input in both forward and reverse directions. Additionally, we have shown that we can reduce CopyCat's dependency on custom manufactured hardware by using an off-the-shelf Microsoft Kinect depth camera to achieve similar verification performance. Finally, we show how we can extend our ability to recognize sign language by leveraging depth maps to develop a method using improved hand detection and hand shape classification to recognize selected classifier-based grammatical structures of ASL.

APA, Harvard, Vancouver, ISO, and other styles

3

Nayak, Sunita. "Representation and learning for sign language recognition." [Tampa, Fla] : University of South Florida, 2008. http://purl.fcla.edu/usf/dc/et/SFE0002362.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Nurena-Jara, Roberto, Cristopher Ramos-Carrion, and Pedro Shiguihara-Juarez. "Data collection of 3D spatial features of gestures from static peruvian sign language alphabet for sign language recognition." Institute of Electrical and Electronics Engineers Inc, 2020. http://hdl.handle.net/10757/656634.

Full text

Abstract:

El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado.
Peruvian Sign Language Recognition (PSL) is approached as a classification problem. Previous work has employed 2D features from the position of hands to tackle this problem. In this paper, we propose a method to construct a dataset consisting of 3D spatial positions of static gestures from the PSL alphabet, using the HTC Vive device and a well-known technique to extract 21 keypoints from the hand to obtain a feature vector. A dataset of 35, 400 instances of gestures for PSL was constructed and a novel way to extract data was stated. To validate the appropriateness of this dataset, a comparison of four baselines classifiers in the Peruvian Sign Language Recognition (PSLR) task was stated, achieving 99.32% in the average in terms of F1 measure in the best case.
Revisión por pares

APA, Harvard, Vancouver, ISO, and other styles

5

Cooper, H. M. "Sign language recognition : generalising to more complex corpora." Thesis, University of Surrey, 2010. http://epubs.surrey.ac.uk/843617/.

Full text

Abstract:

The aim of this thesis is to find new approaches to Sign Language Recognition (SLR) which are suited to working with the Limited corpora currently available. Data available for SLR is of limited quality; low resolution and frame rates make the task of recognition even more complex. The content is rarely natural, concentrating on isolated signs and filmed under laboratory conditions. In addition, the amount of accurately labelled data is minimal. To this end, several contributions are made: Tracking the hands is eschewed in favour of detection based techniques more robust to noise; for both signs and for linguistically-motivated sign sub-units are investigated, to make best use of limited data sets. Finally, an algorithm is proposed to learn signs from the inset signers on TV, with the aid of the accompanying subtitles, thus increasing the corpus of data available. Tracking fast moving hands under laboratory conditions is a complex task, move this to real world data and the challenge is even greater. When using tracked data as a base for SLR, the errors in the tracking are compounded at the classification stage. Proposed instead, is a novel sign detection method, which views space-time as a 3D volume and the sign within it as an object to be located. Features are combined into strong classifiers using a novel boosting implementation designed to create optimal classifiers over sparse datasets. Using boosted volumetric features, on a robust frame differenced input, average classification rates reach 71% on seen signers and 66% on a mixture of seen and unseen signers, with individual sign classification rates gaining 95%. Using a classifier per sign approach to SLR, means that data sets need to contain numerous examples of the signs to be learnt. Instead, this thesis proposes learnt classifiers to detect the common sub-units of sign. The responses of these classifiers can then be combined for recognition at the sign level. This approach requires fewer examples per sign to be learnt, since the sub-unit detectors are trained on data from multiple signs. It is also faster at detection time since there are fewer classifiers to consult, the number of these being limited by the linguistics of sign and not the number of signs being detected. For this method, appearance based boosted classifiers are introduced to distinguish the sub-units of sign. Results show that when combined with temporal models, these novel sub-unit classifiers, can outperform similar- classifiers learnt on tracked results. As an added side effect; since the sub-units are linguistically derived they can be used independently to help linguistic annotators. Since sign language data sets are costly to collect and annotate, there are not many publicly available. Those which are, tend to be constrained in content and often taken under laboratory conditions. However, in the UK, the British Broadcasting Corporation (BBC) regularly produces programs with an inset signer and corresponding subtitles. This provides a natural signer, covering a wide range of topics, in real world conditions. While it has no ground truth, it is proposed that the translated subtitles can provide weak labels for learning signs. The final contributions of this thesis, lead to an innovative approach to learn signs from these co-occurring streams of data. Using a unique, temporally constrained, version of the Apriori mining algorithm, similar sections of video are identified as possible sign locations. These estimates are improved upon by introducing the concept of contextual negatives, removing contextually similar noise. Combined with an iterative honing process, to enhance the localisation of the target sign, 23 word/sign combinations are learnt from a 30 minute news broadcast, providing a novel method for automatic data set creation.

APA, Harvard, Vancouver, ISO, and other styles

6

Li, Pei. "Hand shape estimation for South African sign language." Thesis, University of the Western Cape, 2012. http://hdl.handle.net/11394/4374.

Full text

Abstract:

>Magister Scientiae - MSc
Hand shape recognition is a pivotal part of any system that attempts to implement Sign Language recognition. This thesis presents a novel system which recognises hand shapes from a single camera view in 2D. By mapping the recognised hand shape from 2D to 3D,it is possible to obtain 3D co-ordinates for each of the joints within the hand using the kinematics embedded in a 3D hand avatar and smooth the transformation in 3D space between any given hand shapes. The novelty in this system is that it does not require a hand pose to be recognised at every frame, but rather that hand shapes be detected at a given step size. This architecture allows for a more efficient system with better accuracy than other related systems. Moreover, a real-time hand tracking strategy was developed that works efficiently for any skin tone and a complex background.

APA, Harvard, Vancouver, ISO, and other styles

7

Belissen, Valentin. "From Sign Recognition to Automatic Sign Language Understanding : Addressing the Non-Conventionalized Units." Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG064.

Full text

Abstract:

Les langues des signes (LS) se sont développées naturellement au sein des communautés de Sourds. Ne disposant pas de forme écrite, ce sont des langues orales, utilisant les canaux gestuel pour l’expression et visuel pour la réception. Ces langues peu dotées ne font pas l'objet d'un large consensus au niveau de leur description linguistique. Elles intègrent des signes lexicaux, c’est-à-dire des unités conventionnalisées du langage dont la forme est supposée arbitraire, mais aussi – et à la différence des langues vocales, si on ne considère pas la gestualité co-verbale – des structures iconiques, en utilisant l’espace pour organiser le discours. L’iconicité, ce lien entre la forme d’un signe et le sens qu’il porte, est en effet utilisée à plusieurs niveaux du discours en LS.La plupart des travaux de recherche en reconnaissance automatique de LS se sont en fait attelés à reconnaitre les signes lexicaux, d’abord sous forme isolée puis au sein de LS continue. Les corpus de vidéos associés à ces recherches sont souvent relativement artificiels, consistant en la répétition d’énoncés élicités sous forme écrite, parfois en LS interprétée, qui peut également présenter des différences importantes avec la LS naturelle.Dans cette thèse, nous souhaitons montrer les limites de cette approche, en élargissant cette perspective pour envisager la reconnaissance d’éléments utilisés pour la construction du discours ou au sein de structures illustratives.Pour ce faire, nous montrons l’intérêt et les limites des corpus de linguistes : la langue y est naturelle et les annotations parfois détaillées, mais pas toujours utilisables en données d’entrée de système d’apprentissage automatique, car pas nécessairement cohérentes. Nous proposons alors la refonte d’un corpus de dialogue en langue des signes française, Dicta-Sign-LSF-v2, avec des annotations riches et cohérentes, suivant un schéma d’annotation partagé par de nombreux linguistes.Nous proposons ensuite une redéfinition du problème de la reconnaissance automatique de LS, consistant en la reconnaissance de divers descripteurs linguistiques, plutôt que de se focaliser sur les signes lexicaux uniquement. En parallèle, nous discutons de métriques de la performance adaptées.Pour réaliser une première expérience de reconnaissance de descripteurs linguistiques non uniquement lexicaux, nous développons alors une représentation compacte et généralisable des signeurs dans les vidéos. Celle-ci est en effet réalisée par un traitement parallèle des mains, du visage et du haut du corps, en utilisant des outils existants ainsi que des modèles que nous avons développés. Un prétraitement permet alors de former un vecteur de caractéristiques pertinentes. Par la suite, nous présentons une architecture adaptée et modulaire d’apprentissage automatique de descripteurs linguistiques, consistant en un réseau de neurones récurrent et convolutionnel.Nous montrons enfin via une analyse quantitative et qualitative l’effectivité du modèle proposé, testé sur Dicta-Sign-LSF-v2. Nous réalisons en premier lieu une analyse approfondie du paramétrage, en évaluant tant le modèle d'apprentissage que la représentation des signeurs. L’étude des prédictions du modèle montre alors le bien-fondé de l'approche proposée, avec une performance tout à fait intéressante pour la reconnaissance continue de quatre descripteurs linguistiques, notamment au vu de l’incertitude relative aux annotations elles-mêmes. La segmentation de ces dernières est en effet subjective, et la pertinence même des catégories utilisées n’est pas démontrée de manière forte. Indirectement, le modèle proposé pourrait donc permettre de mesurer la validité de ces catégories. Avec plusieurs pistes d’amélioration envisagées, notamment sur la représentation des signeurs et l’utilisation de corpus de taille supérieure, le bilan est très encourageant et ouvre la voie à une acception plus large de la reconnaissance continue de langue des signes
Sign Languages (SLs) have developed naturally in Deaf communities. With no written form, they are oral languages, using the gestural channel for expression and the visual channel for reception. These poorly endowed languages do not meet with a broad consensus at the linguistic level. These languages make use of lexical signs, i.e. conventionalized units of language whose form is supposed to be arbitrary, but also - and unlike vocal languages, if we don't take into account the co-verbal gestures - iconic structures, using space to organize discourse. Iconicity, which is defined as the existence of a similarity between the form of a sign and the meaning it carries, is indeed used at several levels of SL discourse.Most research in automatic Sign Language Recognition (SLR) has in fact focused on recognizing lexical signs, at first in the isolated case and then within continuous SL. The video corpora associated with such research are often relatively artificial, consisting of the repetition of elicited utterances in written form. Other corpora consist of interpreted SL, which may also differ significantly from natural SL, as it is strongly influenced by the surrounding vocal language.In this thesis, we wish to show the limits of this approach, by broadening this perspective to consider the recognition of elements used for the construction of discourse or within illustrative structures.To do so, we show the interest and the limits of the corpora developed by linguists. In these corpora, the language is natural and the annotations are sometimes detailed, but not always usable as input data for machine learning systems, as they are not necessarily complete or coherent. We then propose the redesign of a French Sign Language dialogue corpus, Dicta-Sign-LSF-v2, with rich and consistent annotations, following an annotation scheme shared by many linguists.We then propose a redefinition of the problem of automatic SLR, consisting in the recognition of various linguistic descriptors, rather than focusing on lexical signs only. At the same time, we discuss adapted metrics for relevant performance assessment.In order to perform a first experiment on the recognition of linguistic descriptors that are not only lexical, we then develop a compact and generalizable representation of signers in videos. This is done by parallel processing of the hands, face and upper body, using existing tools and models that we have set up. Besides, we preprocess these parallel representations to obtain a relevant feature vector. We then present an adapted and modular architecture for automatic learning of linguistic descriptors, consisting of a recurrent and convolutional neural network.Finally, we show through a quantitative and qualitative analysis the effectiveness of the proposed model, tested on Dicta-Sign-LSF-v2. We first carry out an in-depth analysis of the parameterization, evaluating both the learning model and the signer representation. The study of the model predictions then demonstrates the merits of the proposed approach, with a very interesting performance for the continuous recognition of four linguistic descriptors, especially in view of the uncertainty related to the annotations themselves. The segmentation of the latter is indeed subjective, and the very relevance of the categories used is not strongly demonstrated. Indirectly, the proposed model could therefore make it possible to measure the validity of these categories. With several areas for improvement being considered, particularly in terms of signer representation and the use of larger corpora, the results are very encouraging and pave the way for a wider understanding of continuous Sign Language Recognition

APA, Harvard, Vancouver, ISO, and other styles

8

Rupe, Jonathan C. "Vision-based hand shape identification for sign language recognition /." Link to online version, 2005. https://ritdml.rit.edu/dspace/handle/1850/940.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Mudduluru, Sravani. "Indian Sign Language Numbers Recognition using Intel RealSense Camera." DigitalCommons@CalPoly, 2017. https://digitalcommons.calpoly.edu/theses/1815.

Full text

Abstract:

The use of gesture based interaction with devices has been a significant area of research in the field of computer science since many years. The main idea of these kind of interactions is to ease the user experience by providing high degree of freedom and provide more interactive way of communication with the technology in a natural way. The significant areas of applications of gesture recognition are in video gaming, human computer interaction, virtual reality, smart home appliances, medical systems, robotics and several others. With the availability of the devices such as Kinect, Leap Motion and Intel RealSense cameras accessing the depth as well as color information has become available to the public with affordable costs. The Intel RealSense camera is a USB powered controller that can be supported with few hardware requirements such as Windows 8 and above. This is one such camera that can be used to track the human body information similar to the Kinect and Leap Motion. It was designed specifically to provide more minute information about the different parts of the human body such as face, hand etc. This camera was designed to give users more natural and intuitive interactions with the smart devices by providing some features such as creating 3D avatars, high quality 3D prints, high-quality graphic gaming visuals, virtual reality and others. The main aim of this study is to try to analyze hand tracking information and build a training model in order to decide if this camera is suitable for sign language. In this study, we have extracted the joint information of 22 joint labels per single hand .We trained the model to identify the Indian Sign Language(ISL) numbers from 0-9. Through this study we analyzed that multi-class SVM model showed higher accuracy of 93.5% when compared to the decision tree and KNN models.

APA, Harvard, Vancouver, ISO, and other styles

10

Brashear, Helene Margaret. "Improving the efficacy of automated sign language practice tools." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/34703.

Full text

Abstract:

The CopyCat project is an interdisciplinary effort to create a set of computer-aided language learning tools for deaf children. The CopyCat games allow children to interact with characters using American Sign Language (ASL). Through Wizard of Oz pilot studies we have developed a set of games, shown their efficacy in improving young deaf children's language and memory skills, and collected a large corpus of signing examples. Our previous implementation of the automatic CopyCat games uses automatic sign language recognition and verification in the infrastructure of a memory repetition and phrase verification task. The goal of my research is to expand the automatic sign language system to transition the CopyCat games to include the flexibility of a dialogue system. I have created a labeling ontology from analysis of the CopyCat signing corpus, and I have used the ontology to describe the contents of the CopyCat data set. This ontology was used to change and improve the automatic sign language recognition system and to add flexibility to language use in the automatic game.

APA, Harvard, Vancouver, ISO, and other styles

11

Yin, Pei. "Segmental discriminative analysis for American Sign Language recognition and verification." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33939.

Full text

Abstract:

This dissertation presents segmental discriminative analysis techniques for American Sign Language (ASL) recognition and verification. ASL recognition is a sequence classification problem. One of the most successful techniques for recognizing ASL is the hidden Markov model (HMM) and its variants. This dissertation addresses two problems in sign recognition by HMMs. The first is discriminative feature selection for temporally-correlated data. Temporal correlation in sequences often causes difficulties in feature selection. To mitigate this problem, this dissertation proposes segmentally-boosted HMMs (SBHMMs), which construct the state-optimized features in a segmental and discriminative manner. The second problem is the decomposition of ASL signs for efficient and accurate recognition. For this problem, this dissertation proposes discriminative state-space clustering (DISC), a data-driven method of automatically extracting sub-sign units by state-tying from the results of feature selection. DISC and SBHMMs can jointly search for discriminative feature sets and representation units of ASL recognition. ASL verification, which determines whether an input signing sequence matches a pre-defined phrase, shares similarities with ASL recognition, but it has more prior knowledge and a higher expectation of accuracy. Therefore, ASL verification requires additional discriminative analysis not only in utilizing prior knowledge but also in actively selecting a set of phrases that have a high expectation of verification accuracy in the service of improving the experience of users. This dissertation describes ASL verification using CopyCat, an ASL game that helps deaf children acquire language abilities at an early age. It then presents the "probe" technique which automatically searches for an optimal threshold for verification using prior knowledge and BIG, a bi-gram error-ranking predictor which efficiently selects/creates phrases that, based on the previous performance of existing verification systems, should have high verification accuracy. This work demonstrates the utility of the described technologies in a series of experiments. SBHMMs are validated in ASL phrase recognition as well as various other applications such as lip reading and speech recognition. DISC-SBHMMs consistently produce fewer errors than traditional HMMs and SBHMMs in recognizing ASL phrases using an instrumented glove. Probe achieves verification efficacy comparable to the optimum obtained from manually exhaustive search. Finally, when verifying phrases in CopyCat, BIG predicts which CopyCat phrases, even unseen in training, will have the best verification accuracy with results comparable to much more computationally intensive methods.

APA, Harvard, Vancouver, ISO, and other styles

12

Starner, Thad. "Visual recognition of American sign language using hidden Markov models." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/29089.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Adam, Jameel. "Video annotation wiki for South African sign language." Thesis, University of the Western Cape, 2011. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_1540_1304499135.

Full text

Abstract:

The SASL project at the University of the Western Cape aims at developing a fully automated translation system between English and South African Sign Language (SASL). Three important aspects of this system require SASL documentation and knowledge. These are: recognition of SASL from a video sequence, linguistic translation between SASL and English and the rendering of SASL. Unfortunately, SASL documentation is a scarce resource and no official or complete documentation exists. This research focuses on creating an online collaborative video annotation knowledge management system for SASL where various members of the community can upload SASL videos to and annotate them in any of the sign language notation systems, SignWriting, HamNoSys and/or Stokoe. As such, knowledge about SASL structure is pooled into a central and freely accessible knowledge base that can be used as required. The usability and performance of the system were evaluated. The usability of the system was graded by users on a rating scale from one to five for a specific set of tasks. The system was found to have an overall usability of 3.1, slightly better than average. The performance evaluation included load and stress tests which measured the system response time for a number of users for a specific set of tasks. It was found that the system is stable and can scale up to cater for an increasing user base by improving the underlying hardware.

APA, Harvard, Vancouver, ISO, and other styles

14

Feng, Qianli. "Automatic American Sign Language Imitation Evaluator." The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1461233570.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Zhou, Mingjie. "Deep networks for sign language video caption." HKBU Institutional Repository, 2020. https://repository.hkbu.edu.hk/etd_oa/848.

Full text

Abstract:

In the hearing-loss community, sign language is a primary tool to communicate with people while there is a communication gap between hearing-loss people with normal hearing people. Sign language is different from spoken language. It has its own vocabulary and grammar. Recent works concentrate on the sign language video caption which consists of sign language recognition and sign language translation. Continuous sign language recognition, which can bridge the communication gap, is a challenging task because of the weakly supervised ordered annotations where no frame-level label is provided. To overcome this problem, connectionist temporal classification (CTC) is the most widely used method. However, CTC learning could perform badly if the extracted features are not good. For better feature extraction, this thesis presents the novel self-attention-based fully-inception (SAFI) networks for vision-based end-to-end continuous sign language recognition. Considering the length of sign words differs from each other, we introduce the fully inception network with different receptive fields to extract dynamic clip-level features. To further boost the performance, the fully inception network with an auxiliary classifier is trained with aggregation cross entropy (ACE) loss. Then the encoder of self-attention networks as the global sequential feature extractor is used to model the clip-level features with CTC. The proposed model is optimized by jointly training with ACE on clip-level feature learning and CTC on global sequential feature learning in an end-to-end fashion. The best method in the baselines achieves 35.6% WER on the validation set and 34.5% WER on the test set. It employs a better decoding algorithm for generating pseudo labels to do the EM-like optimization to fine-tune the CNN module. In contrast, our approach focuses on the better feature extraction for end-to-end learning. To alleviate the overfitting on the limited dataset, we employ temporal elastic deformation to triple the real-world dataset RWTH- PHOENIX-Weather 2014. Experimental results on the real-world dataset RWTH- PHOENIX-Weather 2014 demonstrate the effectiveness of our approach which achieves 31.7% WER on the validation set and 31.2% WER on the test set. Even though sign language recognition can, to some extent, help bridge the communication gap, it is still organized in sign language grammar which is different from spoken language. Unlike sign language recognition that recognizes sign gestures, sign language translation (SLT) converts sign language to a target spoken language text which normal hearing people commonly use in their daily life. To achieve this goal, this thesis provides an effective sign language translation approach which gains state-of-the-art performance on the largest real-life German sign language translation database, RWTH-PHOENIX-Weather 2014T. Besides, a direct end-to-end sign language translation approach gives out promising results (an impressive gain from 9.94 to 13.75 BLEU and 9.58 to 14.07 BLEU on the validation set and test set) without intermediate recognition annotations. The comparative and promising experimental results show the feasibility of the direct end-to-end SLT

APA, Harvard, Vancouver, ISO, and other styles

16

Holden, Eun-Jung. "Visual recognition of hand motion." University of Western Australia. Dept. of Computer Science, 1997. http://theses.library.uwa.edu.au/adt-WU2003.0007.

Full text

Abstract:

Hand gesture recognition is an active area of research in recent years, being used in various applications from deaf sign recognition systems to human-machine interaction applications. The gesture recognition process, in general, may be divided into two stages: the motion sensing, which extracts useful data from hand motion; and the classification process, which classifies the motion sensing data as gestures. The existing vision-based gesture recognition systems extract 2-D shape and trajectory descriptors from the visual input, and classify them using various classification techniques from maximum likelihood estimation to neural networks, finite state machines, Fuzzy Associative Memory (FAM) or Hidden Markov Models (HMMs). This thesis presents the framework of the vision-based Hand Motion Understanding (HMU) system that recognises static and dynamic Australian Sign Language (Auslan) signs by extracting and classifying 3-D hand configuration data from the visual input. The HMU system is a pioneer gesture recognition system that uses a combination of a 3-D hand tracker for motion sensing, and an adaptive fuzzy expert system for classification. The HMU 3-D hand tracker extracts 3-D hand configuration data that consists of the 21 degrees-of-freedom parameters of the hand from the visual input of a single viewpoint, with an aid of a colour coded glove. The tracker uses a model-based motion tracking algorithm that makes incremental corrections to the 3-D model parameters to re-configure the model to fit the hand posture appearing in the images through the use of a Newton style optimisation technique. Finger occlusions are handled to a certain extent by recovering the missing hand features in the images through the use of a prediction algorithm. The HMU classifier, then, recognises the sequence of 3-D hand configuration data as a sign by using an adaptive fuzzy expert system where the sign knowledge are used as inference rules. The classification is performed in two stages. Firstly, for each image, the classifier recognises Auslan basic hand postures that categorise the Auslan signs like the alphabet in English. Secondly, the sequence of Auslan basic hand postures that appear in the image sequence is analysed and recognised as a sign. Both the posture and sign recognition are performed by the same adaptive fuzzy inference engine. The HMU rule base stores 22 Auslan basic hand postures, and 22 signs. For evaluation, 44 motion sequences (2 for each of the 22 signs) are recorded. Among them, 22 randomly chosen sequences (1 for each of the 22 signs) are used for testing and the rest are used for training. The evaluation shows that before training the HMU system correctly recognised 20 out of 22 signs. After training, with the same test set, the HMU system recognised 21 signs correctly. All of the failed cases did not produce any output. The evaluation has successfully demonstrated the functionality of the combined use of a 3-D hand tracker and an adaptive fuzzy expert for a vision-based sign language recognition.

APA, Harvard, Vancouver, ISO, and other styles

17

Buehler, Patrick. "Automatic learning of British Sign Language from signed TV broadcasts." Thesis, University of Oxford, 2010. http://ora.ox.ac.uk/objects/uuid:2930e980-4307-41bf-b4ff-87e8c4d0d722.

Full text

Abstract:

In this work, we will present several contributions towards automatic recognition of BSL signs from continuous signing video sequences. Specifically, we will address three main points: (i) automatic detection and tracking of the hands using a generative model of the image; (ii) automatic learning of signs from TV broadcasts using the supervisory information available from subtitles; and (iii) generalisation given sign examples from one signer to recognition of signs from different signers. Our source material consists of many hours of video with continuous signing and corresponding subtitles recorded from BBC digital television. This is very challenging material for a number of reasons, including self-occlusions of the signer, self-shadowing, blur due to the speed of motion, and in particular the changing background. Knowledge of the hand position and hand shape is a pre-requisite for automatic sign language recognition. We cast the problem of detecting and tracking the hands as inference in a generative model of the image, and propose a complete model which accounts for the positions and self-occlusions of the arms. Reasonable configurations are obtained by efficiently sampling from a pictorial structure proposal distribution. The results using our method exceed the state-of-the-art for the length and stability of continuous limb tracking. Previous research in sign language recognition has typically required manual training data to be generated for each sign, e.g. a signer performing each sign in controlled conditions - a time-consuming and expensive procedure. We show that for a given signer, a large number of BSL signs can be learned automatically from TV broadcasts using the supervisory information available from subtitles broadcast simultaneously with the signing. We achieve this by modelling the problem as one of multiple instance learning. In this way we are able to extract the sign of interest from hours of signing footage, despite the very weak and "noisy" supervision from the subtitles. Lastly, we show that automatic recognition of signs can be extended to multiple signers. Using automatically extracted examples from a single signer, we train discriminative classifiers and show that these can successfully classify and localise signs in new signers. This demonstrates that the descriptor we extract for each frame (i.e. hand position, hand shape, and hand orientation) generalises between different signers.

APA, Harvard, Vancouver, ISO, and other styles

18

Achmed, Imran. "Independent hand-tracking from a single two-dimensional view and its application to South African sign language recognition." Thesis, University of Western Cape, 2014. http://hdl.handle.net/11394/3330.

Full text

Abstract:

Philosophiae Doctor - PhD
Hand motion provides a natural way of interaction that allows humans to interact not only with the environment, but also with each other. The effectiveness and accuracy of hand-tracking is fundamental to the recognition of sign language. Any inconsistencies in hand-tracking result in a breakdown in sign language communication. Hands are articulated objects, which complicates the tracking thereof. In sign language communication the tracking of hands is often challenged by the occlusion of the other hand, other body parts and the environment in which they are being tracked. The thesis investigates whether a single framework can be developed to track the hands independently of an individual from a single 2D camera in constrained and unconstrained environments without the need for any special device. The framework consists of a three-phase strategy, namely, detection, tracking and learning phases. The detection phase validates whether the object being tracked is a hand, using extended local binary patterns and random forests. The tracking phase tracks the hands independently by extending a novel data-association technique. The learning phase exploits contextual features, using the scale-invariant features transform (SIFT) algorithm and the fast library for approximate nearest neighbours (FLANN) algorithm to assist tracking and the recovering of hands from any form of tracking failure. The framework was evaluated on South African sign language phrases that use a single hand, both hands without occlusion, and both hands with occlusion. These phrases were performed by 20 individuals in constrained and unconstrained environments. The experiments revealed that integrating all three phases to form a single framework is suitable for tracking hands in both constrained and unconstrained environments, where a high average accuracy of 82,08% and 79,83% was achieved respectively.

APA, Harvard, Vancouver, ISO, and other styles

19

Naidoo, Nathan Lyle. "South African sign language recognition using feature vectors and Hidden Markov Models." Thesis, University of the Western Cape, 2010. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_8533_1297923615.

Full text

Abstract:

This thesis presents a system for performing whole gesture recognition for South African Sign Language. The system uses feature vectors combined with Hidden Markov models. In order to constuct a feature vector, dynamic segmentation must occur to extract the signer&rsquo
s hand movements. Techniques and methods for normalising variations that occur when recording a signer performing a gesture, are investigated. The system has a classification rate of 69%

APA, Harvard, Vancouver, ISO, and other styles

20

Ding, Liya. "Modelling and Recognition of Manuals and Non-manuals in American Sign Language." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1237564092.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Blair, James M. "Architectures for Real-Time Automatic Sign Language Recognition on Resource-Constrained Device." UNF Digital Commons, 2018. https://digitalcommons.unf.edu/etd/851.

Full text

Abstract:

Powerful, handheld computing devices have proliferated among consumers in recent years. Combined with new cameras and sensors capable of detecting objects in three-dimensional space, new gesture-based paradigms of human computer interaction are becoming available. One possible application of these developments is an automated sign language recognition system. This thesis reviews the existing body of work regarding computer recognition of sign language gestures as well as the design of systems for speech recognition, a similar problem. Little work has been done to apply the well-known architectural patterns of speech recognition systems to the domain of sign language recognition. This work creates a functional prototype of such a system, applying three architectures seen in speech recognition systems, using a Hidden Markov classifier with 75-90% accuracy. A thorough search of the literature indicates that no cloud-based system has yet been created for sign language recognition and this is the first implementation of its kind. Accordingly, there have been no empirical performance analyses regarding a cloud-based Automatic Sign Language Recognition (ASLR) system, which this research provides. The performance impact of each architecture, as well as the data interchange format, is then measured based on response time, CPU, memory, and network usage across an increasing vocabulary of sign language gestures. The results discussed herein suggest that a partially-offloaded client-server architecture, where feature extraction occurs on the client device and classification occurs in the cloud, is the ideal selection for all but the smallest vocabularies. Additionally, the results indicate that for the potentially large data sets transmitted for 3D gesture classification, a fast binary interchange protocol such as Protobuf has vastly superior performance to a text-based protocol such as JSON.

APA, Harvard, Vancouver, ISO, and other styles

22

Neyra-Gutierrez, Andre, and Pedro Shiguihara-Juarez. "Feature Extraction with Video Summarization of Dynamic Gestures for Peruvian Sign Language Recognition." Institute of Electrical and Electronics Engineers Inc, 2020. http://hdl.handle.net/10757/656630.

Full text

Abstract:

El texto completo de este trabajo no está disponible en el Repositorio Académico UPC por restricciones de la casa editorial donde ha sido publicado.
In peruvian sign language (PSL), recognition of static gestures has been proposed earlier. However, to state a conversation using sign language, it is also necessary to employ dynamic gestures. We propose a method to extract a feature vector for dynamic gestures of PSL. We collect a dataset with 288 video sequences of words related to dynamic gestures and we state a workflow to process the keypoints of the hands, obtaining a feature vector for each video sequence with the support of a video summarization technique. We employ 9 neural networks to test the method, achieving an average accuracy ranging from 80% and 90%, using 10 fold cross-validation.

APA, Harvard, Vancouver, ISO, and other styles

23

Viswavarapu, Lokesh Kumar. "Real-Time Finger Spelling American Sign Language Recognition Using Deep Convolutional Neural Networks." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1404616/.

Full text

Abstract:

This thesis presents design and development of a gesture recognition system to recognize finger spelling American Sign Language hand gestures. We developed this solution using the latest deep learning technique called convolutional neural networks. This system uses blink detection to initiate the recognition process, Convex Hull-based hand segmentation with adaptive skin color filtering to segment hand region, and a convolutional neural network to perform gesture recognition. An ensemble of four convolutional neural networks are trained with a dataset of 25254 images for gesture recognition and a feedback unit called head pose estimation is implemented to validate the correctness of predicted gestures. This entire system was developed using Python programming language and other supporting libraries like OpenCV, Tensor flow and Dlib to perform various image processing and machine learning tasks. This entire application can be deployed as a web application using Flask to make it operating system independent.

APA, Harvard, Vancouver, ISO, and other styles

24

De, Villiers Hendrik Adrianus Cornelis. "A vision-based South African sign language tutor." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/86333.

Full text

Abstract:

Thesis (PhD)--Stellenbosch University, 2014.
ENGLISH ABSTRACT: A sign language tutoring system capable of generating detailed context-sensitive feedback to the user is presented in this dissertation. This stands in contrast with existing sign language tutor systems, which lack the capability of providing such feedback. A domain specific language is used to describe the constraints placed on the user’s movements during the course of a sign, allowing complex constraints to be built through the combination of simpler constraints. This same linguistic description is then used to evaluate the user’s movements, and to generate corrective natural language feedback. The feedback is dynamically tailored to the user’s attempt, and automatically targets that correction which would require the least effort on the part of the user. Furthermore, a procedure is introduced which allows feedback to take the form of a simple to-do list, despite the potential complexity of the logical constraints describing the sign. The system is demonstrated using real video sequences of South African Sign Language signs, exploring the different kinds of advice the system can produce, as well as the accuracy of the comments produced. To provide input for the tutor system, the user wears a pair of coloured gloves, and a video of their attempt is recorded. A vision-based hand pose estimation system is proposed which uses the Earth Mover’s Distance to obtain hand pose estimates from images of the user’s hands. A two-tier search strategy is employed, first obtaining nearest neighbours using a simple, but related, metric. It is demonstrated that the two-tier system’s accuracy approaches that of a global search using only the Earth Mover’s Distance, yet requires only a fraction of the time. The system is shown to outperform a closely related system on a set of 500 real images of gloved hands.
AFRIKAANSE OPSOMMING: ’n Gebaretaaltutorstelsel met die vermo¨e om konteks-sensitiewe terugvoer te lewer aan die gebruiker word uiteengesit in hierdie proefskrif. Hierdie staan in kontras met bestaande tutorstelsels, wat nie hierdie kan bied vir die gebruiker nie. ’n Domein-spesifieke taal word gebruik om beperkinge te definieer op die gebruiker se bewegings deur die loop van ’n gebaar. Komplekse beperkinge kan opgebou word uit eenvoudiger beperkinge. Dieselfde linguistieke beskrywing van die gebaar word gebruik om die gebruiker se bewegings te evalueer, en om korrektiewe terugvoer te genereer in teksvorm. Die terugvoer word dinamies aangepas met betrekking tot die gebruiker se probeerslag, en bepaal outomaties die maklikste manier wat die gebruiker sy/haar fout kan korrigeer. ’n Prosedure word uiteengesit om die terugvoer in ’n eenvoudige lysvorm aan te bied, ongeag die kompleksiteit van die linguistieke beskrywing van die gebaar. Die stelsel word gedemonstreer aan die hand van opnames van gebare uit Suid-Afrikaanse Gebaretaal. Die verskeie tipes terugvoer wat die stelsel kan lewer, asook die akkuraatheid van hierdie terugvoer, word ondersoek. Om vir die tutorstelsel intree te bied, dra die gebruiker ’n stel gekleurde handskoene. ’n Visie-gebaseerde handvormafskattingstelsel wat gebruik maak van die Aardverskuiwersafstand (Earth Mover’s Distance) word voorgestel. ’n Twee-vlak soekstrategie word gebruik. ’n Rowwe afstandsmate word gebruik om ’n stel voorlopige handpostuurkandidate te verkry, waarna die stel verfyn word deur gebruik van die Aardverskuiwersafstand. Dit word gewys dat hierdie benaderde strategie se akkuraatheid grens aan die van eksakte soektogte, maar neem slegs ’n fraksie van die tyd. Toetsing op ’n stel van 500 re¨ele beelde, wys dat hierdie stelsel beter presteer as ’n naverwante stelsel uit die literatuur.

APA, Harvard, Vancouver, ISO, and other styles

25

Potrus, Dani. "Swedish Sign Language Skills Training and Assessment." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209129.

Full text

Abstract:

Sign language is used widely around the world as a first language for those that are unable to use spoken language and by groups of people that have a disability which precludes them from using spoken language (such as a hearing impairment). The importance of effective learning of sign language and its applications in modern computer science has grown widely in the modern aged society and research around sign language recognition has sprouted in many different directions, some examples using hidden markov models (HMMs) to train models to recognize different sign language patterns (Swedish sign language, American sign language, Korean sign language, German sign language and so on). This thesis project researches the assessment and skill efficiency of using a simple video game to learn Swedish sign language for children in the ages within the range of 10 to 11 with no learning disorders, or any health disorders. During the experimental testing, 38 children are divided into two equally sized groups of 19 where each group plays a sign language video game. The context of the video game is the same for both groups, where both listened to a 3D avatar speak to them using both spoken language and sign language. The first group played the game and answered questions given to them by using sign language, whereas the other group answered questions given to them by clicking on an alternative on the video game screen. A week after the children have played the video game, the sign language skills that they have acquired from playing the video game are assessed by simple questions where they are asked to provide some of the signs that they saw during the duration of the video game. The main hypothesis of the project is that the group of children that answered by signing outperforms the other group, in both remembering the signs and executing them correctly. A statistical null hypothesis test is performed on this hypothesis, in which the main hypothesis is confirmed. Lastly, discussions for future research within sign language assessment using video games is described in the final chapter of the thesis.
Teckenspråk används i stor grad runt om i världen som ett modersmål för dom som inte kan använda vardagligt talsspråk och utav grupper av personer som har en funktionsnedsättning (t.ex. en hörselskada). Betydelsen av effektivt lärande av teckenspråk och dess tillämpningar i modern datavetenskap har ökat i stor utsträckning i det moderna samhället, och forskning kring teckenspråklig igenkänning har spirat i många olika riktningar, ett exempel är med hjälp av statistika modeller såsom dolda markovmodeller (eng. Hidden markov models) för att träna modeller för att känna igen olika teckenspråksmönster (bland dessa ingår Svenskt teckenspråk, Amerikanskt teckenspråk, Koreanskt teckenspråk, Tyskt teckenspråk med flera). Denna rapport undersöker bedömningen och skickligheten av att använda ett enkelt teckenspråksspel som har utvecklats för att lära ut enkla Svenska teckenspråksmönster för barn i åldrarna 10 till 11 års ålder som inte har några inlärningssjukdomar eller några problem med allmän hälsa. Under projektets experiment delas 38 barn upp i två lika stora grupper om 19 i vardera grupp, där varje grupp kommer att få spela ett teckenspråksspel. Sammanhanget för spelet är detsamma för båda grupperna, där de får höra och se en tredimensionell figur (eng. 3D Avatar) tala till dom med både talsspråk och teckenspråk. Den första gruppen spelar spelet och svarar på frågor som ges till dem med hjälp av teckenspråk, medan den andra gruppen svarar på frågor som ges till dem genom att klicka på ett av fem alternativ som finns på spelets skärm. En vecka efter att barnen har utfört experimentet med teckenspråksspelet bedöms deras teckenspråkliga färdigheter som de har fått från spelet genom att de ombeds återuppge några av de tecknena som de såg under spelets varaktighet. Rapportens hypotes är att de barn som tillhör gruppen som fick ge teckenspråk som svar till frågorna som ställdes överträffar den andra gruppen, genom att både komma ihåg tecknena och återuppge dom på korrekt sätt. En statistisk hypotesprövning utförs på denna hypotes, där denna i sin tur bekräftas. Slutligen beskrivs det i rapportens sista kapitel om framtida forskning inom teckenspråksbedömning med tv spel och deras effektivitet.

APA, Harvard, Vancouver, ISO, and other styles

26

Segers, Vaughn Mackman. "The efficacy of the Eigenvector approach to South African sign language identification." Thesis, University of the Western Cape, 2010. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_2697_1298280657.

Full text

Abstract:

The communication barriers between deaf and hearing society mean that interaction between these communities is kept to a minimum. The South African Sign Language research group, Integration of Signed and Verbal Communication: South African Sign Language Recognition and Animation (SASL), at the University of the Western Cape aims to create technologies to bridge the communication gap. In this thesis we address the subject of whole hand gesture recognition. We demonstrate a method to identify South African Sign Language classifiers using an eigenvector ap- proach. The classifiers researched within this thesis are based on those outlined by the Thibologa Sign Language Institute for SASL. Gesture recognition is achieved in real- time. Utilising a pre-processing method for image registration we are able to increase the recognition rates for the eigenvector approach.

APA, Harvard, Vancouver, ISO, and other styles

27

Sarella, Kanthi. "An image processing technique for the improvement of Sign2 using a dual camera approach /." Online version of thesis, 2008. http://hdl.handle.net/1850/5721.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Achmed, Imran. "Upper body pose recognition and estimation towards the translation of South African sign language." Thesis, University of the Western Cape, 2011. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_2493_1304504127.

Full text

Abstract:

Recognising and estimating gestures is a fundamental aspect towards translating from a sign language to a spoken language. It is a challenging problem and at the same time, a growing phenomenon in Computer Vision. This thesis presents two approaches, an example-based and a learning-based approach, for performing integrated detection, segmentation and 3D estimation of the human upper body from a single camera view. It investigates whether an upper body pose can be estimated from a database of exemplars with labelled poses. It also investigates whether an upper body pose can be estimated using skin feature extraction, Support Vector Machines (SVM) and a 3D human body model. The example-based and learning-based approaches obtained success rates of 64% and 88%, respectively. An analysis of the two approaches have shown that, although the learning-based system generally performs better than the example-based system, both approaches are suitable to recognise and estimate upper body poses in a South African sign language recognition and translation system.

APA, Harvard, Vancouver, ISO, and other styles

29

Gonzalez, Preciado Matilde. "Computer vision methods for unconstrained gesture recognition in the context of sign language annotation." Toulouse 3, 2012. http://thesesups.ups-tlse.fr/1798/.

Full text

Abstract:

Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important
This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

APA, Harvard, Vancouver, ISO, and other styles

30

Mohamed, Asif, Paul Sujeet, and Vishnu Ullas. "Gauntlet-X1: Smart Glove System for American Sign Language translation using Hand Activity Recognition." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-428743.

Full text

Abstract:

The most common forms of Human Computer Interaction (HCI) devices these dayslike the keyboard, mouse and touch interfaces, are limited to working on atwo-dimensional (2-D) surface, and thus do not provide complete freedom ofaccessibility using our hands. With the vast number of gestures a hand can perform,including the different combinations of motion of fingers, wrist and elbow, we canmake accessibility and interaction with the digital environment much more simplified,without restrictions to the physical surface. Fortunately, this is possible due to theadvancements of Microelectromechanical systems (MEMS) manufacturing of sensors,reducing the size of a sensor to the size of a fingernail.In this thesis we document the design and development of a smart glove systemcomprising of Inertial Measurement Units (IMU) sensors that recognize handactivity/gestures using combinations of neural networks and deep learning techniquessuch as Long Short-Term Memory (LSTM) and Convolutional Neural Network(CNN). This peripheral device is named as the Gauntlet-X1, X1 to denote thecurrent prototype version of the device. The system captures IMU data and interfaceswith the host server. In order to demonstrate this prototype as a proof of concept,we integrate to Android mobile applications based on 3-D interactivity like theAmerican Sign Language(ASL), Augmented Reality (AR)/Virtual Reality (VR)applications and can be extended to further the use of HCI technology.

APA, Harvard, Vancouver, ISO, and other styles

31

Jacobs, Kurt. "South African Sign Language Hand Shape and Orientation Recognition on Mobile Devices Using Deep Learning." University of the Western Cape, 2017. http://hdl.handle.net/11394/5647.

Full text

Abstract:

>Magister Scientiae - MSc
In order to classify South African Sign Language as a signed gesture, five fundamental parameters need to be considered. These five parameters to be considered are: hand shape, hand orientation, hand motion, hand location and facial expressions. The research in this thesis will utilise Deep Learning techniques, specifically Convolutional Neural Networks, to recognise hand shapes in various hand orientations. The research will focus on two of the five fundamental parameters, i.e., recognising six South African Sign Language hand shapes for each of five different hand orientations. These hand shape and orientation combinations will be recognised by means of a video stream captured on a mobile device. The efficacy of Convolutional Neural Network for gesture recognition will be judged with respect to its classification accuracy and classification speed in both a desktop and embedded context. The research methodology employed to carry out the research was Design Science Research. Design Science Research refers to a set of analytical techniques and perspectives for performing research in the field of Information Systems and Computer Science. Design Science Research necessitates the design of an artefact and the analysis thereof in order to better understand its behaviour in the context of Information Systems or Computer Science.
National Research Foundation (NRF)

APA, Harvard, Vancouver, ISO, and other styles

32

Yang, Ruiduo. "Dynamic programming with multiple candidates and its applications to sign language and hand gesture recognition." [Tampa, Fla.] : University of South Florida, 2008. http://purl.fcla.edu/usf/dc/et/SFE0002310.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Parashar, Ayush S. "Representation and Interpretation of Manual and Non-Manual Information for Automated American Sign Language Recognition." [Tampa, Fla.] : University of South Florida, 2003. http://purl.fcla.edu/fcla/etd/SFE0000055.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Rajah, Christopher. "Chereme-based recognition of isolated, dynamic gestures from South African sign language with Hidden Markov Models." Thesis, University of the Western Cape, 2006. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_4979_1183461652.

Full text

Abstract:

Much work has been done in building systems that can recognize gestures, e.g. as a component of sign language recognition systems. These systems typically use whole gestures as the smallest unit for recognition. Although high recognition rates have been reported, these systems do not scale well and are computationally intensive. The reason why these systems generally scale poorly is that they recognize gestures by building individual models for each separate gesture
as the number of gestures grows, so does the required number of models. Beyond a certain threshold number of gestures to be recognized, this approach become infeasible. This work proposed that similarly good recognition rates can be achieved by building models for subcomponents of whole gestures, so-called cheremes. Instead of building models for entire gestures, we build models for cheremes and recognize gestures as sequences of such cheremes. The assumption is that many gestures share cheremes and that the number of cheremes necessary to describe gestures is much smaller than the number of gestures. This small number of cheremes then makes it possible to recognized a large number of gestures with a small number of chereme models. This approach is akin to phoneme-based speech recognition systems where utterances are recognized as phonemes which in turn are combined into words.

APA, Harvard, Vancouver, ISO, and other styles

35

Halvardsson, Gustaf, and Johanna Peterson. "Interpretation of Swedish Sign Language using Convolutional Neural Networks and Transfer Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277859.

Full text

Abstract:

The automatic interpretation of signs of a sign language involves image recognition. An appropriate approach for this task is to use Deep Learning, and in particular, Convolutional Neural Networks. This method typically needs large amounts of data to be able to perform well. Transfer learning could be a feasible approach to achieve high accuracy despite using a small data set. The hypothesis of this thesis is to test if transfer learning works well to interpret the hand alphabet of the Swedish Sign Language. The goal of the project is to implement a model that can interpret signs, as well as to build a user-friendly web application for this purpose. The final testing accuracy of the model is 85%. Since this accuracy is comparable to those received in other studies, the project’s hypothesis is shown to be supported. The final network is based on the pre-trained model InceptionV3 with five frozen layers, and the optimization algorithm mini-batch gradient descent with a batch size of 32, and a step-size factor of 1.2. Transfer learning is used, however, not to the extent that the network became too specialized in the pre-trained model and its data. The network has shown to be unbiased for diverse testing data sets. Suggestions for future work include integrating dynamic signing data to interpret words and sentences, evaluating the method on another sign language’s hand alphabet, and integrate dynamic interpretation in the web application for several letters or words to be interpreted after each other. In the long run, this research could benefit deaf people who have access to technology and enhance good health, quality education, decent work, and reduced inequalities.
Automatisk tolkning av tecken i ett teckenspråk involverar bildigenkänning. Ett ändamålsenligt tillvägagångsätt för denna uppgift är att använda djupinlärning, och mer specifikt, Convolutional Neural Networks. Denna metod behöver generellt stora mängder data för att prestera väl. Därför kan transfer learning vara en rimlig metod för att nå en hög precision trots liten mängd data. Avhandlingens hypotes är att utvärdera om transfer learning fungerar för att tolka det svenska teckenspråkets handalfabet. Målet med projektet är att implementera en modell som kan tolka tecken, samt att bygga en användarvänlig webapplikation för detta syfte. Modellen lyckas klassificera 85% av testinstanserna korrekt. Då denna precision är jämförbar med de från andra studier, tyder det på att projektets hypotes är korrekt. Det slutgiltiga nätverket baseras på den förtränade modellen InceptionV3 med fem frysta lager, samt optimiseringsalgoritmen mini-batch gradient descent med en batchstorlek på 32 och en stegfaktor på 1,2. Transfer learning användes, men däremot inte till den nivå så att nätverket blev för specialiserat på den förtränade modellen och dess data. Nätverket har visat sig vara ickepartiskt för det mångfaldiga testningsdatasetet. Förslag på framtida arbeten inkluderar att integrera dynamisk teckendata för att kunna tolka ord och meningar, evaluera metoden på andra teckenspråkshandalfabet, samt att integrera dynamisk tolkning i webapplikationen så flera bokstäver eller ord kan tolkas efter varandra. I det långa loppet kan denna studie gagna döva personer som har tillgång till teknik, och därmed öka chanserna för god hälsa, kvalitetsundervisning, anständigt arbete och minskade ojämlikheter.

APA, Harvard, Vancouver, ISO, and other styles

36

Ghaziasgar, Mehrdad. "The use of mobile phones as service-delivery devices in sign language machine translation system." Thesis, University of the Western Cape, 2010. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_7216_1299134611.

Full text

Abstract:

This thesis investigates the use of mobile phones as service-delivery devices in a sign language machine translation system. Four sign language visualization methods were evaluated on mobile phones. Three of the methods were synthetic sign language visualization methods. Three factors were considered: the intelligibility of sign language, as rendered by the method
the power consumption
and the bandwidth usage associated with each method. The average intelligibility rate was 65%, with some methods achieving intelligibility rates of up to 92%. The average le size was 162 KB and, on average, the power consumption increased to 180% of the idle state, across all methods. This research forms part of the Integration of Signed and Verbal Communication: South African Sign Language Recognition and Animation (SASL) project at the University of the Western Cape and serves as an integration platform for the group's research. In order to perform this research a machine translation system that uses mobile phones as service-delivery devices was developed as well as a 3D Avatar for mobile phones. It was concluded that mobile phones are suitable service-delivery platforms for sign language machine translation systems.

APA, Harvard, Vancouver, ISO, and other styles

37

Glatt, Ruben [UNESP]. "Deep learning architecture for gesture recognition." Universidade Estadual Paulista (UNESP), 2014. http://hdl.handle.net/11449/115718.

Full text

Abstract:

Made available in DSpace on 2015-03-03T11:52:29Z (GMT). No. of bitstreams: 0 Previous issue date: 2014-07-25Bitstream added on 2015-03-03T12:06:38Z : No. of bitstreams: 1 000807195.pdf: 2462524 bytes, checksum: 91686fbe11c74337c40fe57671eb8d82 (MD5)
O reconhecimento de atividade de visão de computador desempenha um papel importante na investigação para aplicações como interfaces humanas de computador, ambientes inteligentes, vigilância ou sistemas médicos. Neste trabalho, é proposto um sistema de reconhecimento de gestos com base em uma arquitetura de aprendizagem profunda. Ele é usado para analisar o desempenho quando treinado com os dados de entrada multi-modais em um conjunto de dados de linguagem de sinais italiana. A área de pesquisa subjacente é um campo chamado interação homem-máquina. Ele combina a pesquisa sobre interfaces naturais, reconhecimento de gestos e de atividade, aprendizagem de máquina e tecnologias de sensores que são usados para capturar a entrada do meio ambiente para processamento posterior. Essas áreas são introduzidas e os conceitos básicos são descritos. O ambiente de desenvolvimento para o pré-processamento de dados e algoritmos de aprendizagem de máquina programada em Python é descrito e as principais bibliotecas são discutidas. A coleta dos fluxos de dados é explicada e é descrito o conjunto de dados utilizado. A arquitetura proposta de aprendizagem consiste em dois passos. O pré-processamento dos dados de entrada e a arquitetura de aprendizagem. O pré-processamento é limitado a três estratégias diferentes, que são combinadas para oferecer seis diferentes perfis de préprocessamento. No segundo passo, um Deep Belief Network é introduzido e os seus componentes são explicados. Com esta definição, 294 experimentos são realizados com diferentes configurações. As variáveis que são alteradas são as definições de pré-processamento, a estrutura de camadas do modelo, a taxa de aprendizagem de pré-treino e a taxa de aprendizagem de afinação. A avaliação dessas experiências mostra que a abordagem de utilização de uma arquitetura ... (Resumo completo, clicar acesso eletrônico abaixo)
Activity recognition from computer vision plays an important role in research towards applications like human computer interfaces, intelligent environments, surveillance or medical systems. In this work, a gesture recognition system based on a deep learning architecture is proposed. It is used to analyze the performance when trained with multi-modal input data on an Italian sign language dataset. The underlying research area is a field called human-machine interaction. It combines research on natural user interfaces, gesture and activity recognition, machine learning and sensor technologies, which are used to capture the environmental input for further processing. Those areas are introduced and the basic concepts are described. The development environment for preprocessing data and programming machine learning algorithms with Python is described and the main libraries are discussed. The gathering of the multi-modal data streams is explained and the used dataset is outlined. The proposed learning architecture consists of two steps. The preprocessing of the input data and the actual learning architecture. The preprocessing is limited to three different strategies, which are combined to offer six different preprocessing profiles. In the second step, a Deep Belief network is introduced and its components are explained. With this setup, 294 experiments are conducted with varying configuration settings. The variables that are altered are the preprocessing settings, the layer structure of the model, the pretraining and the fine-tune learning rate. The evaluation of these experiments show that the approach of using a deep learning architecture on an activity or gesture recognition task yields acceptable results, but has not yet reached a level of maturity, which would allow to use the developed models in serious applications.

APA, Harvard, Vancouver, ISO, and other styles

38

Glatt, Ruben. "Deep learning architecture for gesture recognition /." Guaratinguetá, 2014. http://hdl.handle.net/11449/115718.

Full text

Abstract:

Orientador: José Celso Freire Junior
Coorientador: Daniel Julien Barros da Silva Sampaio
Banca: Galeno José de Sena
Banca: Luiz de Siqueira Martins Filho
Resumo: O reconhecimento de atividade de visão de computador desempenha um papel importante na investigação para aplicações como interfaces humanas de computador, ambientes inteligentes, vigilância ou sistemas médicos. Neste trabalho, é proposto um sistema de reconhecimento de gestos com base em uma arquitetura de aprendizagem profunda. Ele é usado para analisar o desempenho quando treinado com os dados de entrada multi-modais em um conjunto de dados de linguagem de sinais italiana. A área de pesquisa subjacente é um campo chamado interação homem-máquina. Ele combina a pesquisa sobre interfaces naturais, reconhecimento de gestos e de atividade, aprendizagem de máquina e tecnologias de sensores que são usados para capturar a entrada do meio ambiente para processamento posterior. Essas áreas são introduzidas e os conceitos básicos são descritos. O ambiente de desenvolvimento para o pré-processamento de dados e algoritmos de aprendizagem de máquina programada em Python é descrito e as principais bibliotecas são discutidas. A coleta dos fluxos de dados é explicada e é descrito o conjunto de dados utilizado. A arquitetura proposta de aprendizagem consiste em dois passos. O pré-processamento dos dados de entrada e a arquitetura de aprendizagem. O pré-processamento é limitado a três estratégias diferentes, que são combinadas para oferecer seis diferentes perfis de préprocessamento. No segundo passo, um Deep Belief Network é introduzido e os seus componentes são explicados. Com esta definição, 294 experimentos são realizados com diferentes configurações. As variáveis que são alteradas são as definições de pré-processamento, a estrutura de camadas do modelo, a taxa de aprendizagem de pré-treino e a taxa de aprendizagem de afinação. A avaliação dessas experiências mostra que a abordagem de utilização de uma arquitetura ... (Resumo completo, clicar acesso eletrônico abaixo)
Abstract: Activity recognition from computer vision plays an important role in research towards applications like human computer interfaces, intelligent environments, surveillance or medical systems. In this work, a gesture recognition system based on a deep learning architecture is proposed. It is used to analyze the performance when trained with multi-modal input data on an Italian sign language dataset. The underlying research area is a field called human-machine interaction. It combines research on natural user interfaces, gesture and activity recognition, machine learning and sensor technologies, which are used to capture the environmental input for further processing. Those areas are introduced and the basic concepts are described. The development environment for preprocessing data and programming machine learning algorithms with Python is described and the main libraries are discussed. The gathering of the multi-modal data streams is explained and the used dataset is outlined. The proposed learning architecture consists of two steps. The preprocessing of the input data and the actual learning architecture. The preprocessing is limited to three different strategies, which are combined to offer six different preprocessing profiles. In the second step, a Deep Belief network is introduced and its components are explained. With this setup, 294 experiments are conducted with varying configuration settings. The variables that are altered are the preprocessing settings, the layer structure of the model, the pretraining and the fine-tune learning rate. The evaluation of these experiments show that the approach of using a deep learning architecture on an activity or gesture recognition task yields acceptable results, but has not yet reached a level of maturity, which would allow to use the developed models in serious applications.
Mestre

APA, Harvard, Vancouver, ISO, and other styles

39

de, la Cruz Nathan. "Autonomous facial expression recognition using the facial action coding system." University of the Western Cape, 2016. http://hdl.handle.net/11394/5121.

Full text

Abstract:

>Magister Scientiae - MSc
The South African Sign Language research group at the University of the Western Cape is in the process of creating a fully-edged machine translation system to automatically translate between South African Sign Language and English. A major component of the system is the ability to accurately recognise facial expressions, which are used to convey emphasis, tone and mood within South African Sign Language sentences. Traditionally, facial expression recognition research has taken one of two paths: either recognising whole facial expressions of which there are six i.e. anger, disgust, fear, happiness, sadness, surprise, as well as the neutral expression; or recognising the fundamental components of facial expressions as defined by the Facial Action Coding System in the form of Action Units. Action Units are directly related to the motion of specific muscles in the face, combinations of which are used to form any facial expression. This research investigates enhanced recognition of whole facial expressions by means of a hybrid approach that combines traditional whole facial expression recognition with Action Unit recognition to achieve an enhanced classification approach.

APA, Harvard, Vancouver, ISO, and other styles

40

Koller, Oscar Anatol Tobias [Verfasser], Hermann [Akademischer Betreuer] Ney, and Richard [Akademischer Betreuer] Bowden. "Towards large vocabulary continuous sign language recognition: from artificial to real-life tasks / Oscar Tobias Anatol Koller ; Hermann Ney, Richard Bowden." Aachen : Universitätsbibliothek der RWTH Aachen, 2020. http://d-nb.info/1233315951/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Clark, Evan M. "A multicamera system for gesture tracking with three dimensional hand pose estimation /." Link to online version, 2006. https://ritdml.rit.edu/dspace/handle/1850/1909.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Freitas, Fernando de Almeida. "Reconhecimento automático de expressões faciais gramaticais na língua brasileira de sinais." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-10072015-100311/.

Full text

Abstract:

O Reconhecimento das Expressões Faciais tem atraído bastante a atenção dos pesquisadores nas últimas décadas, principalmente devido às suas ponteciais aplicações. Nas Línguas de Sinais, por serem línguas de modalidade visual-espacial e não contarem com o suporte sonoro da entonação, as Expressões Faciais ganham uma importância ainda maior, pois colaboram também para formar a estrutura gramatical da língua. Tais expressões são chamadas Expressões Faciais Gramaticais e estão presentes nos níveis morfológico e sintático das Línguas de Sinais, elas ganham destaque no processo de reconhecimento automático das Línguas de Sinais, pois colaboram para retirada de ambiguidades entre sinais que possuem parâmetros semelhantes, como configuração de mãos e ponto de articulação, além de colaborarem na composição do sentido semântico das sentenças. Assim, este projeto de pesquisa em nível de mestrado tem por objetivo desenvolver um conjunto de modelos de reconhecimento de padrões capazes de resolver o problema de intepretação automática de Expressões Faciais Gramaticais, usadas no contexto da Língua de Sinais Brasileira (Libras), considerando-as em Nível Sintático.
The facial expression recognition has attracted most of the researchers attention over the last years, because of that it can be very useful in many applications. The Sign Language is a spatio-visual language and it does not have the speech intonation support, so Facial Expression gain relative importance to convey grammatical information in a signed sentence and they contributed to morphological and/or syntactic level to a Sign Language. Those expressions are called Grammatical Facial Expression and they cooperate to solve the ambiguity between signs and give meaning to sentences. Thus, this research project aims to develop models that make possible to recognize automatically Grammatical Facial Expressions from Brazilian Sign Language (Libras)

APA, Harvard, Vancouver, ISO, and other styles

43

Mekala, Priyanka. "Field Programmable Gate Array Based Target Detection and Gesture Recognition." FIU Digital Commons, 2012. http://digitalcommons.fiu.edu/etd/723.

Full text

Abstract:

The move from Standard Definition (SD) to High Definition (HD) represents a six times increases in data, which needs to be processed. With expanding resolutions and evolving compression, there is a need for high performance with flexible architectures to allow for quick upgrade ability. The technology advances in image display resolutions, advanced compression techniques, and video intelligence. Software implementation of these systems can attain accuracy with tradeoffs among processing performance (to achieve specified frame rates, working on large image data sets), power and cost constraints. There is a need for new architectures to be in pace with the fast innovations in video and imaging. It contains dedicated hardware implementation of the pixel and frame rate processes on Field Programmable Gate Array (FPGA) to achieve the real-time performance. The following outlines the contributions of the dissertation. (1) We develop a target detection system by applying a novel running average mean threshold (RAMT) approach to globalize the threshold required for background subtraction. This approach adapts the threshold automatically to different environments (indoor and outdoor) and different targets (humans and vehicles). For low power consumption and better performance, we design the complete system on FPGA. (2) We introduce a safe distance factor and develop an algorithm for occlusion occurrence detection during target tracking. A novel mean-threshold is calculated by motion-position analysis. (3) A new strategy for gesture recognition is developed using Combinational Neural Networks (CNN) based on a tree structure. Analysis of the method is done on American Sign Language (ASL) gestures. We introduce novel point of interests approach to reduce the feature vector size and gradient threshold approach for accurate classification. (4) We design a gesture recognition system using a hardware/ software co-simulation neural network for high speed and low memory storage requirements provided by the FPGA. We develop an innovative maximum distant algorithm which uses only 0.39% of the image as the feature vector to train and test the system design. Database set gestures involved in different applications may vary. Therefore, it is highly essential to keep the feature vector as low as possible while maintaining the same accuracy and performance

APA, Harvard, Vancouver, ISO, and other styles

44

Borgia, Fabrizio. "Informatisation d'une forme graphique des Langues des Signes : application au système d'écriture SignWriting." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30030/document.

Full text

Abstract:

Les recherches et les logiciels présentés dans cette étude s'adressent à une importante minorité au sein de notre société, à savoir la communauté des sourdes. De nombreuses recherches démontrent que les sourdes se heurtent à de grosses difficultés avec la langue vocale, ce qui explique pourquoi la plu- part d'entre eux préfère communiquer dans la langue des signes. Du point de vue des sciences de l'information, les LS constituent un groupe de minorités linguistiques peu représentées dans l'univers du numérique. Et, de fait, les sourds sont les sujets les plus touchés par la fracture numérique. Cette étude veut donc être une contribution pour tenter de resserrer cette fracture numérique qui pénalise les sourdes. Pour ce faire, nous nous sommes principalement concentrés sur l'informatisation de SignWriting, qui constitue l'un des systèmes les plus prometteurs pour écrire la LS
The studies and the software presented in this work are addressed to a relevant minority of our society, namely deaf people. Many studies demonstrate that, for several reasons, deaf people experience significant difficulties in exploiting a Vocal Language (VL English, Chinese, etc.). In fact, many of them prefer to communicate using Sign Language (SL). As computer scientists, we observed that SLs are currently a set of underrepresented linguistic minorities in the digital world. As a matter of fact, deaf people are among those individuals which are mostly affected by the digital divide. This work is our contribution towards leveling the digital divide affecting deaf people. In particular, we focused on the computer handling of SignWriting, which is one of the most promising systems devised to write SLs

APA, Harvard, Vancouver, ISO, and other styles

45

Teodoro, Beatriz Tomazela. "Sistema de reconhecimento automático de Língua Brasileira de Sinais." Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-20122015-212746/.

Full text

Abstract:

O reconhecimento de língua de sinais é uma importante área de pesquisa que tem como objetivo atenuar os obstáculos impostos no dia a dia das pessoas surdas e/ou com deficiência auditiva e aumentar a integração destas pessoas na sociedade majoritariamente ouvinte em que vivemos. Baseado nisso, esta dissertação de mestrado propõe o desenvolvimento de um sistema de informação para o reconhecimento automático de Língua Brasileira de Sinais (LIBRAS), que tem como objetivo simplificar a comunicação entre surdos conversando em LIBRAS e ouvintes que não conheçam esta língua de sinais. O reconhecimento é realizado por meio do processamento de sequências de imagens digitais (vídeos) de pessoas se comunicando em LIBRAS, sem o uso de luvas coloridas e/ou luvas de dados e sensores ou a exigência de gravações de alta qualidade em laboratórios com ambientes controlados, focando em sinais que utilizam apenas as mãos. Dada a grande dificuldade de criação de um sistema com este propósito, foi utilizada uma abordagem para o seu desenvolvimento por meio da divisão em etapas. Considera-se que todas as etapas do sistema proposto são contribuições para trabalhos futuros da área de reconhecimento de sinais, além de poderem contribuir para outros tipos de trabalhos que envolvam processamento de imagens, segmentação de pele humana, rastreamento de objetos, entre outros. Para atingir o objetivo proposto foram desenvolvidas uma ferramenta para segmentar sequências de imagens relacionadas à LIBRAS e uma ferramenta para identificar sinais dinâmicos nas sequências de imagens relacionadas à LIBRAS e traduzi-los para o português. Além disso, também foi construído um banco de imagens de 30 palavras básicas escolhidas por uma especialista em LIBRAS, sem a utilização de luvas coloridas, laboratórios com ambientes controlados e/ou imposição de exigências na vestimenta dos indivíduos que executaram os sinais. O segmentador implementado e utilizado neste trabalho atingiu uma taxa média de acurácia de 99,02% e um índice overlap de 0,61, a partir de um conjunto de 180 frames pré-processados extraídos de 18 vídeos gravados para a construção do banco de imagens. O algoritmo foi capaz de segmentar pouco mais de 70% das amostras. Quanto à acurácia para o reconhecimento das palavras, o sistema proposto atingiu 100% de acerto para reconhecer as 422 amostras de palavras do banco de imagens construído, as quais foram segmentadas a partir da combinação da técnica de distância de edição e um esquema de votação com um classificador binário para realizar o reconhecimento, atingindo assim, o objetivo proposto neste trabalho com êxito.
The recognition of sign language is an important research area that aims to mitigate the obstacles in the daily lives of people who are deaf and/or hard of hearing and increase their integration in the majority hearing society in which we live. Based on this, this dissertation proposes the development of an information system for automatic recognition of Brazilian Sign Language (BSL), which aims to simplify the communication between deaf talking in BSL and listeners who do not know this sign language. The recognition is accomplished through the processing of digital image sequences (videos) of people communicating in BSL without the use of colored gloves and/or data gloves and sensors or the requirement of high quality recordings in laboratories with controlled environments focusing on signals using only the hands. Given the great difficulty of setting up a system for this purpose, an approach divided in several stages was used. It considers that all stages of the proposed system are contributions for future works of sign recognition area, and can contribute to other types of works involving image processing, human skin segmentation, object tracking, among others. To achieve this purpose we developed a tool to segment sequences of images related to BSL and a tool for identifying dynamic signals in the sequences of images related to the BSL and translate them into portuguese. Moreover, it was also built an image bank of 30 basic words chosen by a BSL expert without the use of colored gloves, laboratory-controlled environments and/or making of the dress of individuals who performed the signs. The segmentation algorithm implemented and used in this study had a average accuracy rate of 99.02% and an overlap of 0.61, from a set of 180 preprocessed frames extracted from 18 videos recorded for the construction of database. The segmentation algorithm was able to target more than 70% of the samples. Regarding the accuracy for recognizing words, the proposed system reached 100% accuracy to recognize the 422 samples from the database constructed (the ones that were segmented), using a combination of the edit distance technique and a voting scheme with a binary classifier to carry out the recognition, thus reaching the purpose proposed in this work successfully.

APA, Harvard, Vancouver, ISO, and other styles

46

Cardoso, Maria Eduarda de Araújo. "Segmentação automática de Expressões Faciais Gramaticais com Multilayer Perceptrons e Misturas de Especialistas." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-25112018-203224/.

Full text

Abstract:

O reconhecimento de expressões faciais é uma área de interesse da ciência da computação e tem sido um atrativo para pesquisadores de diferentes áreas, pois tem potencial para promover o desenvolvimento de diferentes tipos de aplicações. Reconhecer automaticamente essas expressões tem se tornado um objetivo, principalmente na área de análise do comportamento humano. Especialmente para estudo das línguas de sinais, a análise das expressões faciais é importante para a interpretação do discurso, pois é o elemento que permite expressar informação prosódica, suporta o desenvolvimento da estrutura gramatical e semântica da língua, e ajuda na formação de sinais com outros elementos básicos da língua. Nesse contexto, as expressões faciais são chamadas de expressões faciais gramaticais e colaboram na composição no sentido semântico das sentenças. Entre as linhas de estudo que exploram essa temática, está aquela que pretende implementar a análise automática da língua de sinais. Para aplicações com objetivo de interpretar línguas de sinais de forma automatizada, é preciso que tais expressões sejam identificadas no curso de uma sinalização, e essa tarefa dá-se é definida como segmentação de expressões faciais gramaticais. Para essa área, faz-se útil o desenvolvimento de uma arquitetura capaz de realizar a identificação de tais expressões em uma sentença, segmentando-a de acordo com cada tipo diferente de expressão usada em sua construção. Dada a necessidade do desenvolvimento dessa arquitetura, esta pesquisa apresenta: uma análise dos estudos na área para levantar o estado da arte; a implementação de algoritmos de reconhecimento de padrões usando Multilayer Perceptron e misturas de especialistas para a resolução do problema de reconhecimento da expressão facial; a comparação desses algoritmos reconhecedores das expressões faciais gramaticais usadas na concepção de sentenças na Língua Brasileira de Sinais (Libras). A implementação e teste dos algoritmos mostraram que a segmentação automática de expressões faciais gramaticais é viável em contextos dependentes do usuários. Para contextos independentes de usuários, o problema de segmentação de expressões faciais representa um desafio que requer, principalmente, a organização de um ambiente de aprendizado estruturado sobre um conjunto de dados com volume e diversidade maior do que os atualmente disponíveis
The recognition of facial expressions is an area of interest in computer science and has been an attraction for researchers in different fields since it has potential for development of different types of applications. Automatically recognizing these expressions has become a goal primarily in the area of human behavior analysis. Especially for the study of sign languages, the analysis of facial expressions represents an important factor for the interpretation of discourse, since it is the element that allows expressing prosodic information, supports the development of the grammatical and semantic structure of the language, and eliminates ambiguities between similar signs. In this context, facial expressions are called grammatical facial expressions. These expressions collaborate in the semantic composition of the sentences. Among the lines of study that explore this theme is the one that intends to implement the automatic analysis of sign language. For applications aiming to interpret signal languages in an automated way, it is necessary that such expressions be identified in the course of a signaling, and that task is called \"segmentation of grammatical facial expressions\'\'. For this area, it is useful to develop an architecture capable of performing the identification of such expressions in a sentence, segmenting it according to each different type of expression used in its construction. Given the need to develop this architecture, this research presents: a review of studies already carried out in the area; the implementation of pattern recognition algorithms using Multilayer Perceptron and mixtures of experts to solve the facial expression recognition problem; the comparison of these algorithms as recognizers of grammatical facial expressions used in the conception of sentences in the Brazilian Language of Signs (Libras). The implementation and tests carried out with such algorithms showed that the automatic segmentation of grammatical facial expressions is practicable in user-dependent contexts. Regarding user-independent contexts, this is a challenge which demands the organization of a learning environment structured on datasets bigger and more diversified than those current available

APA, Harvard, Vancouver, ISO, and other styles

47

Silva, Renato Kimura da. "Interfaces naturais e o reconhecimento das línguas de sinais." Pontifícia Universidade Católica de São Paulo, 2013. https://tede2.pucsp.br/handle/handle/18125.

Full text

Abstract:

Made available in DSpace on 2016-04-29T14:23:20Z (GMT). No. of bitstreams: 1 Renato Kimura da Silva.pdf: 3403382 bytes, checksum: 99bab2a00a7da4496b0eea8ad640d9bf (MD5) Previous issue date: 2013-06-07
Interface is an intermediate layer between two faces. In the computational context, we could say that the interface exists on the interactive intermediation between two subjects, or between subject and program. Over the years, the interfaces have evolved constantly: from the monochromatic text lines to the mouse with the exploratory concept of graphic interfaces to the more recent natural interfaces ubique and that aims the interactive transparency. In the new interfaces, through the use of body, the user can interact with the computer. Today is not necessary to learn the interface, or the use of these interfaces is more intuitive, with recognition of voice, face and gesture. This technology advance fits well to basic needs from the individuals, like communication. With the evolution of the devices and the interfaces, is more feasible conceive new technologies that benefits people in different spheres. The contribution of this work lays on understanding the technical scenario that allow thinking and conceiving natural interfaces for the signal recognition of Sign Languages and considerable part of its grammar. To do so, this research was guided primarily in the study of the development of computer interfaces and their close relationship with videogames, basing on the contributions of authors such as Pierre Lévy, Sherry Turkle, Janet Murray and Louise Poissant. Thereafter, we approach to authors as William Stokoe, Scott Liddell, Ray Birdwhistell, Lucia Santaella and Winfried Nöth, concerning general and specific themes spanning the multidisciplinarity of Sign Languages. Finally, a research was made of State of Art of Natural Interfaces focused on the recognition of Sign Languages, besides the remarkable research study related to the topic, presenting possible future paths to be followed by new lines of multidisciplinary research
Interface é uma camada intermediária que está entre duas faces. No contexto computacional, podemos dizer que interface existe na intermediação interativa entre dois sujeitos, ou ainda entre sujeito e programa. Ao longo dos anos, as interfaces vêm evoluído constantemente: das linhas de texto monocromáticas, aos mouses com o conceito exploratório da interface gráfica até as mais recentes interfaces naturais ubíquas e que objetivam a transparência da interação. Nas novas interfaces, por meio do uso do corpo, o usuário interage com o computador, não sendo necessário aprender a interface. Seu uso é mais intuitivo, com o reconhecimento da voz, da face e dos gestos. O avanço tecnológico vai de encontro com necessidades básicas do indivíduo, como a comunicação, tornando-se factível conceber novas tecnologias que beneficiam pessoas em diferentes esferas. A contribuição desse trabalho está em entender o cenário técnico que possibilita idealizar e criar interfaces naturais para o reconhecimento dos signos das Línguas de Sinais e considerável parte de sua gramática. Para tanto, essa pesquisa foi primeiramente pautada no estudo do desenvolvimento das interfaces computacionais e da sua estreita relação com os videogames, fundamentando-se nas contribuições de autores como Pierre Lévy, Sherry Turkle, Janet Murray e Louise Poissant. Em momento posterior, aproximamo-nos de autores como William Stokoe, Scott Liddell, Ray Birdwhistell, Lúcia Santaella e Winfried Nöth, a respeito de temas gerais e específicos que abarcam a multidisciplinaridade das Línguas de Sinais. Por fim, foi realizado um levantamento do Estado da Arte das Interfaces Naturais voltadas ao Reconhecimento das Línguas de Sinais, além do estudo de pesquisas notáveis relacionadas ao tema, apresentando possíveis caminhos futuros a serem trilhados por novas linhas de pesquisa multidisciplinares

APA, Harvard, Vancouver, ISO, and other styles

48

Anjo, Mauro dos Santos. "Avaliação das técnicas de segmentação, modelagem e classificação para o reconhecimento automático de gestos e proposta de uma solução para classificar gestos da libras em tempo real." Universidade Federal de São Carlos, 2013. https://repositorio.ufscar.br/handle/ufscar/523.

Full text

Abstract:

Made available in DSpace on 2016-06-02T19:06:03Z (GMT). No. of bitstreams: 1 4988.pdf: 3663610 bytes, checksum: 1eb03927c23747c4a6420de5624f8571 (MD5) Previous issue date: 2013-10-22
Universidade Federal de Sao Carlos
Multimodal interfaces are becoming popular and trying to enhance user experience through the use of natural forms of interaction. Among these forms we have speech and gestures inputs. Speech recognition is already a common feature in our daily basis but gesture recognition has just now being widely used as a new form of interaction. The Brazilian Sign Language (Libras) was recently recognized as a legal way of communication since the Brazilian Government enacted the law N˚10.436 on 04/24/2002, and also has recently became an obligatory subject in teachers education and an elective subject in undergraduate courses through the enactment N˚5.626 on 12/22/2005. In this context, this dissertation presents a study of all the steps that are necessary to achieve a complete system to recognize Static and Dynamic gestures of Libras, being these steps: Segmentation; Modeling and Interpretation; and Classification. Results and proposed solutions will be presented for each one of these steps, and the system will be evaluated in the task of real-time recognition of static and dyamic gestures within a finite set of Libras gestures. All the solutions presented in this dissertation were embedded in the software GestureUI, in which the main goal is to simplify the research in the field of gesture recognition allowing the communication with multimodal interfaces through a TCP/IP protocol.
Interfaces multimodais estão cada vez mais populares e buscam a interação natural como recurso para enriquecer a experiência do usuário. Dentre as formas de interação natural, estão a fala e os gestos. O reconhecimento de fala já está presente em nosso dia a dia em variadas aplicações, porém o reconhecimento de gestos apareceu recentemente como uma nova forma de interação. A Linguagem Brasileira de Sinais (Libras) foi recentemente reconhecida como meio de comunicação e expressão através da Lei N˚10.436 de 24/04/2002, e também foi incluída como disciplina obrigatória em cursos de formação de professores e optativa em cursos de graduação através do Decreto N˚5.626 de 22/12/2005. Neste contexto, esta dissertação apresenta um estudo sobre todas as etapas necessárias para a construção de um sistema para reconhecimento de Gestos Estáticos e Dinâmicos da Libras, sendo estas: Segmentação; Modelagem e Identificação; e Reconhecimento. Resultados e soluções propostas serão apresentados para cada uma destas etapas, e o sistema será avaliado no reconhecimento em tempo real utilizando um conjunto finito de gestos estáticos e dinâmicos. Todas as soluções apresentadas nesta dissertação foram encapsuladas no Software GestureUI, que tem por objetivo simplificar as pesquisas na área de reconhecimento de gestos permitindo a comunicação com interfaces multimodais através de um protocolo TCP/IP.

APA, Harvard, Vancouver, ISO, and other styles

49

Silva, Brunna Carolinne Rocha. "Desenvolvimento de tecnologia baseada em redes neurais artificiais para reconhecimento de gestos da língua de sinais." Universidade Federal de Goiás, 2018. http://repositorio.bc.ufg.br/tede/handle/tede/8725.

Full text

Abstract:

Submitted by Liliane Ferreira (ljuvencia30@gmail.com) on 2018-07-19T10:58:33Z No. of bitstreams: 2 Dissertação - Brunna Carolinne Rocha Silva - 2018.pdf: 18872874 bytes, checksum: 227a38d63020f0863a2632461b79e19c (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2018-07-19T11:21:27Z (GMT) No. of bitstreams: 2 Dissertação - Brunna Carolinne Rocha Silva - 2018.pdf: 18872874 bytes, checksum: 227a38d63020f0863a2632461b79e19c (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Made available in DSpace on 2018-07-19T11:21:27Z (GMT). No. of bitstreams: 2 Dissertação - Brunna Carolinne Rocha Silva - 2018.pdf: 18872874 bytes, checksum: 227a38d63020f0863a2632461b79e19c (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2018-04-06
The purpose of this paper is to design, develop and evaluate four devices capable of identifying configuration, orientation and movement of the hands, verifying which one has better performance recognition of sign language gestures. The methodology starts from the definition of the layout and the components of data acquisition and processing, the construction of the database treated for each gesture to be recognized and validation of the proposed devices. Signs of flex sensors, accelerometers and gyroscopes are collected, positioned differently on each device. The recognition of the patterns of each gesture is performed using artificial neural networks. After being trained, validated and tested, the neural network interconnected to the devices obtain a hit rate of up to 96.8%. The validated device offers efficacy and efficiency to identify sign language gestures and demonstrates that the use of the sensory approach is promising.
O intuito deste trabalho é projetar, desenvolver e avaliar quatro dispositivos capazes de identificar configuração, orientação e movimento das mãos, verificando qual possui melhor desempenho para reconhecimento de gestos da língua de sinais. A metodologia parte da definição do leiaute e dos componentes de aquisição e processamento de dados, da construção da base de dados tratados para cada gesto a ser reconhecido e da validação dos dispositivos propostos. São coletados sinais de sensores de flexão, acelerômetros e giroscópios, posicionados diferentemente em cada dispositivo. O reconhecimento dos padrões de cada gesto é realizado utilizando redes neurais artificiais. Após treinada, validada e testada, a rede neural interligada aos dispositivos obtêm média de acerto de até 96,8%. O dispositivo validado oferece eficácia e eficiência para identificar gestos da língua de sinais e demonstra que o uso da abordagem sensorial é promissora.

APA, Harvard, Vancouver, ISO, and other styles

50

Bermingham, Rowena. "Describing and remembering motion events in British Sign Language." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/288080.

Full text

Abstract:

Motion events are ubiquitous in conversation, from describing a tiresome commute to recounting a burglary. These situations, where an entity changes location, consist of four main semantic components: Motion (the movement), Figure (the entity moving), Ground (the object or objects with respect to which the Figure carries out the Motion) and Path (the route taken). Two additional semantic components can occur simultaneously: Manner (the way the Motion occurs) and Cause (the source of/reason for the Motion). Languages differ in preferences for provision and packaging of semantic components in descriptions. It has been suggested, in the thinking-for-speaking hypothesis, that these preferences influence the conceptualisation of events (such as their memorisation). This thesis addresses questions relating to the description and memory of Motion events in British Sign Language (BSL) and English. It compares early BSL (acquired before age seven) and late BSL (acquired after age 16) descriptions of Motion events and investigates whether linguistic preferences influence memory. Comparing descriptions by early signers and late signers indicates where their linguistic preferences differ, providing valuable knowledge for interpreters wishing to match early signers. Understanding how linguistic preferences might influence memory contributes to debates around the connection between language and thought. The experimental groups for this study were: deaf early BSL signers, hearing early BSL signers, deaf late BSL signers, hearing late BSL signers and hearing English monolinguals. Participants watched target Motion event video clips before completing a memory and attention task battery. Subsequently, they performed a forced-choice recognition task where they saw each target Motion event clip again alongside a distractor clip that differed in one semantic component. They selected which of the two clips they had seen in the first presentation. Finally, participants were filmed describing all of the target and distractor video clips (in English for English monolinguals and BSL for all other groups). The Motion event descriptions were coded for the inclusion and packaging of components. Linguistic descriptions were compared between languages (English and BSL) and BSL group. Statistical models were created to investigate variation on the memory and attention task battery and the recognition task. Results from linguistic analysis reveal that English and BSL are similar in the components included in descriptions. However, packaging differs between languages. English descriptions show preferences for Manner verbs and spatial particles to express Path ('run out'). BSL descriptions show preferences for serial verb constructions (using Manner and Path verbs in the same clause). The BSL groups are also similar in the components they include in descriptions. However, the packaging differs, with hearing late signers showing some English-like preferences and deaf early signers showing stronger serial verb preferences. Results from the behavioural experiments show no overall relationship between language group and memory. I suggest that the similarity of information provided in English and BSL descriptions undermines the ability of the task to reveal memory differences. However, results suggest a link between individual linguistic description and memory; marking a difference between components in linguistic description is correlated with correctly selecting that component clip in the recognition task. I argue that this indicates a relationship between linguistic encoding and memory within each individual, where their personal preference for including certain semantic components in their utterances is connected to their memory for those components. I also propose that if the languages were more distinct in their inclusion of information then there may have been differences in recognition task scores. I note that further research is needed across modalities to create a fuller picture of how information is included and packaged cross-modally and how this might affect individual Motion event memory.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!