Journal articles: 'Speech processing systems. Pattern recognition systems'

1

Järvinen, Kari. "Digital speech processing: Speech coding, synthesis, and recognition." Signal Processing 30, no. 1 (January 1993): 133–34. http://dx.doi.org/10.1016/0165-1684(93)90056-g.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Mišković, Dragiša, Milan Gnjatović, Perica Štrbac, Branimir Trenkić, Nikša Jakovljević, and Vlado Delić. "Hybrid methodological approach to context-dependent speech recognition." International Journal of Advanced Robotic Systems 14, no. 1 (January 1, 2017): 172988141668713. http://dx.doi.org/10.1177/1729881416687131.

Full text

Abstract:

Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human–machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.

APA, Harvard, Vancouver, ISO, and other styles

3

Modi, Rohan. "Transcript Anatomization with Multi-Linguistic and Speech Synthesis Features." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (June 20, 2021): 1755–58. http://dx.doi.org/10.22214/ijraset.2021.35371.

Full text

Abstract:

Handwriting Detection is a process or potential of a computer program to collect and analyze comprehensible input that is written by hand from various types of media such as photographs, newspapers, paper reports etc. Handwritten Text Recognition is a sub-discipline of Pattern Recognition. Pattern Recognition is refers to the classification of datasets or objects into various categories or classes. Handwriting Recognition is the process of transforming a handwritten text in a specific language into its digitally expressible script represented by a set of icons known as letters or characters. Speech synthesis is the artificial production of human speech using Machine Learning based software and audio output based computer hardware. While there are many systems which convert normal language text in to speech, the aim of this paper is to study Optical Character Recognition with speech synthesis technology and to develop a cost effective user friendly image based offline text to speech conversion system using CRNN neural networks model and Hidden Markov Model. The automated interpretation of text that has been written by hand can be very useful in various instances where processing of great amounts of handwritten data is required, such as signature verification, analysis of various types of documents and recognition of amounts written on bank cheques by hand.

APA, Harvard, Vancouver, ISO, and other styles

4

Dmitriev, V. Ya, T. A. Ignat'eva, and V. P. Pilyavskiy. "Development of Artificial Intelligence and Prospects for Its Application." Economics and Management 27, no. 2 (May 1, 2021): 132–38. http://dx.doi.org/10.35854/1998-1627-2021-2-132-138.

Full text

Abstract:

Aim. To analyze the concept of “artificial intelligence”, to justify the effectiveness of using artificial intelligence technologies.Tasks. To study the conceptual apparatus; to propose and justify the author’s definition of the “artificial intelligence” concept; to describe the technology of speech recognition using artificial intelligence.Methodology. The authors used such general scientific methods of cognition as comparison, deduction and induction, analysis, generalization and systematization.Results. Based on a comparative analysis of the existing conceptual apparatus, it is concluded that there is no single concept of “artificial intelligence”. Each author puts his own vision into it. In this regard, the author’s definition of the “artificial intelligence” concept is formulated. It is determined that an important area of applying artificial intelligence technologies in various fields of activity is speech recognition technology. It is shown that the first commercially successful speech recognition prototypes appeared already by the 1990s, and since the beginning of the 21st century. The great interest in “end-to-end” automatic speech recognition has become obvious. While traditional phonetic approaches have requested pronunciation, acoustic, and language model data, end-to-end models simultaneously consider all components of speech recognition, thereby facilitating the stages of self-learning and development. It is established that a significant increase in the” mental “ capabilities of computer technology and the development of new algorithms have led to new achievements in this direction. These advances are driven by the growing demand for speech recognition.Conclusions. According to the authors, artificial intelligence is a complex of computer programs that duplicate the functions of the human brain, opening up the possibility of informal learning based on big data processing, allowing to solve the problems of pattern recognition (text, image, speech) and the formation of management decisions. Currently, the active development of information and communication technologies and artificial intelligence concepts has led to a wide practical application of intelligent technologies, especially in control systems. The impact of these systems can be found in the work of mobile phones and expert systems, in forecasting and other areas. Among the obstacles to the development of this technology is the lack of accuracy in speech and voice recognition systems in the conditions of sound interference, which is always present in the external environment. However, the recent advances overcome this disadvantage.

APA, Harvard, Vancouver, ISO, and other styles

5

Hickt, L. "Speech and speaker recognition." Signal Processing 13, no. 3 (October 1987): 336–38. http://dx.doi.org/10.1016/0165-1684(87)90137-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Tseng, Juin-Ling. "Intelligent Augmented Reality System based on Speech Recognition." International Journal of Circuits, Systems and Signal Processing 15 (March 18, 2021): 178–86. http://dx.doi.org/10.46300/9106.2021.15.20.

Full text

Abstract:

In general, most of the current augmented reality systems can combine 3D virtual scenes with live reality, and users usually interact with 3D objects of the augmented reality (AR) system through image recognition. Although the image-recognition technology has matured enough to allow users to interact with the system, the interaction process is usually limited by the number of patterns used to identify the image. It is not convenient to handle. To provide a more flexible interactive manipulation mode, this study imports the speech-recognition mechanism that allows users to operate 3D objects in an AR system simply by speech. In terms of implementation, the program uses Unity3D as the main development environment and the AR e-Desk as the main development platform. The AR e-Desk interacts through the identification mechanism of the reacTIVision and its markers. We use Unity3D to build the required 3D virtual scenes and objects in the AR e-Desk and import the Google Cloud Speech suite to the AR e-Desk system to develop the speech-interaction mechanism. Then, the intelligent AR system is developed.

APA, Harvard, Vancouver, ISO, and other styles

7

Puder, Henning, and Gerhard Schmidt. "Applied speech and audio processing." Signal Processing 86, no. 6 (June 2006): 1121–23. http://dx.doi.org/10.1016/j.sigpro.2005.07.034.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

PHINYOMARK, ANGKOON, PORNCHAI PHUKPATTARANONT, and CHUSAK LIMSAKUL. "APPLICATIONS OF VARIANCE FRACTAL DIMENSION: A SURVEY." Fractals 22, no. 01n02 (March 2014): 1450003. http://dx.doi.org/10.1142/s0218348x14500030.

Full text

Abstract:

Chaotic dynamical systems are pervasive in nature and can be shown to be deterministic through fractal analysis. There are numerous methods that can be used to estimate the fractal dimension. Among the usual fractal estimation methods, variance fractal dimension (VFD) is one of the most significant fractal analysis methods that can be implemented for real-time systems. The basic concept and theory of VFD are presented. Recent research and the development of several applications based on VFD are reviewed and explained in detail, such as biomedical signal processing and pattern recognition, speech communication, geophysical signal analysis, power systems and communication systems. The important parameters that need to be considered in computing the VFD are discussed, including the window size and the window increment of the feature, and the step size of the VFD. Directions for future research of VFD are also briefly outlined.

APA, Harvard, Vancouver, ISO, and other styles

9

Ujiie, Yuta, and Kohske Takahashi. "Weaker McGurk Effect for Rubin’s Vase-Type Speech in People With High Autistic Traits." Multisensory Research 34, no. 6 (April 16, 2021): 663–79. http://dx.doi.org/10.1163/22134808-bja10047.

Full text

Abstract:

Abstract While visual information from facial speech modulates auditory speech perception, it is less influential on audiovisual speech perception among autistic individuals than among typically developed individuals. In this study, we investigated the relationship between autistic traits (Autism-Spectrum Quotient; AQ) and the influence of visual speech on the recognition of Rubin’s vase-type speech stimuli with degraded facial speech information. Participants were 31 university students (13 males and 18 females; mean age: 19.2, SD: 1.13 years) who reported normal (or corrected-to-normal) hearing and vision. All participants completed three speech recognition tasks (visual, auditory, and audiovisual stimuli) and the AQ–Japanese version. The results showed that accuracies of speech recognition for visual (i.e., lip-reading) and auditory stimuli were not significantly related to participants’ AQ. In contrast, audiovisual speech perception was less susceptible to facial speech perception among individuals with high rather than low autistic traits. The weaker influence of visual information on audiovisual speech perception in autism spectrum disorder (ASD) was robust regardless of the clarity of the visual information, suggesting a difficulty in the process of audiovisual integration rather than in the visual processing of facial speech.

APA, Harvard, Vancouver, ISO, and other styles

10

CHEN, QINGCAI, XIAOLONG WANG, PENGFEI SU, and YI YAO. "AUTO ADAPTED ENGLISH PRONUNCIATION EVALUATION: A FUZZY INTEGRAL APPROACH." International Journal of Pattern Recognition and Artificial Intelligence 22, no. 01 (February 2008): 153–68. http://dx.doi.org/10.1142/s0218001408006090.

Full text

Abstract:

To evaluate the pronunciation skills of spoken English is one of the key tasks for computer-aided spoken language learning (CALL). While most of the researchers focus on improving the speech recognition techniques to build a reliable evaluation system, another important aspect of this task has been ignored, i.e. the pronunciation evaluation model that integrates both the reliabilities of existing speech processing systems and the learner's pronunciation personalities. To take this aspect into consideration, a Sugeno integral-based evaluation model is introduced in this paper. At first, the English phonemes that are hard to be distinguished (HDP) for Chinese language learners are grouped into different HDP sets. Then, the system reliabilities for distinguishing the phonemes within a HDP set are computed from the standard speech corpus and are integrated with the phoneme recognition results under the Sugeno integral framework. The fuzzy measures are given for each subset of speech segments that contains n occurrences of phonemes within a HDP set. Rather than providing a quantity of scores, the linguistic descriptions of evaluation results are given by the model, which is more helpful for the users to improve their spoken language skills. To get a better performance, generic algorithm (GA)-based parameter optimization is also applied to optimize the model parameters. Experiments are conducted on the Sphinx-4 speech recognition platform. They show that, with 84.7% of average recognition rate of the SR system on standard speech corpus, our pronunciation evaluation model has got reasonable and reliable results for three kinds of test corpora.

APA, Harvard, Vancouver, ISO, and other styles

11

Murthy, Hema A., and B. Yegnanarayana. "Speech processing using group delay functions." Signal Processing 22, no. 3 (March 1991): 259–67. http://dx.doi.org/10.1016/0165-1684(91)90014-a.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Nair, Nishanth Ulhas, and T. V. Sreenivas. "Multi-Pattern Viterbi Algorithm for joint decoding of multiple speech patterns." Signal Processing 90, no. 12 (December 2010): 3278–83. http://dx.doi.org/10.1016/j.sigpro.2010.05.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Becerra Yoma, Néstor, and Carlos Molina. "Feature-dependent compensation of coders in speech recognition." Signal Processing 86, no. 1 (January 2006): 38–49. http://dx.doi.org/10.1016/j.sigpro.2005.03.019.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Mporas, Iosif, Todor Ganchev, Otilia Kocsis, and Nikos Fakotakis. "Context-adaptive pre-processing scheme for robust speech recognition in fast-varying noise environment." Signal Processing 91, no. 8 (August 2011): 2101–11. http://dx.doi.org/10.1016/j.sigpro.2011.03.020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Saitoh, Takeshi. "Research on multi-modal silent speech recognition technology." Impact 2018, no. 3 (June 15, 2018): 47–49. http://dx.doi.org/10.21820/23987073.2018.3.47.

Full text

Abstract:

We are all familiar with audio speech recognition technology for interfacing with smartphones and in-car computers. However, technology that can interpret our speech signals without audio is a far greater challenge for scientists. Audio speech recognition (ASR) can only work in situations where there is little or no background noise and where speech is clearly enunciated. Other technologies that use visual signals to lip-read, or that use lip-reading in conjunction with degraded audio input are under development. However, in the situations where a person cannot speak or where the person's face may not be fully visible, silent speech recognition, which uses muscle movements or brain signals to decode speech, is also under development. Associate Professor Takeshi Saitoh's laboratory at the Kyushu Institute of Technology is at the forefront of visual speech recognition (VSR) and is collaborating with researchers worldwide to develop a range of silent speech recognition technologies. Saitoh, whose small team of researchers and students are being supported by the Japan Society for the Promotion of Science (JSPS), says: 'The aim of our work is to achieve smooth and free communication in real time, without the need for audible speech.' The laboratory's VSR prototype is already performing at a high level. There are many reasons why scientists are working on speech technology that does not rely on audio. Saitoh points out that: 'With an ageing population, more people will suffer from speech or hearing disabilities and would benefit from a means to communicate freely. This would vastly improve their quality of life and create employment opportunities.' Also, intelligent machines, controlled by human-machine interfaces, are expected to become increasingly common in our lives. Non-audio speech recognition technology will be useful for interacting with smartphones, driverless cars, surveillance systems and smart appliances. VSR uses a modified camera, combined with image processing and pattern recognition to convert moving shapes made by the mouth, into meaningful language. Earlier VSR technologies matched the shape of a still mouth with vowel sounds, and others have correlated mouth shapes with a key input. However, these do not provide audio output in real-time, so cannot facilitate a smooth conversation. Also, it is vital that VSR is both easy to use and applicable to a range of situations, such as people bedridden in a supine position, where there is a degree of camera movement or where a face is being viewed in profile rather than full-frontal. Any reliable system should also be user-dependent, such that it will work on any skin colour and any shape of face and in spite of head movement.

APA, Harvard, Vancouver, ISO, and other styles

16

Sánchez-García, Carolina, Sonia Kandel, Christophe Savariaux, and Salvador Soto-Faraco. "The Time Course of Audio-Visual Phoneme Identification: a High Temporal Resolution Study." Multisensory Research 31, no. 1-2 (2018): 57–78. http://dx.doi.org/10.1163/22134808-00002560.

Full text

Abstract:

Speech unfolds in time and, as a consequence, its perception requires temporal integration. Yet, studies addressing audio-visual speech processing have often overlooked this temporal aspect. Here, we address the temporal course of audio-visual speech processing in a phoneme identification task using a Gating paradigm. We created disyllabic Spanish word-like utterances (e.g., /pafa/, /paθa/, …) from high-speed camera recordings. The stimuli differed only in the middle consonant (/f/, /θ/, /s/, /r/, /g/), which varied in visual and auditory saliency. As in classical Gating tasks, the utterances were presented in fragments of increasing length (gates), here in 10 ms steps, for identification and confidence ratings. We measured correct identification as a function of time (at each gate) for each critical consonant in audio, visual and audio-visual conditions, and computed the Identification Point and Recognition Point scores. The results revealed that audio-visual identification is a time-varying process that depends on the relative strength of each modality (i.e., saliency). In some cases, audio-visual identification followed the pattern of one dominant modality (either A or V), when that modality was very salient. In other cases, both modalities contributed to identification, hence resulting in audio-visual advantage or interference with respect to unimodal conditions. Both unimodal dominance and audio-visual interaction patterns may arise within the course of identification of the same utterance, at different times. The outcome of this study suggests that audio-visual speech integration models should take into account the time-varying nature of visual and auditory saliency.

APA, Harvard, Vancouver, ISO, and other styles

17

Gabbouj, Moncef. "Speech production and speech modelling." Signal Processing 23, no. 2 (May 1991): 215. http://dx.doi.org/10.1016/0165-1684(91)90075-t.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Lu, Yuanyao, and Jie Yan. "Automatic Lip Reading Using Convolution Neural Network and Bidirectional Long Short-term Memory." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 01 (May 24, 2019): 2054003. http://dx.doi.org/10.1142/s0218001420540038.

Full text

Abstract:

Traditional automatic lip-reading systems generally consist of two stages: feature extraction and recognition, while the handcrafted features are empirical and cannot learn the relevance of lip movement sequence sufficiently. Recently, deep learning approaches have attracted increasing attention, especially the significant improvements of convolution neural network (CNN) applied to image classification and long short-term memory (LSTM) used in speech recognition, video processing and text analysis. In this paper, we propose a hybrid neural network architecture, which integrates CNN and bidirectional LSTM (BiLSTM) for lip reading. First, we extract key frames from each isolated video clip and use five key points to locate mouth region. Then, features are extracted from raw mouth images using an eight-layer CNN. The extracted features have the characteristics of stronger robustness and fault-tolerant capability. Finally, we use BiLSTM to capture the correlation of sequential information among frame features in two directions and the softmax function to predict final recognition result. The proposed method is capable of extracting local features through convolution operations and finding hidden correlation in temporal information from lip image sequences. The evaluation results of lip-reading recognition experiments demonstrate that our proposed method outperforms conventional approaches such as active contour model (ACM) and hidden Markov model (HMM).

APA, Harvard, Vancouver, ISO, and other styles

19

Di Persia, Leandro, Diego Milone, Hugo Leonardo Rufiner, and Masuzo Yanagida. "Perceptual evaluation of blind source separation for robust speech recognition." Signal Processing 88, no. 10 (October 2008): 2578–83. http://dx.doi.org/10.1016/j.sigpro.2008.04.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Weismer, Susan Ellis, Elena Plante, Maura Jones, and J. Bruce Tomblin. "A Functional Magnetic Resonance Imaging Investigation of Verbal Working Memory in Adolescents With Specific Language Impairment." Journal of Speech, Language, and Hearing Research 48, no. 2 (April 2005): 405–25. http://dx.doi.org/10.1044/1092-4388(2005/028).

Full text

Abstract:

This study used neuroimaging and behavioral techniques to examine the claim that processing capacity limitations underlie specific language impairment (SLI). Functional magnetic resonance imaging (fMRI) was used to investigate verbal working memory in adolescents with SLI and normal language (NL) controls. The experimental task involved a modified listening span measure that included sentence encoding and recognition of final words in prior sets of sentences. The SLI group performed significantly poorer than the NL group for both encoding and recognition and displayed slower reaction times for correct responses on high complexity encoding items. fMRI results revealed that the SLI group exhibited significant hypoactivation during encoding in regions that have been implicated in attentional and memory processes, as well as hypoactivation during recognition in regions associated with language processing. Correlational analyses indicated that adolescents with SLI exhibited different patterns of coordinating activation among brain regions relative to controls for both encoding and recognition, suggesting reliance on a less functional network. These findings are interpreted as supporting the notion that constraints in nonlinguistic systems play a role in SLI.

APA, Harvard, Vancouver, ISO, and other styles

21

Gülzow, T., T. Ludwig, and U. Heute. "Spectral-subtraction speech enhancement in multirate systems with and without non-uniform and adaptive bandwidths." Signal Processing 83, no. 8 (August 2003): 1613–31. http://dx.doi.org/10.1016/s0165-1684(03)00080-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Biau, Emmanuel, Salvador Soto-Faraco, Ruth de Diego Balaguer, and LLuís Fuentemilla. "Spontaneous gestures modulate speech processing through phase resetting of delta–theta neural oscillations." Multisensory Research 26, no. 1-2 (2013): 65. http://dx.doi.org/10.1163/22134808-000s0043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Hess, Wolfgang. "Speech enhancement." Signal Processing 12, no. 3 (April 1987): 331–33. http://dx.doi.org/10.1016/0165-1684(87)90104-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Arısoy, Ebru, Helin Dutağacı, and Levent M. Arslan. "A unified language model for large vocabulary continuous speech recognition of Turkish." Signal Processing 86, no. 10 (October 2006): 2844–62. http://dx.doi.org/10.1016/j.sigpro.2005.12.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Setiawan, Ariyono. "Pengenalan Bentuk dan Pola Suara bagi Anak Anak Penyandang Tuna Rungu." Jurnal Penelitian 3, no. 2 (June 4, 2018): 57–65. http://dx.doi.org/10.46491/jp.v3e2.38.57-65.

Full text

Abstract:

IInability to speak for the Deaf child is a distinctive characteristic that makes it different from normal children. Children with normal hearing understand the language through hearing in the months before they start talking. Speech recognition or recognition of voice patterns in children Deaf as search forms grades suitability and appropriateness. the type of sound is a development from techniques and systems that enable the computer to accept input in the form of patterns spoken word so on get the value of the type of words approaches, and can be understood. This study proposes a solution in Method utilizes biometrics to recognize the type of sound patterns deaf children who will be in the skewer with the sound of a normal child. Biometric methods used in digital signal processing (in this case sound) in the form of discrete biometrics refers to the automatic identification of humans by psikological or basic human characteristic sound.

APA, Harvard, Vancouver, ISO, and other styles

26

Bogach, Natalia, Elena Boitsova, Sergey Chernonog, Anton Lamtev, Maria Lesnichaya, Iurii Lezhenin, Andrey Novopashenny, et al. "Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching." Electronics 10, no. 3 (January 20, 2021): 235. http://dx.doi.org/10.3390/electronics10030235.

Full text

Abstract:

This article contributes to the discourse on how contemporary computer and information technology may help in improving foreign language learning not only by supporting better and more flexible workflow and digitizing study materials but also through creating completely new use cases made possible by technological improvements in signal processing algorithms. We discuss an approach and propose a holistic solution to teaching the phonological phenomena which are crucial for correct pronunciation, such as the phonemes; the energy and duration of syllables and pauses, which construct the phrasal rhythm; and the tone movement within an utterance, i.e., the phrasal intonation. The working prototype of StudyIntonation Computer-Assisted Pronunciation Training (CAPT) system is a tool for mobile devices, which offers a set of tasks based on a “listen and repeat” approach and gives the audio-visual feedback in real time. The present work summarizes the efforts taken to enrich the current version of this CAPT tool with two new functions: the phonetic transcription and rhythmic patterns of model and learner speech. Both are designed on a base of a third-party automatic speech recognition (ASR) library Kaldi, which was incorporated inside StudyIntonation signal processing software core. We also examine the scope of automatic speech recognition applicability within the CAPT system workflow and evaluate the Levenstein distance between the transcription made by human experts and that obtained automatically in our code. We developed an algorithm of rhythm reconstruction using acoustic and language ASR models. It is also shown that even having sufficiently correct production of phonemes, the learners do not produce a correct phrasal rhythm and intonation, and therefore, the joint training of sounds, rhythm and intonation within a single learning environment is beneficial. To mitigate the recording imperfections voice activity detection (VAD) is applied to all the speech records processed. The try-outs showed that StudyIntonation can create transcriptions and process rhythmic patterns, but some specific problems with connected speech transcription were detected. The learners feedback in the sense of pronunciation assessment was also updated and a conventional mechanism based on dynamic time warping (DTW) was combined with cross-recurrence quantification analysis (CRQA) approach, which resulted in a better discriminating ability. The CRQA metrics combined with those of DTW were shown to add to the accuracy of learner performance estimation. The major implications for computer-assisted English pronunciation teaching are discussed.

APA, Harvard, Vancouver, ISO, and other styles

27

Lee, Woojung, Ji-Hyun Song, and Joon-Hyuk Chang. "Minima-controlled speech presence uncertainty tracking method for speech enhancement." Signal Processing 91, no. 1 (January 2011): 155–61. http://dx.doi.org/10.1016/j.sigpro.2010.06.019.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Vlad, Mariana, and Sorin Vlad. "THE USE OF MACHINE LEARNING TECHNIQUES IN ACCOUNTING. A SHORT SURVEY." Journal of Social Sciences 4, no. 3 (September 2021): 139–43. http://dx.doi.org/10.52326/jss.utm.2021.4(3).14.

Full text

Abstract:

Machine learning (ML) is a subset of artificial Intelligence (AI) aiming to develop systems that can learn and continuously improve the abilities through generalization in an autonomous manner. ML is presently all around us, almost every facet of our digital and real life is embedding some ML related content. Customer recommendation systems, customer behavior prediction, fraud detection, speech recognition, image recognition, black & white movies colorization, accounting fraud detection are just some examples of the vast range of applications in which ML is involved. The techniques that this paper investigates are mainly focused on the use of neural networks in accounting and finance research fields. An artificial neural network is modelling the brain ability of learning intricate patterns from the information presented at its inputs using elementary interconnected units, named neurons, grouped in layers and trained by means of a learning algorithm. The performance of the network depends on many factors like the number of layers, the number of each neurons in each layer, the learning algorithm, activation functions, to name just a few of them. Machine learning algorithms have already started to replace humans in jobs that require document’s processing and decision making.

APA, Harvard, Vancouver, ISO, and other styles

29

Chengqi Zhang*, Ling Guan**, and Zheru Chi. "Introduction to the Special Issue on Learning in Intelligent Algorithms and Systems Design." Journal of Advanced Computational Intelligence and Intelligent Informatics 3, no. 6 (December 20, 1999): 439–40. http://dx.doi.org/10.20965/jaciii.1999.p0439.

Full text

Abstract:

Learning has long been and will continue to be a key issue in intelligent algorithms and systems design. Emulating the behavior and mechanisms of human learning by machines at such high levels as symbolic processing and such low levels as neuronal processing has long been a dominant interest among researchers worldwide. Neural networks, fuzzy logic, and evolutionary algorithms represent the three most active research areas. With advanced theoretical studies and computer technology, many promising algorithms and systems using these techniques have been designed and implemented for a wide range of applications. This Special Issue presents seven papers on learning in intelligent algorithms and systems design from researchers in Japan, China, Australia, and the U.S. Neural Networks: Emulating low-level human intelligent processing, or neuronal processing, gave birth of artificial neural networks more than five decades ago. It was hoped that devices based on biological neural networks would possess characteristics of the human brain. Neural networks have reattracted researchers' attention since the late 1980s when back-propagation algorithms were used to train multilayer feed-forward neural networks. In the last decades, we have seen promising progress in this research field yield many new models, learning algorithms, and real-world applications, evidenced by the publication of new journals in this field. Fuzzy Logic: Since L. A. Zadeh introduced fuzzy set theory in 1965, fuzzy logic has increasingly become the focus of many researchers and engineers opening up new research and problem solving. Fuzzy set theory has been favorably applied to control system design. In the last few years, fuzzy model applications have bloomed in image processing and pattern recognition. Evolutionary Algorithms: Evolutionary optimization algorithms have been studied over three decades, emulating natural evolutionary search and selection so powerful in global optimization. The study of evolutionary algorithms includes evolutionary programming (EP), evolutionary strategies (ESs), genetic algorithms (GAs), and genetic programming (GP). In the last few years, we have also seen multiple computational algorithms combined to maximize system performance, such as neurofuzzy networks, fuzzy neural networks, fuzzy logic and genetic optimization, neural networks, and evolutionary algorithms. This Special Issue also includes papers that introduce combined techniques. Wang et al present an improved fuzzy algorithm for enhanced eyeground images. Examination of the eyeground image is effective in diagnosing glaucoma and diabetes. Conventional eyeground image quality is usually too poor for doctors to obtain useful information, so enhancement is required to eliminate this. Due to details and uncertainties in eyeground images, conventional enhancement such as histogram equalization, edge enhancement, and high-pass filters fail to achieve good results. Fuzzy enhancement enhances images in three steps: (1) transferring an image from the spatial domain to the fuzzy domain; (2) conducting enhancement in the fuzzy domain; and (3) returning the image from the fuzzy domain to the spatial domain. The paper detailing this proposes improved mapping and fast implementation. Mohammadian presents a method for designing self-learning hierarchical fuzzy logic control systems based on the integration of evolutionary algorithms and fuzzy logic. The purpose of such an approach is to provide an integrated knowledge base for intelligent control and collision avoidance in a multirobot system. Evolutionary algorithms are used as in adaptation for learning fuzzy knowledge bases of control systems and learning, mapping, and interaction between fuzzy knowledge bases of different fuzzy logic systems. Fuzzy integral has been found useful in data fusion. Pham and Wagner present an approach based on the fuzzy integral and GAs to combine likelihood values of cohort speakers. The fuzzy integral nonlinearly fuses similarity measures of an utterance assigned to cohort speakers. In their approach, Gas find optimal fuzzy densities required for fuzzy fusion. Experiments using commercial speech corpus T146 show their approach achieves more favorable performance than conventional normalization. Evolution reflects the behavior of a society. Puppala and Sen present a coevolutionary approach to generating behavioral strategies for cooperating agent groups. Agent behavior evolves via GAs, where one genetic algorithm population is evolved per individual in the cooperative group. Groups are evaluated by pairing strategies from each population and best strategy pairs are stored together in shared memory. The approach is evaluated using asymmetric room painting and results demonstrate the superiority of shared memory over random pairing in consistently generating optimal behavior patterns. Object representation and template optimization are two main factors affecting object recognition performance. Lu et al present an evolutionary algorithm for optimizing handwritten numeral templates represented by rational B-spline surfaces of character foreground-background-distance distribution maps. Initial templates are extracted from training a feed-forward neural network instead of using arbitrarily chosen patterns to reduce iterations required in evolutionary optimization. To further reduce computational complexity, a fast search is used in selection. Using 1,000 optimized numeral templates, the classifier achieves a classification rate of 96.4% while rejecting 90.7% of nonnumeral patterns when tested on NIST Special Database 3. Determining an appropriate number of clusters is difficult yet important. Li et al based their approach based on rival penalized competitive learning (RPCL), addressing problems of overlapped clusters and dependent components of input vectors by incorporating full covariance matrices into the original RPCL algorithm. The resulting learning algorithm progressively eliminates units whose clusters contain only a small amount of training data. The algorithm is applied to determine the number of clusters in a Gaussian mixture distribution and to optimize the architecture of elliptical function networks for speaker verification and for vowel classification. Another important issue on learning is Kurihara and Sugawara's adaptive reinforcement learning algorithm integrating exploitation- and exploration-oriented learning. This algorithm is more robust in dynamically changing, large-scale environments, providing better performance than either exploitation- learning or exploration-oriented learning, making it is well suited for autonomous systems. In closing we would like to thank the authors who have submitted papers to this Special Issue and express our appreciation to the referees for their excellent work in reading papers under a tight schedule.

APA, Harvard, Vancouver, ISO, and other styles

30

Sameti, Hossein, and Li Deng. "Nonstationary-state hidden Markov model representation of speech signals for speech enhancement." Signal Processing 82, no. 2 (February 2002): 205–27. http://dx.doi.org/10.1016/s0165-1684(01)00179-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

R, Skuratovskii, Bazarna A, and Osadhyy E. "Analysis of speech MEL scale and its classification as big data by parameterized KNN." Artificial Intelligence 26, jai2021.26(1) (June 30, 2021): 42–57. http://dx.doi.org/10.15407/jai2021.01.042.

Full text

Abstract:

Recognizing emotions and human speech has always been an exciting challenge for scientists. In our work the parameterization of the vector is obtained and realized from the sentence divided into the containing emotional-informational part and the informational part is effectively applied. The expressiveness of human speech is improved by the emotion it conveys. There are several characteristics and features of speech that differentiate it among utterances, i.e. various prosodic features like pitch, timbre, loudness and vocal tone which categorize speech into several emotions. They were supplemented by us with a new classification feature of speech, which consists in dividing a sentence into an emotionally loaded part of the sentence and a part that carries only informational load. Therefore, the sample speech is changed when it is subjected to various emotional environments. As the identification of the speaker’s emotional states can be done based on the Mel scale, MFCC is one such variant to study the emotional aspects of a speaker’s utterances. In this work, we implement a model to identify several emotional states from MFCC for two datasets, classify emotions for them on the basis of MFCC features and give the correspondent comparison of them. Overall, this work implements the classification model based on dataset minimization that is done by taking the mean of features for the improvement of the classification accuracy rate in different machine learning algorithms. In addition to the static analysis of the author's tonal portrait, which is used in particular in MFFC, we propose a new method for the dynamic analysis of the phrase in processing and studying as a new linguistic-emotional entity pronounced by the same author. Due to the ranking by the importance of the MEL scale features, we are able to parameterize the vectors coordinates be processed by the parametrized KNN method. Language recognition is a multi-level task of pattern recognition. Here acoustic signals are analyzed and structured in a hierarchy of structural elements, words, phrases and sentences. Each level of such a hierarchy may provide some temporal constants: possible word sequences or known types of pronunciation that reduce the number of recognition errors at a lower level. An analysis of voice and speech dynamics is appropriate for improving the quality of human perception and the formation of human speech by a machine and is within the capabilities of artificial intelligence. Emotion results can be widely applied in e-learning platforms, vehicle on-board systems, medicine, etc

APA, Harvard, Vancouver, ISO, and other styles

32

Di Persia, Leandro, Masuzo Yanagida, Hugo Leonardo Rufiner, and Diego Milone. "Objective quality evaluation in blind source separation for speech recognition in a real room." Signal Processing 87, no. 8 (August 2007): 1951–65. http://dx.doi.org/10.1016/j.sigpro.2007.02.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Hansen, John H. L. "Analysis and compensation of stressed and noisy speech with application to robust automatic recognition." Signal Processing 17, no. 3 (July 1989): 282. http://dx.doi.org/10.1016/0165-1684(89)90010-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Huang, Kuo-Chang, Yau-Tarng Juang, and Wen-Chieh Chang. "Robust integration for speech features." Signal Processing 86, no. 9 (September 2006): 2282–88. http://dx.doi.org/10.1016/j.sigpro.2005.10.020.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Karjalainen, Matti. "Speech communication, human and machine." Signal Processing 15, no. 2 (September 1988): 217–18. http://dx.doi.org/10.1016/0165-1684(88)90074-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Hou, Yuchen, and Lawrence B. Holder. "On Graph Mining With Deep Learning: Introducing Model R for Link Weight Prediction." Journal of Artificial Intelligence and Soft Computing Research 9, no. 1 (January 1, 2019): 21–40. http://dx.doi.org/10.2478/jaiscr-2018-0022.

Full text

Abstract:

Abstract Deep learning has been successful in various domains including image recognition, speech recognition and natural language processing. However, the research on its application in graph mining is still in an early stage. Here we present Model R, a neural network model created to provide a deep learning approach to the link weight prediction problem. This model uses a node embedding technique that extracts node embeddings (knowledge of nodes) from the known links’ weights (relations between nodes) and uses this knowledge to predict the unknown links’ weights. We demonstrate the power of Model R through experiments and compare it with the stochastic block model and its derivatives. Model R shows that deep learning can be successfully applied to link weight prediction and it outperforms stochastic block model and its derivatives by up to 73% in terms of prediction accuracy. We analyze the node embeddings to confirm that closeness in embedding space correlates with stronger relationships as measured by the link weight. We anticipate this new approach will provide effective solutions to more graph mining tasks.

APA, Harvard, Vancouver, ISO, and other styles

37

Mitiche, Lahcène, Amel B. H. Adamou-Mitiche, and Daoud Berkani. "Low-order model for speech signals." Signal Processing 84, no. 10 (October 2004): 1805–11. http://dx.doi.org/10.1016/j.sigpro.2004.05.029.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Tan, Alan W. C., M. V. C. Rao, and B. S. Daya Sagar. "A composite signal subspace speech classifier." Signal Processing 87, no. 11 (November 2007): 2600–2606. http://dx.doi.org/10.1016/j.sigpro.2007.04.009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Andre-Obrecht, Régine. "Automatic segmentation of continuous speech signals." Signal Processing 9, no. 1 (July 1985): 71. http://dx.doi.org/10.1016/0165-1684(85)90068-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Veinović, M. DJ, B. D. Kovačević, and M. M. Milosavljević. "Robust non-recursive AR speech analysis." Signal Processing 37, no. 2 (May 1994): 189–201. http://dx.doi.org/10.1016/0165-1684(94)90102-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Huang, Kuo-Chang, Shin-Lun Tung, and Yau-Tarng Juang. "A likelihood measure based on projection-based group delay scheme for Mandarin speech recognition in noise." Signal Processing 83, no. 3 (March 2003): 611–26. http://dx.doi.org/10.1016/s0165-1684(02)00496-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Su, Huan-yu. "Acoustic-phonetic recognition of continuous speech using vector quantization; Adaptation of the dictionary to a speaker." Signal Processing 14, no. 4 (June 1988): 388–89. http://dx.doi.org/10.1016/0165-1684(88)90099-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Ververidis, Dimitrios, and Constantine Kotropoulos. "Fast and accurate sequential floating forward feature selection with the Bayes classifier applied to speech emotion recognition." Signal Processing 88, no. 12 (December 2008): 2956–70. http://dx.doi.org/10.1016/j.sigpro.2008.07.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Pak, Junhyeong, Inyong Choi, Yu Gwang Jin, and Jong Won Shin. "Multichannel speech reinforcement based on binaural unmasking." Signal Processing 139 (October 2017): 165–72. http://dx.doi.org/10.1016/j.sigpro.2017.04.021.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Schnitzler, Jürgen, and Peter Vary. "Trends and perspectives in wideband speech coding." Signal Processing 80, no. 11 (November 2000): 2267–81. http://dx.doi.org/10.1016/s0165-1684(00)00116-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Jax, Peter, and Peter Vary. "On artificial bandwidth extension of telephone speech." Signal Processing 83, no. 8 (August 2003): 1707–19. http://dx.doi.org/10.1016/s0165-1684(03)00082-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Akdeniz, Rafet, and Siddik Yarman. "A novel method to represent speech signals." Signal Processing 85, no. 1 (January 2005): 37–50. http://dx.doi.org/10.1016/j.sigpro.2004.08.012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Barros, Allan Kardec, and Noboru Ohnishi. "Single channel speech enhancement by efficient coding." Signal Processing 85, no. 9 (September 2005): 1805–12. http://dx.doi.org/10.1016/j.sigpro.2005.03.011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Chang, Joon-Hyuk. "Perceptual weighting filter for robust speech modification." Signal Processing 86, no. 5 (May 2006): 1089–93. http://dx.doi.org/10.1016/j.sigpro.2005.07.025.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Tantibundhit, C., J. R. Boston, C. C. Li, J. D. Durrant, S. Shaiman, K. Kovacyk, and A. El-Jaroudi. "New signal decomposition method based speech enhancement." Signal Processing 87, no. 11 (November 2007): 2607–28. http://dx.doi.org/10.1016/j.sigpro.2007.04.014.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Speech processing systems. Pattern recognition systems'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles