Journal articles: 'Speech processing systems; Speech synthesis'

1

Järvinen, Kari. "Digital speech processing: Speech coding, synthesis, and recognition." Signal Processing 30, no. 1 (1993): 133–34. http://dx.doi.org/10.1016/0165-1684(93)90056-g.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

M Tasbolatov, N. Mekebayev, O. Mamyrbayev, M. Turdalyuly, D. Oralbekova,. "Algorithms and architectures of speech recognition systems." Psychology and Education Journal 58, no. 2 (2021): 6497–501. http://dx.doi.org/10.17762/pae.v58i2.3182.

Full text

Abstract:

Digital processing of speech signal and the voice recognition algorithm is very important for fast and accurate automatic scoring of the recognition technology. A voice is a signal of infinite information. The direct analysis and synthesis of a complex speech signal is due to the fact that the information is contained in the signal. Speech is the most natural way of communicating people. The task of speech recognition is to convert speech into a sequence of words using a computer program. This article presents an algorithm of extracting MFCC for speech recognition. The MFCC algorithm reduces the processing power by 53% compared to the conventional algorithm. Automatic speech recognition using Matlab.

APA, Harvard, Vancouver, ISO, and other styles

3

Delic, Vlado, Darko Pekar, Radovan Obradovic, and Milan Secujski. "Speech signal processing in ASR&TTS algorithms." Facta universitatis - series: Electronics and Energetics 16, no. 3 (2003): 355–64. http://dx.doi.org/10.2298/fuee0303355d.

Full text

Abstract:

Speech signal processing and modeling in systems for continuous speech recognition and Text-to-Speech synthesis in Serbian language are described in this paper. Both systems are fully developed by the authors and do not use any third party software. Accuracy of the speech recognizer and intelligibility of the TTS system are in the range of the best solutions in the world, and all conditions are met for commercial use of these solutions.

APA, Harvard, Vancouver, ISO, and other styles

4

Varga, A., and F. Fallside. "A technique for using multipulse linear predictive speech synthesis in text-to-speech type systems." IEEE Transactions on Acoustics, Speech, and Signal Processing 35, no. 4 (1987): 586–87. http://dx.doi.org/10.1109/tassp.1987.1165151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Delić, Vlado, Zoran Perić, Milan Sečujski, et al. "Speech Technology Progress Based on New Machine Learning Paradigm." Computational Intelligence and Neuroscience 2019 (June 25, 2019): 1–19. http://dx.doi.org/10.1155/2019/4368036.

Full text

Abstract:

Speech technologies have been developed for decades as a typical signal processing area, while the last decade has brought a huge progress based on new machine learning paradigms. Owing not only to their intrinsic complexity but also to their relation with cognitive sciences, speech technologies are now viewed as a prime example of interdisciplinary knowledge area. This review article on speech signal analysis and processing, corresponding machine learning algorithms, and applied computational intelligence aims to give an insight into several fields, covering speech production and auditory perception, cognitive aspects of speech communication and language understanding, both speech recognition and text-to-speech synthesis in more details, and consequently the main directions in development of spoken dialogue systems. Additionally, the article discusses the concepts and recent advances in speech signal compression, coding, and transmission, including cognitive speech coding. To conclude, the main intention of this article is to highlight recent achievements and challenges based on new machine learning paradigms that, over the last decade, had an immense impact in the field of speech signal processing.

APA, Harvard, Vancouver, ISO, and other styles

6

Sunitha, Dr K. V. N., and P. Sunitha Devi. "Text Normalization for Telugu Text-to-Speech Synthesis." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 11, no. 2 (2013): 2241–49. http://dx.doi.org/10.24297/ijct.v11i2.1176.

Full text

Abstract:

Most areas related to language and speech technology, directly or indirectly, require handling of unrestricted text, and Text-to-speech systems directly need to work on real text. To build a natural sounding speech synthesis system, it is essential that the text processing component produce an appropriate sequence of phonemic units corresponding to an arbitrary input text. A novel approach is used, where the input text is tokenized, and classification is done based on token type. The token sense disambiguation is achieved by the semantic nature of the language and then the expansion rules are applied to get the normalized text. However, for Telugu language not much work is done on text normalization. In this paper we discuss our efforts for designing a rule based system to achieve text normalization in the context of building Telugu text-to-speech system.

APA, Harvard, Vancouver, ISO, and other styles

7

Reddy, Bharathi, D. Leela Rani, and Prof S. Varadarajan. "HIGH SPEED CARRY SAVE MULTIPLIER BASED LINEAR CONVOLUTION USING VEDIC MATHAMATICS." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 4, no. 2 (2013): 284–87. http://dx.doi.org/10.24297/ijct.v4i2a2.3173.

Full text

Abstract:

VLSI applications include Digital Signal Processing, Digital control systems, Telecommunications, Speech and Audio processing for audiology and speech language pathology. The latest research in VLSI is the design and implementation of DSP systems which are essential for above applications. The fundamental computation in DSP Systems is convolution. Convolution and LTI systems are the heart and soul of DSP. The behavior of LTI systems in continuous time is described by Convolution integral whereas the behavior in discrete-time is described by Linear convolution. In this paper, Linear convolution is performed using carry save multiplier architecture based on vertical and cross wise algorithm of Urdhva â€“ Tiryagbhyam in Vedic mathematics. Coding is done using Verilog HDL(verilog Hardware Description Language). Simulation and Synthesis are performed using Xilinx FPGA

APA, Harvard, Vancouver, ISO, and other styles

8

Chabchoub, Abdelkader, and Adnan Cherif. "Implementation of the Arabic Speech Synthesis with TD-PSOLA Modifier." International Journal of Signal System Control and Engineering Application 3, no. 4 (2010): 77–80. http://dx.doi.org/10.3923/ijssceapp.2010.77.80.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Modi, Rohan. "Transcript Anatomization with Multi-Linguistic and Speech Synthesis Features." International Journal for Research in Applied Science and Engineering Technology 9, no. VI (2021): 1755–58. http://dx.doi.org/10.22214/ijraset.2021.35371.

Full text

Abstract:

Handwriting Detection is a process or potential of a computer program to collect and analyze comprehensible input that is written by hand from various types of media such as photographs, newspapers, paper reports etc. Handwritten Text Recognition is a sub-discipline of Pattern Recognition. Pattern Recognition is refers to the classification of datasets or objects into various categories or classes. Handwriting Recognition is the process of transforming a handwritten text in a specific language into its digitally expressible script represented by a set of icons known as letters or characters. Speech synthesis is the artificial production of human speech using Machine Learning based software and audio output based computer hardware. While there are many systems which convert normal language text in to speech, the aim of this paper is to study Optical Character Recognition with speech synthesis technology and to develop a cost effective user friendly image based offline text to speech conversion system using CRNN neural networks model and Hidden Markov Model. The automated interpretation of text that has been written by hand can be very useful in various instances where processing of great amounts of handwritten data is required, such as signature verification, analysis of various types of documents and recognition of amounts written on bank cheques by hand.

APA, Harvard, Vancouver, ISO, and other styles

10

Thoidis, Iordanis, Lazaros Vrysis, Dimitrios Markou, and George Papanikolaou. "Temporal Auditory Coding Features for Causal Speech Enhancement." Electronics 9, no. 10 (2020): 1698. http://dx.doi.org/10.3390/electronics9101698.

Full text

Abstract:

Perceptually motivated audio signal processing and feature extraction have played a key role in the determination of high-level semantic processes and the development of emerging systems and applications, such as mobile phone telecommunication and hearing aids. In the era of deep learning, speech enhancement methods based on neural networks have seen great success, mainly operating on the log-power spectra. Although these approaches surpass the need for exhaustive feature extraction and selection, it is still unclear whether they target the important sound characteristics related to speech perception. In this study, we propose a novel set of auditory-motivated features for single-channel speech enhancement by fusing temporal envelope and temporal fine structure information in the context of vocoder-like processing. A causal gated recurrent unit (GRU) neural network is employed to recover the low-frequency amplitude modulations of speech. Experimental results indicate that the exploited system achieves considerable gains for normal-hearing and hearing-impaired listeners, in terms of objective intelligibility and quality metrics. The proposed auditory-motivated feature set achieved better objective intelligibility results compared to the conventional log-magnitude spectrogram features, while mixed results were observed for simulated listeners with hearing loss. Finally, we demonstrate that the proposed analysis/synthesis framework provides satisfactory reconstruction accuracy of speech signals.

APA, Harvard, Vancouver, ISO, and other styles

11

Musthofa, Musthofa. "COMPUTATIONAL LINGUISTICS (Model Baru Kajian Linguistik dalam Perspektif Komputer)." Adabiyyāt: Jurnal Bahasa dan Sastra 9, no. 2 (2010): 247. http://dx.doi.org/10.14421/ajbs.2010.09203.

Full text

Abstract:

This paper describes a new discipline in applied linguistics studies, computational linguistics. It’s a new model of applied linguistics which is influenced by computer technology. Computational linguistics is a discipline straddling applied linguistics and computer science that is concerned with the computer processing of natural languages on all levels of linguistic description. Traditionally, computational linguistics was usually performed by computer scientists who had specialized in the application of computers to the processing of a natural language. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. The several areas of computational linguistics study encompasses such practical applications as speech recognition systems, speech synthesis, automated voice response systems, web search engines, text editors, grammar checking, text to speech, corpus linguistics, machine translation, text data mining, and others. This paper presents the definition of computational linguistics, relation between language and computer, and area of computational linguistics studies.

APA, Harvard, Vancouver, ISO, and other styles

12

Soic, Renato, Marin Vukovic, and Gordan Jezic. "Spoken notifications in smart environments using Croatian language." Computer Science and Information Systems, no. 00 (2020): 36. http://dx.doi.org/10.2298/csis200424036s.

Full text

Abstract:

Speech technologies have advanced significantly in the last decade, mostly due to rise in available computing power combined with novel approaches to natural language processing. As a result, speech-enabled systems have become popular commercial products, successfully integrated with various environments. However, this can be stated for English and a few other ?big? languages. From the perspective of a minority language, such as Croatian, there are many challenges ahead to achieve comparable results. In this paper, we propose a model for natural language generation and speech synthesis in a smart environment using Croatian language. The model is evaluated on 27 users to estimate the quality of user experience. The evaluation goal was to determine what users perceive to be more important - generated speech quality or grammatical correctness of the spoken content. It is shown that most users perceived grammatically correct spoken texts as being of the highest quality.

APA, Harvard, Vancouver, ISO, and other styles

13

Jokisch, Oliver, and Markus Huber. "Advances in the development of a cognitive user interface." MATEC Web of Conferences 161 (2018): 01003. http://dx.doi.org/10.1051/matecconf/201816101003.

Full text

Abstract:

In this contribution, we want to summarize recent development steps of the embedded cognitive user interface UCUI, which enables a user-adaptive scenario in human-machine or even human-robot interactions by considering sophisticated cognitive and semantic modelling. The interface prototype is developed by different German institutes and companies with their steering teams at Fraunhofer IKTS and Brandenburg University of Technology. The interface prototype is able to communicate with users via speech and gesture recognition, speech synthesis and a touch display. The device includes an autarkic semantic processing and beyond a cognitive behavior control, which supports an intuitive interaction to control different kinds of electronic devices, e. g. in a smart home environment or in interactive respectively collaborative robotics. Contrary to available speech assistance systems such as Amazon Echo or Google Home, the introduced cognitive user interface UCUI ensures the user privacy by processing all necessary information without any network access of the interface device.

APA, Harvard, Vancouver, ISO, and other styles

14

Pradeep, R., M. Kiran Reddy, and K. Sreenivasa Rao. "LSTM-Based Robust Voicing Decision Applied to DNN-Based Speech Synthesis." Automatic Control and Computer Sciences 53, no. 4 (2019): 328–32. http://dx.doi.org/10.3103/s0146411619040096.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Jacob, Agnes, and P. Mythili. "Developing a Child Friendly Text-to-Speech System." Advances in Human-Computer Interaction 2008 (2008): 1–6. http://dx.doi.org/10.1155/2008/597971.

Full text

Abstract:

This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices. Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS system was implemented in MATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).

APA, Harvard, Vancouver, ISO, and other styles

16

Spanias, Andreas S. "A hybrid transform method for analysis/synthesis of speech." Signal Processing 24, no. 2 (1991): 217–29. http://dx.doi.org/10.1016/0165-1684(91)90132-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Wen, Zhengqi, Jianhua Tao, Shifeng Pan, and Yang Wang. "Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis." Journal of Signal Processing Systems 74, no. 3 (2013): 423–35. http://dx.doi.org/10.1007/s11265-013-0862-z.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Savran, Arman, Levent M. Arslan, and Lale Akarun. "Speaker-independent 3D face synthesis driven by speech and text." Signal Processing 86, no. 10 (2006): 2932–51. http://dx.doi.org/10.1016/j.sigpro.2005.12.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Geçki̇ni̇, Nezih C., Tülay Güngen, Hilmi Güngen, and Mehmet Eti̇şkol. "Speech synthesis using AM/FM sinusoids and band-pass noise." Signal Processing 8, no. 3 (1985): 339–61. http://dx.doi.org/10.1016/0165-1684(85)90111-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Zhang, Hao, Richard Sproat, Axel H. Ng, et al. "Neural Models of Text Normalization for Speech Applications." Computational Linguistics 45, no. 2 (2019): 293–337. http://dx.doi.org/10.1162/coli_a_00349.

Full text

Abstract:

Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars. We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process. These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers. Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.

APA, Harvard, Vancouver, ISO, and other styles

21

El-Imam, Yousif A. "Synthesis of the intonation of neutrally spoken Modern Standard Arabic speech." Signal Processing 88, no. 9 (2008): 2206–21. http://dx.doi.org/10.1016/j.sigpro.2008.03.013.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

LÉVY-VÉHEL, JACQUES. "FRACTAL APPROACHES IN SIGNAL PROCESSING." Fractals 03, no. 04 (1995): 755–75. http://dx.doi.org/10.1142/s0218348x95000679.

Full text

Abstract:

Some recent advances in the application of fractal tools for studying complex signals are presented. The first part of the paper is devoted to a brief description of the theoretical methods used. These essentially consist of generalizations of previous techniques that allow us to efficiently handle real signals. We present some results dealing with the multifractal analysis of sequences of Choquet capacities, and the possibility of constructing such capacities with prescribed spectrum. Related results concerning the pointwise irregularity of a continuous function at each point are given in the frame of iterated functions systems. Finally, some results on a particular stochastic process are sketched: the multifractional Brownian motion, which is a generalization of the classical fractional Brownian motion, where the parameter H is replaced by a function. The second part consists of the description of selected applications of current interest, in the fields of image analysis, speech synthesis and road traffic modeling. In each case we try to show how a fractal approach provides new means to solve specific problems in signal processing, sometimes with greater success than classical methods.

APA, Harvard, Vancouver, ISO, and other styles

23

BALASA, FLORIN, FRANK H. M. FRANSSEN, FRANCKY V. M. CATTHOOR, and HUGO J. DE MAN. "TRANSFORMATION OF NESTED LOOPS WITH MODULO INDEXING TO AFFINE RECURRENCES." Parallel Processing Letters 04, no. 03 (1994): 271–80. http://dx.doi.org/10.1142/s0129626494000260.

Full text

Abstract:

For multi-dimensional (M-D) signal and data processing systems, transformation of algorithmic specifications is a major instrument both in code optimization and code generation for parallelizing compilers and in control flow optimization as a preprocessor for architecture synthesis. State-of-the-art transformation techniques are limited to affine index expressions. This is however not sufficient for many important applications in image, speech and numerical processing. In this paper, a novel transformation method is introduced, oriented to the subclass of algorithm specifications that contains modulo expressions of affine functions to index M-D signals. The method employs extensively the concept of Hermite normal form. The transformation method can be carried out in polynomial time, applying only integer arithmetic.

APA, Harvard, Vancouver, ISO, and other styles

24

Železný, Miloš, Zdeněk Krňoul, Petr Císař, and Jindřich Matoušek. "Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis." Signal Processing 86, no. 12 (2006): 3657–73. http://dx.doi.org/10.1016/j.sigpro.2006.02.039.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Chen, Fei, and Yuan-Ting Zhang. "A novel temporal fine structure-based speech synthesis model for cochlear implant." Signal Processing 88, no. 11 (2008): 2693–99. http://dx.doi.org/10.1016/j.sigpro.2008.05.011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Yolchuyeva, Sevinj, Géza Németh, and Bálint Gyires-Tóth. "Grapheme-to-Phoneme Conversion with Convolutional Neural Networks." Applied Sciences 9, no. 6 (2019): 1143. http://dx.doi.org/10.3390/app9061143.

Full text

Abstract:

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. In this paper, we investigate convolutional neural networks (CNN) for G2P conversion. We propose a novel CNN-based sequence-to-sequence (seq2seq) architecture for G2P conversion. Our approach includes an end-to-end CNN G2P conversion with residual connections and, furthermore, a model that utilizes a convolutional neural network (with and without residual connections) as encoder and Bi-LSTM as a decoder. We compare our approach with state-of-the-art methods, including Encoder-Decoder LSTM and Encoder-Decoder Bi-LSTM. Training and inference times, phoneme and word error rates were evaluated on the public CMUDict dataset for US English, and the best performing convolutional neural network-based architecture was also evaluated on the NetTalk dataset. Our method approaches the accuracy of previous state-of-the-art results in terms of phoneme error rate.

APA, Harvard, Vancouver, ISO, and other styles

27

Kaeslin, Hubert. "Systematic extraction and concatenation of diphone elements for the synthesis of standard German speech." Signal Processing 9, no. 1 (1985): 67. http://dx.doi.org/10.1016/0165-1684(85)90066-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

AbuZeina, Dia, and Taqieddin Mostafa Abdalbaset. "Exploring the Performance of Tagging for the Classical and the Modern Standard Arabic." Advances in Fuzzy Systems 2019 (January 23, 2019): 1–10. http://dx.doi.org/10.1155/2019/6254649.

Full text

Abstract:

The part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications. In fact, the PoS taggers contribute as a preprocessing step in various NLP tasks, such as syntactic parsing, information extraction, machine translation, and speech synthesis. In this paper, we examine the performance of a modern standard Arabic (MSA) based tagger for the classical (i.e., traditional or historical) Arabic. In this work, we employed the Stanford Arabic model tagger to evaluate the imperative verbs in the Holy Quran. In fact, the Stanford tagger contains 29 tags; however, this work experimentally evaluates just one that is the VB ≡ imperative verb. The testing set contains 741 imperative verbs, which appear in 1,848 positions in the Holy Quran. Despite the previously reported accuracy of the Arabic model of the Stanford tagger, which is 96.26% for all tags and 80.14% for unknown words, the experimental results show that this accuracy is only 7.28% for the imperative verbs. This result promotes the need for further research to expose why the tagging is severely inaccurate for classical Arabic. The performance decline might be an indication of the necessity to distinguish between training data for both classical and MSA Arabic for NLP tasks.

APA, Harvard, Vancouver, ISO, and other styles

29

Musiek, Frank E., Jennifer Shinn, Gail D. Chermak, and Doris-Eva Bamiou. "Perspectives on the Pure-Tone Audiogram." Journal of the American Academy of Audiology 28, no. 07 (2017): 655–71. http://dx.doi.org/10.3766/jaaa.16061.

Full text

Abstract:

AbstractThe pure-tone audiogram, though fundamental to audiology, presents limitations, especially in the case of central auditory involvement. Advances in auditory neuroscience underscore the considerably larger role of the central auditory nervous system (CANS) in hearing and related disorders. Given the availability of behavioral audiological tests and electrophysiological procedures that can provide better insights as to the function of the various components of the auditory system, this perspective piece reviews the limitations of the pure-tone audiogram and notes some of the advantages of other tests and procedures used in tandem with the pure-tone threshold measurement.To review and synthesize the literature regarding the utility and limitations of the pure-tone audiogram in determining dysfunction of peripheral sensory and neural systems, as well as the CANS, and to identify other tests and procedures that can supplement pure-tone thresholds and provide enhanced diagnostic insight, especially regarding problems of the central auditory system.A systematic review and synthesis of the literature.The authors independently searched and reviewed literature (journal articles, book chapters) pertaining to the limitations of the pure-tone audiogram.The pure-tone audiogram provides information as to hearing sensitivity across a selected frequency range. Normal or near-normal pure-tone thresholds sometimes are observed despite cochlear damage. There are a surprising number of patients with acoustic neuromas who have essentially normal pure-tone thresholds. In cases of central deafness, depressed pure-tone thresholds may not accurately reflect the status of the peripheral auditory system. Listening difficulties are seen in the presence of normal pure-tone thresholds. Suprathreshold procedures and a variety of other tests can provide information regarding other and often more central functions of the auditory system.The audiogram is a primary tool for determining type, degree, and configuration of hearing loss; however, it provides the clinician with information regarding only hearing sensitivity, and no information about central auditory processing or the auditory processing of real-world signals (i.e., speech, music). The pure-tone audiogram offers limited insight into functional hearing and should be viewed only as a test of hearing sensitivity. Given the limitations of the pure-tone audiogram, a brief overview is provided of available behavioral tests and electrophysiological procedures that are sensitive to the function and integrity of the central auditory system, which provide better diagnostic and rehabilitative information to the clinician and patient.

APA, Harvard, Vancouver, ISO, and other styles

30

Venkatesh, Satvik, David Moffat, and Eduardo Reck Miranda. "Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast." Electronics 10, no. 7 (2021): 827. http://dx.doi.org/10.3390/electronics10070827.

Full text

Abstract:

Music and speech detection provides us valuable information regarding the nature of content in broadcast audio. It helps detect acoustic regions that contain speech, voice over music, only music, or silence. In recent years, there have been developments in machine learning algorithms to accomplish this task. However, broadcast audio is generally well-mixed and copyrighted, which makes it challenging to share across research groups. In this study, we address the challenges encountered in automatically synthesising data that resembles a radio broadcast. Firstly, we compare state-of-the-art neural network architectures such as CNN, GRU, LSTM, TCN, and CRNN. Later, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Thirdly, we examine how the quantity of synthetic training data impacts the results. Finally, we evaluate the effectiveness of synthesised, real-world, and combined approaches for training models, to understand if the synthetic data presents any additional value. Amongst the network architectures, CRNN was the best performing network. Results also show that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. After testing our model on in-house and public datasets, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative.

APA, Harvard, Vancouver, ISO, and other styles

31

Tonks, James, W. Huw Williams, Ian Frampton, Philip Yates, and Alan Slater. "The Neurological Bases of Emotional Dys-Regulation Arising From Brain Injury in Childhood: A ‘When and Where’ Heuristic." Brain Impairment 8, no. 2 (2007): 143–53. http://dx.doi.org/10.1375/brim.8.2.143.

Full text

Abstract:

AbstractLasting emotional and social communication deficits are common among children who have suffered brain injury. Concerns have been raised that current assessment and treatment methods are inadequate in addressing the needs of such children in rehabilitation. We advocate that a proportion of reported deficits occur as a result of compromise to emotion processing systems in the brain. In this article we review adult brain injury research, which indicates that dissociable subsystems are involved in distinguishing the nuances of emotional expression. Findings previously reported in the literature have been integrated into a dissociable heuristic framework, which offers a novel representation of subcomponents of the emotion processing system. In considering the development of the subcomponents of emotion processing, evidence indicates that intrinsic arousal systems are operational from birth, systems associated with sensory/spatial skills that are essential in reading emotional expression develop rapidly from birth, and systems utilised in executive system synthesis become increasingly sophisticated with development, stemming across childhood and into adulthood. In conclusion, it is proposed that the heuristic is a useful tool on which assessment measures may be based when considering the primary effects of brain injury in children.

APA, Harvard, Vancouver, ISO, and other styles

32

Маkarych, M. V., Yu B. Popova та M. O. Shved. "Electronic Lexicography: Traditional and Modern Approaches М. V. Маkarych1), Yu. B. Popova1), M. O. Shved2)". Science & Technique 19, № 5 (2020): 421–27. http://dx.doi.org/10.21122/2227-1031-2020-19-5-421-427.

Full text

Abstract:

Nowadays there are a lot of modern technologies in electronic lexicography: speech synthesis technology, cross-referencing between dictionary modules, spell-checking functions, etc. The increasing availability of online information has necessitated intensive research in the area of automatic text summarization within the Natural Language Processing community. Belarusian scientists are also interested in this sphere and new lexicographical approaches for creating a linguistic database are shown in the paper. The authors present English-Belarusian-Russian electronic dictionary TechLex. This is the project of the 2nd English Department and the Department of Software for Information Systems and Technologies of the Belarusian National Technical University. The linguistic database of the dictionary is compiled not by the traditional method of processing a large number of paper dictionaries and combining the received translations, but by sequential processing of scientific and technical English-language periodicals. While the designing the dictionary the authors have taken into account the analysis of modern electronic multilingual translation dictionaries and created a client-server application in the Java programming language. The client part of the system contains a mobile application for the Android operating system, which has been tested on tablets and smartphones with different screen diagonals. The interface of the TechLex dictionary is designed taking into account the possibility of adding new subject areas and filling them with appropriate lexical material. The main advantage of our dictionary is that it is the first technical multilingual electronic dictionary having a Belarusian version.

APA, Harvard, Vancouver, ISO, and other styles

33

Lochlainn, Mícheál Mac. "Sintéiseoir 1.0: a multidialectical TTS application for Irish." ReCALL 22, no. 2 (2010): 152–71. http://dx.doi.org/10.1017/s0958344010000054.

Full text

Abstract:

AbstractThis paper details the development of a multidialectical text-to-speech (TTS) application, Sintéiseoir, for the Irish language. This work is being carried out in the context of Irish as a lesser-used language, where learners and other L2 speakers have limited direct exposure to L1 speakers and speech communities, and where native sound systems and vocabularies can be seen to be receding even among L1 speakers – particularly the young.Sintéiseoir essentially implements the diphone concatenation model, albeit augmented to include phones, half-phones and, potentially, other phonic units. It is based on a platform-independent framework comprising a user interface, a set of dialect-specific tokenisation engines, a concatenation engine and a playback device.The tokenisation strategy is entirely rule-based and does not refer to dictionary look-ups. Provision has been made for prosodic processing in the framework but has not yet been implemented. Concatenation units are stored in the form of WAV files on the local file system.Sintéiseoir’s user interface (UI) provides a text field that allows the user to submit a grapheme string for synthesis and a prompt to select a dialect. It also filters input to reject graphotactically invalid strings, restrict input to alphabetic and certain punctuation marks found in Irish orthography, and ensure that a dialect has, indeed, been selected.The UI forwards the filtered grapheme string to the appropriate tokenisation engine. This searches for specified substrings and maps them to corresponding tokens that themselves correspond to concatenation units.The resultant token string is then forwarded to the concatenation engine, which retrieves the relevant concatenation units, extracts their audio data and combines them in a new unit. This is then forwarded to the playback device.The terms of reference for the initial development of Sintéiseoir specified that it should be capable of uttering, individually, the 99 most common Irish lemmata in the dialects of An Spidéal, Músgraí Uí Fhloínn and Gort a’ Choirce, which are internally consistent dialects within the Connacht, Munster and Ulster regions, respectively, of the dialect continuum. Audio assets to satisfy this requirement have already been prepared, and have been found to produce reasonably accurate output. The tokenisation engine is, however, capable of processing a wider range of input strings and when required concatenation units are found to be unavailable, returns a report via the user interface.

APA, Harvard, Vancouver, ISO, and other styles

34

Childers, Donald G., and Jose A. Diaz. "Speech Processing and Synthesis Toolboxes." Journal of the Acoustical Society of America 108, no. 5 (2000): 1975. http://dx.doi.org/10.1121/1.1318896.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Dasarathy, Belur V. "Robust speech processing." Information Fusion 5, no. 2 (2004): 75. http://dx.doi.org/10.1016/j.inffus.2004.02.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Stansfield, E. V. "Electronic Speech Synthesis." IEE Proceedings F Communications, Radar and Signal Processing 132, no. 2 (1985): 127. http://dx.doi.org/10.1049/ip-f-1.1985.0029.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Kitawaki, N., and H. Nagabuchi. "Quality assessment of speech coding and speech synthesis systems." IEEE Communications Magazine 26, no. 10 (1988): 36–44. http://dx.doi.org/10.1109/35.7665.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Thompson, Laura A., and William C. Ogden. "Visible speech improves human language understanding: Implications for speech processing systems." Artificial Intelligence Review 9, no. 4-5 (1995): 347–58. http://dx.doi.org/10.1007/bf00849044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Taylor, H. Rosemary. "Book Review: Speech Synthesis and Recognition Systems, Speech Synthesis and Recognition." International Journal of Electrical Engineering & Education 26, no. 4 (1989): 366. http://dx.doi.org/10.1177/002072098902600409.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Al-Moslmi, Tareq, Mohammed Albared, Adel Al-Shabi, Nazlia Omar, and Salwani Abdullah. "Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis." Journal of Information Science 44, no. 3 (2017): 345–62. http://dx.doi.org/10.1177/0165551516683908.

Full text

Abstract:

Sentiment analysis is held to be one of the highly dynamic recent research fields in Natural Language Processing, facilitated by the quickly growing volume of Web opinion data. Most of the approaches in this field are focused on English due to the lack of sentiment resources in other languages such as the Arabic language and its large variety of dialects. In most sentiment analysis applications, good sentiment resources play a critical role. Based on that, in this article, several publicly available sentiment analysis resources for Arabic are introduced. This article introduces the Arabic senti-lexicon, a list of 3880 positive and negative synsets annotated with their part of speech, polarity scores, dialects synsets and inflected forms. This article also presents a Multi-domain Arabic Sentiment Corpus (MASC) with a size of 8860 positive and negative reviews from different domains. In this article, an in-depth study has been conducted on five types of feature sets for exploiting effective features and investigating their effect on performance of Arabic sentiment analysis. The aim is to assess the quality of the developed language resources and to integrate different feature sets and classification algorithms to synthesise a more accurate sentiment analysis method. The Arabic senti-lexicon is used for generating feature vectors. Five well-known machine learning algorithms: naïve Bayes, k-nearest neighbours, support vector machines (SVMs), logistic linear regression and neural network are employed as base-classifiers for each of the feature sets. A wide range of comparative experiments on standard Arabic data sets were conducted, discussion is presented and conclusions are drawn. The experimental results show that the Arabic senti-lexicon is a very useful resource for Arabic sentiment analysis. Moreover, results show that classifiers which are trained on feature vectors derived from the corpus using the Arabic sentiment lexicon are more accurate than classifiers trained using the raw corpus.

APA, Harvard, Vancouver, ISO, and other styles

41

Hara, Yoshiyuki, and Tsuneo Nitta. "Text-to-speech synthesis with controllable processing time and speech quality." Journal of the Acoustical Society of America 102, no. 6 (1997): 3251. http://dx.doi.org/10.1121/1.419571.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Inbanila, K., and E. Krishnakumar. "Investigation of Speech Synthesis, Speech Processing Techniques and Challenges for Enhancements." Journal of Computational and Theoretical Nanoscience 16, no. 4 (2019): 1581–92. http://dx.doi.org/10.1166/jctn.2019.8079.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Scott, Sophie K., and Carolyn McGettigan. "The neural processing of masked speech." Hearing Research 303 (September 2013): 58–66. http://dx.doi.org/10.1016/j.heares.2013.05.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Adiga, Nagaraj, and S. R. M. Prasanna. "Speech synthesis for glottal activity region processing." International Journal of Speech Technology 22, no. 1 (2018): 79–91. http://dx.doi.org/10.1007/s10772-018-09583-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Moon, Todd K., Jacob H. Gunther, Cortnie Broadus, Wendy Hou, and Nils Nelson. "Turbo Processing for Speech Recognition." IEEE Transactions on Cybernetics 44, no. 1 (2014): 83–91. http://dx.doi.org/10.1109/tcyb.2013.2247593.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Kai, Atsuhiko, and Seiichi Nakagawa. "Comparison of continuous speech recognition systems with unknown-word processing for speech disfluencies." Systems and Computers in Japan 29, no. 9 (1998): 43–53. http://dx.doi.org/10.1002/(sici)1520-684x(199808)29:9<43::aid-scj5>3.0.co;2-j.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Lee, Chong R., and Yong K. Park. "Speech segment coding and pitch control methods for speech synthesis systems." Journal of the Acoustical Society of America 102, no. 6 (1997): 3251. http://dx.doi.org/10.1121/1.420238.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Kuligowska, Karolina, Paweł Kisielewicz, and Aleksandra Włodarz. "Speech synthesis systems: disadvantages and limitations." International Journal of Engineering & Technology 7, no. 2.28 (2018): 234. http://dx.doi.org/10.14419/ijet.v7i2.28.12933.

Full text

Abstract:

The present speech synthesis systems can be successfully used for a wide range of diverse purposes. However, there are serious and important limitations in using various synthesizers. Many of these problems can be identified and resolved. The aim of this paper is to present the current state of development of speech synthesis systems and to examine their drawbacks and limitations. The paper dis-cusses the current classification, construction and functioning of speech synthesis systems, which gives an insight into synthesizers implemented so far. The analysis of disadvantages and limitations of speech synthesis systems focuses on identification of weak points of these systems, namely: the impact of emotions and prosody, spontaneous speech in terms of naturalness and intelligibility, preprocessing and text analysis, problem of ambiguity, natural sounding, adaptation to the situation, variety of systems, sparsely spoken languages, speech synthesis for older people, and some other minor limitations. Solving these problems stimulates further development of speech synthesis domain.

APA, Harvard, Vancouver, ISO, and other styles

49

Jamieson, Donald G., Vijay Parsa, Moneca C. Price, and James Till. "Interaction of Speech Coders and Atypical Speech II." Journal of Speech, Language, and Hearing Research 45, no. 4 (2002): 689–99. http://dx.doi.org/10.1044/1092-4388(2002/055).

Full text

Abstract:

We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.

APA, Harvard, Vancouver, ISO, and other styles

50

FUNAKOSHI, KOTARO, TAKENOBU TOKUNAGA, and HOZUMI TANAKA. "Processing Japanese Self-correction in Speech Dialog Systems." Journal of Natural Language Processing 10, no. 4 (2003): 33–53. http://dx.doi.org/10.5715/jnlp.10.4_33.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Speech processing systems; Speech synthesis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles