To see the other types of publications on this topic, follow the link: Speech pause detection.

Journal articles on the topic 'Speech pause detection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 36 journal articles for your research on the topic 'Speech pause detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

ESPOSITO, ANNA, VOJTĚCH STEJSKAL, and ZDENĚK SMÉKAL. "COGNITIVE ROLE OF SPEECH PAUSES AND ALGORITHMIC CONSIDERATIONS FOR THEIR PROCESSING." International Journal of Pattern Recognition and Artificial Intelligence 22, no. 05 (August 2008): 1073–88. http://dx.doi.org/10.1142/s0218001408006508.

Full text
Abstract:
This study investigates pausing strategies, focusing the attention on empty speech pauses. A cross-modal analysis (video and audio) of spontaneous narratives produced by male and female children and adults showed that a remarkable amount of empty speech pauses was used to signal new concepts in the speech flow and to segment discourse units such as clauses and paragraphs. Based on these results, an adaptive mathematical model for pause distribution was suggested, that exploits, as pause features, the absence of signal and/or the changes of energy over different acoustic dimensions strongly related to the auditory perception. These considerations inspired the formulation and the implementation of two pause detection procedures that proved to be more effective than the Likelihood Ratio Test (LRT) and Long-Term Spectral Divergence (LTSD) algorithms recently proposed in literature and applied for Voice Activity Detection (VAD).
APA, Harvard, Vancouver, ISO, and other styles
2

Toth, Laszlo, Ildiko Hoffmann, Gabor Gosztolya, Veronika Vincze, Greta Szatloczki, Zoltan Banreti, Magdolna Pakaski, and Janos Kalman. "A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech." Current Alzheimer Research 15, no. 2 (January 3, 2018): 130–38. http://dx.doi.org/10.2174/1567205014666171121114930.

Full text
Abstract:
Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community.
APA, Harvard, Vancouver, ISO, and other styles
3

Mattys, Sven L., and Jamie H. Clark. "Lexical activity in speech processing: evidence from pause detection." Journal of Memory and Language 47, no. 3 (October 2002): 343–59. http://dx.doi.org/10.1016/s0749-596x(02)00037-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Hamzah, Raseeda, Nursuriati Jamil, and Rosniza Roslan. "Development of Acoustical Feature Based Classifier Using Decision Fusion Technique for Malay Language Disfluencies Classification." Indonesian Journal of Electrical Engineering and Computer Science 8, no. 1 (October 1, 2017): 262. http://dx.doi.org/10.11591/ijeecs.v8.i1.pp262-267.

Full text
Abstract:
<p>Speech disfluency such as filled pause (FP) is a hindrance in Automated Speech Recognition as it degrades the accuracy performance. Previous work of FP detection and classification have fused a number of acoustical features as fusion classification is known to improve classification results. This paper presents new decision fusion of two well-established acoustical features that are zero crossing rates (ZCR) and speech envelope (ENV) with eight popular acoustical features for classification of Malay language filled pause (FP) and elongation (ELO). Five hundred ELO and 500 FP are selected from a spontaneous speeches of a parliamentary session and Naïve Bayes classifier is used for the decision fusion classification. The proposed feature fusion produced better classification performance compared to single feature classification with the highest F-measure of 82% for both classes.</p>
APA, Harvard, Vancouver, ISO, and other styles
5

Marzinzik, M., and B. Kollmeier. "Speech pause detection for noise spectrum estimation by tracking power envelope dynamics." IEEE Transactions on Speech and Audio Processing 10, no. 2 (2002): 109–18. http://dx.doi.org/10.1109/89.985548.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Beritelli, Francesco, Salvatore Casale, and Salvatore Serrano. "A low-complexity speech-pause detection algorithm for communication in noisy environments." European Transactions on Telecommunications 15, no. 1 (January 2004): 33–38. http://dx.doi.org/10.1002/ett.943.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Holzgrefe-Lang, Julia, Caroline Wellmann, Barbara Höhle, and Isabell Wartenburger. "Infants’ Processing of Prosodic Cues: Electrophysiological Evidence for Boundary Perception beyond Pause Detection." Language and Speech 61, no. 1 (September 22, 2017): 153–69. http://dx.doi.org/10.1177/0023830917730590.

Full text
Abstract:
Infants as young as six months are sensitive to prosodic phrase boundaries marked by three acoustic cues: pitch change, final lengthening, and pause. Behavioral studies suggest that a language-specific weighting of these cues develops during the first year of life; recent work on German revealed that eight-month-olds, unlike six-month-olds, are capable of perceiving a prosodic boundary on the basis of pitch change and final lengthening only. The present study uses Event-Related Potentials (ERPs) to investigate the neuro-cognitive development of prosodic cue perception in German-learning infants. In adults’ ERPs, prosodic boundary perception is clearly reflected by the so-called Closure Positive Shift (CPS). To date, there is mixed evidence on whether an infant CPS exists that signals early prosodic cue perception, or whether the CPS emerges only later—the latter implying that infantile brain responses to prosodic boundaries reflect acoustic, low-level pause detection. We presented six- and eight-month-olds with stimuli containing either no boundary cues, only a pitch cue, or a combination of both pitch change and final lengthening. For both age groups, responses to the former two conditions did not differ, while brain responses to prosodic boundaries cued by pitch change and final lengthening showed a positivity that we interpret as a CPS-like infant ERP component. This hints at an early sensitivity to prosodic boundaries that cannot exclusively be based on pause detection. Instead, infants’ brain responses indicate an early ability to exploit subtle, relational prosodic cues in speech perception—presumably even earlier than could be concluded from previous behavioral results.
APA, Harvard, Vancouver, ISO, and other styles
8

Raso, Tommaso, Bárbara Teixeira, and Plínio Barbosa. "Modelling automatic detection of prosodic boundaries for brazilian portuguese spontaneous speech." Journal of Speech Sciences 9 (September 9, 2020): 105–28. http://dx.doi.org/10.20396/joss.v9i00.14957.

Full text
Abstract:
Speech is segmented into intonational units marked by prosodic boundaries. This segmentation is claimed to have important consequences on syntax, information structure and cognition. This work aims both to investigate the phonetic-acoustic parameters that guide the production and perception of prosodic boundaries, and to develop models for automatic detection of prosodic boundaries in male monological spontaneous speech of Brazilian Portuguese. Two samples were segmented into intonational units by two groups of trained annotators. The boundaries perceived by the annotators were tagged as either terminal or non-terminal. A script was used to extract 111 phonetic-acoustic parameters along speech signal in a right and left windows around the boundary of each phonological word. The extracted parameters comprise measures of (1) Speech rate and rhythm; (2) Standardized segment duration; (3) Fundamental frequency; (4) Intensity; (5) Silent pause. The script considers as prosodic boundary positions at which at least 50% of the annotators indicated a boundary of the same type. A training of models composed by the parameters extracted by the script was developed; these models, were then improved heuristically. The models were developed from the two samples and from the whole data, both using non-balanced and balanced data. Linear Discriminant Analysis algorithm was adopted to produce the models. The models for terminal boundaries show a much higher performance than those for non-terminal ones. In this paper we: (i) show the methodological procedures; (ii) analyze the different models; (iii) discuss some strategies that could lead to an improvement of our results.
APA, Harvard, Vancouver, ISO, and other styles
9

Mertens, Piet. "Polytonia." Journal of Speech Sciences 4, no. 2 (February 5, 2021): 17–57. http://dx.doi.org/10.20396/joss.v4i2.15053.

Full text
Abstract:
This paper first proposes a labeling scheme for tonal aspects of speech and then describes an automatic annotation system using this transcription. This fine-grained transcription provides labels indicating pitch level and pitch movement of individual syllables. Of the five pitch levels, three (low, mid, high) are defined on the basis of pitch changes in the local context and two (bottom, top) are defined relative to the boundaries of the speaker’s global pitch range. For pitch movements, both simple and compound, the transcription indicates direction (rise, fall, level) and size, using size categories (pitch intervals) adjusted relative to the speaker’s pitch range. The automatic tonal annotation system combines several processing steps: segmentation into syllable peaks, pause detection, pitch stylization, pitch range estimation, classification of the intra-syllabic pitch contour, and pitch level assignment. It uses a dedicated and rule-based procedure, which unlike commonly used supervised learning techniques does not require a labeled corpus for training the model. The paper also includes a preliminary evaluation of the annotation system, for a reference corpus of nearly 14 minutes of spontaneous speech in French and Dutch, in order to quantify the annotation errors. The results, expressed in terms of standard measures of precision, recall, accuracy and Fmeasure are encouraging. For pitch levels low, mid and high an F-measure between 0.946 and 0.815 is obtained and for pitch movements a value between 0.708 and 1. Provided additional modules for the detection of prominence and prosodic boundaries, the resulting annotation may serve as an input for a phonological annotation.
APA, Harvard, Vancouver, ISO, and other styles
10

Biron, Tirza, Daniel Baum, Dominik Freche, Nadav Matalon, Netanel Ehrmann, Eyal Weinreb, David Biron, and Elisha Moses. "Automatic detection of prosodic boundaries in spontaneous speech." PLOS ONE 16, no. 5 (May 3, 2021): e0250969. http://dx.doi.org/10.1371/journal.pone.0250969.

Full text
Abstract:
Automatic speech recognition (ASR) and natural language processing (NLP) are expected to benefit from an effective, simple, and reliable method to automatically parse conversational speech. The ability to parse conversational speech depends crucially on the ability to identify boundaries between prosodic phrases. This is done naturally by the human ear, yet has proved surprisingly difficult to achieve reliably and simply in an automatic manner. Efforts to date have focused on detecting phrase boundaries using a variety of linguistic and acoustic cues. We propose a method which does not require model training and utilizes two prosodic cues that are based on ASR output. Boundaries are identified using discontinuities in speech rate (pre-boundary lengthening and phrase-initial acceleration) and silent pauses. The resulting phrases preserve syntactic validity, exhibit pitch reset, and compare well with manual tagging of prosodic boundaries. Collectively, our findings support the notion of prosodic phrases that represent coherent patterns across textual and acoustic parameters.
APA, Harvard, Vancouver, ISO, and other styles
11

O’Malley, Ronan, Lee-Anne Morris, Chloe Longden, Alex Turner, Traci Walker, Annalena Venneri, Bahman Mirheidari, Heidi Christensen, Markus Reuber, and Daniel Blackburn. "26 Can an automated assessment of language help distinguish between Functional Cognitive Disorder and early neurodegeneration?" Journal of Neurology, Neurosurgery & Psychiatry 91, no. 8 (July 20, 2020): e18.2-e19. http://dx.doi.org/10.1136/jnnp-2020-bnpa.43.

Full text
Abstract:
Objectives/AimsWe used our automated cognitive assessment tool to explore whether responses to questions probing recent and remote memory could aid in distinguishing between patients with early neurodegenerative disorders and those with Functional Cognitive Disorders (FCD).Hypotheses: pwFCD would have no significant differences in pause to speech ratio and measures of linguistic complexity compared to healthy controls. pwFCD would have significant differences in pause to speech ratio and measures of linguistic complexity compared to pwMCI and pwAD.MethodsWe recruited 15 participants with FCD, MCI and AD each as well as 15 healthy controls. Participants answered 12 questions posed by the ‘Digital Doctor’. Automatic processing of the audio-recorded answers involved automatic speech recognition including detecting length of pauses. Two questions probe recent memory, exploring knowledge of current affairs. Two probe remote memory, asking for autobiographical details.We analysed the data using: Pause to speech time ratio. Moving average type token ratio (MATTR): An automated measure of vocabulary richness. Computerised propositional idea density rater (CPIDR): An automated measure of propositional idea density.ResultsThere was a significant difference in the pause to speech ratio for recent memory questions for HC versus AD (P=0.0012) and MCI (p<0.0001) but also compared to those with FCD (p=0.0128). There was a significant difference in the pause to speech ratio for remote memory questions for HC vs AD (p=0.0008) and MCI (p=0.0049) but not FCD (p=0.0613). There was no significant difference between FCD v AD or FCD v MCI. The MATTR and CPIDR were similar across all groups but highest in HC and FMD.ConclusionsThis study rejects both hypotheses. However, the data supports the application of linguistic measures to recent and remote memory questions in distinguishing those with MCI & AD from HC’s. Further work will investigate the utility of incorporating additional measures of lexical and grammatical complexity (word frequency, sentence structure). Longitudinal study will provide insights into which features may predict stability in FCD and HC’s and progression from MCI to AD, supporting the system’s promise as a monitoring tool.
APA, Harvard, Vancouver, ISO, and other styles
12

O’Shaughnessy, Douglas. "Detecting filled pauses in spontaneous speech." Journal of the Acoustical Society of America 106, no. 4 (October 1999): 2181–82. http://dx.doi.org/10.1121/1.427284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

SANTAMARÍA, JESÚS, and LOURDES ARAUJO. "Pattern-based unsupervised parsing method." Natural Language Engineering 22, no. 3 (June 4, 2014): 397–422. http://dx.doi.org/10.1017/s1351324914000072.

Full text
Abstract:
AbstractWe have developed a heuristic method for unsupervised parsing of unrestricted text. Our method relies on detecting certain patterns of part-of-speech tag sequences of words in sentences. This detection is based on statistical data obtained from the corpus and allows us to classify part-of-speech tags into classes that play specific roles in the parse trees. These classes are then used to construct the parse tree of new sentences via a set of deterministic rules. Aiming to asses the viability of the method on different languages, we have tested it on English, Spanish, Italian, Hebrew, German, and Chinese. We have obtained a significant improvement over other unsupervised approaches for some languages, including English, and provided, as far as we know, the first results of this kind for others.
APA, Harvard, Vancouver, ISO, and other styles
14

Verkhodanova, Vasilisa, and Vladimir Shapranov. "Experiments on Detection of Voiced Hesitations in Russian Spontaneous Speech." Journal of Electrical and Computer Engineering 2016 (2016): 1–8. http://dx.doi.org/10.1155/2016/2013658.

Full text
Abstract:
The development and popularity of voice-user interfaces made spontaneous speech processing an important research field. One of the main focus areas in this field is automatic speech recognition (ASR) that enables the recognition and translation of spoken language into text by computers. However, ASR systems often work less efficiently for spontaneous than for read speech, since the former differs from any other type of speech in many ways. And the presence of speech disfluencies is its prominent characteristic. These phenomena are an important feature in human-human communication and at the same time they are a challenging obstacle for the speech processing tasks. In this paper we address an issue of voiced hesitations (filled pauses and sound lengthenings) detection in Russian spontaneous speech by utilizing different machine learning techniques, from grid search and gradient descent in rule-based approaches to such data-driven ones as ELM and SVM based on the automatically extracted acoustic features. Experimental results on the mixed and quality diverse corpus of spontaneous Russian speech indicate the efficiency of the techniques for the task in question, with SVM outperforming other methods.
APA, Harvard, Vancouver, ISO, and other styles
15

Honnibal, Matthew, and Mark Johnson. "Joint Incremental Disfluency Detection and Dependency Parsing." Transactions of the Association for Computational Linguistics 2 (December 2014): 131–42. http://dx.doi.org/10.1162/tacl_a_00171.

Full text
Abstract:
We present an incremental dependency parsing model that jointly performs disfluency detection. The model handles speech repairs using a novel non-monotonic transition system, and includes several novel classes of features. For comparison, we evaluated two pipeline systems, using state-of-the-art disfluency detectors. The joint model performed better on both tasks, with a parse accuracy of 90.5% and 84.0% accuracy at disfluency detection. The model runs in expected linear time, and processes over 550 tokens a second.
APA, Harvard, Vancouver, ISO, and other styles
16

Bondarenko, Ivan Yuriiovych, and Olha Mykolaivna Ladoshko. "Neural network algorithm for detection tonal, noise and pauses parts of continuous speech." Electronics and Communications 17, no. 6 (February 28, 2013): 19–25. http://dx.doi.org/10.20535/2312-1807.2012.17.6.11392.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Igras, Magdalena, and Bartosz Ziółko. "Detection of Sentence Boundaries in Polish Based on Acoustic Cues." Archives of Acoustics 41, no. 2 (June 1, 2016): 233–43. http://dx.doi.org/10.1515/aoa-2016-0023.

Full text
Abstract:
Abstract In this article the authors investigated and presented the experiments on the sentence boundaries annotation from Polish speech using acoustic cues as a source of information. The main result of the investigation is an algorithm for detection of the syntactic boundaries appearing in the places of punctuation marks. In the first stage, the algorithm detects pauses and divides a speech signal into segments. In the second stage, it verifies the configuration of acoustic features and puts hypotheses of the positions of punctuation marks. Classification is performed with parameters describing phone duration and energy, speaking rate, fundamental frequency contours and frequency bands. The best results were achieved for Naive Bayes classifier. The efficiency of the algorithm is 52% precision and 98% recall. Another significant outcome of the research is statistical models of acoustic cues correlated with punctuation in spoken Polish.
APA, Harvard, Vancouver, ISO, and other styles
18

De Looze, Celine, Finnian Kelly, Lisa Crosby, Aisling Vourdanou, Robert F. Coen, Cathal Walsh, Brian A. Lawlor, and Richard B. Reilly. "Changes in Speech Chunking in Reading Aloud is a Marker of Mild Cognitive Impairment and Mild-to-Moderate Alzheimer’s Disease." Current Alzheimer Research 15, no. 9 (July 11, 2018): 828–47. http://dx.doi.org/10.2174/1567205015666180404165017.

Full text
Abstract:
Background: Speech and Language Impairments, generally attributed to lexico-semantic deficits, have been documented in Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD). This study investigates the temporal organisation of speech (reflective of speech production planning) in reading aloud in relation to cognitive impairment, particularly working memory and attention deficits in MCI and AD. The discriminative ability of temporal features extracted from a newly designed read speech task is also evaluated for the detection of MCI and AD. Method: Sixteen patients with MCI, eighteen patients with mild-to-moderate AD and thirty-six healthy controls (HC) underwent a battery of neuropsychological tests and read a set of sentences varying in cognitive load, probed by manipulating sentence length and syntactic complexity. Results: Our results show that Mild-to-Moderate AD is associated with a general slowness of speech, attributed to a higher number of speech chunks, silent pauses and dysfluences, and slower speech and articulation rates. Speech chunking in the context of high cognitive-linguistic demand appears to be an informative marker of MCI, specifically related to early deficits in working memory and attention. In addition, Linear Discriminant Analysis shows the ROC AUCs (Areas Under the Receiver Operating Characteristic Curves) of identifying MCI vs. HC, MCI vs. AD and AD vs. HC using these speech characteristics are 0.75, 0.90 and 0.94 respectively. Conclusion: The implementation of connected speech-based technologies in clinical and community settings may provide additional information for the early detection of MCI and AD.
APA, Harvard, Vancouver, ISO, and other styles
19

Espinoza-Cuadros, Fernando, Rubén Fernández-Pozo, Doroteo T. Toledano, José D. Alcázar-Ramírez, Eduardo López-Gonzalo, and Luis A. Hernández-Gómez. "Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment." Computational and Mathematical Methods in Medicine 2015 (2015): 1–13. http://dx.doi.org/10.1155/2015/489761.

Full text
Abstract:
Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients’ facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition), over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets). Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs). Support vector regression (SVR) is applied on facial features and i-vectors to estimate the AHI.
APA, Harvard, Vancouver, ISO, and other styles
20

Kinsner, Witold, and Warren Grieder. "Amplification of Signal Features Using Variance Fractal Dimension Trajectory." International Journal of Cognitive Informatics and Natural Intelligence 4, no. 4 (October 2010): 1–17. http://dx.doi.org/10.4018/jcini.2010100101.

Full text
Abstract:
This paper describes how the selection of parameters for the variance fractal dimension (VFD) multiscale time-domain algorithm can create an amplification of the fractal dimension trajectory that is obtained for a natural-speech waveform in the presence of ambient noise. The technique is based on the variance fractal dimension trajectory (VFDT) algorithm that is used not only to detect the external boundaries of an utterance, but also its internal pauses representing the unvoiced speech. The VFDT algorithm can also amplify internal features of phonemes. This fractal feature amplification is accomplished when the time increments are selected in a dyadic manner rather than selecting the increments in a unit distance sequence. These amplified trajectories for different phonemes are more distinct, thus providing a better characterization of the individual segments in the speech signal. This approach is superior to other energy-based boundary-detection techniques. Observations are based on extensive experimental results on speech utterances digitized at 44.1 kilosamples per second, with 16 bits in each sample.
APA, Harvard, Vancouver, ISO, and other styles
21

Cameron, Sharon, Nicky Chong-White, Kiri Mealings, Tim Beechey, Harvey Dillon, and Taegan Young. "The Parsing Syllable Envelopes Test for Assessment of Amplitude Modulation Discrimination Skills in Children: Development, Normative Data, and Test–Retest Reliability Studies." Journal of the American Academy of Audiology 29, no. 02 (February 2018): 151–63. http://dx.doi.org/10.3766/jaaa.16146.

Full text
Abstract:
AbstractIntensity peaks and valleys in the acoustic signal are salient cues to syllable structure, which is accepted to be a crucial early step in phonological processing. As such, the ability to detect low-rate (envelope) modulations in signal amplitude is essential to parse an incoming speech signal into smaller phonological units.The Parsing Syllable Envelopes (ParSE) test was developed to quantify the ability of children to recognize syllable boundaries using an amplitude modulation detection paradigm. The envelope of a 750-msec steady-state /a/ vowel is modulated into two or three pseudo-syllables using notches with modulation depths varying between 0% and 100% along an 11-step continuum. In an adaptive three-alternative forced-choice procedure, the participant identified whether one, two, or three pseudo-syllables were heard.Development of the ParSE stimuli and test protocols, and collection of normative and test–retest reliability data.Eleven adults (aged 23 yr 10 mo to 50 yr 9 mo, mean 32 yr 10 mo) and 134 typically developing, primary-school children (aged 6 yr 0 mo to 12 yr 4 mo, mean 9 yr 3 mo). There were 73 males and 72 females.Data were collected using a touchscreen computer. Psychometric functions (PFs) were automatically fit to individual data by the ParSE software. Performance was related to the modulation depth at which syllables can be detected with 88% accuracy (referred to as the upper boundary of the uncertainty region [UBUR]). A shallower PF slope reflected a greater level of uncertainty. Age effects were determined based on raw scores. z Scores were calculated to account for the effect of age on performance. Outliers, and individual data for which the confidence interval of the UBUR exceeded a maximum allowable value, were removed. Nonparametric tests were used as the data were skewed toward negative performance.Across participants, the performance criterion (UBUR) was met with a median modulation depth of 42%. The effect of age on the UBUR was significant (p < 0.00001). The UBUR ranged from 50% modulation depth for 6-yr-olds to 25% for adults. Children aged 6–10 had significantly higher uncertainty region boundaries than adults. A skewed distribution toward negative performance occurred (p = 0.00007). There was no significant difference in performance on the ParSE between males and females (p = 0.60). Test–retest z scores were strongly correlated (r = 0.68, p < 0.0000001).The ParSE normative data show that the ability to identify syllable boundaries based on changes in amplitude modulation improves with age, and that some children in the general population have performance much worse than their age peers. The test is suitable for use in planned studies in a clinical population.
APA, Harvard, Vancouver, ISO, and other styles
22

Nagumo, Ryosuke, Yaming Zhang, Yuki Ogawa, Mitsuharu Hosokawa, Kengo Abe, Takaaki Ukeda, Sadayuki Sumi, et al. "Automatic Detection of Cognitive Impairments through Acoustic Analysis of Speech." Current Alzheimer Research 17, no. 1 (March 20, 2020): 60–68. http://dx.doi.org/10.2174/1567205017666200213094513.

Full text
Abstract:
Background: Early detection of mild cognitive impairment is crucial in the prevention of Alzheimer’s disease. The aim of the present study was to identify whether acoustic features can help differentiate older, independent community-dwelling individuals with cognitive impairment from healthy controls. Methods: A total of 8779 participants (mean age 74.2 ± 5.7 in the range of 65-96, 3907 males and 4872 females) with different cognitive profiles, namely healthy controls, mild cognitive impairment, global cognitive impairment (defined as a Mini Mental State Examination score of 20-23), and mild cognitive impairment with global cognitive impairment (a combined status of mild cognitive impairment and global cognitive impairment), were evaluated in short-sentence reading tasks, and their acoustic features, including temporal features (such as duration of utterance, number and length of pauses) and spectral features (F0, F1, and F2), were used to build a machine learning model to predict their cognitive impairments. Results: The classification metrics from the healthy controls were evaluated through the area under the receiver operating characteristic curve and were found to be 0.61, 0.67, and 0.77 for mild cognitive impairment, global cognitive impairment, and mild cognitive impairment with global cognitive impairment, respectively. Conclusion: Our machine learning model revealed that individuals’ acoustic features can be employed to discriminate between healthy controls and those with mild cognitive impairment with global cognitive impairment, which is a more severe form of cognitive impairment compared with mild cognitive impairment or global cognitive impairment alone. It is suggested that language impairment increases in severity with cognitive impairment.
APA, Harvard, Vancouver, ISO, and other styles
23

Mattys, S. L., C. W. Pleydell-Pearce, J. F. Melhorn, and S. E. Whitecross. "Detecting Silent Pauses in Speech: A New Tool for Measuring On-Line Lexical and Semantic Processing." Psychological Science 16, no. 12 (December 1, 2005): 958–64. http://dx.doi.org/10.1111/j.1467-9280.2005.01644.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

ARCIULI, JOANNE, DAVID MALLARD, and GINA VILLAR. "“Um, I can tell you're lying”: Linguistic markers of deception versus truth-telling in speech." Applied Psycholinguistics 31, no. 3 (June 4, 2010): 397–411. http://dx.doi.org/10.1017/s0142716410000044.

Full text
Abstract:
ABSTRACTLying is a deliberate attempt to transmit messages that mislead others. Analysis of language behaviors holds great promise as an objective method of detecting deception. The current study reports on the frequency of use and acoustic nature of “um” and “like” during laboratory-elicited lying versus truth-telling. Results obtained using a within-participants false opinion paradigm showed that instances of “um” occur less frequently and are of shorter duration during lying compared to truth-telling. There were no significant differences in relation to “like.” These findings contribute to our understanding of the linguistic markers of deception behavior. They also assist in our understanding of the role of “um” in communication more generally. Our results suggest that “um” may not be accurately conceptualized as a filled pause/hesitation or speech disfluency/error whose increased usage coincides with increased cognitive load or increased arousal during lying. It may instead carry a lexical status similar to interjections and form an important part of authentic, effortless communication, which is somewhat lacking during lying.
APA, Harvard, Vancouver, ISO, and other styles
25

Bogach, Natalia, Elena Boitsova, Sergey Chernonog, Anton Lamtev, Maria Lesnichaya, Iurii Lezhenin, Andrey Novopashenny, et al. "Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching." Electronics 10, no. 3 (January 20, 2021): 235. http://dx.doi.org/10.3390/electronics10030235.

Full text
Abstract:
This article contributes to the discourse on how contemporary computer and information technology may help in improving foreign language learning not only by supporting better and more flexible workflow and digitizing study materials but also through creating completely new use cases made possible by technological improvements in signal processing algorithms. We discuss an approach and propose a holistic solution to teaching the phonological phenomena which are crucial for correct pronunciation, such as the phonemes; the energy and duration of syllables and pauses, which construct the phrasal rhythm; and the tone movement within an utterance, i.e., the phrasal intonation. The working prototype of StudyIntonation Computer-Assisted Pronunciation Training (CAPT) system is a tool for mobile devices, which offers a set of tasks based on a “listen and repeat” approach and gives the audio-visual feedback in real time. The present work summarizes the efforts taken to enrich the current version of this CAPT tool with two new functions: the phonetic transcription and rhythmic patterns of model and learner speech. Both are designed on a base of a third-party automatic speech recognition (ASR) library Kaldi, which was incorporated inside StudyIntonation signal processing software core. We also examine the scope of automatic speech recognition applicability within the CAPT system workflow and evaluate the Levenstein distance between the transcription made by human experts and that obtained automatically in our code. We developed an algorithm of rhythm reconstruction using acoustic and language ASR models. It is also shown that even having sufficiently correct production of phonemes, the learners do not produce a correct phrasal rhythm and intonation, and therefore, the joint training of sounds, rhythm and intonation within a single learning environment is beneficial. To mitigate the recording imperfections voice activity detection (VAD) is applied to all the speech records processed. The try-outs showed that StudyIntonation can create transcriptions and process rhythmic patterns, but some specific problems with connected speech transcription were detected. The learners feedback in the sense of pronunciation assessment was also updated and a conventional mechanism based on dynamic time warping (DTW) was combined with cross-recurrence quantification analysis (CRQA) approach, which resulted in a better discriminating ability. The CRQA metrics combined with those of DTW were shown to add to the accuracy of learner performance estimation. The major implications for computer-assisted English pronunciation teaching are discussed.
APA, Harvard, Vancouver, ISO, and other styles
26

Zasjad’ko, K. I., A. V. Bogomolov, S. K. Soldatov, A. P. Vonarshenko, A. F. Borejchuk, and M. N. Jazljuk. "Changes in indicators of intonation structure of speech in occupational activity of air traffic control operators." Occupational Health and Industrial Ecology, no. 1 (March 14, 2019): 31–37. http://dx.doi.org/10.31089/1026-9428-2019-1-31-37.

Full text
Abstract:
Introduction.The study is aimed to determine possible use of vocal signal analysis for diagnosis of functional states in air traffic control operators, with justifying selection of informative parameters of intonation structure of speech.Materials and methods.Experiments on semi-natural simulator complex with participation of 16 air traffic male dispatchers modelled occupational activity of air traffic dispatcher with moderate (6 aircrafts controlled) and intense (7–12 aircrafts controlled) work load. Duration of simulated working shiftwas 6 hours. Registration covered characteristics of main vocal tone of the examinees, with calculation of 8 jitter-factors that portrayed mirco-changes of main vocal tone curve and 2 tremor indices disclosing periodic waves of 4–16 Hz in main vocal tone curve. Functional state of the dispatchers was assessed via cardiac rhythm parameters. Reliability and work capacity of the dispatchers corresponded to correct radio traffi c, changes in threshold of reception and transfer of aircrafts in number of allowable dangerous approach of aircrafts, time to detection of input aircraft’s deviation from preset flight line.Results and discussion.According to analysis of the experiments results, some parameters of the main vocal tone carried significant changes both in first (simulated moderate work load) and second (simulated intense work load) experimental series.The data obtained prove lower level of psychic regulation of the dispatchers’ occupational activity during 3rd to 5thhours of “working shift” in the first experimental series and from 2nd to 4thhours of the second experimental series, due to decreased psychophysiologic resources and developing fatigue.Conclusion.Studies of changes in indicators of intonation structure of speech in occupational activity of air traffic dispatchers demonstrated that using such indicators provides adequate diagnosis of the functional state. The most informative indicators are average value, histogram asymmetry and excessive frequency of main vocal tone, duration of pauses between words of the dispatchers’ commands and fi ft h jitter-factor.
APA, Harvard, Vancouver, ISO, and other styles
27

Popovic, Mirjana, Vladimir Kostic, Eleonora Dzoljic, and Marko Ercegovac. "A rapid method of detecting motor blocks in patients with Parkinson's disease during volitional hand movements." Srpski arhiv za celokupno lekarstvo 130, no. 11-12 (2002): 376–81. http://dx.doi.org/10.2298/sarh0212376p.

Full text
Abstract:
INTRODUCTION An algorithm to study hand movements in patients with Parkinson's disease (PD) who experience temporary, involuntary inability to move a hand have been developed. In literature, this rather enigmatic phenomenon has been described in gait, speech, handwriting and tapping, and noted as motor blocks (MB) or freezing episodes. Freezing refers to transient periods in which the voluntary motor activity being attempted by an individual is paused. It is a sudden, unplanned state of immobility that appears to arise from deficits in initiating or simultaneously and sequentially executing movements, in correcting inappropriate movements or in planning movements. The clinical evaluation of motor blocks is difficult because of a variability both within and between individuals and relationship of blocks to time of drug ingestion. In literature the terms freezing, motor block or motor freezing are used in parallel. AIM In clinical settings classical manifestations of Parkinson's Disease (akinesia bradykinesia, rigidity, tremor, axial motor performance and postural instability) are typically evaluated. Recently, in literature, new computerized methods are suggested for their objective assessment. We propose monitoring of motor blocks during hand movements to be integrated. For this purpose we have developed a simple method that comprises PC computer, digitizing board and custom made software. Movement analysis is "off line", and the result is the data that describe the number, duration and onset of motor blocks. METHOD Hand trajectories are assessed during simple volitional self paced point-to-point planar hand movement by cordless magnetic mouse on a digitizing board (Drawing board III, 305 x 457 mm, GTCO Cal Comp Inc), Fig. 1. Testing included 8 Parkinsonian patients and 8 normal healthy controls, age matched, with unknown neurologic motor or sensory disorders, Table 1. Three kinematic indicators of motor blocks: 1) duration (MBTJ; 2) onset (t%); and 3) number (N) of MB episodes, allow identification and quantification of motor blocks. Duration of motor blocks (MVT) is defined as the time sequence when (x,y) coordinates do not change their values and is expressed in percentage from the whole movement duration MBT% = MBT/T (%). If during some movements more than one motor block occurs (N > 1) then this movement is decomposed. The whole movement motor block (mbt) is the sum of all motor blocks MVT,; during the same movement and expressed in percentage from the whole movement duration mbt%= tYJa (%). The onset of motor block (t) is determined with the beginning of motor block and expressed in percentage from the whole move- ment duration: t% = t/T (%). After the determination of kinematic indicators of motor blocks (MVT, N, t) for healthy controls, their mean values are calculated. Statistical package ANOVA is applied to determine statistical significance of the difference between PD patients and mean values from age matched control healthy group. PD patients are then classified into two groups: one group consisting of PD patients with motor blocks and the other without motor blocks, similar to healthy controls. RESULTS Acquired movements are processed and analyzed. Fig. 2 is an example of hand trajectories. Time course of (x, y) coordinates indicates motor block appearance, Fig.3. Detailed presentation of kinematic indicators of motor block (MVT, N, t) is in Fig. 4. Intra-subject variability of these parameters is presented in Figs 5, 6 and 7 for patient #3. The results for N show that 45% of all patients #3 movements had none motor blocks (N = 0); 20% had N = 1; 15% had N = 2; 11.5% had N = 3; 5.7% had N = 4; 0.3 % had N = 5; 0.7% had N = 6; 0.3 % had N = 7 and 1% had N = 8 motor blocks. The results for t% show that 3% of all patients' #3 blocks started at first quarter, 17% started in the second, 36% in the third, and 44% in the last quarter of movement. The results for MBT% show that 14.5% of all movements had MBT% in the range 0-5%; 56% had MBT% 5-10%; 22% had MBT% 10-15%; 5.5% had MBT% 15-20?% and 2% had MBT% 20-25%. No block lasted more than 25% from the whole movement duration. Table 2 is the summary of mean variability for kinematic indicators of motor block (N, mbt%, t%) and for the movement duration T during a 7 day-testing of patients #3. The analysis of calculated data for eight tested PD patients revealed a significant difference (p < 0.01) between healthy controls and three PD patients; data on five PD patients were not significantly different (ns). This method clustered 3 PD patients in the group that experience motor blocks, while the rest were in the group without their significant occurrence. DISCUSSION This algorithm is an additional instrument in classical evaluation of PD patients during their clinical evaluation and treatment. It provides to clinician a rapid feedback on the changes of voluntary hand movements in everyday progress of illness. Furthermore, this method could be of assistance for developing strategies to overcome motor blocks in arm movements at their beginning, as well as for the feedback of the success of drug therapy.
APA, Harvard, Vancouver, ISO, and other styles
28

"Significance of Intelligent Pause Detection Protocol (IPDP) Over Other Protocols used for Speech Processing." International Journal of Innovative Technology and Exploring Engineering 8, no. 12 (October 10, 2019): 4332–36. http://dx.doi.org/10.35940/ijitee.l2734.1081219.

Full text
Abstract:
Analysis of human voice based on its pitch can be used in detecting the pauses. The available algorithms for pause detection could succeed to some extent and lot of scope for better performance still exists. The proposed intelligent pause detection protocol (IPDP) is the convergence of (i) calculation of Mean/RMS peak value from human voice pitch (ii) estimating the pause using MLE algorithm and (iii) optimizing the bandwidth utilization of Vocoders using DTX algorithm. The work carried projects better pause removal than the existing standard methods.
APA, Harvard, Vancouver, ISO, and other styles
29

Yuan, Jiahong, Xingyu Cai, Yuchen Bian, Zheng Ye, and Kenneth Church. "Pauses for Detection of Alzheimer’s Disease." Frontiers in Computer Science 2 (January 29, 2021). http://dx.doi.org/10.3389/fcomp.2020.624488.

Full text
Abstract:
Pauses, disfluencies and language problems in Alzheimer’s disease can be naturally modeled by fine-tuning Transformer-based pre-trained language models such as BERT and ERNIE. Using this method with pause-encoded transcripts, we achieved 89.6% accuracy on the test set of the ADReSS (Alzheimer’sDementiaRecognition throughSpontaneousSpeech) Challenge. The best accuracy was obtained with ERNIE, plus an encoding of pauses. Robustness is a challenge for large models and small training sets. Ensemble over many runs of BERT/ERNIE fine-tuning reduced variance and improved accuracy. We found thatumwas used much less frequently in Alzheimer’s speech, compared touh. We discussed this interesting finding from linguistic and cognitive perspectives.
APA, Harvard, Vancouver, ISO, and other styles
30

Sharma, Shilpa, Punam Rattan, Anurag Sharma, and Mohammad Shabaz. "Voice activity detection using optimal window overlapping especially over health-care infrastructure." World Journal of Engineering ahead-of-print, ahead-of-print (August 9, 2021). http://dx.doi.org/10.1108/wje-02-2021-0112.

Full text
Abstract:
Purpose This paper aims to introduce recently an unregulated unsupervised algorithm focused on voice activity detection by data clustering maximum margin, i.e. support vector machine. The algorithm for clustering K-mean used to solve speech behaviour detection issues was later applied, the application, therefore, did not permit the identification of voice detection. This is critical in demands for speech recognition. Design/methodology/approach Here, the authors find a voice activity detection detector based on a report provided by a K-mean algorithm that permits sliding window detection of voice and noise. However, first, it needs an initial detection pause. The machine initialized by the algorithm will work on health-care infrastructure and provides a platform for health-care professionals to detect the clear voice of patients. Findings Timely usage discussion on many histories of NOISEX-92 var reveals the average non-speech and the average signal-to-noise ratios hit concentrations which are higher than modern voice activity detection. Originality/value Research work is original.
APA, Harvard, Vancouver, ISO, and other styles
31

Brodersen, Michael, Achim Volmer, and Gerhard Schmidt. "Signal enhancement for communication systems used by fire fighters." EURASIP Journal on Audio, Speech, and Music Processing 2019, no. 1 (December 2019). http://dx.doi.org/10.1186/s13636-019-0165-9.

Full text
Abstract:
AbstractSo-called full-face masks are essential for fire fighters to ensure respiratory protection in smoke diving incidents. While such masks are absolutely necessary for protection purposes on one hand, they impair the voice communication of fire fighters drastically on the other hand. For this reason communication systems should be used to amplify the speech and, therefore, to improve the communication quality. This paper gives an overview of communication enhancement techniques for masks based on digital signal processing. The presented communication system picks up the speech signal by a microphone in the mask, enhance it, and play back the amplified signal by loudspeakers located on the outside of such masks. Since breathing noise is also picked up by the microphone, it’s advantageous to recognize and suppress it – especially since breathing noise is very loud (usually much louder than the recorded voice). A voice activity detection distinguishes between side talkers, pause, breathing out, breathing in, and speech. It ensures that only speech components are played back. Due to the fact that the microphone is located close to the loudspeakers, the output signals are coupling back into the microphone and feedback may occur even at moderate gains. This can be reduced by feedback reduction (consisting of cancellation and suppression approaches). To enhance the functionality of the canceler a decorrelation stage can be applied to the enhanced signal before loudspeaker playback. As a consequence of all processing stages, the communication can be improved significantly, as the results of measurements of real-time mask systems show.
APA, Harvard, Vancouver, ISO, and other styles
32

Balagopalan, Aparna, Benjamin Eyre, Jessica Robin, Frank Rudzicz, and Jekaterina Novikova. "Comparing Pre-trained and Feature-Based Models for Prediction of Alzheimer's Disease Based on Speech." Frontiers in Aging Neuroscience 13 (April 27, 2021). http://dx.doi.org/10.3389/fnagi.2021.635945.

Full text
Abstract:
Introduction: Research related to the automatic detection of Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional diagnostic methods. Since AD significantly affects the content and acoustics of spontaneous speech, natural language processing, and machine learning provide promising techniques for reliably detecting AD. There has been a recent proliferation of classification models for AD, but these vary in the datasets used, model types and training and testing paradigms. In this study, we compare and contrast the performance of two common approaches for automatic AD detection from speech on the same, well-matched dataset, to determine the advantages of using domain knowledge vs. pre-trained transfer models.Methods: Audio recordings and corresponding manually-transcribed speech transcripts of a picture description task administered to 156 demographically matched older adults, 78 with Alzheimer's Disease (AD) and 78 cognitively intact (healthy) were classified using machine learning and natural language processing as “AD” or “non-AD.” The audio was acoustically-enhanced, and post-processed to improve quality of the speech recording as well control for variation caused by recording conditions. Two approaches were used for classification of these speech samples: (1) using domain knowledge: extracting an extensive set of clinically relevant linguistic and acoustic features derived from speech and transcripts based on prior literature, and (2) using transfer-learning and leveraging large pre-trained machine learning models: using transcript-representations that are automatically derived from state-of-the-art pre-trained language models, by fine-tuning Bidirectional Encoder Representations from Transformer (BERT)-based sequence classification models.Results: We compared the utility of speech transcript representations obtained from recent natural language processing models (i.e., BERT) to more clinically-interpretable language feature-based methods. Both the feature-based approaches and fine-tuned BERT models significantly outperformed the baseline linguistic model using a small set of linguistic features, demonstrating the importance of extensive linguistic information for detecting cognitive impairments relating to AD. We observed that fine-tuned BERT models numerically outperformed feature-based approaches on the AD detection task, but the difference was not statistically significant. Our main contribution is the observation that when tested on the same, demographically balanced dataset and tested on independent, unseen data, both domain knowledge and pretrained linguistic models have good predictive performance for detecting AD based on speech. It is notable that linguistic information alone is capable of achieving comparable, and even numerically better, performance than models including both acoustic and linguistic features here. We also try to shed light on the inner workings of the more black-box natural language processing model by performing an interpretability analysis, and find that attention weights reveal interesting patterns such as higher attribution to more important information content units in the picture description task, as well as pauses and filler words.Conclusion: This approach supports the value of well-performing machine learning and linguistically-focussed processing techniques to detect AD from speech and highlights the need to compare model performance on carefully balanced datasets, using consistent same training parameters and independent test datasets in order to determine the best performing predictive model.
APA, Harvard, Vancouver, ISO, and other styles
33

"Automatic Speech Recognition with Stuttering Speech Removal using Long Short-Term Memory (LSTM)." International Journal of Recent Technology and Engineering 8, no. 5 (January 30, 2020): 1677–81. http://dx.doi.org/10.35940/ijrte.e6230.018520.

Full text
Abstract:
Stuttering or Stammering is a speech defect within which sounds, syllables, or words are rehashed or delayed, disrupting the traditional flow of speech. Stuttering can make it hard to speak with other individuals, which regularly have an effect on an individual's quality of life. Automatic Speech Recognition (ASR) system is a technology that converts audio speech signal into corresponding text. Presently ASR systems play a major role in controlling or providing inputs to the various applications. Such an ASR system and Machine Translation Application suffers a lot due to stuttering (speech dysfluency). Dysfluencies will affect the phrase consciousness accuracy of an ASR, with the aid of increasing word addition, substitution and dismissal rates. In this work we focused on detecting and removing the prolongation, silent pauses and repetition to generate proper text sequence for the given stuttered speech signal. The stuttered speech recognition consists of two stages namely classification using LSTM and testing in ASR. The major phases of classification system are Re-sampling, Segmentation, Pre-Emphasis, Epoch Extraction and Classification. The current work is carried out in UCLASS Stuttering dataset using MATLAB with 4% to 6% increase in accuracy when compare with ANN and SVM.
APA, Harvard, Vancouver, ISO, and other styles
34

Bartošek, J. "A Pitch Detection Algorithm for Continuous Speech Signals Using Viterbi Traceback with Temporal Forgetting." Acta Polytechnica 51, no. 5 (January 5, 2011). http://dx.doi.org/10.14311/1422.

Full text
Abstract:
This paper presents a pitch-detection algorithm (PDA) for application to signals containing continuous speech. The core of the method is based on merged normalized forward-backward correlation (MNFBC) working in the time domain with the ability to make basic voicing decisions. In addition, the Viterbi traceback procedure is used for post-processing the MNFBC output considering the three best fundamental frequency (F0) candidates in each step. This should make the final pitch contour smoother, and should also prevent octave errors. In transition probabilities computation between F0 candidates, two major improvements were made over existing post-processing methods. Firstly, we compare pitch distance in musical cent units. Secondly, temporal forgetting is applied in order to avoid penalizing pitch jumps after prosodic pauses of one speaker or changes in pitch connected with turn-taking in dialogs. Results computed on a pitchreference database definitely show the benefit of the first improvement, but they have not yet proved any benefits of temporal modification. We assume this only happened due to the nature of the reference corpus, which had a small amount of suprasegmental content.
APA, Harvard, Vancouver, ISO, and other styles
35

Gurumdimma, N. Y., D. B. Bisandu, and E. Ojedayo. "Event extraction from textual data." Journal of Computer Science and Its Application 26, no. 1 (February 9, 2020). http://dx.doi.org/10.4314/jcsia.v26i1.4.

Full text
Abstract:
Many text mining techniques have been pro-posed for mining useful patterns in text documents. However, how to effectively extract and use attributes from unstructured data is still an open research issue. Event attribute extraction is a challenging research area with broad application in the field of data mining and other related field because of the importance of decision making from the hidden knowledge/patterns discovered from the textual data, for example, in crime detection: where events are extracted from an eyewitness report to concisely identify what happened during a crime. In this work, we present our approach to extracting these events based on the dependency parse tree relations of the text and its part of speech (POS). The proposed method uses a machine learning algorithm to predict events from a text. The preliminary result of the experiment run with WEKA tool shows that more than 90% of events can be predicted based on POS and the dependency relations (DepR) of a sentence.Keywords: Events; Part of Speech; Classification; Data; PredictionVol. 26 No 1, June 2019
APA, Harvard, Vancouver, ISO, and other styles
36

Smith, Naomi. "Between, Behind, and Out of Sight." M/C Journal 24, no. 2 (April 26, 2021). http://dx.doi.org/10.5204/mcj.2764.

Full text
Abstract:
Introduction I am on the phone with a journalist discussing my research into anti-vaccination. As the conversation winds up, they ask a question I have come to expect: "how big do you think this is?" My answer is usually some version of the following: that we have no way of knowing. I and my fellow researchers can only see the information that is public or in the sunlight. How anti-vaccination information spreads through private networks is dark to us. It is private and necessarily so. This means that we cannot track how these conversations spread in the private or parochial spaces of Facebook, nor can we consider how they might extend into other modes of mediated communication. Modern communication is a complex and multiplatform accomplishment. Consider this: I am texting with my friend, I send her a selfie, in the same moment I hear a notification, she has DMed me a relevant Instagram post via that app. I move to Instagram and share another post in response; we continue our text message conversation there. Later in the day, I message her on Facebook Messenger while participating in a mutual WhatsApp group chat. The next day we Skype, and while we talk, we send links back and forth, which in hindsight are as clear as hieroglyphics before the Rosetta stone. I comment on her Twitter post, and we publicly converse back and forth briefly while other people like our posts. None of these instances are discrete conversational events, even though they occur on different platforms. They are iterations on the same themes, and the archival properties of social media and private messaging apps mean that neither of you forgets where you left off. The conversation slides not only between platforms and contexts but in and out of visibility. Digitally mediated conversation hums in the background of daily life (boring meetings, long commutes and bad dates) and expands our understanding of the temporal and sequential limits of conversation. In this article, I will explore digitally-mediated cross-platform conversation as a problem in two parts, and how we can understand it as part of the 'dark social'. Specifically, I want to draw attention to how 'dark' online spaces are part of our everyday communicative practices and are not necessarily synonymous with the illicit, illegal, or deviant. I argue that the private conversations we have online are also part of the dark social web, insofar as they are hidden from the public eye. When I think of dark social spaces, I think of what lies beneath the surface of murky waters, what hides behind in backstage areas, and the moments between platforms. In contrast, 'light' (or public) social spaces are often perceived as siloed. The boundaries between these platforms are artificially clean and do not appear to leak into other spaces. This article explores the dark and shadowed spaces of online conversation and considers how we might approach them as researchers. Conversations occur in the backchannels of social media platforms, in private messaging functions that are necessarily invisible to the researcher's gaze. These spaces are distinct from the social media activity analysed by Marwick and boyd. Their research examining teens' privacy strategies on social media highlights how social media posts that multiple audiences may view often hold encoded meanings. Social media posts are a distinct and separate category of activity from meditated conversations that occur one to one, or in smaller group chat settings. Second is the disjunction between social media platforms. Users spread their activity across any number of social media platforms, according to social and personal logics. However, these movements are difficult to capture; it is difficult to see in the dark. Platforms are not hermeneutically sealed off from each other, or the broader web. I argue that understanding how conversation moves between platforms and in the backstage spaces of platforms are two parts of the same dark social puzzle. Conversation Online Digital media have changed how we maintain our social connections across time and space. Social media environments offer new possibilities for communication and engagement as well as new avenues for control. Calls and texts can be ignored, and our phones are often used as shields. Busying ourselves with them can help us avoid unwanted face-to-face conversations. There are a number of critiques regarding the pressure of always-on contact, and a growing body of research that examines how users negotiate these demands. By examining group messaging, Mannell highlights how the boundaries of these chats are porous and flexible and mark a distinct communicative break from previous forms of mobile messaging, which were largely didactic. The advent of group chats has also led to an increasing complication of conversation boundaries. One group chat may have several strands of conversation sporadically re-engaged with over time. Manell's examination of group chats empirically illustrates the complexity of digitally-mediated conversations as they move across private, parochial, and public spaces in a way that is not necessarily temporally linear. Further research highlights the networked nature of digitally mediated interpersonal communication and how conversations sprawl across multiple platforms (Burchell). Couldry (16, 17) describes this complex web as the media manifold. This concept encompasses the networked platforms that comprise it and refers to its embeddedness in daily life. As we no longer “log on” to the internet to send and receive email, the manifold is both everywhere and nowhere; so too are our conversations. Gershon has described the ways we navigate the communicative affordances of these platforms as “media ideologies" which are the "beliefs, attitudes, and strategies about the media they [individuals] use" (391). Media ideologies also contain implicit assumptions about which platforms are best for delivering which kinds of messages. Similarly, Burchell argues that the relational ordering of available media technologies is "highly idiosyncratic" (418). Burchell contends that this idiosyncratic ordering is interdependent and relational, and that norms about what to do when are both assumed by individuals and learnt in their engagement with others (418). The influence of others allows us to adjust our practices, or as Burchell argues, "to attune and regulate one's own conduct … and facilitate engagement despite the diverse media practices of others" (418). In this model, individuals are constantly learning and renegotiating norms of conversation on a case by case, platform by platform basis. However, I argue that it is more illuminating to consider how we have collectively developed an implicit and unconscious set of norms and signals that govern our (collective) conduct, as digitally mediated conversation has become embedded in our daily lives. This is not to say that everyone has the same conversational skill level, but rather that we have developed a common toolbox for understanding the ebb and flow of digitally mediated conversations across platforms. However, these norms are implicit, and we only have a partial understanding of how they are socially achieved in digitally-mediated conversation. What Lies Beneath Most of what we do online is assumed not to be publicly visible. While companies like Facebook trace us across the web and peer into every nook and cranny of our private use patterns, researchers have remained focussed on what lies above in the light, not below, in the dark. This has meant an overwhelming focus on single platform studies that rely on the massification of data as a default measure for analysing sentiment and behaviour online. Sociologically, we know that what occurs in dark social spaces, or backstage, is just as important to social life as what happens in front of an audience (Goffman). Goffman's research uses the metaphor of the theatre to analyse how social life is accomplished as a performance. He highlights that (darkened) backstage spaces are those where we can relax, drop our front, and reveal parts of our (social) self that may be unpalatable to a broader audience. Simply, the public data accessible to researchers on social media are “trace data”, or “trace conversation”, from the places where conversations briefly leave (public) footprints and can be tracked and traced before vanishing again. Alternatively, we can visualise internet researchers as swabbing door handles for trace evidence, attempting to assemble a narrative out of a left-behind thread or a stray fingerprint. These public utterances, often scraped through API access, are only small parts of the richness of online conversation. Conversations weave across multiple platforms, yet single platforms are focussed on, bracketing off their leaky edges in favour of certainty. We know the social rules of platforms, but less about the rules between platforms, and in their darker spaces. Conversations briefly emerge into the light, only to disappear again. Without understanding how conversation is achieved and how it expands and contracts and weaves in and out of the present, we are only ever guessing about the social dynamics of mediated conversation as they shift between light, dark, and shadow spaces. Small things can cast large shadows; something that looms large may be deceptively small. Online they could be sociality distorted by disinformation campaigns or swarms of social bots. Capturing the Unseen: An Ethnomethodological Approach Not all data are measurable, computable, and controllable. There is uncertainty beyond what computational logics can achieve. Nooks and crannies of sociality exist beyond the purview of computable data. This suggests that we can apply pre-digital social research methods to capture these “below the surface” conversations and understand their logics. Sociologists have long understood that conversation is a social accomplishment. In the 1960s, sociologist Harvey Sacks developed conversation analysis as an ethnomethodological technique that seeks to understand how social life is accomplished in day-to-day conversation and micro-interactions. Conversation analysis is a detailed and systematic account of how naturally-occurring talk is socially ordered, and has been applied across a number of social contexts, including news interviews, judicial settings, suicide prevention hotlines, therapy sessions, as well as regular phone conversations (Kitzinger and Frith). Conversation analysis focusses on fine-grained detail, all of the little patterns of speech that make up a conversation; for example, the pauses, interruptions, self-corrections, false starts, and over-speaking. Often these too are hidden features of conversation, understood implicitly, but hovering on the edges of our social knowledge. One of the most interesting uses of conversational analysis is to understand refusal, that is, how we say 'no' as a social action. This body of research turns common-sense social knowledge – that saying no is socially difficult – into a systemic schema of social action. For instance, acceptance is easy to achieve; saying yes typically happens quickly and without hesitation. Acceptances are not qualified; a straightforward 'yes' is sufficient (Kitzinger and Frith). However, refusals are much more socially complex. Refusal is usually accomplished by apologies, compliments, and other palliative strategies that seek to cushion the blow of refusals. They are delayed and indirect conversational routes, indicating their status as a dispreferred social action, necessitating their accompaniment by excuses or explanations (Kitzinger and Frith). Research by Kitzinger and Frith, examining how women refuse sexual advances, illustrates that we all have a stock of common-sense knowledge about how refusals are typically achieved, which persists across various social contexts, including in our intimate relationships. Conversation analysis shows us how conversation is achieved and how we understand each other. To date, conversation analysis techniques have been applied to spoken conversation but not yet extended into text-based mediated conversation. I argue that we could apply insights from conversation analysis to understand the rules that govern digitally mediated conversation, how conversation moves in the spaces between platforms, and the rules that govern its emergence into public visibility. What rules shape the success of mediated communication? How can we understand it as a social achievement? When conversation analysis walks into the dark room it can be like turning on the light. How can we apply conversation analysis, usually concerned with the hidden aspects of plainly visible talk, to conversation in dark social spaces, across platforms and in private back channels? There is evidence that the norms of refusal, as highlighted by conversation analysis, are persistent across platforms, including in people's private digitally-mediated conversations. One of the ways in which we can identify these norms in action is by examining technology resistance. Relational communication via mobile device is pervasive (Hall and Baym). The concentration of digitally-mediated communication into smartphones means that conversational norms are constantly renegotiated, alongside expectations of relationship maintenance in voluntary social relationships like friendship (Hall and Baym). Mannell also explains that technology resistance can include lying by text message when explaining non-availability. These small, habitual, and often automatic lies are categorised as “butler lies” and are a polite way of achieving refusal in digitally mediated conversations that are analogous to how refusal is accomplished in face-to-face conversation. Refusals, rejections, and, by extension, unavailability appear to be accompanied by the palliative actions that help us achieve refusal in face-to-face conversation. Mannell identifies strategies such as “feeling ill” to explain non-availability without hurting others' feelings. Insights from conversation analysis suggest that on balance, it is likely that all parties involved in both the furnishing and acceptance of a butler lie understand that these are polite fabrications, much like the refusals in verbal conversation. Because of their invisibility, it is easy to assume that conversations in the dark social are chaotic and disorganised. However, there are tantalising hints that the reverse is true. Instead of arguing that individuals construct conversational norms on a case by case, platform by platform basis, I suggest that we now have a stock of common-sense social knowledge that we also apply to cross-platform mediated communication. In the spaces where gaps in this knowledge exist, Szabla and Blommaert argue that actors use existing norms of interactions and can navigate a range of interaction events even in online environments where we would expect to see a degree of context collapse and interactional disorganisation. Techniques of Detection How do we see in the dark? Some nascent research suggests a way forward that will help us understand the rhythms of cross-platform mediated conversation. Apps have been used to track participants' messaging and calling activities (Birnholtz, Davison, and Li). This research found a number of patterns that signal a user's attention or inattention, including response times and linguistic clues. Similarly, not-for-profit newsroom The Markup built a Facebook inspector called the citizen browser, a "standalone desktop application that was distributed to a panel of more than 1000 paid participants" (Mattu et al.). The application works by being connected to a participant's Facebook account and periodically capturing data from their Facebook feeds. The data is automatically deidentified but is still linked to the demographic information that participants provide about themselves, such as gender, race, location, and age. Applications like these point to how researchers might reliably collect interaction data from Facebook to glimpse into the hidden networks and interactions that drive conversation. User-focussed data collection methods also help us, as researchers, to sever our reliance on API access. API-reliant research is dependent on the largesse of social media companies for continued access and encourages research on the macro at the micro's expense. After all, social media and other digital platforms are partly constituted by the social acts of their users. Without speech acts that constitute mediated conversation, liking, sharing GIFs, and links, as well as the gaps and silences, digital platforms cease to exist. Digital platforms are not just archives of “big data”, but rather they are collections of speech and records of how our common-sense knowledge about how to communicate has stretched and expanded beyond face-to-face contexts. A Problem of Bots Ethnomethodological approaches have been critiqued as focussing too much on the small details of conversation, on nit-picking small details, and thus, as unable to comment on macro social issues of oppression and inequality (Kitzinger and Frith 311). However, understanding digitally-mediated conversation through the lens of talk-as-human-interaction may help us untangle our most pressing social problems across digital platforms. Extensive research examines platforms such as Twitter for “inauthentic” behaviour, primarily identifying which accounts are bots. Bots accounts are programmed Twitter accounts (for example) that automatically tweet information on political or contentious issues, while mimicking genuine engagement. Bots can reply to direct messages too; they converse with us as they are programmed to act as “humanly” as possible. Despite this, there are patterns of behaviour and engagement that distinguish programmed bot accounts, and a number of platforms are dedicated to their detection. However, bots are becoming increasingly sophisticated and better able to mimic “real” human engagement online. But there is as yet no systematic framework regarding what “real” digitally mediated conversation looks like. An ethnomethodological approach to understanding this would better equip platforms to understand inauthentic activity. As Yang and colleagues succinctly state, "a supervised machine learning tool is only as good as the data used for its training … even the most advanced [bot detection] algorithms will fail with outdated training datasets" (8). On the flipside, organisations are using chat bots to deliver cognitive behavioural therapy and assist people in moments of psychological distress. But the bots do not feel human; they reply instantly to any message sent. Some require responses in the form of emojis. The basis of therapy is talk. Understanding more accurately how naturally-occurring talk functions in online spaces could create more sensitive and genuinely therapeutic tools. Conclusion It is easy to forget that social media have largely mainstreamed over the last decade; in this decade, crucial social norms about how we converse online have developed. These norms allow us to navigate our conversations, with intimate friends and strangers alike across platforms, both in and out of public view, in ways that are often temporally non-sequential. Dark social spaces are a matter of intense marketing interest. Advertising firm Disruptive Advertising identified the very spaces that are the focus of this article as “dark social”: messaging apps, direct messaging, and native mobile apps facilitate user activity that is "not as easily controlled nor tracked". Dark social traffic continues to grow, yet our understanding of why, how, and for whom trails behind. To make sense of our social world, which is increasingly indistinguishable from online activity, we need to examine the spaces between and behind platforms, and how they co-mingle. Where are the spaces where the affordances of multiple platforms and technologies scrape against each other in uncomfortable ways? How do users achieve intelligible conversation not just because of affordances, but despite them? Focussing on micro-sociological encounters and conversations may also help us understand what could build a healthy online ecosystem. How are consensus and agreement achieved online? What are the persistent speech acts (or text acts) that signal when consensus is achieved? To begin where I started, to understand the scope and power of anti-vaccination sentiment, we need to understand how it is shared and discussed in dark social spaces, in messaging applications, and other backchannel spaces. Taking an ethnomethodological approach to these conversational interactions could also help us determine how misinformation is refused, accepted, and negotiated in mediated conversation. Focussing on “dark conversation” will help us more richly understand our social world and add much needed insight into some of our pressing social problems. References Burchell, Kenzie. "Everyday Communication Management and Perceptions of Use: How Media Users Limit and Shape Their Social World." Convergence 23.4 (2017): 409–24. Couldry, Nick. Media, Society, World: Social Theory and Digital Media Practice. Polity, 2012. Goffman, Erving. The Presentation of Self in Everyday Life. Penguin, 1990. Gershon, Ilana. The Breakup 2.0: Disconnecting over New Media. Cornell University Press, 2010. Hall, Jeffrey A., and Nancy K. Baym. "Calling and Texting (Too Much): Mobile Maintenance Expectations, (Over)dependence, Entrapment, and Friendship Satisfaction." New Media & Society 14.2 (2012): 316–31. Hall, Margaret, et al. "Editorial of the Special Issue on Following User Pathways: Key Contributions and Future Directions in Cross-Platform Social Media Research." International Journal of Human–Computer Interaction 34.10 (2018): 895–912. Kitzinger, Celia, and Hannah Frith. "Just Say No? The Use of Conversation Analysis in Developing a Feminist Perspective on Sexual Refusal." Discourse & Society 10.3 (1999): 293–316. Ling, Rich. "Soft Coercion: Reciprocal Expectations of Availability in the Use of Mobile Communication." First Monday, 2016. Mannell, Kate. "A Typology of Mobile Messaging's Disconnective Affordances." Mobile Media & Communication 7.1 (2019): 76–93. ———. "Plural and Porous: Reconceptualising the Boundaries of Mobile Messaging Group Chats." Journal of Computer-Mediated Communication 25.4 (2020): 274–90. Marwick, Alice E., and danah boyd. "Networked Privacy: How Teenagers Negotiate Context in Social Media." New Media & Society 16.7 (2014): 1051–67. Mattu, Surya, Leon Yin, Angie Waller, and Jon Keegan. "How We Built a Facebook Inspector." The Markup 5 Jan. 2021. 9 Mar. 2021 <https://themarkup.org/citizen-browser/2021/01/05/how-we-built-a-facebook-inspector>. Sacks, Harvey. Lectures on Conversation: Volumes I and II. Ed. Gail Jefferson. Blackwell, 1995. Szabla, Malgorzata, and Jan Blommaert. "Does Context Really Collapse in Social Media Interaction?" Applied Linguistics Review 11.2 (2020): 251–79.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography