Journal articles: 'Diacriticism'

1

Bourque, K. "DIACRITICISMS!" GLQ: A Journal of Lesbian and Gay Studies 16, no. 1-2 (2010): 309–11. http://dx.doi.org/10.1215/10642684-2009-025.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Rhoades, Gale. "Diacritics for indexers." Indexer: The International Journal of Indexing 26, no. 4 (2008): 146–47. http://dx.doi.org/10.3828/indexer.2008.45.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Toth, Štefan, Emanuel Zaymus, Michal Ďuračík, Patrik Hrkút, and Matej Meško. "Diacritics restoration based on word n-grams for Slovak texts." Open Computer Science 11, no. 1 (2021): 180–89. http://dx.doi.org/10.1515/comp-2020-0143.

Full text

Abstract:

Abstract Despite the modern boom in technology, we are still faced with the fact that people write texts without diacritics. There are two main reasons for this. The first, historical reason stems from the past when the use of diacritics was troublesome and people would write text without them. The second one is the speed - typing without diacritics is usually faster. Text without diacritics is easy to understand for people, but for some types of documents, missing diacritics can cause a problem. This is also an issue when computers process such text. In this paper, we propose an algorithm based on word n-grams (a contiguous sequence of n words) that can restore diacritics of text written in the Slovak language. We also compare and evaluate our results with other algorithms developed for Slovak text.

APA, Harvard, Vancouver, ISO, and other styles

4

Hermena, Ehab W., Sana Bouamama, Simon P. Liversedge, and Denis Drieghe. "Does diacritics‐based lexical disambiguation modulate word frequency, length, and predictability effects? An eye‐movements investigation of processing Arabic diacritics." PLOS ONE 16, no. 11 (2021): e0259987. http://dx.doi.org/10.1371/journal.pone.0259987.

Full text

Abstract:

In Arabic, a predominantly consonantal script that features a high incidence of lexical ambiguity (heterophonic homographs), glyph-like marks called diacritics supply vowel information that clarifies how each consonant should be pronounced, and thereby disambiguate the pronunciation of consonantal strings. Diacritics are typically omitted from print except in situations where a particular homograph is not sufficiently disambiguated by the surrounding context. In three experiments we investigated whether the presence of disambiguating diacritics on target homographs modulates word frequency, length, and predictability effects during reading. In all experiments, the subordinate representation of the target homographs was instantiated by the diacritics (in the diacritized conditions), and by the context subsequent to the target homographs. The results replicated the effects of word frequency (Experiment 1), word length (Experiment 2), and predictability (Experiment 3). However, there was no evidence that diacritics-based disambiguation modulated these effects in the current study. Rather, diacritized targets in all experiments attracted longer first pass and later (go past and/or total fixation count) processing. These costs are suggested to be a manifestation of the subordinate bias effect. Furthermore, in all experiments, the diacritics-based disambiguation facilitated later sentence processing, relative to when the diacritics were absent. The reported findings expand existing knowledge about processing of diacritics, their contribution towards lexical ambiguity resolution, and sentence processing.

APA, Harvard, Vancouver, ISO, and other styles

5

Rhoades, Gale. "Diacritics for indexers revisited." Indexer: The International Journal of Indexing 34, no. 4 (2016): 177–79. http://dx.doi.org/10.3828/indexer.2016.55.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Sheikh, Ahmed Abdalla, Mohd Sanusi Azmi, Maslita Abd Aziz, Mohammed Nasser Al-Mhiqani, and Salem Saleh Bafjaish. "Framework of diacritic segmentation for Arabic handwritten document." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 2 (2021): 1001–8. https://doi.org/10.11591/ijeecs.v24.i2.pp1001-1008.

Full text

Abstract:

In recent Arabic standard language and Arabic dialectal texts, diacritics and short vowels are absent. There are some exceptions have been made for the Arabic beginner learner scripts, religious texts and as well as a significant political text. In addition, the text without diacritics is considered ambiguous due to numerous words with different diacritic marks seem identical. However, this paper we present a framework for segmenting diacritics from Arabic handwritten document by using region-based segmentation technique. Since Arabic handwritten and Mushaf Al-Quran contain many diacritical marks. Hence, the diacritics must be properly extracted from Arabic handwritten document to avoid losing some good features. Furthermore, the proposed framework is devised specifically to segment diacritics from Arabic handwritten image, thus there will be no feature extraction, feature selection, and classification processes included. Besides, we will present the methodology that is used to fulfil the objectives of this paper. The preprocessing phases will be explained and more specifically segmentation phase for segmenting diacritics which is the phase we concentrate more in this article. Lastly, we will identify the proposed technique region-based segmentation to facilitate our development throughout the experimental process.

APA, Harvard, Vancouver, ISO, and other styles

7

Shiekh, Ahmed Abdalla, Mohd Sanusi Azmi, Maslita Abd Aziz, Mohammed Nasser Al-Mhiqani, and Salem Saleh Bafjaish. "Framework of diacritic segmentation for Arabic handwritten document." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 2 (2021): 1001. http://dx.doi.org/10.11591/ijeecs.v24.i2.pp1001-1008.

Full text

Abstract:

In recent Arabic standard language and Arabic dialectal texts, diacritics and short vowels are absent. There are some exceptions have been made for the Arabic beginner learner scripts, religious texts and as well as a significant political text. In addition, the text without diacritics is considered ambiguous due to numerous words with different diacritic marks seem identical. However, this paper we present a framework for segmenting diacritics from Arabic handwritten document by using region-based segmentation technique. Since Arabic handwritten and Mushaf Al-Quran contain many diacritical marks. Hence, the diacritics must be properly extracted from Arabic handwritten document to avoid losing some good features. Furthermore, the proposed framework is devised specifically to segment diacritics from Arabic handwritten image, thus there will be no feature extraction, feature selection, and classification processes included. Besides, we will present the methodology that is used to fulfil the objectives of this paper. The pre-processing phases will be explained and more specifically segmentation phase for segmenting diacritics which is the phase we concentrate more in this article. Lastly, we will identify the proposed technique region-based segmentation to facilitate our development throughout the experimental process.

APA, Harvard, Vancouver, ISO, and other styles

8

Shiekh, Ahmed Abdalla, Mohd Sanusi Azmi, Maslita Abd Aziz, Mohammed Nasser Al-Mhiqani, and Salem Saleh Bafjaish. "Diacritic segmentation technique for arabic handwritten using region-based." Indonesian Journal of Electrical Engineering and Computer Science 18, no. 1 (2020): 478. http://dx.doi.org/10.11591/ijeecs.v18.i1.pp478-484.

Full text

Abstract:

Arabic is a broadly utilized alphabetic composition framework on the planet, and it has 28 essential letters. The letters in order was first used to compose messages in Arabic, most prominently the Qur'an the holy book of Islam. However, Arabic language has diacritics in the word or letters which are not something extra or discretionary to the language, rather they are a vital piece of it. By changing some diacritics may change both the syntax and semantics of a word by turning a word into another. However, the current researches address the foreground image and consider the diacritics as noises or secondary images. Thus, it is not suitable for Arabic handwritten. The diacritics will be removed from the image and this will lead to losing some good features. Furthermore, to extract the diacritics, the region-based segmentation technique is used. The image will be measured based on the region properties by first finding the connected component in binary image, and then we will determine the best area range measurement in that region for each image. The proposed technique region based has been tested in nine different images with different handwritten style, and successfully extracted secondary foreground images (diacritics) for each image.

APA, Harvard, Vancouver, ISO, and other styles

9

Ahmed, Abdalla Sheikh, Sanusi Azmi Mohd, Abd Aziz Maslita, Nasser Al-Mhiqani Mohammed, and Saleh Bafjaish Salem. "Diacritic segmentation technique for arabic handwritten using region-based." Indonesian Journal of Electrical Engineering and Computer Science (IJEECS) 18, no. 1 (2020): 478–84. https://doi.org/10.11591/ijeecs.v18.i1.pp478-484.

Full text

Abstract:

Arabic is a broadly utilized alphabetic composition framework on the planet, and it has 28 essential letters. The letters in order was first used to compose messages in Arabic, most prominently the Qur'an the holy book of Islam. However, Arabic language has diacritics in the word or letters which are not something extra or discretionary to the language, rather they are a vital piece of it. By changing some diacritics may change both the syntax and semantics of a word by turning a word into another. However, the current researches address the foreground image and consider the diacritics as noises or secondary images. Thus, it is not suitable for Arabic handwritten. The diacritics will be removed from the image and this will lead to losing some good features. Furthermore, to extract the diacritics, the region-based segmentation technique is used. The image will be measured based on the region properties by first finding the connected component in binary image, and then we will determine the best area range measurement in that region for each image. The proposed technique region based has been tested in nine different images with different handwritten style, and successfully extracted secondary foreground images (diacritics) for each image

APA, Harvard, Vancouver, ISO, and other styles

10

Stankevičius, Lukas, Mantas Lukoševičius, Jurgita Kapočiūtė-Dzikienė, Monika Briedienė, and Tomas Krilavičius. "Correcting Diacritics and Typos with a ByT5 Transformer Model." Applied Sciences 12, no. 5 (2022): 2636. http://dx.doi.org/10.3390/app12052636.

Full text

Abstract:

Due to the fast pace of life and online communications and the prevalence of English and the QWERTY keyboard, people tend to forgo using diacritics, make typographical errors (typos) when typing in other languages. Restoring diacritics and correcting spelling is important for proper language use and the disambiguation of texts for both humans and downstream algorithms. However, both of these problems are typically addressed separately: the state-of-the-art diacritics restoration methods do not tolerate other typos, but classical spellcheckers also cannot deal adequately with all the diacritics missing.In this work, we tackle both problems at once by employing the newly-developed universal ByT5 byte-level seq2seq transformer model that requires no language-specific model structures. For a comparison, we perform diacritics restoration on benchmark datasets of 12 languages, with the addition of Lithuanian. The experimental investigation proves that our approach is able to achieve results (>98%) comparable to the previous state-of-the-art, despite being trained less and on fewer data. Our approach is also able to restore diacritics in words not seen during training with >76% accuracy. Our simultaneous diacritics restoration and typos correction approach reaches >94% alpha-word accuracy on the 13 languages. It has no direct competitors and strongly outperforms classical spell-checking or dictionary-based approaches. We also demonstrate all the accuracies to further improve with more training. Taken together, this shows the great real-world application potential of our suggested methods to more data, languages, and error classes.

APA, Harvard, Vancouver, ISO, and other styles

11

Frändén, Märit. "Krumelurer med svag ställning." Nordic Journal of Socio-Onomastics 4, no. 2 (2024): 47–80. http://dx.doi.org/10.59589/noso.42024.15961.

Full text

Abstract:

This paper presents a quantitative study of the registration of diacritics in Hungarian, Spanish, Vietnamese and Turkish family names in the official Swedish population records. Out of the examined diacritics, vowels with accents and the Turkish letter <ç> are registered in up to 21 % of the cases, whereas the Turkish letter <ü> is registered in over 90 % of the occurrences. Parts of a bigger interview study is also presented, where informants share their views of diacritics in the Swedish population records.

APA, Harvard, Vancouver, ISO, and other styles

12

Darwish, Kareem, Ahmed Abdelali, Hamdy Mubarak, and Mohamed Eldesouki. "Arabic Diacritic Recovery Using a Feature-rich biLSTM Model." ACM Transactions on Asian and Low-Resource Language Information Processing 20, no. 2 (2021): 1–18. http://dx.doi.org/10.1145/3434235.

Full text

Abstract:

Diacritics (short vowels) are typically omitted when writing Arabic text, and readers have to reintroduce them to correctly pronounce words. There are two types of Arabic diacritics: The first are core-word diacritics (CW), which specify the lexical selection, and the second are case endings (CE), which typically appear at the end of word stems and generally specify their syntactic roles. Recovering CEs is relatively harder than recovering core-word diacritics due to inter-word dependencies, which are often distant. In this article, we use feature-rich recurrent neural network model that use a variety of linguistic and surface-level features to recover both core word diacritics and case endings. Our model surpasses all previous state-of-the-art systems with a CW error rate (CWER) of 2.9% and a CE error rate (CEER) of 3.7% for Modern Standard Arabic (MSA) and CWER of 2.2% and CEER of 2.5% for Classical Arabic (CA). When combining diacritized word cores with case endings, the resultant word error rates are 6.0% and 4.3% for MSA and CA, respectively. This highlights the effectiveness of feature engineering for such deep neural models.

APA, Harvard, Vancouver, ISO, and other styles

13

ASAHIAH, FRANKLIN ỌLÁDIÍPỌ̀, ỌDẸ́TÚNJÍ ÀJÀDÍ ỌDẸ́JỌBÍ, and EMMANUEL RÓTÌMÍ ADÁGÚNODÒ. "A survey of diacritic restoration in abjad and alphabet writing systems." Natural Language Engineering 24, no. 1 (2017): 123–54. http://dx.doi.org/10.1017/s1351324917000407.

Full text

Abstract:

AbstractA diacritic is a mark placed near or through a character to alter its original phonetic or orthographic value. Many languages around the world use diacritics in their orthography, whatever the writing system the orthography is based on. In many languages, diacritics are ignored either by convention or as a matter of convenience. For users who are not familiar with the text domain, the absence of diacritics within text has been known to cause mild to serious readability and comprehension problems. However, the absence of diacritics in text causes near-intractable problems for natural language processing systems. This situation has led to extensive research on diacritization. Several techniques have been applied to address diacritic restoration (or diacritization) but the existing surveys of techniques have been restricted to some languages and hence left gaps for practitioners to fill. Our survey examined diacritization from the angle of resources deployed and various formulation employed for diacritization. It was concluded by recommending that (a) any proposed technique for diacritization should consider the language features and the purpose served by diacritics, (b) that evaluation metrics needed to be more rigorously defined for easy comparison of performance of models.

APA, Harvard, Vancouver, ISO, and other styles

14

Kurzon, Dennis. "A brief note on diacritics." Written Language and Literacy 11, no. 1 (2008): 90–94. http://dx.doi.org/10.1075/wll.11.1.07kur.

Full text

Abstract:

This short note relates to remarks made by Peter Daniels in his 2006 article ‘On beyond alphabets’ on diacritics which he defines as markers that have a consistent phonological function in the particular script. From examples taken from French, Czech and other languages using the roman script, it is shown that the principal function of diacritics is to set up contrasts among various graphemes. This is extrapolated to the Arabic and Perso-Arabic scripts, where it is shown that the dots above and below graphemes are in effect diacritics.

APA, Harvard, Vancouver, ISO, and other styles

15

Wells, J. C. "Orthographic diacritics and multilingual computing." Language Problems and Language Planning 24, no. 3 (2000): 249–72. http://dx.doi.org/10.1075/lplp.24.3.04wel.

Full text

Abstract:

Diacritics — marks above, through, or below letters — are used in many orthographies to remedy the shortcomings of the ordinary Latin alphabet. The author catalogues the various diacritics that are in use for spelling different languages, describing what they look like and what they are used for. He also analyses the problems of using accented letters in a multilingual computing environment, and discusses the extent to which these problems have been resolved, with particular reference to Unicode.

APA, Harvard, Vancouver, ISO, and other styles

16

Bates, Michael L. "Arabic Diacritics without Special Software." Middle East Studies Association Bulletin 22, no. 1 (1988): 34–36. http://dx.doi.org/10.1017/s0026318400019519.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Henton, Caroline G. "5. Individual symbols and diacritics." Journal of the International Phonetic Association 18, no. 2 (1988): 85–94. http://dx.doi.org/10.1017/s0025100300003686.

Full text

Abstract:

In preparation for the IPA Kiel Convention in 1989, this report summarizes preliminary discussions of possible improvements and changes to individual symbols and diacritics in the International Phonetic Alphabet. It details responses to a questionnaire sent out on 22 October 1988. Specific questions were suggested by a number of sources: my own paper (Henton 1987), reactions to that paper, and from discussions with other phoneticians. The questionnaire was written with a view to eliciting simple and democratic input, rather than to promoting certain symbols or suggestions over others. It was sent to 38 members of the Association who rated the section on ‘Individual symbols and diacritics’ 1–5 (on a scale of 17, with 1 being highest preference) in their reponses to the invitation to revise the International Phonetic Association's alphabet. Eighty members originally replied to the call for input to the Kiel meeting, so the 38 who placed this section high in their preferences are a good portion (47%) of members actively concerned with this Convention.

APA, Harvard, Vancouver, ISO, and other styles

18

Ball, Martin J. "On the status of diacritics." Journal of the International Phonetic Association 31, no. 2 (2001): 259–64. http://dx.doi.org/10.1017/s0025100301002067.

Full text

Abstract:

In this article we note that diacritics, both in terms of their definition by the IPA, and in studies of transcriber reliability, are treated as a single group. Further, they are usually treated as being used purely to refine the meaning of a sound and, as such, as having less status phonetically than full symbols. It is argued here that diacritics should be classified into at least two major categories, and it is shown how one of these categories is the equivalent of a ‘full’ symbol. Apart from the implications this has for reliability measures, it is argued in conclusion that a more neutral definition of diacritic by the IPA is required.

APA, Harvard, Vancouver, ISO, and other styles

19

Alloa, Emmanuel. "Abstract: Flesh as embodied Diacritics." Chiasmi International 11 (2009): 262. http://dx.doi.org/10.5840/chiasmi20091144.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Hifny, Yasser. "Open Vocabulary Arabic Diacritics Restoration." IEEE Signal Processing Letters 26, no. 10 (2019): 1421–25. http://dx.doi.org/10.1109/lsp.2019.2933721.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Azmi, Aqil M., Rehab M. Alnefaie, and Hatim A. Aboalsamh. "Light Diacritic Restoration to Disambiguate Homographs in Modern Arabic Texts." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 3 (2022): 1–14. http://dx.doi.org/10.1145/3486675.

Full text

Abstract:

Diacritic restoration (also known as diacritization or vowelization) is the process of inserting the correct diacritical markings into a text. Modern Arabic is typically written without diacritics, e.g., newspapers. This lack of diacritical markings often causes ambiguity, and though natives are adept at resolving, there are times they may fail. Diacritic restoration is a classical problem in computer science. Still, as most of the works tackle the full (heavy) diacritization of text, we, however, are interested in diacritizing the text using a fewer number of diacritics. Studies have shown that a fully diacritized text is visually displeasing and slows down the reading. This article proposes a system to diacritize homographs using the least number of diacritics, thus the name “light.” There is a large class of words that fall under the homograph category, and we will be dealing with the class of words that share the spelling but not the meaning. With fewer diacritics, we do not expect any effect on reading speed, while eye strain is reduced. The system contains morphological analyzer and context similarities. The morphological analyzer is used to generate all word candidates for diacritics. Then, through a statistical approach and context similarities, we resolve the homographs. Experimentally, the system shows very promising results, and our best accuracy is 85.6%.

APA, Harvard, Vancouver, ISO, and other styles

22

VAN COMPERNOLLE, RÉMI A. "Use and variation of French diacritics on an Internet dating site." Journal of French Language Studies 21, no. 2 (2010): 131–48. http://dx.doi.org/10.1017/s0959269510000293.

Full text

Abstract:

ABSTRACTThis article explores sociolinguistic variation in the use of French accents and diacritics—where they would be expected in the formal written language—in electronic personals posted on an Internet dating site. Results of a series of VARBRUL analyses show both age (i.e., 18–25 years vs. 36–45 years) and gender to be significant social variables in accent/diacritic variation. Three potential explanations of the variation are addressed: (i) the eventual loss of accents and diacritics in computer-mediated French, (ii) stable variation with differences between ‘digital natives’ and ‘digital immigrants’, and (iii) a change in progress whereby accents and diacritics are becoming ‘prestige variants’ in computer-mediated contexts.

APA, Harvard, Vancouver, ISO, and other styles

23

Nazih, Waleed, and Yasser Hifny. "Arabic Syntactic Diacritics Restoration Using BERT Models." Computational Intelligence and Neuroscience 2022 (October 30, 2022): 1–8. http://dx.doi.org/10.1155/2022/3214255.

Full text

Abstract:

The Arabic syntactic diacritics restoration problem is often solved using long short-term memory (LSTM) networks. Handcrafted features are used to augment these LSTM networks or taggers to improve performance. A transformer-based machine learning technique known as bidirectional encoder representations from transformers (BERT) has become the state-of-the-art method for natural language understanding in recent years. In this paper, we present a novel tagger based on BERT models to restore Arabic syntactic diacritics. We formulated the syntactic diacritics restoration as a token sequence classification task similar to named-entity recognition (NER). Using the Arabic TreeBank (ATB) corpus, the developed BERT tagger achieves a 1.36% absolute case-ending error rate (CEER) over other systems.

APA, Harvard, Vancouver, ISO, and other styles

24

Alsayadi, Hamzah A., Abdelaziz A. Abdelhamid, Islam Hegazy, and Zaki T. Fayed. "Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models." Journal of Intelligent & Fuzzy Systems 41, no. 6 (2021): 6207–19. http://dx.doi.org/10.3233/jifs-202841.

Full text

Abstract:

Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In this paper, we investigate the effect of diactrics on the Arabic speech recognition based end-to-end deep learning. The applied end-to-end approach includes CNN-LSTM and attention-based technique presented in the state-of-the-art framework namely, Espresso using Pytorch. In addition, and to the best of our knowledge, the approach of CNN-LSTM with attention-based has not been used in the task of Arabic Automatic speech recognition (ASR). To fill this gap, this paper proposes a new approach based on CNN-LSTM with attention based method for Arabic ASR. The language model in this approach is trained using RNN-LM and LSTM-LM and based on nondiacritized transcription of the speech corpus. The Standard Arabic Single Speaker Corpus (SASSC), after omitting the diacritics, is used to train and test the deep learning model. Experimental results show that the removal of diacritics decreased out-of-vocabulary and perplexity of the language model. In addition, the word error rate (WER) is significantly improved when compared to diacritized data. The achieved average reduction in WER is 13.52%.

APA, Harvard, Vancouver, ISO, and other styles

25

Kurzon, Dennis. "Diacritics and the Perso-Arabic script." Writing Systems Research 5, no. 2 (2013): 234–43. http://dx.doi.org/10.1080/17586801.2013.799451.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Abed, Sa’ed, Mohammad Alshayeji, and Sari Sultan. "Diacritics Effect on Arabic Speech Recognition." Arabian Journal for Science and Engineering 44, no. 11 (2019): 9043–56. http://dx.doi.org/10.1007/s13369-019-04024-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Masmoudi, Abir, Salima Mdhaffar, Rahma Sellami, and Lamia Hadrich Belguith. "Automatic Diacritics Restoration for Tunisian Dialect." ACM Transactions on Asian and Low-Resource Language Information Processing 18, no. 3 (2019): 1–18. http://dx.doi.org/10.1145/3297278.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Bunčić, Daniel. "On the etymology of diacritics in general and the origin of the Czech diacritics in particular." Slavia 92, no. 4 (2023): 385–424. http://dx.doi.org/10.58377/slav.2023.4.01.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Abbad, Hamza, and Shengwu Xiong. "Simple Extensible Deep Learning Model for Automatic Arabic Diacritization." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 2 (2022): 1–16. http://dx.doi.org/10.1145/3480938.

Full text

Abstract:

Automatic diacritization is an Arabic natural language processing topic based on the sequence labeling task where the labels are the diacritics and the letters are the sequence elements. A letter can have from zero up to two diacritics. The dataset used was a subset of the preprocessed version of the Tashkeela corpus. We developed a deep learning model composed of a stack of four bidirectional long short-term memory hidden layers of the same size and an output layer at every level. The levels correspond to the groups that we classified the diacritics into (short vowels, double case-endings, Shadda, and Sukoon). Before training, the data were divided into input vectors containing letter indexes and outputs vectors containing the indexes of diacritics regarding their groups. Both input and output vectors are concatenated, then a sliding window operation with overlapping is performed to generate continuous and fixed-size data. Such data is used for both training and evaluation. Finally, we realize some tests using the standard metrics with all of their variations and compare our results with two recent state-of-the-art works. Our model achieved 3% diacritization error rate and 8.99% word error rate when including all letters. We have also generated the confusion matrix to show the performances per output and analyzed the mismatches of the first 500 lines to classify the model errors according to their linguistic nature.

APA, Harvard, Vancouver, ISO, and other styles

30

Alshammari, Hamed, and Khaled Elleithy. "Toward Robust Arabic AI-Generated Text Detection: Tackling Diacritics Challenges." Information 15, no. 7 (2024): 419. http://dx.doi.org/10.3390/info15070419.

Full text

Abstract:

Current AI detection systems often struggle to distinguish between Arabic human-written text (HWT) and AI-generated text (AIGT) due to the small marks present above and below the Arabic text called diacritics. This study introduces robust Arabic text detection models using Transformer-based pre-trained models, specifically AraELECTRA, AraBERT, XLM-R, and mBERT. Our primary goal is to detect AIGTs in essays and overcome the challenges posed by the diacritics that usually appear in Arabic religious texts. We created several novel datasets with diacritized and non-diacritized texts comprising up to 9666 HWT and AIGT training examples. We aimed to assess the robustness and effectiveness of the detection models on out-of-domain (OOD) datasets to assess their generalizability. Our detection models trained on diacritized examples achieved up to 98.4% accuracy compared to GPTZero’s 62.7% on the AIRABIC benchmark dataset. Our experiments reveal that, while including diacritics in training enhances the recognition of the diacritized HWTs, duplicating examples with and without diacritics is inefficient despite the high accuracy achieved. Applying a dediacritization filter during evaluation significantly improved model performance, achieving optimal performance compared to both GPTZero and the detection models trained on diacritized examples but evaluated without dediacritization. Although our focus was on Arabic due to its writing challenges, our detector architecture is adaptable to any language.

APA, Harvard, Vancouver, ISO, and other styles

31

de Voogt, Alex. "A Paleographic Analysis of Swahili-Arabic Script through Thirteen Poems." Journal of Islamic Manuscripts 16, no. 1 (2025): 97–113. https://doi.org/10.1163/1878464x-01601001.

Full text

Abstract:

Abstract A paleographic analysis of a set of thirteen poems composed by Muyaka bin Haji (1776–1840) but written in Swahili-Arabic script by Mwalimu Sikujua in the 1890s reveals a consistent preference in the placement of script-specific diacritics and in the use of certain letterform combinations, but variation in the choice of graphic forms, particularly those for kāf. Sikujua places miniature consonant signs between a consonant letterform and a potential vowel diacritic. Since the miniature consonants are rendered in red, the scribe needs to alternate pens during the writing of the text or leave sufficient space if they are inserted later. Letterform combinations appear limited to those with either mīm, jīm, or yāʾ. This characteristic may contrast Sikujua’s handwriting with that of others. The paleographic consequences of these combinations mainly concern the placement of vowel diacritics, which are placed in relation to the consonant sign rather than the baseline of the overall text. Where miniature consonants and letterform combinations show a consistent placement of the Swahili-Arabic diacritics in this corpus, a third kāf graphic form adds variation. The kāf graphic forms are in free variation and accompanying vowel diacritics have varying placements. The third kāf graphic form is possibly unique to Sikujua who has also shown creative variations in spelling when using Arabic or Swahili-Arabic script for the Swahili language.

APA, Harvard, Vancouver, ISO, and other styles

32

Zayyan, Ayman A., Mohamed Elmahdy, Husniza binti Husni, and Jihad Al Ja’am. "Automatic Diacritics Restoration for Dialectal Arabic Text." International Journal of Computing and Information Sciences 12, no. 2 (2016): 159–65. http://dx.doi.org/10.21700/ijcis.2016.119.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Nazir, S., and A. Javed. "Diacritics Recognition Based Urdu Nastalique OCR System." Nucleus 51, no. 3 (2014): 361–67. https://doi.org/10.71330/thenucleus.2014.691.

Full text

Abstract:

Improvements and new developments in the field of Artificial Intelligence have opened new horizons in the advancement of machines that originally have limited intelligence. As compared to human brain, machines have already better computational speed and storage however there is still much room to improve the capability to acquire and process data and draw conclusions from it on its own. Optical Character Recognition (OCR) deals exclusively with printed designs and hand written text in nature. Plenty of developments have been made in OCR so far in recognition of Latin, Asian, Arabic and Western texts. As far as Urdu is concerned the work is almost non-existent when compared with the languages cited above. One of its main reasons is the use of extremely complex characters of Nastalique style in Urdu. A methodology for the recognition and processing of the diacritics of Nastalique script is presented in this research work. The proposed technique is effective in recognizing cursive texts with invariant font size of 48. A dataset of 6728 main Urdu Nastalique ligatures is used for the testing purposes which shows that this new technique has the capacity to recognize Nastalique ligatures by having an accuracy of 97.40%. The proposed research work also focuses to improve the existing base mark association process of the Urdu OCR system.

APA, Harvard, Vancouver, ISO, and other styles

34

Lutf, Mohammed, Xinge You, Yiu-ming Cheung, and C. L. Philip Chen. "Arabic font recognition based on diacritics features." Pattern Recognition 47, no. 2 (2014): 672–84. http://dx.doi.org/10.1016/j.patcog.2013.07.015.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Kinoshita, Sachiko, Annabel Amos, and Dennis Norris. "Diacritic priming in novice readers of diacritics." Journal of Experimental Psychology: Human Perception and Performance 49, no. 3 (2023): 370–83. http://dx.doi.org/10.1037/xhp0001084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Braun, Almut. "IPAtranscriptor: A Python program for narrow phonetic transcription for blind and sighted linguists." Journal of the International Phonetic Association 50, no. 2 (2018): 193–98. http://dx.doi.org/10.1017/s0025100318000233.

Full text

Abstract:

IPAtranscriptor is a tool for creating narrow phonetic transcriptions. As it connects to the computer's default text-to-speech engine on demand, the program can be used not only by sighted but also by partially sighted and blind individuals. Sighted users can choose whether they prefer the mouse or the keyboard as their input device. In contrast to other programs, the full set of symbols and diacritics of the International Phonetic Alphabet (IPA) is implemented and users can produce very narrow phonetic transcriptions as they can insert up to three diacritics above and three diacritics below each IPA symbol to modify it. Furthermore, the program can facilitate the collaboration between blind and sighted phoneticians (or students of linguistics in general) since they can easily exchange their phonetic transcriptions. A conversion of the transcriptions is not necessary as all transcribers can use the same system regardless of their visual abilities. IPAtranscriptor is freely available online and is believed to be the first audio-based program for narrow phonetic transcription that can be used by blind and sighted phoneticians.

APA, Harvard, Vancouver, ISO, and other styles

37

Фёдорова, Людмила Львовна. "Диакритики в графической системе языка: нужно ли проставлять точки в ё?" Język i Metoda 7 (2021): 301–11. http://dx.doi.org/10.4467/23919981jm.21.028.14260.

Full text

Abstract:

The paper offers a sketch of a typological classification of diacritics, according to their role in the graphic system and their position in writing. The diacritics of the Russian writing are under consideration. The research focuses on the letter ё, highlighting traditions and practices of its use. The survey reveals current trends in the use of ё, based on attitudes of young people to it, and factors that influence the choice of ё when writing in different styles (formal/non-formal) and techniques: handwritten or printed (on the computer/other device). Ideological potential of the letter ё is noted, too.

APA, Harvard, Vancouver, ISO, and other styles

38

Huda, Kadhim Tayyeh, Salih Mahdi Mohammed, and Sabah Ahmed AL-Jumaili Ahmed. "Novel steganography scheme using Arabic text features in Holy Quran." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 3 (2019): 1910–18. https://doi.org/10.11591/ijece.v9i3.pp1910-1918.

Full text

Abstract:

With the rapid growth of the Internet and mobile devices, the need for hidden communications has significantly increased. Steganography is a technique introduced for establishing hidden communication, Most steganography techniques have been applied to audio, images, videos, and text. Many researchers used steganography in Arabic texts to take advantage of adding, editing or changing letters or diacritics, but lead to notable and suspicious text. In this paper, we propose two novel steganography algorithms for Arabic text using the Holy Quran as cover text. The fact that it is forbidden to add, edit or change any letter or diacritics in the Holy Quran provides the valuable feature of its robustness and difficulty as a cover in steganography. The algorithms hide secret messages elements within Arabic letters benefiting from the existence of sun letters (Arabic: ḥurūf shamsīyah) and moon letters (ḥurūf qamarīyah). Also, we consider the existence of some Arabic language characteristics represented as small vowel letters (Arabic Diacritics). Our experiments using the proposed two algorithms demonstrate high capacity for text files. The proposed algorithms are robust against attack since the changes in the cover text are imperceptible, so our contribution offers a more secure algorithm that provides good capacity.

APA, Harvard, Vancouver, ISO, and other styles

39

Hssini, Mohamed. "Problem of Multiple Diacritics Design for Arabic Script." IOSR Journal of Engineering 02, no. 12 (2012): 48–53. http://dx.doi.org/10.9790/3021-021234853.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Woodard, Marion. "The use of diacritics for visual articulatory behaviours." International Journal of Language & Communication Disorders 26, no. 1 (1991): 125–28. http://dx.doi.org/10.3109/13682829109011996.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Chetail, Fabienne, and Emeline Boursain. "Shared or separated representations for letters with diacritics?" Psychonomic Bulletin & Review 26, no. 1 (2018): 347–52. http://dx.doi.org/10.3758/s13423-018-1503-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Protopapas, Athanassios, and Svetlana Gerakaki. "Development of Processing Stress Diacritics in Reading Greek." Scientific Studies of Reading 13, no. 6 (2009): 453–83. http://dx.doi.org/10.1080/10888430903034788.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Ali, Rawaa Hamza, Ban Nadeem Dhannoon, and Mohamed Iedan Hamel. "Arabic text steganography using lunar and solar diacritics." Indonesian Journal of Electrical Engineering and Computer Science 31, no. 3 (2023): 1559. http://dx.doi.org/10.11591/ijeecs.v31.i3.pp1559-1567.

Full text

Abstract:

The need to hide essential information has rapidly increased as mobile devices and the internet has overgrown. Steganography is a method created to create hidden communication. Recently, methods have been developed to hide important information using text steganography. This work-study takes advantage of the possibility of concealing data in all diacritics after the two letters ال)) in the cover text. In the presented study, we propose a new algorithm in steganography in Arabic text as a cover text. After pre-processing the cover text, the algorithm hides the elements of secret messages inside the Arabic letters by adding appropriate diacritics (like Hamzah Al-Wasl) on the extracted words beginning with (ال) according to its third letter type (solar or lunar). In the proposed algorithm, the length of the secret message is determined so that the intended recipient can extract the hidden message accurately. The proposed algorithm is robust against the attack because the change in the cover text is small and imperceptible. On the other hand, since Arabic is used as a cover text, the breadth of the inclusion depends on the number of words beginning with (ال) definition.

APA, Harvard, Vancouver, ISO, and other styles

44

Rawaa, Hamza Ali, Nadeem Dhannoon Ban, and Iedan Hamel Mohammed. "Arabic text steganography using lunar and solar diacritics." Arabic text steganography using lunar and solar diacritics 31, no. 3 (2023): 1559–67. https://doi.org/10.11591/ijeecs.v31.i3.pp1559-1567.

Full text

Abstract:

The need to hide essential information has rapidly increased as mobile devices and the internet has overgrown. Steganography is a method created to create hidden communication. Recently, methods have been developed to hide important information using text steganography. This work-study takes advantage of the possibility of concealing data in all diacritics after the two letters )ال (in the cover text. In the presented study, we propose a new algorithm in steganography in Arabic text as a cover text. After pre-processing the cover text, the algorithm hides the elements of secret messages inside the Arabic letters by adding appropriate diacritics (like Hamzah Al-Wasl) on the extracted words beginning with (ال (according to its third letter type (solar or lunar). In the proposed algorithm, the length of the secret message is determined so that the intended recipient can extract the hidden message accurately. The proposed algorithm is robust against the attack because the change in the cover text is small and imperceptible. On the other hand, since Arabic is used as a cover text, the breadth of the inclusion depends on the number of words beginning with (ال (definition.

APA, Harvard, Vancouver, ISO, and other styles

45

Mischler, Ælfwine. "Using diacritics in SKY Index Professional version 8." Indexer 43, no. 2 (2025): 159–70. https://doi.org/10.3828/index.2025.17.

Full text

Abstract:

This technical article explains how to use diacritics (modifying marks or signs added to characters) in version 8 of SKY Index Professional, one of the popular programs used by book indexers worldwide. It covers the Character Map and how to set up the Translation Manager to change combinations of characters and enter characters that are not on the keyboard. This article may be particularly useful for indexers working with non-English-language material.

APA, Harvard, Vancouver, ISO, and other styles

46

Bucholtz, Mary. "White affects and sociolinguistic activism." Language in Society 47, no. 3 (2018): 350–54. http://dx.doi.org/10.1017/s0047404518000271.

Full text

Abstract:

This year, undergraduates in my class ‘Language, race, and ethnicity’ carried out collaborative sociolinguistic activism projects addressing a range of issues in our community, such as racist street signs and California's ban on diacritics in personal names on official documents. Despite my and my teaching assistants’ explicit instructions that the projects should aim to effect some tangible change—the replacement of the street signs, the legalization of diacritics—many students focused instead on the more amorphous goal of ‘raising awareness’ of these issues on our campus and in the local community. As we explained, while raising the public profile of a social injustice is a necessary step toward changing it, this act alone cannot bring about change.

APA, Harvard, Vancouver, ISO, and other styles

47

Mielke, Jeff. "Visualizing phonetic segment frequencies with density-equalizing maps." Journal of the International Phonetic Association 48, no. 2 (2017): 129–54. http://dx.doi.org/10.1017/s0025100317000123.

Full text

Abstract:

A method is demonstrated for creating density-equalizing maps of IPA consonant and vowel charts, where the size of a cell in the chart reflects information such as the crosslinguistic frequency of the consonant or vowel. Transforming the IPA charts in such a way allows the visualization of interactions between phonetic features. Density-equalizing maps are used to illustrate a range of facts about consonant and vowel inventories, including the frequency of consonants and vowels and the frequency of common diacritics, and to illustrate the frequency of deletion and epenthesis involving particular consonants and vowels. Solutions are proposed for issues involving genealogical sampling, counting pairs of very similar phones, and counting diacritics in relation to basic symbols.

APA, Harvard, Vancouver, ISO, and other styles

48

Maninder, Kaur, and Manjeet Kaur Ms. "An Efficient OCR System based on the Regional Feature using the ASVM as Classifier." International Journal of Trend in Scientific Research and Development 1, no. 5 (2017): 1226–32. https://doi.org/10.31142/ijtsrd2425.

Full text

Abstract:

In Image Processing, sometimes due to poor handwriting, the writer left some gap between diacritics and character or between diacritics and header line due to which small text blocks gets created which leads to improper text line segmentation and hence leads to wrong results and overlapping. As a result accuracy of the algorithm degrades. In proposed work Adaptive SVM will be used to improve accuracy of the system. Maninder Kaur | Ms. Manjeet Kaur "An Efficient OCR System based on the Regional Feature using the ASVM as Classifier" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-1 | Issue-5 , August 2017, URL: https://www.ijtsrd.com/papers/ijtsrd2425.pdf

APA, Harvard, Vancouver, ISO, and other styles

49

Tayyeh, Huda Kadhim, Mohammed Salih Mahdi, and Ahmed Sabah Ahmed AL-Jumaili. "Novel steganography scheme using Arabic text features in Holy Quran." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 3 (2019): 1910. http://dx.doi.org/10.11591/ijece.v9i3.pp1910-1918.

Full text

Abstract:

With the rapid growth of the Internet and mobile devices, the need for hidden communications has significantly increased. Steganography is a technique introduced for establishing hidden communication, Most steganography techniques have been applied to audio, images, videos, and text. Many researchers used steganography in Arabic texts to take advantage of adding, editing or changing letters or diacritics, but lead to notable and suspicious text. In this paper, we propose two novel steganography algorithms for Arabic text using the Holy Quran as cover text. The fact that it is forbidden to add, edit or change any letter or diacritics in the Holy Quran provides the valuable feature of its robustness and difficulty as a cover in steganography. The algorithms hide secret messages elements within Arabic letters benefiting from the existence of sun letters (Arabic: ḥurūf shamsīyah) and moon letters (ḥurūf qamarīyah). Also, we consider the existence of some Arabic language characteristics represented as small vowel letters (Arabic Diacritics). Our experiments using the proposed two algorithms demonstrate high capacity for text files. The proposed algorithms are robust against attack since the changes in the cover text are imperceptible, so our contribution offers a more secure algorithm that provides good capacity.

APA, Harvard, Vancouver, ISO, and other styles

50

Sattar Malik, Dr Abdul. "The Justification of Urdu Letters with Similar Sounds and Diacritics." Noor e Tahqeeq 8, no. 01 (2024): 109–18. http://dx.doi.org/10.54692/nooretahqeeq.2024.08012155.

Full text

Abstract:

Urdu script is derived from the Arabic script but Urdu differs from Arabic in its nature and unlike Arabic some sounds are pronounced like each other. Due to that some experts object to Urdu script and suggest the removal of letters with similar sounds. But the removal of these letters can create many complexities and difficulties which cannot be resolved. These eight specific sounds of Urdu are common to Pakistani languages such as Punjabi, Sindhi, Pashto, Balochi, Kashmiri, and other local languages. How is it possible to exclude these words from all these languages? These sounds are the valuable capital of world languages like Persian and Arabic. With this collaboration, Urdu emerges as a great global and scientific language. By removing these letters, Urdu will suffer a great loss in terms of knowledge. There will be an encounter, which cannot be compensated. In short, this change is not feasible in context with Urdu historical background and present linguistic geographical condition. The Urdu script is non- diacritical in its nature and diacritics in Urdu are not arranged in the same way as in Arabic. However, it is necessary to organize the essential diacritics in the basic textbooks. This article argues and analyses the justification of Urdu letters with similar sounds and diacritics in detail.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Diacriticism'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles