To see the other types of publications on this topic, follow the link: POS tagger.

Journal articles on the topic 'POS tagger'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'POS tagger.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Elhadi, Mohamed Taybe, and Ramadan Sayad Alfared. "A Grammatically and Structurally Based Part of Speech (POS) Tagger for Arabic Language." International Journal on Natural Language Computing 11, no. 5 (2022): 17–29. http://dx.doi.org/10.5121/ijnlc.2022.11502.

Full text
Abstract:
In this paper we report on an experimental syntactically and morphologically driven rule-based Arabic tagger. The tagger is developed using Arabic language grammatical rules and regulations. The tagger requires no pre-tagged text and is developed using a primitive set of lexicon items along with extensive grammatical and structural rules. It is tested and compared to Stanford tagger both in terms of accuracy and performance (speed). Obtained results are quite comparable to Stanford tagger performance with marginal difference favoring the developed tagger in accuracy with huge difference in ter
APA, Harvard, Vancouver, ISO, and other styles
2

Mohamed, Taybe Elhadi, and Sayad Alfared Ramadan. "A GRAMMATICALLY AND STRUCTURALLY BASED PART OF SPEECH (POS) TAGGER FOR ARABIC LANGUAGE." International Journal on Natural Language Computing (IJNLC) 11, no. 5 (2022): 13. https://doi.org/10.5281/zenodo.7332965.

Full text
Abstract:
In this paper we report on an experimental syntactically and morphologically driven rule-based Arabic tagger. The tagger is developed using Arabic language grammatical rules and regulations. The tagger requires no pre-tagged text and is developed using a primitive set of lexicon items along with extensive grammatical and structural rules. It is tested and compared to Stanford tagger both in terms of accuracy and performance (speed). Obtained results are quite comparable to Stanford tagger performance with marginal difference favoring the developed tagger in accuracy with huge difference in ter
APA, Harvard, Vancouver, ISO, and other styles
3

Goh, Thing Thing, Nor Azliana Akmal Jamaludin, Hassan Mohamed, Mohd Nazri Ismail, and Huang Shen Chua. "A Comparative Study on Part-of-Speech Taggers’ Performance on Examination Questions Classification According to Bloom’s Taxonomy." Journal of Physics: Conference Series 2224, no. 1 (2022): 012001. http://dx.doi.org/10.1088/1742-6596/2224/1/012001.

Full text
Abstract:
Abstract Examination questions classification according to Bloom’s Taxonomy uses Natural Language Processing (NLP) approach, a series of text processing approach that generally can divided into the keywords identification stage and then the identified keywords classification to Bloom’s Taxonomy levels stage. Since this NLP approach is a pipeline processes, the keywords identification stage’s performance in term of accuracy is affecting the subsequent stage - the identified keywords classification and subsequently limits the final accuracy performance of the questions classification. The keywor
APA, Harvard, Vancouver, ISO, and other styles
4

Pradiptha, I. Gde Made Hendra, and Ngurah Agus Sanjaya ER. "Building Balinese Part-of-Speech Tagger Using Hidden Markov Model (HMM)." JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) 9, no. 2 (2020): 303. http://dx.doi.org/10.24843/jlk.2020.v09.i02.p18.

Full text
Abstract:
Part-of-Speech tagging or word class labeling is a process for labeling a word class in a word in a sentence. Previous research on POS Tagger, especially for Indonesian, has been done using various approaches and obtained high accuracy values. However, not many researchers have built POS Tagger for Balinese. In this article, we are interested in building a POS Tagger for Balinese using a probabilistic approach, specifically the Hidden Markov Model (HMM). HMM is selected to deal with ambiguity since it gives higher accuracy and fast processing time. We used k-fold cross-validation (with k = 10)
APA, Harvard, Vancouver, ISO, and other styles
5

Putranti, Noviah Dwi, and Edi Winarko. "Analisis Sentimen Twitter untuk Teks Berbahasa Indonesia dengan Maximum Entropy dan Support Vector Machine." IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 8, no. 1 (2014): 91. http://dx.doi.org/10.22146/ijccs.3499.

Full text
Abstract:
AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif. Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengump
APA, Harvard, Vancouver, ISO, and other styles
6

Adinarayanan, Sharada, Naren J, Sriranjanie P, and Vithya G. "Rule based POS Tagger for Sanskrit." International Journal of Psychosocial Rehabilitation 23, no. 1 (2019): 336–45. http://dx.doi.org/10.37200/ijpr/v23i1/pr190243.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Warjri, Sunita, Partha Pakray, Saralin A. Lyngdoh, and Arnab Kumar Maji. "Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus." ACM Transactions on Asian and Low-Resource Language Information Processing 21, no. 3 (2022): 1–24. http://dx.doi.org/10.1145/3488381.

Full text
Abstract:
Part-of-speech (POS) tagging is one of the research challenging fields in natural language processing (NLP). It requires good knowledge of a particular language with large amounts of data or corpora for feature engineering, which can lead to achieving a good performance of the tagger. Our main contribution in this research work is the designed Khasi POS corpus. Till date, there has been no form of any kind of Khasi corpus developed or formally developed. In the present designed Khasi POS corpus, each word is tagged manually using the designed tagset. Methods of deep learning have been used to
APA, Harvard, Vancouver, ISO, and other styles
8

Nikam, Bhushan Ashokrao. "Survey of part-of-speech tagger for mixed-code Indian and foreign language used in social media." International Journal of Advances in Applied Sciences 8, no. 4 (2019): 264. http://dx.doi.org/10.11591/ijaas.v8.i4.pp264-268.

Full text
Abstract:
<p>Part-Of-Speech Tagger (POS Tagger) is a tool that scans the text in specific language and allocates chunks of speech to individual word (and another token), such as verb, adjective, nown etc., as more fine-grained POS tags are used in computational applications like 'noun-plural'. Basically, the goal of a POS tagger is to allocate linguistic (mostly grammatical) information to sub-sentential units, called tokens as well as to words and symbols (e.g. punctuation). This paper presents a survey of POS Tagger used for code-Mixed Indian and Foreign languages. Various methods, procedures, a
APA, Harvard, Vancouver, ISO, and other styles
9

Bhushan, Nikam. "Survey of part-of-speech tagger for mixed-code Indian and foreign language used in social media." International Journal of Advances in Applied Sciences (IJAAS) 8, no. 4 (2019): 264–68. https://doi.org/10.11591/ijaas.v8.i4.pp264-268.

Full text
Abstract:
A Part-Of-Speech Tagger (POS Tagger) is a tool that scans the text in specific language and allocates chunks of speech to individual word (and another token), such as verb, adjective, nown etc., as more fine-grained POS tags are used in computational applications like 'noun-plural'. Basically, the goal of a POS tagger is to allocate linguistic (mostly grammatical) information to sub-sentential units, called tokens as well as to words and symbols (e.g. punctuation). This paper presents a survey of POS Tagger used for code-Mixed Indian and Foreign languages. Various methods, procedures,
APA, Harvard, Vancouver, ISO, and other styles
10

Dibitso, Mary, Pius A. Owolawi, and Sunday O. Ojo. "An Hybrid Part of Speech Tagger for Setswana Language using a Voting Method." International Conference on Intelligent and Innovative Computing Applications 2022 (December 31, 2022): 245–53. http://dx.doi.org/10.59200/iconic.2022.027.

Full text
Abstract:
Part-of Speech (PoS) tagging is a corpus linguistics that deals with assigning appropriate lexical categories to each word in a sentence. To effectively address challenges associated with PoS tagging, several Natural Language Processing (NLP) tasks modelling techniques have been employed, including Conditional Random Fields (CRF), Support Vector Machines (SVM), and Decision Trees in diverse languages. These PoS taggers implement the process of associating the correct PoS (nouns, verbs, adjectives, adverbs, etc.) with each word in a sentence. However, creating language resources is an expensive
APA, Harvard, Vancouver, ISO, and other styles
11

Cing, Dim Lam, and Khin Mar Soe. "Improving accuracy of Part-of-Speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (2020): 2023. http://dx.doi.org/10.11591/ijece.v10i2.pp2023-2030.

Full text
Abstract:
In Natural Language Processing (NLP), Word segmentation and Part-of-Speech (POS) tagging are fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But, as t
APA, Harvard, Vancouver, ISO, and other styles
12

Dim, Lam Cing, and Mar Soe Khin. "Improving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 2 (2020): 2023–30. https://doi.org/10.11591/ijece.v10i2.pp2023-2030.

Full text
Abstract:
In Natural Language Processing (NLP), Word segmentation and Part-ofSpeech (POS) tagging are fundamental tasks. The POS information is also necessary in NLP’s preprocessing work applications such as machine translation (MT), information retrieval (IR), etc. Currently, there are many research efforts in word segmentation and POS tagging developed separately with different methods to get high performance and accuracy. For Myanmar Language, there are also separate word segmentors and POS taggers based on statistical approaches such as Neural Network (NN) and Hidden Markov Models (HMMs). But,
APA, Harvard, Vancouver, ISO, and other styles
13

Bonchanoski, Martin, and Katerina Zdravkova. "Learning syntactic tagging of Macedonian language." Computer Science and Information Systems 15, no. 3 (2018): 799–820. http://dx.doi.org/10.2298/csis180310027b.

Full text
Abstract:
This paper presents the creation of machine learning based systems for Part-of-speech tagging of Macedonian language. Four well-known PoS tagger systems implemented for English and Slavic languages: TnT, cyclic dependency network, guided learning framework for bidirectional sequence classification, and dynamic features induction were trained. Orwell?s novel ?1984? was manually tagged from the authors and it was used split into training and test set. After the training of the models, a comparison between the models was made. At the end, a POS tagger with an accuracy that reaches 97.5% was achie
APA, Harvard, Vancouver, ISO, and other styles
14

Pattnaik, Sagarika, and Ajit Kumar Nayak. "A Modified Markov-Based Maximum-Entropy Model for POS Tagging of Odia Text." International Journal of Decision Support System Technology 14, no. 1 (2022): 1–24. http://dx.doi.org/10.4018/ijdsst.286690.

Full text
Abstract:
POS (Parts of Speech) tagging, a vital step in diverse Natural Language Processing (NLP) tasks has not drawn much attention in case of Odia a computationally under-developed language. The proposed hybrid method suggests a robust POS tagger for Odia. Observing the rich morphology of the language and unavailability of sufficient annotated text corpus a combination of machine learning and linguistic rules is adopted in the building of the tagger. The tagger is trained on tagged text corpus from the domain of tourism and is capable of obtaining a perceptible improvement in the result. Also an appr
APA, Harvard, Vancouver, ISO, and other styles
15

Shirko, Birhanesh Fikre. "APPLICATION OF HYBRID APPROACH FOR WOLAITA LANGUAGE PART OF SPEECH TAGGING." Journal of Research in Engineering and Applied Sciences 9, no. 2 (2024): 719–32. http://dx.doi.org/10.46565/jreas.202492719-732.

Full text
Abstract:
The main purpose of this study is to develop part-of-speech tagger for Wolaita Language using hybrid approach. Part of speech tagger is one of the subtasks in NLP application which is important for other Natural Language Processing (NLP) applications, like parser, machine translator, speech recognizer and search engines. PoST is a process of tagging a corresponding part of speech tag for a word that tag defines how the word is used in a sentence. The PoST for Wolaita language is not enough yet to be used as one vital module in other natural language processing applications. In this study, the
APA, Harvard, Vancouver, ISO, and other styles
16

Baig, Amber, Mutee U. Rahman, Hameedullah Kazi, and Ahsanullah Baloch. "Developing a POS Tagged Corpus of Urdu Tweets." Computers 9, no. 4 (2020): 90. http://dx.doi.org/10.3390/computers9040090.

Full text
Abstract:
Processing of social media text like tweets is challenging for traditional Natural Language Processing (NLP) tools developed for well-edited text due to the noisy nature of such text. However, demand for tools and resources to correctly process such noisy text has increased in recent years due to the usefulness of such text in various applications. Literature reports various efforts made to develop tools and resources to process such noisy text for various languages, notably, part-of-speech (POS) tagging, an NLP task having a direct effect on the performance of other successive text processing
APA, Harvard, Vancouver, ISO, and other styles
17

Ramadhanti, Febyana, Yudi Wibisono, and Rosa Ariani Sukamto. "Analisis Morfologi untuk Menangani Out-of-Vocabulary Words pada Part-of-Speech Tagger Bahasa Indonesia Menggunakan Hidden Markov Model." Jurnal Linguistik Komputasional (JLK) 2, no. 1 (2019): 6. http://dx.doi.org/10.26418/jlk.v2i1.13.

Full text
Abstract:
Part-of-speech (PoS) tagger merupakan salah satu task dalam bidang natural language processing (NLP) sebagai proses penandaan kategori kata (part-of-speech) untuk setiap kata pada teks kalimat masukan. Hidden markov model (HMM) merupakan algoritma PoS tagger berbasis probabilistik, sehingga sangat tergantung pada train corpus. Terbatasnya komponen dalam train corpus dan luasnya kata dalam bahasa Indonesia menimbulkan masalah yang disebut out-of-vocabulary (OOV) words. Penelitian ini membandingkan PoS tagger yang menggunakan HMM+AM (analisis morfologi) dan PoS tagger HMM tanpa AM, dengan menggu
APA, Harvard, Vancouver, ISO, and other styles
18

Tammekänd, Liina, and Reeli Torn-Leesik. "Automatinis kalbos dalių žymėjimas (POS) Tartu estų anglų kalbos mokinių tekstyne: mokinių klaidų poveikis CLAWS7 įrankio tikslumui." Taikomoji kalbotyra 20 (December 28, 2023): 121–35. http://dx.doi.org/10.15388/taikalbot.2023.20.9.

Full text
Abstract:
The present paper, which is a continuation of Tammekänd and Torn-Leesik’s (2022) study, aims to examine how learner errors affect the CLAWS7 tagger’s automated assignment of part-of-speech (POS) tags to a sample of 24,812 words of the Tartu Corpus of Estonian Learner English (TCELE). Learner errors causing tagging errors in the sample were identified, based on which a working error taxonomy was created. The POS-tagged and error-tagged samples were collated and compared to map correlations between learner and tagging errors. Error groups that correlated with significantly increased rates of tag
APA, Harvard, Vancouver, ISO, and other styles
19

Koleva, Mariya, Melissa Farasyn, Bart Desmet, Anne Breitbarth, and Véronique Hoste. "An automatic part-of-speech tagger for Middle Low German." International Journal of Corpus Linguistics 22, no. 1 (2017): 107–40. http://dx.doi.org/10.1075/ijcl.22.1.05kol.

Full text
Abstract:
Abstract Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having
APA, Harvard, Vancouver, ISO, and other styles
20

Sumoko, Ade, Arif Bijaksana Putra Negara, and Helen Sasty Pratiwi. "Perbandingan Tipe Metode PoS Tagger Terhadap Nilai Akurasi Untuk Bahasa Melayu Pontianak." Jurnal Sistem dan Teknologi Informasi (Justin) 9, no. 3 (2021): 342. http://dx.doi.org/10.26418/justin.v9i3.44116.

Full text
Abstract:
Metode PoS Tagger adalah metode yang digunakan sebagai acuan untuk melakukan tagging pos secara langsung sesuai dengan data uji yang dan data latih yang telah dilakukan pengujian untuk mendeteksi seberapa akurat metode pos tagger tersebut dalam melakukan tagging, yang dimana hal ini lakukan pengecekan dengan data tagging secara manual untuk melihat kearutan dari metode tersebut. Data kalimat teks korpus yang digunakan sebanyak 1500 kalimat bahasa Melayu Pontianak yang dimana kalimat teks digunakan dalam penelitian ini sebagai kalimat latih maupun kalimat uji. Penelitian ini melakukan pengemban
APA, Harvard, Vancouver, ISO, and other styles
21

Kamayani, Mia. "Perkembangan Part-of-Speech Tagger Bahasa Indonesia." Jurnal Linguistik Komputasional (JLK) 2, no. 2 (2019): 34. http://dx.doi.org/10.26418/jlk.v2i2.20.

Full text
Abstract:
Tujuan dari artikel ini adalah membuat kajian literatur terhadap metode pelabelan part-of-speech (POS tagger) untuk Bahasa Indonesia yang telah dilakukan selama 11 tahun terakhir (sejak tahun 2008). Artikel ini dapat menjadi roadmap POS tagger Bahasa Indonesia dan juga dasar pertimbangan untuk pengembangan selanjutnya agar menggunakan dataset dan tagset yang standar sebagai benchmark metode. Terdapat 15 publikasi yang dibahas, pembahasan meliputi dataset, tagset dan metode yang digunakan untuk POS tag Bahasa Indonesia. Dataset yang paling banyak digunakan dan paling mungkin menjadi corpus stan
APA, Harvard, Vancouver, ISO, and other styles
22

Alqrainy, Shihadeh, and Muhammed Alawairdhi. "Towards Developing a Comprehensive Tag Set for the Arabic Language." Journal of Intelligent Systems 30, no. 1 (2020): 287–96. http://dx.doi.org/10.1515/jisys-2019-0256.

Full text
Abstract:
Abstract This paper presents a comprehensive Tag set as a fundamental component for developing an automated Word Class/Part-of-Speech (PoS) tagging system for the Arabic language. The aim is to develop a standard and comprehensive PoS tag set that based upon PoS classes and Arabic inflectional morphology useful for Linguistics and Natural Language Processing (NLP) developers to extract more linguistic information from it. The tag names in the developed tag set uses terminology from Arabic tradition grammar rather than English grammar. The usability of the presented Tag set has been tested in m
APA, Harvard, Vancouver, ISO, and other styles
23

AbuZeina, Dia, and Taqieddin Mostafa Abdalbaset. "Exploring the Performance of Tagging for the Classical and the Modern Standard Arabic." Advances in Fuzzy Systems 2019 (January 23, 2019): 1–10. http://dx.doi.org/10.1155/2019/6254649.

Full text
Abstract:
The part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications. In fact, the PoS taggers contribute as a preprocessing step in various NLP tasks, such as syntactic parsing, information extraction, machine translation, and speech synthesis. In this paper, we examine the performance of a modern standard Arabic (MSA) based tagger for the classical (i.e., traditional or historical) Arabic. In this work, we employed the Stanford Arabic model tagger to evaluate the imperative verbs in the Holy Quran. In fact, the Stanford tagger contains 29 tags; however,
APA, Harvard, Vancouver, ISO, and other styles
24

Alluhaibi, Reyadh, Tareq Alfraidi, Mohammad A. R. Abdeen, and Ahmed Yatimi. "A Comparative Study of Arabic Part of Speech Taggers Using Literary Text Samples from Saudi Novels." Information 12, no. 12 (2021): 523. http://dx.doi.org/10.3390/info12120523.

Full text
Abstract:
Part of Speech (POS) tagging is one of the most common techniques used in natural language processing (NLP) applications and corpus linguistics. Various POS tagging tools have been developed for Arabic. These taggers differ in several aspects, such as in their modeling techniques, tag sets and training and testing data. In this paper we conduct a comparative study of five Arabic POS taggers, namely: Stanford Arabic, CAMeL Tools, Farasa, MADAMIRA and Arabic Linguistic Pipeline (ALP) which examine their performance using text samples from Saudi novels. The testing data has been extracted from di
APA, Harvard, Vancouver, ISO, and other styles
25

Geyken, Alexander, and Jordan Boyd-Graber. "Automatic classification of multi-word expressions in print dictionaries." Lingvisticæ Investigationes. International Journal of Linguistics and Language Resources 26, no. 2 (2004): 187–202. http://dx.doi.org/10.1075/li.26.2.03gey.

Full text
Abstract:
Summary This work demonstrates the assignment of multi-word expressions in print dictionaries to POS classes with minimal linguistic resources. In this application, 32,000 entries from the Wörterbuch der deutschen Idiomatik (H. Schemann 1993) were classified using an inductive description of POS sequences in conjunction with a Brill Tagger trained on manually tagged idiomatic entries. This process assigned categories to 86% of entries with 88% accuracy. This classification supplies a meaningful preprocessing step for further applications: the resulting POS-sequences for all idiomatic entries m
APA, Harvard, Vancouver, ISO, and other styles
26

Zupan, Katja, Nikola Ljubešić, and Tomaž Erjavec. "How to tag non-standard language: Normalisation versus domain adaptation for Slovene historical and user-generated texts." Natural Language Engineering 25, no. 5 (2019): 651–74. http://dx.doi.org/10.1017/s1351324919000366.

Full text
Abstract:
AbstractPart-of-speech (PoS) tagging of non-standard language with models developed for standard language is known to suffer from a significant decrease in accuracy. Two methods are typically used to improve it: word normalisation, which decreases the out-of-vocabulary rate of the PoS tagger, and domain adaptation where the tagger is made aware of the non-standard language variation, either through supervision via non-standard data being added to the tagger’s training set, or via distributional information calculated from raw texts. This paper investigates the two approaches, normalisation and
APA, Harvard, Vancouver, ISO, and other styles
27

Singh, Umrinderpal, and Vishal Goyal. "Punjabi Pos Tagger: Rule Based and HMM." International Journal of Advanced Research in Computer Science and Software Engineering 7, no. 7 (2017): 193. http://dx.doi.org/10.23956/ijarcsse/v7i7/0106.

Full text
Abstract:
The Part of Speech tagger system is used to assign a tag to every input word in a given sentence. The tags may include different part of speech tag for a particular language like noun, pronoun, verb, adjective, conjunction etc. and may have subcategories of all these tags. Part of Speech tagging is a basic and a preprocessing task of most of the Natural Language Processing (NLP) applications such as Information Retrieval, Machine Translation, and Grammar Checking etc. The task belongs to a larger set of problems, namely, sequence labeling problems. Part of Speech tagging for Punjabi is not wid
APA, Harvard, Vancouver, ISO, and other styles
28

Alsharif, Hashem. "A Template-Based Approach for Tagging Non-Vocalized Arabic Nouns." Academic Journal of Research and Scientific Publishing 3, no. 32 (2021): 05–35. http://dx.doi.org/10.52132/ajrsp.e.2021.32.1.

Full text
Abstract:
There exist no corpora of Arabic nouns. Furthermore, in any Arabic text, nouns can be found in different forms. In fact, by tagging nouns in an Arabic text, the beginning of each sentence can determine whether it starts with a noun or a verb. Part of Speech Tagging (POS) is the task of labeling each word in a sentence with its appropriate category, which is called a Tag (Noun, Verb and Article). In this thesis, we attempt to tag non-vocalized Arabic text. The proposed POS Tagger for Arabic Text is based on searching for each word of the text in our lists of Verbs and Articles. Nouns are found
APA, Harvard, Vancouver, ISO, and other styles
29

Setyono, Winkie, Donni Richasdy, and Mahendra Dwifebri Purbolaksono. "POS Tagger Improvisation with the Addition of Foreign Word Labels on Telkom University News." Building of Informatics, Technology and Science (BITS) 4, no. 2 (2022): 588–94. http://dx.doi.org/10.47065/bits.v4i2.1983.

Full text
Abstract:
News is a medium of daily information usually obtained by the public. The news consists of a lot of information in it and is composed of sentence structures. Each language is unique with its own sentence structure, like Indonesian and other foreign languages. But nowadays, many media mix Indonesian with foreign languages, making the sentence structure different from Bahasa Indonesia. To classify these words, Part Of Speech Tagging needed to determine the class of words composed of sentences by learning from the Corpus of each language. With the new sentence structure, POS Tagger requires a lar
APA, Harvard, Vancouver, ISO, and other styles
30

Mohamed, Hassan, Nazlia Omar, and Mohd Juzaiddin Ab Aziz. "The Effectiveness of Using Malay Affixes for Handling Unknown Words In Unsupervised HMM POS Tagger." International Journal of Engineering & Technology 7, no. 4.29 (2018): 9–12. http://dx.doi.org/10.14419/ijet.v7i4.29.21834.

Full text
Abstract:
The challenge in unsupervised Hidden Markov Model (HMM) training for a POS tagger is that the training depends on an untagged corpus; the only supervised data limiting possible tagging of words is a dictionary. A morpheme-based POS guessing algorithm has been introduced to assign unknown words’ probable tags based on linguistically meaningful affixes. Therefore, the exact morphemes of prefixes, suffixes and circumfixes in the agglutinative Malay language is examined before giving tags to unknown words. The algorithm has been integrated into HMM tagger which uses HMM trained parameters for ta
APA, Harvard, Vancouver, ISO, and other styles
31

Jarrar, Mustafa, Diyam Akra, and Tymaa Hammouda. "Alma: Fast Lemmatizer and POS Tagger for Arabic." Procedia Computer Science 244 (2024): 378–87. http://dx.doi.org/10.1016/j.procs.2024.10.212.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Xian, Benjamin Chu Min, Mohamed Lubani, Liew Kwei Ping, Khalil Bouzekri, Rohana Mahmud, and Dickson Lukose. "Benchmarking Mi-POS: Malay Part-of-Speech Tagger." International Journal of Knowledge Engineering 2, no. 3 (2016): 115–21. http://dx.doi.org/10.18178/ijke.2016.2.3.064.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Ali, Arshad, Athar Rashid, and Ameer Sultan. "Sketching Victorian Society: A Corpus Assisted Study of Social Class in Dickens' Great Expectations." Global Language Review V, no. II (2020): 54–61. http://dx.doi.org/10.31703/glr.2020(v-ii).06.

Full text
Abstract:
The present work deals with the adjectives used by Charles Dickens to portray the social class in the novel Great Expectations. The study used a corpus linguistics methodology for data preparation, corpus development, and data analysis. The text of the novel was collected from online sources and used in the compilation of the corpus. The corpus was filtered of additional information and tagged using a part-of-speech tagger (POS tagger). The tagged data was analyzed using AntConc software. The findings of the study suggest that the use of adjectives plays a substantial role in the portrayal of
APA, Harvard, Vancouver, ISO, and other styles
34

Bani, Rkia, Samir Amri, Lahbib Zenkouar, and Zouhair Guennoun. "Toward accurate Amazigh part-of-speech tagging." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 1 (2024): 572. http://dx.doi.org/10.11591/ijai.v13.i1.pp572-580.

Full text
Abstract:
<span lang="EN-US">Part-of-speech (POS) tagging is the process of assigning to each word in a text its corresponding grammatical information POS. It is an important pre-processing step in other natural language processing (NLP) tasks, so the objective of finding the most accurate one. The previous approaches were based on traditional machine learning algorithms, later with the development of deep learning, more POS taggers were adopted. If the accuracy of POS tagging reaches 97%, even with the traditional machine learning, for high resourced language like English, French, it’s far the ca
APA, Harvard, Vancouver, ISO, and other styles
35

Bani, Rkia, Samir Amri, Lahbib Zenkouar, and Zouhair Guennoun. "Toward accurate Amazigh part-of-speech tagging." IAES International Journal of Artificial Intelligence (IJ-AI) 13, no. 1 (2024): 572–80. https://doi.org/10.11591/ijai.v13.i1.pp572-580.

Full text
Abstract:
Part-of-speech (POS) tagging is the process of assigning to each word in a text its corresponding grammatical information POS. It is an important pre-processing step in other natural language processing (NLP) tasks, so the objective of finding the most accurate one. The previous approaches were based on traditional machine learning algorithms, later with the development of deep learning, more POS taggers were adopted. If the accuracy of POS tagging reaches 97%, even with the traditional machine learning, for high resourced language like English, French, it’s far the case in low resource
APA, Harvard, Vancouver, ISO, and other styles
36

Ali, Arshad, Athar Rashid, and Ameer Sultan. "Exploring Personal Deixis in Western Music: A Corpus-Based Study." Global Regional Review V, no. IV (2020): 106–17. http://dx.doi.org/10.31703/grr.2020(v-iv).11.

Full text
Abstract:
Pragmatics informs us about the relationship between the use of language and its context. This relationship is identified through person deixis. This research interprets the reference meaning of personal deixis and looks at the most frequent personal deixis used in the lyrics of male and female English singers. This research uses a corpus method for the analysis. The data was collected from online sources to compile corpora of songs sung by male and female singers. The research has adopted both qualitative and quantitative approaches for the analysis of corpora. The corpus was tagged using par
APA, Harvard, Vancouver, ISO, and other styles
37

Dewi, Nindian Puspa, and Ubaidi Ubaidi. "POS Tagging Bahasa Madura dengan Menggunakan Algoritma Brill Tagger." Jurnal Teknologi Informasi dan Ilmu Komputer 7, no. 6 (2020): 1121. http://dx.doi.org/10.25126/jtiik.2020722449.

Full text
Abstract:
<p class="Abstrak">Bahasa Madura adalah bahasa daerah yang selain digunakan di Pulau Madura juga digunakan di daerah lainnya seperti di kota Jember, Pasuruan, dan Probolinggo. Sebagai bahasa daerah, Bahasa Madura mulai banyak ditinggalkan khususnya di kalangan anak muda. Beberapa penyebabnya adalah adanya rasa gengsi dan tingkat kesulitan untuk mempelajari Bahasa Madura yang memiliki ragam dialek dan tingkat bahasa. Berkurangnya penggunaan Bahasa Madura dapat mengakibatkan punahnya Bahasa Madura sebagai salah satu bahasa daerah yang ada di Indonesia. Oleh karena itu, perlu adanya usaha u
APA, Harvard, Vancouver, ISO, and other styles
38

BAR-HAIM, ROY, KHALIL SIMA'AN, and YOAD WINTER. "Part-of-speech tagging of Modern Hebrew text." Natural Language Engineering 14, no. 2 (2008): 223–51. http://dx.doi.org/10.1017/s135132490700455x.

Full text
Abstract:
AbstractWords in Semitic texts often consist of a concatenation ofword segments, each corresponding to a part-of-speech (POS) category. Semitic words may be ambiguous with regard to their segmentation as well as to the POS tags assigned to each segment. When designing POS taggers for Semitic languages, a major architectural decision concerns the choice of the atomic input tokens (terminal symbols). If the tokenization is at the word level, the output tags must be complex, and represent both the segmentation of the word and the POS tag assigned to each word segment. If the tokenization is at th
APA, Harvard, Vancouver, ISO, and other styles
39

Alayiaboozar, Elham. "Studying the possibility of improving the function of a POS tagger system." Comparative Linguistic Research 10, no. 19 (2020): 95–110. https://doi.org/10.22084/RJHLL.2019.16614.1834.

Full text
Abstract:
The aim of the present study is to check the possibility of improving the function of a POS tagger system via POS tag disambiguation of some of Persian noun and adjective homographs ending in <-ی>. The case study in present research is HAZM.The POS tag disambiguation program is based on some context-sensitive rules. the mentioned rules were extracted from Bijan Khan corpus, Hazm was trained by Bijan Khan corpus. General evaluation of the mentioned POS disambiguation program indicates that if some of the context-sensitive rules which play a role in better POS tagging are added to HAZM, th
APA, Harvard, Vancouver, ISO, and other styles
40

Daimary, Surjya Kanta, Vishal Goyal, Madhumita Barbora, and Umrinderpal Singh. "Development of Part of Speech Tagger for Assamese Using HMM." International Journal of Synthetic Emotions 9, no. 1 (2018): 23–32. http://dx.doi.org/10.4018/ijse.2018010102.

Full text
Abstract:
This article presents the work on the Part-of-Speech Tagger for Assamese based on Hidden Markov Model (HMM). Over the years, a lot of language processing tasks have been done for Western and South-Asian languages. However, very little work is done for Assamese language. So, with this point of view, the POS Tagger for Assamese using Stochastic Approach is being developed. Assamese is a free word-order, highly agglutinate and morphological rich language, thus developing POS Tagger with good accuracy will help in development of other NLP task for Assamese. For this work, an annotated corpus of 27
APA, Harvard, Vancouver, ISO, and other styles
41

Lyashevskaya, Olga, and Ilia Afanasev. "An HMM-Based PoS Tagger for Old Church Slavonic." Journal of Linguistics/Jazykovedný casopis 72, no. 2 (2021): 556–67. http://dx.doi.org/10.2478/jazcas-2021-0051.

Full text
Abstract:
Abstract We present a hybrid HMM-based PoS tagger for Old Church Slavonic. The training corpus is a portion of one text, Codex Marianus (40k) annotated with the Universal Dependencies UPOS tags in the UD-PROIEL treebank. We perform a number of experiments in within-domain and out-of-domain settings, in which the remaining part of Codex Marianus serves as a within-domain test set, and Kiev Folia is used as an out-of-domain test set. Analysing by-PoS-class precision and sensitivity in each run, we combine a simple context-free n-gram-based approach and Hidden Markov method (HMM), and added lingu
APA, Harvard, Vancouver, ISO, and other styles
42

Passban, Peyman, Qun Liu, and Andy Way. "Boosting Neural POS Tagger for Farsi Using Morphological Information." ACM Transactions on Asian and Low-Resource Language Information Processing 16, no. 1 (2016): 1–15. http://dx.doi.org/10.1145/2934676.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Ali Abumalloh, Rabab, Hasan Muaidi Al-Serhan, Othman Bin Ibrahim, and Waheeb Abu-Ulbeh. "Arabic Part-of-Speech Tagger, an Approach Based on Neural Network Modelling." International Journal of Engineering & Technology 7, no. 2.29 (2018): 742. http://dx.doi.org/10.14419/ijet.v7i2.29.14009.

Full text
Abstract:
POS-tagging gained the interest of researchers in computational linguistics sciences in the recent years. Part-of-speech tagging systems assign the proper grammatical tag or morpho-syntactical category labels automatically to every word in the corpus per its appearance on the text. POS-tagging serves as a fundamental and preliminary step in linguistic analysis which can help in developing many natural language processing applications such as: word processing systems, spell checking systems, building dictionaries and in parsing systems. Arabic language gained the interest of researchers which l
APA, Harvard, Vancouver, ISO, and other styles
44

Amri, S., R. Bani, L. Zenkouar, and Z. Guennoun. "Improving Amazigh POS tagging using machine learning." Mathematical Modeling and Computing 11, no. 3 (2024): 741–51. http://dx.doi.org/10.23939/mmc2024.03.741.

Full text
Abstract:
Tamazight, Berber, and Amazigh are the multiple names for the same language. It covers a great geographical area including the north of Africa, Sahara Sahel. It is spread principally in Morocco, Algeria, Tunisia, and Mali. In terms of natural language processing, it is considered a low-resource language. This paper presents multiple applications of different machine learning algorithms for part-of-speech tagging Amazigh for the first time. Those algorithms include trigrams 'n' tags (TnT), Brill tagging, hidden Markov model (HMM), Unigram, Bigram, Unigram + Bigram,and conditional random fields
APA, Harvard, Vancouver, ISO, and other styles
45

Memon, Adnan Ali, Saman Hina, Abdul Karim Kazi, and Saad Ahmed. "Parts-of-speech tagger for Sindhi language using deep neural network architecture." Mehran University Research Journal of Engineering and Technology 43, no. 3 (2024): 47. http://dx.doi.org/10.22581/muet1982.2768.

Full text
Abstract:
Language is a fundamental medium for human communication, encompassing spoken and written forms, each governed by grammatical rules. Sindhi, one of the oldest languages, is characterized by its rich morphology and grammatical structure. Part-of-speech (POS) tagging, a crucial process in natural language processing, involves assigning grammatical tags to words. This research presents a novel approach to POS tagging for Sindhi text using deep learning techniques. We developed a POS tagger employing Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models, with LSTM demonstrating super
APA, Harvard, Vancouver, ISO, and other styles
46

Wubetu, Barud Demilie. "Analysis of implemented part of speech tagger approaches: The case of Ethiopian languages." Indian Journal of Science and Technology 13, no. 48 (2020): 4661–71. https://doi.org/10.17485/IJST/v13i48.1876.

Full text
Abstract:
Abstract <strong>Objective:</strong>&nbsp;To review Part of Speech (POS) tagging works that have been done for the Ethiopian languages.&nbsp;<strong>Methods:</strong>&nbsp;All methods that have been implemented to develop POS tagging for the Ethiopian languages have been mentioned.&nbsp;<strong>Findings:</strong>&nbsp;Since all implemented POS tagging methods have been mentioned in this work, the result will be used for future natural language processing researchers to select the best methodology.&nbsp;<strong>Novelty:</strong>&nbsp;The work includes all implemented POS tagging research works
APA, Harvard, Vancouver, ISO, and other styles
47

Kumar, S., M. Anand Kumar, and K. P. Soman. "Deep Learning Based Part-of-Speech Tagging for Malayalam Twitter Data (Special Issue: Deep Learning Techniques for Natural Language Processing)." Journal of Intelligent Systems 28, no. 3 (2019): 423–35. http://dx.doi.org/10.1515/jisys-2017-0520.

Full text
Abstract:
Abstract The paper addresses the problem of part-of-speech (POS) tagging for Malayalam tweets. The conversational style of posts/tweets/text in social media data poses a challenge in using general POS tagset for tagging the text. For the current work, a tagset was designed that contains 17 coarse tags and 9915 tweets were tagged manually for experiment and evaluation. The tagged data were evaluated using sequential deep learning methods like recurrent neural network (RNN), gated recurrent units (GRU), long short-term memory (LSTM), and bidirectional LSTM (BLSTM). The training of the model was
APA, Harvard, Vancouver, ISO, and other styles
48

Jabar, H. Yousif. "Neural Computing based Part of Speech Tagger for Arabic Language: A review study." International Journal of Computation and Applied Sciences IJOCAAS 1, no. 5 (2020): 361–65. https://doi.org/10.5281/zenodo.4002418.

Full text
Abstract:
this paper aims to explore the implementation of part of speech tagger (POS) for Arabic Language using neural computing. The Arabic Language is one of the most important languages in the world. More than 422 million people use the Arabic Language as the primary media for writing and speaking. The part of speech is one crucial stage for most natural languages processing. Many factors affect the performance of POS including the type of language, the corpus size, the tag-set, the computation model. The artificial neural network (ANN) is modern paradigms that simulate the human behavior to learn,
APA, Harvard, Vancouver, ISO, and other styles
49

Kamau, Gabriel. "Data-Driven Part-of-Speech Tagging for the Gikuyu Language: Development, Challenges, and Prospects." International Journal on Natural Language Computing 13, no. 5/6 (2024): 15–26. https://doi.org/10.5121/ijnlc.2024.13602.

Full text
Abstract:
This paper presents the development of a data-driven Part-of-Speech (POS) tagger for Gikuyu, a Bantu language spoken in Kenya. Gikuyu, like many indigenous African languages, is under-resourced, with limited computational tools for linguistic processing. By employing a corpus sourced primarily from the Gikuyu Bible and leveraging a Memory-Based Tagging (MBT) approach, this study demonstrates the feasibility of creating a robust POS tagging system. The tagger achieved a precision of 90.44%, a recall of 88.34%, and an F-score of 91.35%. These results underscore its potential for applications in
APA, Harvard, Vancouver, ISO, and other styles
50

Baniata, Laith H., Seyoung Park, and Seong-Bae Park. "A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects." Applied Sciences 8, no. 12 (2018): 2502. http://dx.doi.org/10.3390/app8122502.

Full text
Abstract:
The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM) - Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder betw
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!