Log in

Relevant bibliographies by topics / Analysis of item difficulty / Journal articles

To see the other types of publications on this topic, follow the link: Analysis of item difficulty.

Journal articles on the topic 'Analysis of item difficulty'

Author: Grafiati

Published: 4 June 2021

Last updated: 15 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Analysis of item difficulty.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Cuhadar, Ismail, Yanyun Yang, and Insu Paek. "Consequences of Ignoring Guessing Effects on Measurement Invariance Analysis." Applied Psychological Measurement 45, no. 4 (May 17, 2021): 283–96. http://dx.doi.org/10.1177/01466216211013915.

Full text

Abstract:

Pseudo-guessing parameters are present in item response theory applications for many educational assessments. When sample size is not sufficiently large, the guessing parameters may be ignored from the analysis. This study examines the impact of ignoring pseudo-guessing parameters on measurement invariance analysis, specifically, on item difficulty, item discrimination, and mean and variance of ability distribution. Results show that when non-zero guessing parameters are ignored from the measurement invariance analysis, item discrimination estimates tend to decrease particularly for more difficult items, and item difficulty estimates decrease unless the items are highly discriminating and difficult. As the guessing parameter increases, the size of the decrease in item discrimination and difficulty tends to increase, and the estimated mean and variance of ability distribution tend to be inaccurate. When two groups have heterogeneous ability distributions, ignoring the guessing parameter affects the reference group and the focal group differently. Implications of result findings are discussed.

APA, Harvard, Vancouver, ISO, and other styles

2

Alpusari, Mahmud. "ANALISIS BUTIR SOAL KONSEP DASAR IPA 1 MELALUI PENGGUNAAN PROGRAM KOMPUTER ANATES VERSI 4.0 FOR WINDOWS." Primary: Jurnal Pendidikan Guru Sekolah Dasar 3, no. 2 (January 8, 2015): 106. http://dx.doi.org/10.33578/jpfkip.v3i2.2501.

Full text

Abstract:

This research was qualitative research with descriptive method. Subject was student teachers who took Fundamental Science 1. Based on validity analysis of item on 1 % significance level, there were 16 valid items, 26 valid ietms for 5 % significance level, and 14 invalid items. Then, analysis of distinguishing items, the item number 20 was very worst, 15 items were poor, other 15 item were fair and the other items were good. Meanwhile, analysis of level of difficulty, 17 item were very easy, 9 items were easy, 11 items were moderate, an item was difficult, and the others were very difficult. Analysis for whole items, there were only 21 items that were ready to be used, 5 items were needed to be revised, and the others could not be used in a testKey words : Konsep Dasar IPA 1, validity, distinguishing items, level of difficulity

APA, Harvard, Vancouver, ISO, and other styles

3

Habib, Md Ahsan, Humayun Kabir Talukder, Md Mahbubur Rahman, and Shahnila Ferdousi. "Post-application Quality Analysis of MCQs of Preclinical Examination Using Item Analysis." Bangladesh Journal of Medical Education 7, no. 1 (April 18, 2017): 2–7. http://dx.doi.org/10.3329/bjme.v7i1.32220.

Full text

Abstract:

Multiple choice questions (MCQs) have considerable role in the preclinical medical assessment, both formative as well as summative. This cross sectional descriptive study was conducted to observe the quality of MC items (completion type) of anatomy, biochemistry and physiology used in preclinical undergraduate medical examinations of 2012 and 2013 of a public university of Bangladesh. Each MC item had a stem and 5 options, and 1200 options were analyzed for difficulty and discrimination indices. Total 556 options were false statements (distracters) and were analyzed to observe their effectiveness as distracter. The study revealed that 18.67% of options were with appropriate difficulty (0.660.80). Highest frequency (43.5%) of difficulty indices was in easy class interval (0.911). Over all frequencies of items of three subjects in the ascending order were difficult, appropriate, marginal and easy as per their difficulty indices. Satisfactory or better discrimination indices (=0.20) were observed in 29.33% options. The mean difficulty and discrimination indices observed were respectively 0.82±0.18 (95% confidence interval [CI] 0.81 to 0.83) and 0.13±0.14 (95% CI 0.122 to 0.138). Out of the options, 6.75% had negative discrimination indices. Items with difficulty index around 0.60 had maximum discriminatory power (up to 0.68) and more difficult as well as easy items had less discriminatory ability. Out of the distracters 83.45% were observed effective and the mean effectiveness was 22.3±18.7% (95% CI 20.75% to 23.85%). The study recommended using the method and findings to improve the quality of the items leading to development of a standard Question Bank.Bangladesh Journal of Medical Education Vol.7(1) 2016: 2-7

APA, Harvard, Vancouver, ISO, and other styles

4

Suruchi, Suruchi, and Surender Singh Rana. "Test Item Analysis and Relationship Between Difficulty Level and Discrimination Index of Test Items in an Achievement Test in Biology." Paripex - Indian Journal Of Research 3, no. 6 (January 15, 2012): 56–58. http://dx.doi.org/10.15373/22501991/june2014/18.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Pradanti, Septi Ika, Martono Martono, and Teguh Sarosa. "An Item Analysis of English Summative Test for The First Semester of The Third Grade Junior High School Students in Surakarta." English Education 6, no. 3 (May 29, 2018): 312. http://dx.doi.org/10.20961/eed.v6i3.35891.

Full text

Abstract:

<p>The aims of this research are:(1) to find out whether the multiple-choice items of English summative test for the third grade junior high school students in Surakarta have fulfilled the criteria of good test or not; and (2)To describe whether the multiple-choice items of English summative test for the third grade junior high school students in Surakarta have fulfilled the criteria of good test item viewed from difficulty level, discrimination power, and distractor evaluation or not. The data were taken from 100 students‟ answer sheets in five schools in Surakarta. The test item was analyzed by using item analysis technique seen from the score and the language content analysis. The analysis was considered by using three aspects i.e. index of discriminating power, level of difficulty and the distractor evaluation. The research finding shows that (1) the difficulty level shows 13% for very difficult item, 22% for very easy item, and 64% for satisfactory item; (2) only 35% of all test items have appropriate index of discriminating power; and (3) only 49% of the distractors are effective since they were selected by the students. Based on the three criteria of a good test item above, there are only 46% of multiple choice items fulfilling the criteria of a good test items.</p>

APA, Harvard, Vancouver, ISO, and other styles

6

Khilnani, Ajeet Kumar, Rekha Thaddanee, and Gurudas Khilnani. "Development of multiple choice question bank in otorhinolaryngology by item analysis: a cross-sectional study." International Journal of Otorhinolaryngology and Head and Neck Surgery 5, no. 2 (February 23, 2019): 449. http://dx.doi.org/10.18203/issn.2454-5929.ijohns20190779.

Full text

Abstract:

<p class="abstract"><strong>Background:</strong> Multiple choice questions (MCQs) are routinely used for formative and summative assessment in medical education. Item analysis is a process of post validation of MCQ tests, whereby items are analyzed for difficulty index, discrimination index and distractor efficiency, to obtain a range of items of varying difficulty and discrimination indices. This study was done to understand the process of item analysis and analyze MCQ test so that a valid and reliable MCQ bank in otorhinolaryngology is developed.</p><p class="abstract"><strong>Methods:</strong> 158 students of 7<sup>th</sup> Semester were given an 8 item MCQ test. Based on the marks achieved, the high achievers (top 33%, 52 students) and low achievers (bottom 33%, 52 students) were included in the study. The responses were tabulated in Microsoft Excel Sheet and analyzed for difficulty index, discrimination index and distractor efficiency. </p><p class="abstract"><strong>Results:</strong> The mean (SD) difficulty index (Diff-I) of 8 item test was 61.41% (11.81%). 5 items had a very good difficulty index (41% to 60%), while 3 items were easy (Diff-I >60%). There was no item with Diff-I <30%, i.e. a difficult item, in this test. The mean (SD) discrimination index (DI) of the test was 0.48 (0.15), and all items had very good discrimination indices of more than 0.25. Out of 24 distractors, 6 (25%) were non-functional distractors (NFDs). The mean (SD) distractor efficiency (DE) of the test was 74.62% (23.79%).</p><p class="abstract"><strong>Conclusions:</strong> Item analysis should be an integral and regular activity in each department so that a valid and reliable MCQ question bank is developed.</p>

APA, Harvard, Vancouver, ISO, and other styles

7

Park, Eun-Young, and Soojung Chae. "Rasch Analysis of the Korean Parenting Stress Index Short Form (K-PSI-SF) in Mothers of Children with Cerebral Palsy." International Journal of Environmental Research and Public Health 17, no. 19 (September 25, 2020): 7010. http://dx.doi.org/10.3390/ijerph17197010.

Full text

Abstract:

The purpose of this study was to investigate the psychometric characteristics of the Korean Parenting Stress Index Short Form (K-PSI-SF) for mothers of children with cerebral palsy (CP) by using a Rasch analysis. The participants were 114 mothers of children with CP whose ages ranged from 2.79 to 11.90 years. The K-PSI-SF consists of 36 items, with a 5-point Likert scale grading along three subscales (Parent Distress, Parent–Child Dysfunctional Interaction, and Difficult Child). The response data were analyzed, and we determined the item fitness and item difficulty, rating scale fit, and separation index. The results show that two items did not have the required fitness. After these two items were deleted, the means of the 34 items in two of the subscales were statistically different from those of the original 36 items. Our analysis of the item difficulty identified the need to add easier question items. The 5-point Likert scale used in the questionnaire was found to be appropriate. This significance of this study is that it suggested the need to modify item fitness and difficulty level, as it identified the psychometric characteristics of the K-PSI-SF through a Rasch analysis based on the item response theory.

APA, Harvard, Vancouver, ISO, and other styles

8

Nair, Manju K., and Dawnji S. R. "Quality of multiple choice questions in undergraduate pharmacology assessments in a teaching hospital of Kerala, India: an item analysis." International Journal of Basic & Clinical Pharmacology 6, no. 6 (May 23, 2017): 1265. http://dx.doi.org/10.18203/2319-2003.ijbcp20172001.

Full text

Abstract:

Background: Carefully constructed, high quality multiple choice questions can serve as effective tools to improve standard of teaching. This item analysis was performed to find the difficulty index, discrimination index and number of non functional distractors in single best response type questions.Methods: 40 single best response type questions with four options, each carrying one mark for the correct response, was taken for item analysis. There was no negative marking. The maximum marks was 40. Based on the scores, the evaluated answer scripts were arranged with the highest score on top and the least score at the bottom. Only the upper third and lower third were included. The response to each item was entered in Microsoft excel 2010. Difficulty index, Discrimination index and number of non functional distractors per item were calculated.Results: 40 multiple choice questions and 120 distractors were analysed in this study. 72.5% items were good with a difficulty index between 30%-70%. 25% items were difficult and 2.5% items were easy. 27.5% items showed excellent discrimination between high scoring and low scoring students. One item had a negative discrimination index (-0.1). There were 9 items with non functional distractors.Conclusions: This study emphasises the need for improving the quality of multiple choice questions. Hence repeated evaluation by item analysis and modification of non functional distractors may be performed to enhance standard of teaching in Pharmacology.

APA, Harvard, Vancouver, ISO, and other styles

9

Shanmugam, S. Kanageswari Suppiah, Vincent Wong, and Murugan Rajoo. "EXAMINING THE QUALITY OF ENGLISH TEST ITEMS USING PSYCHOMETRIC AND LINGUISTIC CHARACTERISTICS AMONG GRADE SIX PUPILS." Malaysian Journal of Learning and Instruction 17, Number 2 (July 31, 2020): 63–101. http://dx.doi.org/10.32890/mjli2020.17.2.3.

Full text

Abstract:

Purpose - This study examined the quality of English test items using psychometric and linguistic characteristics among Grade Six pupils. Method - Contrary to the conventional approach of relying only on statistics when investigating item quality, this study adopted a mixed-method approach by employing psychometric analysis and cognitive interviews. The former was conducted on 30 Grade Six pupils, with each item representing a different construct commonly found in English test papers. Qualitative input was obtained through cognitive interviews with five Grade Six pupils and expert judgements from three teachers. Findings - None of the items were found to be too easy or difficult, and all items had positive discrimination indices. The item on idioms was most ideal in terms of difficulty and discrimination. Difficult items were found to be vocabulary-based. Surprisingly, the higher-order-thinking subjective items proved to be excellent in difficulty, although improvements could be made on their ability to discriminate. The qualitative expert judgements agreed with the quantitative psychometric analysis. Certain results from the item analysis, however, contradicted past findings that items with the ideal item difficulty value between 0.4 and 0.6 would have equally ideal item discrimination index. Significance -The findings of the study can serve as a reminder on the significance of using Classical Test Theory, a non-complex psychometric approach in assisting classroom teacher practitioners during the meticulous process of test design and ensuring test item quality.

APA, Harvard, Vancouver, ISO, and other styles

10

Rusch, Reuben R., Cynthia L. Trigg, Ray Brogan, and Scott Petriquin. "Item Difficulty and Item Validity for the Children's Group Embedded Figures Test." Perceptual and Motor Skills 78, no. 1 (February 1994): 75–79. http://dx.doi.org/10.2466/pms.1994.78.1.75.

Full text

Abstract:

The validity and reliability of the Children's Group Embedded Figures Test was reported for students in Grade 2 by Cromack and Stone in 1980; however, a search of the literature indicates no evidence for internal consistency or item analysis. Hence the purpose of this study was to examine the item difficulty and item validity of the test with children in Grades 1 and 2. Confusion in the literature over development and use of this test was seemingly resolved through analysis of these descriptions and through an interview with the test developer. One early-appearing item was unreasonably difficult. Two or three other items were quite difficult and made little contribution to the total score. Caution is recommended, however, in any reordering or elimination of items based on these findings, given the limited number of subjects ( n = 84).

APA, Harvard, Vancouver, ISO, and other styles

11

Danuwijaya, Ari Arifin. "ITEM ANALYSIS OF READING COMPREHENSION TEST FOR POST-GRADUATE STUDENTS." English Review: Journal of English Education 7, no. 1 (December 9, 2018): 29. http://dx.doi.org/10.25134/erjee.v7i1.1493.

Full text

Abstract:

Developing a test is a complex and reiterative process which subject to revision even if the items were developed by skilful item writers. Many commercial test publishers need to conduct test analysis, rather than trusting the item writers� judgement and skills to improve the quality of items that need to be proven statistically after trying out was performed. This study is a part of test development process which aims to analyse the reading comprehension test items. One hundred multiple choice questions were pilot tested to 50 postgraduate students in one university. The pilot testing was aimed to investigate item quality which can further be developed better. The responses were then analysed using Classical Test Theory and using psychometric software called Lertap. The results showed that item difficulty level was mostly average. In terms of item discrimination, more than half of the total items were categorized marginal which required further modifications. This study suggests some recommendation that can be useful to improve the quality of the developed items.��Keywords: reading comprehension; item analysis; classical test theory; item difficulty; test development.

APA, Harvard, Vancouver, ISO, and other styles

12

Lailiyah, Lailiyah, Yetti Supriyati, and Komarudin Komarudin. "ANALYSIS OF MEASURES ITEMS IN DEVELOPMENT OF INSTRUMENTS SELF-ASSESSMENT (RASCH MODELING APPLICATION)." JISAE: JOURNAL OF INDONESIAN STUDENT ASSESMENT AND EVALUATION 4, no. 1 (February 21, 2018): 1–9. http://dx.doi.org/10.21009/jisae.041.01.

Full text

Abstract:

This analysis aims to determine the quality of the instrument items that have been developed in the empirical test phase one. Tests were carried out on 46 items to 219 respondents in SMA Ksatrya Jakarta. The item quality is seen from the fit or not fit and the level of difficulty of the item that has been developed. The fit or unfit criteria are seen in INFIT and OUTFIT, both MNSQ and ZSTD, and Pt-Measure Correlation values. The level of difficulty of the item is seen in the entry number column which is indicated by the magnitude of the logit value and has been sorted from the hardest to the easiest. Based on the results of analysis with the help of software winstep obtained 39 items statement fit with the model and the number of respondents 194, the three criteria above (MNSQ, ZSTD, and Pt.Measure Correlation) has been met. This means that 39 items are valid. The result of the analysis also shows the most difficult item sequence is item 5 with logit value 63,32, and the easiest item is item 44 with logit value 36,13. The resulting fit instrument must have gone through several stages of analysis. When there are items that are not fit, the item is issued, as well as the respondent. So that obtained a set of measuring instruments that are valid / fit with the model and can be used for the purposes of assessment. Keyword: self-assessment, infit, outfit, ZSTD, and Rasch Model.

APA, Harvard, Vancouver, ISO, and other styles

13

Hartati, Neti, and Hendro Pratama Supra Yogi. "Item Analysis for a Better Quality Test." English Language in Focus (ELIF) 2, no. 1 (September 26, 2019): 59. http://dx.doi.org/10.24853/elif.2.1.59-70.

Full text

Abstract:

This study is a small-scale study of item analysis of a teacher’s own-made summative test. It examines the quality of multiple-choice items in terms of the difficulty level, the discriminating power, and the effectiveness of distractors. The study employed a qualitative approach which also used a simple quantitative analysis to analyze the quality of the test items through the document analysis of the teacher’s English summative test and the students’ answer sheets. The result shows that the summative test has more easy items than difficult items with the ratio of 19:25:6 while they should be 1:2:1 for easy, medium, and difficult. In terms of the Discriminating Power, there are 3, 13, and 16 for excellent, Good, and satisfactory level, but there are 17 and 2 for poor and bad levels of Discriminating Power. There are 43 (21.5%) of all distractors which are dysfunctional which, in turns, makes the items too easy which also makes the items fail to discriminate the upper-group students from the lower ones. Therefore, the 43 dysfunctional distractors should be revised to alter the difficulty level and improve the discriminating power. This research is expected to serve as a reflective means for teachers to examine their own-made test to ensure the quality of their test items.

APA, Harvard, Vancouver, ISO, and other styles

14

Palimbong, Jefri, Mujasam Mujasam, and Alberto Yonathan Tangke Allo. "Item Analysis Using Rasch Model in Semester Final Exam Evaluation Study Subject in Physics Class X TKJ SMK Negeri 2 Manokwari." Kasuari: Physics Education Journal (KPEJ) 1, no. 1 (June 9, 2019): 43–51. http://dx.doi.org/10.37891/kpej.v1i1.40.

Full text

Abstract:

The purpose of this research is to item analyze in semester final exam evaluation using Raschmodel in terms of validity, reliability, and difficulty level study subject in physics class X TKJ SMK Negeri 2Manokwari. This research was a qualitative research that is evaluation of learning outcomes usingquantitative descriptive method with data collection techniques used documentation. The data of respondents44 students with the number of items 30 questions, obtained the result are the validity of the conformity levelof item 26 questions was fit and 4 questions not fit. The students reliability 0,37 is weak, the item reliability0,83 is good and the reliability between students with item 0,42 is bad. The difficult level of item general inthe medium category, the means is very good because it is not difficult and not easy. The conclusion itemanalysis using rasch model in semester final exam evaluation were valid, reliable, and the difficulty level ofitem is very good.

APA, Harvard, Vancouver, ISO, and other styles

15

Trace, Jonathan, James Dean Brown, Gerriet Janssen, and Liudmila Kozhevnikova. "Determining cloze item difficulty from item and passage characteristics across different learner backgrounds." Language Testing 34, no. 2 (August 3, 2016): 151–74. http://dx.doi.org/10.1177/0265532215623581.

Full text

Abstract:

Cloze tests have been the subject of numerous studies regarding their function and use in both first language and second language contexts (e.g., Jonz & Oller, 1994; Watanabe & Koyama, 2008). From a validity standpoint, one area of investigation has been the extent to which cloze tests measure reading ability beyond the sentence level. Using test data from 50 30-item cloze passages administered to 2,298 Japanese and 5,170 Russian EFL students, this study examined the degree to which linguistic features for cloze passages and items influenced item difficulty. Using a common set of 10 anchor items, all 50 tests were modeled in terms of person ability and item difficulty onto a single scale using many-faceted Rasch measurement ( k = 1314). Principle components analysis was then used to categorize 25 linguistic item- and passage-level variables for the 50 cloze tests and their respective items, from which three components for each passage- and item-level variables were identified. These six factors along with item difficulty were then entered into both a hierarchical structural equation model and a linear multiple regression to determine the degree to which difficulty in cloze tests could be explained separately by passage and item features. Comparisons were further made by looking at differences in models by nationality and by proficiency level (e.g., high and low). The analyses revealed noteworthy differences in mean item difficulties and in the variance structures between passage- and item-level features, as well as between different examinee proficiency groups.

APA, Harvard, Vancouver, ISO, and other styles

16

Maharani, Amalia Vidya, and Nur Hidayanto Pancoro Setyo Putro. "Item Analysis of English Final Semester Test." Indonesian Journal of EFL and Linguistics 5, no. 2 (December 1, 2020): 491. http://dx.doi.org/10.21462/ijefl.v5i2.302.

Full text

Abstract:

Numerous studies have been conducted on the item test analysis in English test. However, investigation on the characteristics of a good test of English final semester test is still rare in several districts in East Java. This research sought to examine the quality of the English final semester test in the academic year of 2018/2019 in Ponorogo. A total of 151 samples in the form of students’ answers to the test were analysed based on item difficulty, item discrimination, and distractors’ effectiveness using Quest program. This descriptive quantitative research revealed that the test does not have good proportion among easy, medium, and difficult item. In the item discrimination, the test had 39 excellent items (97.5%) which meant that the test could discriminate among high and low achievers. Besides, the distractors could distract students since there were 32 items (80%) that had effective distractors. The findings of this research provided insights that item analysis became important process in constructing test. It related to find the quality of the test that directly affects the accuracy of students’ score.

APA, Harvard, Vancouver, ISO, and other styles

17

Shamsuddin, Hasni, Nordin Abdul Razak, and Ahmad Zamri Khairani. "Calibrating Students’ Performance in Mathematics: A Rasch Model Analysis." International Journal of Engineering & Technology 7, no. 3.20 (September 1, 2018): 109. http://dx.doi.org/10.14419/ijet.v7i3.20.18991.

Full text

Abstract:

Rasch model analysis is an important tools in analysing students’ performance at item level. As such, the purpose of this study is to calibrate 14 years old students’ performance in mathematics test based on the item difficulty parameter. 307 Form 2 students provide responses for this study. A 40-item multiple choice test was developed to gauge the responses. Results show that two of the items need to be dropped since they did not meet the Rasch model’s expectations. Analysis on the remaining items showed that the students were most competent in item related to Directed Numbers (mean = -1.445 logits), while they are least competent in the topic of Circle (mean = 1.065 logits). We also provide calibration of the performance at item level. In addition, we discuss how to the findings might be helpful for teachers in addressing students’ difficulty in the topics.

APA, Harvard, Vancouver, ISO, and other styles

18

Burud, Ismail, Kavitha Nagandla, and Puneet Agarwal. "Impact of distractors in item analysis of multiple choice questions." International Journal of Research in Medical Sciences 7, no. 4 (March 27, 2019): 1136. http://dx.doi.org/10.18203/2320-6012.ijrms20191313.

Full text

Abstract:

Background: Item analysis is a quality assurance of examining the performance of the individual test items that measures the validity and reliability of exams. This study was performed to evaluate the quality of the test items with respect to their performance on difficulty index (DFI), Discriminatory index (DI) and assessment of functional and non-functional distractors (FD and NFD).Methods: This study was performed on the summative examination undertaken by 113 students. The analyses include 120 one best answers (OBAs) and 360 distractors.Results: Out of the 360 distractors, 85 distractors were chosen by less than 5% with the distractor efficiency of 23.6%. About 47 (13%) items had no NFDs while 51 (14%), 30 (8.3%), and 4 (1.1%) items contained 1, 2, and 3 NFDs respectively. Majority of the items showed excellent difficulty index (50.4%, n=42) and fair discrimination (37%, n=33). The questions with excellent difficulty index and discriminatory index showed statistical significance with 1NFD and 2 NFD (p=0.03).Conclusions: The post evaluation of item performance in any exam in one of the quality assurance method of identifying the best performing item for quality question bank. The distractor efficiency gives information on the overall quality of item.

APA, Harvard, Vancouver, ISO, and other styles

19

Park, Eun-Young, Yoo Im Choi, and Jung-Hee Kim. "Psychometric Properties of the Korean Dispositional Hope Scale Using the Rasch Analysis in Stroke Patients." Occupational Therapy International 2019 (November 11, 2019): 1–6. http://dx.doi.org/10.1155/2019/7058415.

Full text

Abstract:

Background. It is reported that hopeful thinking plays a positive role in encouraging patients to achieve functional goals during the rehabilitation process. Hope is a key concept in evaluating stroke outcomes in research and rehabilitation practice. Aims. The purpose of this study was to investigate the psychometric properties of the Korean Dispositional Hope Scale (K-DHS) using the Rasch analysis in patients with hemiplegic stroke. Methods. The K-DHS was completed by 166 community-dwelling hemiplegic stroke patients in Korea. Data were analyzed according to item fit, item difficulty, and the appropriateness of the rating scale using the Rasch analysis. Results. Item fit analysis showed that 8 items of the K-DHS are appropriate because the infit MSNQ was between 0.7 and 1.3. Item difficulty results revealed that there is a difference in distribution between personal attributes and item difficulty. It shows that the item fit statistics of the 4-point Likert scale of K-DHS are all good. The person separation index demonstrated that the K-DHS could differentiate two or three hope status strata in stroke patients. The item separation index indicated that the items were useful with high reliability. Conclusion. The K-DHS comprises appropriate items for measuring the hope of stroke patients living in the community, and the rating scale of the K-DHS is also appropriate. This study is the first to conduct an analysis of the rating scale and its appropriateness, as well as the difficulty of items based on item response theory, and offers new insights for enhancing hope and improving well-being following stroke.

APA, Harvard, Vancouver, ISO, and other styles

20

Charter, Richard A. "Item Difficulty Analysis of the Tactual Performance Test Trials." Perceptual and Motor Skills 91, no. 3 (December 2000): 903–9. http://dx.doi.org/10.2466/pms.2000.91.3.903.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Laela, Madiana, Dewi Rochsantiningsih, and Martono Martono. "Item Analysis of Preparation Test for English National Examination." English Education 6, no. 1 (September 29, 2017): 36. http://dx.doi.org/10.20961/eed.v6i1.35897.

Full text

Abstract:

<p>This research aims to reveal the quality of English national<strong> </strong>examination preparation test in terms of qualitative and quantitative aspects. Qualitative aspect includes content validity, technical item quality and cognitive domain learning outcome while quantitative aspect include reliability, difficulty level, item discrimination, and distractor effectiveness. Sample was taken from 3 out of 10 schools in Pati district using simple random sampling. This research employs both qualitative and quantitative analysis in which expert judgement is used to analyze content validity and technical item quality while ITEMAN is used for quantitative analysis. The result showed that the test has good content validity, 99.06% items appropriate with competence being measured, good technical item quality and most items (81.13%)are categorized as cognitive domain learning outcome C2 (Understand). Moreover, the test has high reliability index (> 0.8), fair difficulty, and good discrimination. However, 35.85% items have ineffective distractors.</p>

APA, Harvard, Vancouver, ISO, and other styles

22

Nelson, Gena, and Sarah R. Powell. "Computation Error Analysis: Students With Mathematics Difficulty Compared To Typically Achieving Students." Assessment for Effective Intervention 43, no. 3 (December 15, 2017): 144–56. http://dx.doi.org/10.1177/1534508417745627.

Full text

Abstract:

Though proficiency with computation is highly emphasized in national mathematics standards, students with mathematics difficulty (MD) continue to struggle with computation. To learn more about the differences in computation error patterns between typically achieving students and students with MD, we assessed 478 third-grade students on a measure of mathematics computation. Results indicated that using the wrong operation was the most common identifiable error for all students. Students with MD had similar accuracy rates for item categories (e.g., addition items) compared to typically achieving students, but students with MD consistently had more variability in incorrect item responses. This study has implications for efficacious computation instruction for students in the elementary grades.

APA, Harvard, Vancouver, ISO, and other styles

23

Darmana, Ayi, Ani Sutiani, Haqqi Annazili Nasution, Ismanisa Ismanisa*, and Nurhaswinda Nurhaswinda. "Analysis of Rasch Model for the Validation of Chemistry National Exam Instruments." Jurnal Pendidikan Sains Indonesia 9, no. 3 (July 12, 2021): 329–45. http://dx.doi.org/10.24815/jpsi.v9i3.19618.

Full text

Abstract:

Information about score obtained from a test is often interpreted as an indicator of the student's ability level. This is one of the weaknesses of classical analysis that are unable to provide meaningful and fair information. The acquisition of the same score if it comes from a test item with a different level of difficulty, must show different abilities. Analysis of the Rasch model will overcome this weakness. The purpose of this study was to analyze the quality of the items by validating the national chemistry exam instrument using the Rasch model. The research sample was 212 new students of the Department of Chemistry at the State University of Medan. The data collected was in the form of respondent's answer data to the 2013 chemistry UN questions, which amounted to 40 items multiple choice and uses the documentation method. Data analysis technique used the Rasch Model with Ministep software. The results of the analysis show the quality of the Chemistry National Exam (UN) questions is categorized as very good based on the following aspects: unidimension, item fit test, person map item, difficulty test level, person and item reliability. There is one item found to be gender bias, in which men benefit more than women. The average chemistry ability of respondents is above the average level of difficulty of the test items

APA, Harvard, Vancouver, ISO, and other styles

24

Campfield, Dorota E. "Lexical difficulty – using elicited imitation to study child L2." Language Testing 34, no. 2 (August 2, 2016): 197–221. http://dx.doi.org/10.1177/0265532215623580.

Full text

Abstract:

This paper reports a post-hoc analysis of the influence of lexical difficulty of cue sentences on performance in an elicited imitation (EI) task to assess oral production skills for 645 child L2 English learners in instructional settings. This formed part of a large-scale investigation into effectiveness of foreign language teaching in Polish primary schools. EI item design and scoring, IRT and post-hoc lexical analysis of items is described in detail. The research aim was to resolve how much the lexical complexity of items (lexical density, morphological complexity, function word density, and sentence length) contributed to item difficulty and scores. Sentence length, as number of words, predicted better than number of syllables. Function words also contributed, and their importance to EI item construction is discussed. It is suggested that future research should examine phonological aspects of cue sentences to explain potential sources for variability. EI is shown to be a reliable and robust method for young L2 learners with potential for classroom assessment by teachers for emergent oral production skills.

APA, Harvard, Vancouver, ISO, and other styles

25

Hauenstein, Clifford E., and Susan E. Embretson. "Modeling Item Difficulty in a Dynamic Test." Journal of Cognitive Education and Psychology 19, no. 2 (October 1, 2020): 93–106. http://dx.doi.org/10.1891/jcep-d-19-00023.

Full text

Abstract:

The Concept Formation subtest of the Woodcock Johnson Tests of Cognitive Abilities represents a dynamic test due to continual provision of feedback from examiner to examinee. Yet, the original scoring protocol for the test largely ignores this dynamic structure. The current analysis applies a dynamic adaptation of an explanatory item response theory model to evaluate the impact of feedback on item difficulty. Additionally, several item features (rule type, number of target shapes) are considered in the item difficulty model. Results demonstrated that all forms of feedback significantly reduced item difficulty, with the exception of corrective feedback that could not be directly applied to the next item in the series. More complex and compound rule types also significantly predicted item difficulty, as did increasing the number of shapes, thereby supporting the response process aspect of validity. Implications for continued use of the Concept Formation subtest for educational programming decisions are discussed.

APA, Harvard, Vancouver, ISO, and other styles

26

Enright, Mary K., and Isaac I. Bejar. "AN ANALYSIS OF TEST WRITERS' EXPERTISE: MODELING ANALOGY ITEM DIFFICULTY." ETS Research Report Series 1989, no. 2 (December 1989): i—26. http://dx.doi.org/10.1002/j.2330-8516.1989.tb00149.x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Komalasari. "Evaluating Instrument Quality: Rasch Model – Analyses of Post Test of Curriculum 2013 Training." Jurnal Ilmiah Kanderang Tingang 9, no. 1 (June 30, 2018): 67–86. http://dx.doi.org/10.37304/jikt.v9i1.7.

Full text

Abstract:

The main purpose of this study was to evaluate the quality of post test utilized by LPMP Central Kalimantan Indonesia in curriculum 2013 training for X grade teachers. It uses Rasch analysis to explore the item fit, the reliability ( item and person), item difficulty, and the Wrigh map of post test. This study also applies Classical Test Teory (CTT) to determine item discrimination and distracters. Following a series of iterative Rasch analyses that adopted the “data should fit the model” approach, 30 items post test of curriculum 2013 training was analyzed using Acer Conquest 4 software, software based on Rasch measurement model. All items of post test of curriculum 2013 training are sufficient fit to the Rasch model. The difficulty levels (i.e. item measures) for the 30 items range from –1.746 logits to +1.861 logits. The item separation reliability is acceptable at 0.990 and person separation reliability is low at 0.485. The wright map indicates that the test is difficult for the teachers or the teachers have low ability in knowledge of curriculum 2013. The post test items cannot cover all the ranges of the teachers’ ability levels. Items discrimination of post test of curriculum 2013 training grouped into fair discrimination (item 2, 4, 5, 8, 11, 18) and poor discrimination (1, 3, 6, 7, 9, 10,12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30). Some distracters from item 1, 2, 6, 7, 8, 9, 11, 13, 16, 17, 18, 19, 20, 22, 24, 25, 27, 28, 29 and 30 are problematic. These distracters require further investigation or revision. Key words: Rasch analysis, training, curriculum 2013, post test

APA, Harvard, Vancouver, ISO, and other styles

28

Sener, Nilay, and Erol Tas. "Developing Achievement Test: A Research for Assessment of 5th Grade Biology Subject." Journal of Education and Learning 6, no. 2 (February 16, 2017): 254. http://dx.doi.org/10.5539/jel.v6n2p254.

Full text

Abstract:

The purpose of this study is to prepare a multiple-choice achievement test with high reliability and validity for the “Let’s Solve the Puzzle of Our Body” unit. For this purpose, a multiple choice achievement test consisting of 46 items was applied to 178 fifth grade students in total. As a result of the test and material analysis performed during the test development process, difficulty, distinctiveness, and item-total correlation coefficients of the materials were calculated. For the validity study, a table of specifications was prepared and the Content Validity Index (CVI) was found to be 0.95 by taking an expert opinion. As a result of the analysis, 8 items were removed from the test and the KR-20 reliability coefficient of the final test consisting of 38 items was calculated as 0.87. As a result of the item analyses, while item difficulty indices were valued between 0.30 and 0.74, item distinctiveness indeces were valued between 0.31 and 0.71. The average difficulty of the test was calculated as moderate (0.56) and its distinctiveness was calculated as very good (0.49).

APA, Harvard, Vancouver, ISO, and other styles

29

Kibble, Jonathan D., and Teresa Johnson. "Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?" Advances in Physiology Education 35, no. 4 (December 2011): 396–401. http://dx.doi.org/10.1152/advan.00062.2011.

Full text

Abstract:

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors “easy,” “moderate,” or “hard” and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ = −0.19, P < 0.01), indicating that, as intended item difficulty increased, the resulting student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2 = 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ = −0.09, P = 0.14) or item discrimination (ρ = 0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examinations were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70–0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.

APA, Harvard, Vancouver, ISO, and other styles

30

Violato, Claudio. "Item Difficulty and Discrimination as a Function of Stem Completeness." Psychological Reports 69, no. 3 (December 1991): 739–43. http://dx.doi.org/10.2466/pr0.1991.69.3.739.

Full text

Abstract:

The effects on item difficulty and discrimination of stem completeness (complete stem or incomplete stem) for multiple-choice items were studied experimentally. Subjects (166 junior education students) were classified into three achievement groups (low, medium, high) and one of two forms of a multiple-choice test was randomly assigned to each subject. A two-way factorial design (completeness × achievement) was used as the experimental model. Analysis indicated that stem completeness had no effect on either item discrimination or difficulty and there was no interaction effect with achievement. It was concluded that multiple-choice items may be very robust in measuring knowledge in a subject area irrespective of variations in stem construction.

APA, Harvard, Vancouver, ISO, and other styles

31

Caissie, Andre F., Francois Vigneau, and Douglas A. Bors. "What does the Mental Rotation Test Measure? An Analysis of Item Difficulty and Item Characteristics." Open Psychology Journal 2, no. 1 (December 15, 2009): 94–102. http://dx.doi.org/10.2174/1874350100902010094.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Schweizer, Karl, and Stefan Troche. "Is the Factor Observed in Investigations on the Item-Position Effect Actually the Difficulty Factor?" Educational and Psychological Measurement 78, no. 1 (October 6, 2016): 46–69. http://dx.doi.org/10.1177/0013164416670711.

Full text

Abstract:

In confirmatory factor analysis quite similar models of measurement serve the detection of the difficulty factor and the factor due to the item-position effect. The item-position effect refers to the increasing dependency among the responses to successively presented items of a test whereas the difficulty factor is ascribed to the wide range of item difficulties. The similarity of the models of measurement hampers the dissociation of these factors. Since the item-position effect should theoretically be independent of the item difficulties, the statistical ex post manipulation of the difficulties should enable the discrimination of the two types of factors. This method was investigated in two studies. In the first study, Advanced Progressive Matrices (APM) data of 300 participants were investigated. As expected, the factor thought to be due to the item-position effect was observed. In the second study, using data simulated to show the major characteristics of the APM data, the wide range of items with various difficulties was set to zero to reduce the likelihood of detecting the difficulty factor. Despite this reduction, however, the factor now identified as item-position factor, was observed in virtually all simulated datasets.

APA, Harvard, Vancouver, ISO, and other styles

33

Fatimah, Siti, Achmad Bernhardo Elzamzami, and Joko Slamet. "Item Analysis of Final Test for the 9th Grade Students of SMPN 44 Surabaya in the Academic Year of 2019/2020." JournEEL (Journal of English Education and Literature) 2, no. 1 (June 1, 2020): 34–46. http://dx.doi.org/10.51836/journeel.v2i1.81.

Full text

Abstract:

This research was conducted by focusing on the formulated question regarding the test scores validity, reliability and item analysis involving the discrimination power and index difficulty in order to provide detail information leading to the improvement of test items construction. The quality of each particular item was analyzed in terms of item difficulty, item discrimination and distractor analysis. The statistical tests were used to compute the reliability of the test by applying The Kuder-Richardson Formula (KR20). The analysis of 50 test items was computed using Microsoft Office Excel. A descriptive method was applied to describe and examined the data. The research findings showed the test fulfilled the criteria of having content validity which was categorized as a low validity. Meanwhile, the reliability value of the test scores was 0.521010831 (0.52) categorized as lower reliability and revision of test. Through the 50 items examined, there were 21 items that were in need of improvement which were classified into “easy” for the index difficulty and “poor” category for the discriminability by the total 26 items (52%). It means more than 50% of the test items need to be revised as the items do not meet the criteria. It is suggested that in order to measure students’ performance effectively, essential improvement need to be evaluated where items with “poor” discrimination index should be reviewed.

APA, Harvard, Vancouver, ISO, and other styles

34

Domple, Vijay Kishanrao, J. V. Dixit, and Vishal S. Dhande. "Comparative study of simplified new method of item analysis with conventional method." International Journal Of Community Medicine And Public Health 5, no. 1 (December 23, 2017): 254. http://dx.doi.org/10.18203/2394-6040.ijcmph20175792.

Full text

Abstract:

Background: Most of the entrance examinations in the world use multiple choice questions for assessment. We tried a new method in which, the middle one third test papers were included in final item analysis while calculating difficulty and discrimination index. Methods: This cross-sectional study was conducted in the month of July 2017 among 62 third year undergraduate medical students. Total 50 MCQs were framed with a four options and only one correct answer. The assessed papers arranged in descending order and two question papers with highest and lowest marks were excluded to make equal two or three groups. The item analysis was done by two methods based on difficulty indexand discrimination index. Data was analyzed for standard error of difference between two proportions by Microsoft Excel. Results: Therefore 60 students participated in the part-completion examination. The highest score was 37 whereas 15 was lowest. Out of 50 items (MCQs), 7 items were removed before final item analysis because of negative value of proportions. Out of 43 items, 25 showed that there was statistically significant difference between the discrimination index (P<0.05) with new method and conventional method. In rest of the 18 items, there was no significant difference (P>0.05) in discrimination index. Z test for two proportions showed no significant difference in the difficulty index with new and conventional method. Conclusions: To conclude, there is no difference with respect to difficulty index by both methods but regarding discrimination index, the study failed to show conclusive results.

APA, Harvard, Vancouver, ISO, and other styles

35

Park, In Sook, Yeon Ok Suh, Hae Sook Park, So Young Kang, Kwang Sung Kim, Gyung Hee Kim, Yeon-Hee Choi, and Hyun-Ju Kim. "Item development process and analysis of 50 case-based items for implementation on the Korean Nursing Licensing Examination." Journal of Educational Evaluation for Health Professions 14 (September 11, 2017): 20. http://dx.doi.org/10.3352/jeehp.2017.14.20.

Full text

Abstract:

Purpose: The purpose of this study was to improve the quality of items on the Korean Nursing Licensing Examination by developing and evaluating case-based items that reflect integrated nursing knowledge.Methods: We conducted a cross-sectional observational study to develop new case-based items. The methods for developing test items included expert workshops, brainstorming, and verification of content validity. After a mock examination of undergraduate nursing students using the newly developed case-based items, we evaluated the appropriateness of the items through classical test theory and item response theory.Results: A total of 50 case-based items were developed for the mock examination, and content validity was evaluated. The question items integrated 34 discrete elements of integrated nursing knowledge. The mock examination was taken by 741 baccalaureate students in their fourth year of study at 13 universities. Their average score on the mock examination was 57.4, and the examination showed a reliability of 0.40. According to classical test theory, the average level of item difficulty of the items was 57.4% (80%–100% for 12 items; 60%–80% for 13 items; and less than 60% for 25 items). The mean discrimination index was 0.19, and was above 0.30 for 11 items and 0.20 to 0.29 for 15 items. According to item response theory, the item discrimination parameter (in the logistic model) was none for 10 items (0.00), very low for 20 items (0.01 to 0.34), low for 12 items (0.35 to 0.64), moderate for 6 items (0.65 to 1.34), high for 1 item (1.35 to 1.69), and very high for 1 item (above 1.70). The item difficulty was very easy for 24 items (below −2.0), easy for 8 items (−2.0 to −0.5), medium for 6 items (−0.5 to 0.5), hard for 3 items (0.5 to 2.0), and very hard for 9 items (2.0 or above). The goodness-of-fit test in terms of the 2-parameter item response model between the range of 2.0 to 0.5 revealed that 12 items had an ideal correct answer rate.Conclusion: We surmised that the low reliability of the mock examination was influenced by the timing of the test for the examinees and the inappropriate difficulty of the items. Our study suggested a methodology for the development of future case-based items for the Korean Nursing Licensing Examination.

APA, Harvard, Vancouver, ISO, and other styles

36

Maulina, Novi, and Rima Novirianthy. "ITEM ANALYSIS AND PEER-REVIEW EVALUATION OF SPECIFIC HEALTH PROBLEMS AND APPLIED RESEARCH BLOCK EXAMINATION." Jurnal Pendidikan Kedokteran Indonesia: The Indonesian Journal of Medical Education 9, no. 2 (July 28, 2020): 131. http://dx.doi.org/10.22146/jpki.49006.

Full text

Abstract:

Background: Assessment and evaluation for students is an essential component of teaching and learning process. Item analysis is the technique of collecting, summarizing, and using students’ response data to assess the quality of the Multiple Choice Question (MCQ) test by measuring indices of difficulty and discrimination, also distracter efficiency. Peer review practices improve quality of assessment validity in evaluating student performance.Method: We analyzed 150 student’s responses for 100 MCQs in Block Examination for its difficulty index (p), discrimination index (D) and distractor efficiency (DE) using Microsoft excel formula. The Correlation of p and D was analyzed using Spearman correlation test by SPSS 23.0. The result was analyzed to evaluate the peer-review strategy.Results: The median of difficulty index (p) was 54% or within the range of excellent level (p 40-60%) and the mean of discrimination index (D) was 0.24 which is reasonably good. There were 7 items with excellent p (40–60%) and excellent D (≥0.4). Nineteen of items had excellent discrimination index (D≥0.4). However,there were 9 items with negative discrimination index and 30 items with poor discrimination index, which should be fully revised. Forty-two of items had 4 functioning distracters (DE 0%) which suggested the teacher to be more precise and carefully creating the distracters.Conclusion: Based on item analysis, there were items to be fully revised. For better test quality, feedback and suggestions for the item writer should also be performed as a part of peer-review process on the basis of item analysis.

APA, Harvard, Vancouver, ISO, and other styles

37

Andriani, Feni, Meti Indrowati, and Bowo Sugiharto. "Analysis items of the four-tier immune system multiple choice test instrument using rasch model." Biosfer 14, no. 1 (April 30, 2021): 99–119. http://dx.doi.org/10.21009/biosferjpb.18020.

Full text

Abstract:

The purpose of this study was to analyze the feasibility of the items of the four-tier multiple-choice test immune system instrument that had been developed. The development of the instrument using the Treagust (1988) model, namely defining content, collecting student misconceptions information, and developing a diagnostic test. A total of 25 items have been developed. The results of the instrument development were tested on 142 students of grade XI from several high schools in Surakarta who were selected by simple random sampling. The data analysis technique was performed using Rasch analysis in the Winstep application. The results of the construct validity test showed items number 5, 7, and 9 did not fit the validity standards. The reliability test shows that the value of Cronbach Alpha reliability is bad (n = 0.51), the value of the reliability item is special (no = 0.97), the value of person reliability is sufficient (n = 0.68), the value of person separation is weak (n = 1.44), and the item separation value is special (n = 5.38). The person discrimination test showed student 056P31 has the highest ability and student 098P51 has the lowest ability. The item discrimination test shows item number 1 is the best item and the bad item is number 14. The item difficulty analysis showed less proportionality because there were too many items in the easy and difficult categories. An expansion of the sample is needed to see a more comprehensive and diverse range of responses to instruments.

APA, Harvard, Vancouver, ISO, and other styles

38

Nahadi, Mr, Mrs Wiwi Siswaningsih, and Mr Ana Rofiati. "PENGEMBANGAN DAN ANALISIS SOAL ULANGAN KENAIKAN KELAS KIMIA SMA KELAS X BERDASARKAN CLASSICAL TEST THEORY DAN ITEM RESPONSE THEORY." Jurnal Pengajaran Matematika dan Ilmu Pengetahuan Alam 16, no. 2 (October 1, 2011): 109. http://dx.doi.org/10.18269/jpmipa.v16i2.234.

Full text

Abstract:

This research is title “Test Development and Analysis of First Grade Senior High School Final Examination in chemistry Based on Classical Test Theory and Item Response Theory”. This research is conducted to develop a standard test instrument for final examination in senior high school at first grade using analysis based on classical test theory and item response theory. The test is a multiple choice test which consists of 75 items. Each item has five options. The research method is research and development method to get a product of test items which fulfill item criterion such as validity, reliability, item discrimination, item difficulty and distracting options quality based on classical test theory and validity, reliability, item discrimination, item difficulty and pseudo-guessing based on item response theory. The three parameter item response theory model is used in this research. Research and development method is conducted until preliminary field test to 102 first grade students in senior high school. Based on the research result, the test fulfills criterion as a good instrument based on classical test theory and item response theory. The final examination test items have vary of item quality so that some of them need a revision to make them better either for the stem and the options. From the total of 75 test items, 21 test items are declined and 54 test items are accepted.

APA, Harvard, Vancouver, ISO, and other styles

39

Vincent, Wong, and S. Kanageswari Suppiah Shanmugam. "The Role of Classical Test Theory to Determine the Quality of Classroom Teaching Test Items." Pedagogia : Jurnal Pendidikan 9, no. 1 (February 25, 2020): 5–34. http://dx.doi.org/10.21070/pedagogia.v9i1.123.

Full text

Abstract:

The purpose of this study is to describe the use of Classical Test Theory (CTT) to investigate the quality of test items in measuring students' English competence. This study adopts a research method with a mixed methods approach. The results show that most items are within acceptable range of both indexes, with the exception of items in synonyms. Items that focus on vocabulary are more challenging. What is surprising is that the short answer items have an excellent item difficulty level and item discrimination index. General results from data analysis of items also support the hypothesis that items that have an ideal item difficulty value between 0.4 and 0.6 will have the same ideal item discrimination value. This paper reports part of a larger study on the quality of individual test items and overall tests.

APA, Harvard, Vancouver, ISO, and other styles

40

Elston, Beth, Marc Goldstein, and Kepher H. Makambi. "Item Response Theory Analysis of the Outpatient Physical Therapy Improvement in Movement Assessment Log (OPTIMAL)." Physical Therapy 93, no. 5 (May 1, 2013): 661–71. http://dx.doi.org/10.2522/ptj.20120120.

Full text

Abstract:

BackgroundThe Outpatient Physical Therapy Improvement in Movement Assessment Log (OPTIMAL) instrument was created to assess the perceived ability of patients receiving physical therapy in adult outpatient settings to perform actions or movements. Its properties must be studied to determine whether it accomplishes this goal.ObjectiveThe objective of this study was to investigate the item properties of OPTIMAL with item response theory.DesignThis investigation was a retrospective cross-sectional item calibration study.MethodsData were obtained from the American Physical Therapy Association, which collected information from outpatient physical therapy clinics through electronic charting databases that included OPTIMAL responses. Item response theory analyses were performed on the trunk, lower-extremity, and upper-extremity subscales of the Difficulty Scale of OPTIMAL.ResultsIn total, 3,138 patients completed the Difficulty Scale of OPTIMAL at the baseline assessment. The subscale analyses met all item response theory assumptions. The items in each subscale showed fair discrimination. In all analyses, the subscales measured a narrow range of ability levels at the low end of the physical functioning spectrum.LimitationsOPTIMAL was originally intended to be administered as a whole. In the present study, each subscale was analyzed separately, indicating how the subscales perform individually but not as a whole. Another limitation is that only the Difficulty Scale of OPTIMAL was analyzed, without consideration of the Confidence Scale.ConclusionsOPTIMAL best measures low physical functioning at the baseline assessment in adult outpatient physical therapy settings. The addition of categories to each item and the addition of more challenging items are recommended to allow measurements for a broader range of patients.

APA, Harvard, Vancouver, ISO, and other styles

41

Angadi, Netravathi B., Amitha Nagabhushana, and Nayana K. Hashilkar. "Item analysis of multiple choice questions of undergraduate pharmacology examinations in a medical college in Belagavi, Karnataka, India." International Journal of Basic & Clinical Pharmacology 7, no. 10 (September 24, 2018): 1917. http://dx.doi.org/10.18203/2319-2003.ijbcp20183923.

Full text

Abstract:

Background: Multiple choice questions (MCQs) are a common method of assessment of medical students. The quality of MCQs is determined by three parameters such as difficulty index (DIF I), discrimination index (DI), and Distractor efficiency (DE). Item analysis is a valuable yet relatively simple procedure, performed after the examination that provides information regarding the reliability and validity of a test item. The objective of this study was to perform an item analysis of MCQs for testing their validity parameters.Methods: 50 items consisting of 150 distractors were selected from the formative exams. A correct response to an item was awarded one mark with no negative marking for incorrect response. Each item was analysed for three parameters such as DIF I, DI, and DE.Results: A total of 50 items consisting of 150 Distractor s were analysed. DIF I of 31 (62%) items were in the acceptable range (DIF I= 30-70%) and 30 had ‘good to excellent’ (DI >0.25). 10 (20%) items were too easy and 9 (18%) items were too difficult (DIF I <30%). There were 4 items with 6 non-functional Distractor s (NFDs), while the rest 46 items did not have any NFDs.Conclusions: Item analysis is a valuable tool as it helps us to retain the valuable MCQs and discard or modify the items which are not useful. It also helps in increasing our skills in test construction and identifies the specific areas of course content which need greater emphasis or clarity.

APA, Harvard, Vancouver, ISO, and other styles

42

Harrell, Murphy, Melissa Myers, Nanako Hawley, Jasmin Pizer, and Benjamin Hill. "A-185 An Analysis of Rarely Missed Items on the TOMM." Archives of Clinical Neuropsychology 36, no. 6 (August 30, 2021): 1240. http://dx.doi.org/10.1093/arclin/acab062.203.

Full text

Abstract:

Abstract Objective This study examined item performance on Trial 1 of the Test of Memory Malingering (TOMM). We also identified items that were most often missed in individuals with genuine effort. Method Participants were 106 adults seen for disability claims (87.7% male; 70.5% Caucasian, 26.7% Black; age range 22–84 years, Mage = 44.42 years, SD = 13.07; Meducation = 13.58, SD = 2.05) who completed and passed the TOMM as part of a larger battery. Mean score Trial 1 was 43.08, SD = 5.49. Mean score on Trial 2 was 48.98, SD = 1.54. Results Frequency analysis indicated that >95% of the sample correctly identified six items on Trial 1: item 1-spinning wheel (97.2%), item 8-musical notes (99.1%), item 38-ice cream (98.1%), item 41-life preserver (95.3%), item 45-iron (95.3%), and item 47-dart (98.1%). Nine items were correctly identified on Trial 1 by <80% of the sample: item 2-tissue box (77.4%), item 6-suitcase (77.4%), item 20-motorcycle (77.4%), item 22-jack-in-the box (71.7%), item 26-light bulb (75.5%), item 27-maple leaf (72.6%), item 32-racket (79.2%), item 36-birdhouse (79.2%), item 44-pail & shovel (66.0%). Conclusions These findings suggest that items on Trial 1 of the TOMM differ in difficulty in a disability claims sample who performed genuinely on the TOMM. Items 1, 8, 38, 41, 45, and 47 are good candidates for a rarely missed index where failure of these items would be probabilistically unlikely. Future research should evaluate whether these items are failed at higher rates in cases of borderline TOMM performance to improve sensitivity to feigning.

APA, Harvard, Vancouver, ISO, and other styles

43

Guedes, Erika de Souza, Luiz Carlos Orozco-Vargas, Ruth Natália Teresa Turrini, Regina Márcia Cardoso de Sousa, Mariana Alvina dos Santos, and Diná de Almeida Lopes Monteiro da Cruz. "Rasch Analysis of the Power as Knowing Participation in Change Tool - the Brazilian version." Revista Latino-Americana de Enfermagem 21, spe (February 2013): 163–71. http://dx.doi.org/10.1590/s0104-11692013000700021.

Full text

Abstract:

OBJECTIVES: the objective of this study was to evaluate the items contained in the Brazilian version of the Power as Knowing Participation in Change Tool (PKPCT). METHOD: investigation of the psychometric properties of the mentioned questionnaire through Rasch analysis. RESULTS: the data from 952 nursing assistants and 627 baccalaureate nurses were analyzed (average age 44.1 (SD=9.5); 13.0% men). The subscales Choices, Awareness, Freedom and Involvement were tested separately and presented unidimensionality; the categories of the responses given to the items were compiled from 7 to 3 levels and the items fit the model well, except for the following/leading item, in which the infit and outfit values were above 1.4; this item has also presented Differential Item Functioning (DIF) according to the participant's role. The reliability of the items was of 0.99 and the reliability of the participants ranged from 0.80 to 0.84 in the subscales. Items with extremely high levels of difficulty were not identified. CONCLUSIONS: the PKPCT should not be viewed as unidimensional, items with extremely high levels of difficulty in the scale need to be created and the differential functioning of some items has to be further investigated.

APA, Harvard, Vancouver, ISO, and other styles

44

Kheyami, Deena, Ahmed Jaradat, Tareq Al-Shibani, and Fuad A. Ali. "Item Analysis of Multiple Choice Questions at the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain." Sultan Qaboos University Medical Journal [SQUMJ] 18, no. 1 (April 4, 2018): 68. http://dx.doi.org/10.18295/squmj.2018.18.01.011.

Full text

Abstract:

Objectives: The current study aimed to carry out a post-validation item analysis of multiple choice questions (MCQs) in medical examinations in order to evaluate correlations between item difficulty, item discrimination and distraction effectiveness so as to determine whether questions should be included, modified or discarded. In addition, the optimal number of options per MCQ was analysed. Methods: This cross-sectional study was performed in the Department of Paediatrics, Arabian Gulf University, Manama, Bahrain. A total of 800 MCQs and 4,000 distractors were analysed between November 2013 and June 2016. Results: The mean difficulty index ranged from 36.70–73.14%. The mean discrimination index ranged from 0.20–0.34. The mean distractor efficiency ranged from 66.50–90.00%. Of the items, 48.4%, 35.3%, 11.4%, 3.9% and 1.1% had zero, one, two, three and four nonfunctional distractors (NFDs), respectively. Using three or four rather than five options in each MCQ resulted in 95% or 83.6% of items having zero NFDs, respectively. The distractor efficiency was 91.87%, 85.83% and 64.13% for difficult, acceptable and easy items, respectively (P <0.005). Distractor efficiency was 83.33%, 83.24% and 77.56% for items with excellent, acceptable and poor discrimination, respectively (P <0.005). The average Kuder-Richardson formula 20 reliability coefficient was 0.76. Conclusion: A considerable number of the MCQ items were within acceptable ranges. However, some items needed to be discarded or revised. Using three or four rather than five options in MCQs is recommended to reduce the number of NFDs and improve the overall quality of the examination.

APA, Harvard, Vancouver, ISO, and other styles

45

Ingale, Abhijeet S., Purushottam A. Giri, and Mohan K. Doibale. "Study on item and test analysis of multiple choice questions amongst undergraduate medical students." International Journal Of Community Medicine And Public Health 4, no. 5 (April 24, 2017): 1562. http://dx.doi.org/10.18203/2394-6040.ijcmph20171764.

Full text

Abstract:

Background: Item analysis is the process of collecting, summarizing and using information from students’ response to assess the quality of test items. However it is said that MCQs emphasize recall of factual information rather than conceptual understanding and interpretation of concepts. There is more to writing good MCQs than writing good questions. The objectives of the study was to assess the item and test quality of multiple choice questions and to deal with the learning difficulties of students, identify the low achievers in the test. Methods: The hundred MBBS students from Government medical college were examined. A test comprising of thirty MCQs was administered. All items were analysed for Difficulty Index, Discrimination Index and Distractor Efficiency. Data entered in MS Excel 2007 and SPSS 21 analysed with statistical test of significance. Results: Majority 80% items difficulty index is within acceptable range. 63% items showed excellent discrimination Index. Distractor efficiency was overall satisfactory. Conclusions: Multiple choice questions with average difficulty and also having high discriminating power with good distracter efficiency should be incorporated into student’s examination.

APA, Harvard, Vancouver, ISO, and other styles

46

Yuhanna, W. L., M. H. I. Al Muhdhar, A. Gofur, and Z. Hassan. "Self-Reflection Assessment in Vertebrate Zoology (SRAVZ) Using Rasch Analysis." Jurnal Pendidikan IPA Indonesia 10, no. 1 (March 31, 2021): 35–47. http://dx.doi.org/10.15294/jpii.v10i1.25603.

Full text

Abstract:

Instruments that are valid, reliable, and have high consistency are needed to measure studentsâ€™ self-reflection. The Self-Reflection Assessment in Vertebrate Zoology (SRAVZ) was developed to explore studentsâ€™ self-reflection and abilities in the vertebrate zoology course. It is essential to test the instrumentâ€™s validity before measuring studentsâ€™ abilities so that data bias does not occur. This study aims to determine the validity, whether the items are fit or misfit, and the difficulty level of SRAVZ items. SRAVZ is developed by ADDIE (Analysis, Design, Development, Implementation, Evaluation). The SRAVZ consists of 24 items tested on 135 students who have taken the vertebrate zoology course. Analysis of the Rasch model using Winstep version 4.5.2. The Rasch model shows the item reliability value at 0.97. The Cronbach alpha value at 0.94 with PTMEA Corr shows a positive value, unidimensional 48.1%. The separation index of 5.6 means that the level of grouping the items is very good. The mean square infit for SRAVZ was 0.59-1.96, and the mean square outfit value is 0.59-2.16. Data analysis shows that 24 SRAVZ items have 22 fit items and two misfit items (S3 and S5). Item numbers S3 and S5 must be excluded from the SRAVZ construction. Total items used to measure studentsâ€™ self-reflection in the vertebrate zoology course were 22 items. The most difficult item is S3, and the easiest item is S6. Thus, the data indicate that the valid and reliable SRAVZ is in the good, effective, and high level of consistency category

APA, Harvard, Vancouver, ISO, and other styles

47

McQueen, Joy. "Rasch scaling." Australian Review of Applied Linguistics. Series S 13 (January 1, 1996): 137–87. http://dx.doi.org/10.1075/aralss.13.07mcq.

Full text

Abstract:

Abstract This paper describes a methodology for exploring the validity of the Rasch scaling procedure in relation to a multiple-choice test of reading Chinese as a foreign language. The validation procedure involved a post hoc content analysis of test items to identify factors which the research literature on early reading indicated were likely to be related to item difficulty. A comparison was then made between these difficulty factors and the item-difficulty ranking produced by a Rasch analysis of test items. The analysis revealed that the dimension mapped out by the Rasch scaling (and to some extent the corresponding wording of the reporting descriptors) does indeed reflect certain elements thought to be related to early reading ability.

APA, Harvard, Vancouver, ISO, and other styles

48

Hontangas, Pedro, Vicente Ponsoda, Julio Olea, and Steven L. Wise. "The Choice of Item Difficulty in Self-Adapted Testing." European Journal of Psychological Assessment 16, no. 1 (January 2000): 3–12. http://dx.doi.org/10.1027//1015-5759.16.1.3.

Full text

Abstract:

Summary: The difficulty level choices made by examinees during a self-adapted test were studied. A positive correlation between estimate ability and difficulty choice was found. The mean difficulty level selected by the examinees increased nonlinearly as the testing session progressed. Regression analyses showed that the best predictors of difficulty choice were examinee ability, difficulty of the previous item, and score on the previous item. Four strategies for selecting difficulty levels were examined, and examinees were classified into subgroups based on the best-fitting strategy. The subgroups differed with regard to ability, pretest anxiety, number of items passed, and mean difficulty level chosen. The self-adapted test was found to reduce state anxiety for only some of the strategy groups.

APA, Harvard, Vancouver, ISO, and other styles

49

Karim, Sayit Abdul, Suryo Sudiro, and Syarifah Sakinah. "Utilizing test items analysis to examine the level of difficulty and discriminating power in a teacher-made test." EduLite: Journal of English Education, Literature and Culture 6, no. 2 (August 31, 2021): 256. http://dx.doi.org/10.30659/e.6.2.256-269.

Full text

Abstract:

Apart from teaching, English language teachers need to assess their students by giving a test to know the students� achievements. In general, teachers are barely conducting item analysis on their tests. As a result, they have no idea about the quality of their test distributed to the students. The present study attempts to figure out the levels of difficulty (LD) and the discriminating power (DP) of the multiple-choice (MC) test item constructed by an English teacher in the reading comprehension test utilizing test item analysis. This study employs a qualitative approach. For this purpose, a test of 50-MC test items of reading comprehension was obtained from the students� test results. Thirty-five students of grade eight took part in the MC test try-out. They are both male (15) and female (20) students of junior high school 2 Kempo, in West Nusa Tenggara Province. The findings revealed that16 items out of 50 test items were rejected due to the poor and worst quality level of difficulty and discriminating index. Meanwhile, 12 items need to be reviewed due to their mediocre quality, and 11 items are claimed to have good quality items. Besides, 11 items out of 50 test items were considered as the excellent quality as their DP scores reached around 0.44 through 0.78. The implications of the present study will shed light on the quality of teacher-made test items, especially for the MC test.

APA, Harvard, Vancouver, ISO, and other styles

50

Date, Amit P., Archana S. Borkar, Rupesh T. Badwaik, Riaz A. Siddiqui, Tanaji R. Shende, and Amruta V. Dashputra. "Item analysis as tool to validate multiple choice question bank in pharmacology." International Journal of Basic & Clinical Pharmacology 8, no. 9 (August 28, 2019): 1999. http://dx.doi.org/10.18203/2319-2003.ijbcp20194106.

Full text

Abstract:

Background: Multiple choice questions (MCQs) are a common method for formative and summative assessment of medical students. Item analysis enables identifying good MCQs based on difficulty index (DIF I), discrimination index (DI), distracter efficiency (DE). The objective of this study was to assess the quality of MCQs currently in use in pharmacology by item analysis and develop a MCQ bank with quality items.Methods: This cross-sectional study was conducted in 148 second year MBBS students at NKP Salve institute of medical sciences from January 2018 to August 2018. Forty MCQs twenty each from the two term examination of pharmacology were taken for item analysis A correct response to an item was awarded one mark and each incorrect response was awarded zero. Each item was analyzed using Microsoft excel sheet for three parameters such as DIF I, DI, and DE.Results: In present study mean and standard deviation (SD) for Difficulty index (%) Discrimination index (%) and Distractor efficiency (%) were 64.54±19.63, 0.26±0.16 and 66.54±34.59 respectively. Out of 40 items large number of MCQs has acceptable level of DIF (70%) and good in discriminating higher and lower ability students DI (77.5%). Distractor efficiency related to presence of zero or 1 non-functional distrator (NFD) is 80%.Conclusions: The study showed that item analysis is a valid tool to identify quality items which regularly incorporated can help to develop a very useful, valid and a reliable question bank.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!