To see the other types of publications on this topic, follow the link: Tests and Measurements.

Dissertations / Theses on the topic 'Tests and Measurements'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Tests and Measurements.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Turlapati, Radhika. "Leveraging test measurements into proposing additional domain tests." [Johnson City, Tenn. : East Tennessee State University], 2001. http://etd-submit.etsu.edu/etd/theses/available/etd-0404101-011957/unrestricted/TurlapatiR0430.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Williams, A. Lynn. "Tests and Measurements in Speech-Language Pathology." Digital Commons @ East Tennessee State University, 2001. https://www.amzn.com/0750670037.

Full text
Abstract:
Book Summary: This clinical reference provides an in-depth look at the tests and measurements used by speech-language pathologists for patient assessment. Rather than being merely a compendium of common tests, this text includes the theoretical framework behind each type of assessment as well as procedural and referential information. Topics covered include differential diagnosis of communication disorders, scoring conventions of different test instruments, and language assessment instruments for both children and adults.
APA, Harvard, Vancouver, ISO, and other styles
3

Haick, Angela. "Testing irregularities : are we getting accurate scores? /." La Verne, Calif. : University of La Verne, 2003. http://0-wwwlib.umi.com.garfield.ulv.edu/dissertations/fullcit/3076863.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Gao, Hua. "The effect of different anchor tests on the accuracy of test equating for test adaptation." Ohio : Ohio University, 2004. http://www.ohiolink.edu/etd/view.cgi?ohiou1089917802.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Rowan, Barbara Ellen. "Comparability of paper-and-pencil and computer-based cognitive and non-cognitive measures in a low-stakes testing environment /." Full-text of dissertation on the Internet (776.77 KB), 2010. http://www.lib.jmu.edu/general/etd/2010/doctorate/rowanbe/rowanbe_doctorate_04-02-2010.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Lai, Chan-pong. "Item bias in the 2nd IEA mathematics study." Click to view the E-thesis via HKUTO, 1986. http://sunzi.lib.hku.hk/HKUTO/record/B38626445.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Parr, Anita M. "TEACHER MADE TEST RELIABILITY: A COMPARISON OF TEST SCORES AND STUDENT STUDY HABITS FROM FRIDAY TO MONDAY IN A HIGH SCHOOL BIOLOGY CLASS IN MONROE COUNTY OHIO." Marietta College / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=marietta1142864088.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Borkan, Bengu. "Effectiveness of mixed-mode survey designs for teachers using mail and web-based surveys." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1158597296.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Chan, Wai-fat. "An investigation into the effects of diagnostic assessment on students' learning : a case study of the effects of diagnostic assessment on secondary 4 students' learning of chemistry /." Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B17601150.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Whitworth, Clifford K. "Equivalency of paper-pencil tests and computer-administered tests." Thesis, University of North Texas, 2001. https://digital.library.unt.edu/ark:/67531/metadc2741/.

Full text
Abstract:
Are computer-administered versions of a multiple choice paper-pencil test equivalent? This study determined whether there were any significant differences between taking a traditional pencil-paper test and taking the same test using a computer. The literature has shown that there are intervening variables that have caused differences when not controlled. To prove equivalency between test modes, scores have to have similar means, dispersions, and shapes; the ranked-order of the scores must also be similar. Four tests were given over the course of a 16-week semester. The sample was divided, half taking paper-pencil tests and half taking the same test administered by a computer. The mode of administration was switched with each test administration. The analysis showed that, when the intervening variables were controlled, the two modes of administration were equivalent. The analysis used a 2x4 ANOVA, which showed no difference between test modes, but showed that each test administration was significantly different. The Levene statistic was used to test whether dispersions were equivalent and confidence intervals were established to test the kurtosis and skewness statistics. Finally, each of the test scores were transformed into their Normal Curve Equivalents so that Pearson's coefficient could be used to determine the equivalency of the ranked-orders.
APA, Harvard, Vancouver, ISO, and other styles
11

Jackson, David R. "Tests of schemes to infer stratospheric temperature from satellite measurements." Thesis, University of Edinburgh, 1990. http://hdl.handle.net/1842/15089.

Full text
Abstract:
In this thesis we test a retrieval/analysis scheme for inferring stratospheric temperature from satellite observations of radiance. The scheme is similar to that used by the UK Meteorological Office. The retrievals are made by using a multiple linear regression model which regresses radiances against Planck function, whilst the analyses are made using a linear time/space interpolation method. In addition, we compare analyses made using time/space interpolation with analyses made using another analysis scheme which sequentially estimates Fourier coefficients at fixed latitudes using a version of the Kalman Filter. Because of the lack of 'ground truth' observations in the stratosphere, the schemes are tested in simulation experiments. Preliminary tests of the time/space interpolation and sequential estimation analysis schemes are made using idealised radiance fields which resemble observations made by a satellite radiometer in the northern hemisphere winter stratosphere. The regression retrieval scheme and the two analysis schemes arc also tested in a more sophisticated experiment in which the 'true' atmosphere is represented by an atmosphere simulated by a numerical model. Simulated observations are calculated by computing the radiance that would be observed from the 'true' atmosphere by a satellite instrument. The radiances are then retrieved and analysed and the resultant analyses compared with the corresponding 'true' fields. Tests are made using output from a day when a sudden wanning was present. The retrieval scheme is seen to perform less well within the area of the sudden wanning than outside it. However, this may be expected as the vertical structure within the sudden warming is generally too small to be resolved by a satellite instrument. The analysis scheme analyses the stratospheric field well, even in the area of a sudden warming. These results, and results from preliminary tests made using idealised radiance fields, suggest that the analysis is generally of better quality when the distance radius used to select observations for the scheme is small. Results of tests of the sequential estimation scheme reveal that this method also produces satisfactory analyses of idealised radiance and model fields. Constraints of time prevented more rigorous testing of the scheme, but suggestions for further research are given.
APA, Harvard, Vancouver, ISO, and other styles
12

Hart, Raymond C. "A framework for psychometric analysis of student performance across time an illustration with National Educational Longitudinal Study data /." [Kent, Ohio] : Kent State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=kent1177960052.

Full text
Abstract:
Thesis (Ph.D.)--Kent State University, 2007.
Title from PDF t.p. (viewed June 1, 2007). Advisors: Dimiter Dimitrov, Shawn Fitzgerald. Keywords: Item response theory, true score theory, reliability, measurement of change, NELS:88 Includes bibliographical references (p. 58-62).
APA, Harvard, Vancouver, ISO, and other styles
13

Januário, Francisco Maria. "Investigating and improving assessment practices in Physics in secondary schools in Mozambique." Pretoria : [s.n.], 2008. http://upetd.up.ac.za/thesis/available/etd-09252008-161339/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Simpson, Scott. "A study of the relationship between test-taking skills, time used on tests, and test scores." Theological Research Exchange Network (TREN), access this title online, 2006. http://dx.doi.org/10.2986/tren.088-0145.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

O'Loughlin, Kieran John. "The comparability of direct and semi-direct speaking tests : a case study /." Connect to thesis, 1997. http://eprints.unimelb.edu.au/archive/00000378.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Fortier, Hélène. "AFM Indentation Measurements and Viability Tests on Drug Treated Leukemia Cells." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34345.

Full text
Abstract:
A significant body of literature has reported strategies and techniques to assess the mechanical properties of biological samples such as proteins, cellular and tissue systems. Atomic force microscopy has been used to detect elasticity changes of cancer cells. However, only a few studies have provided a detailed and complete protocol of the experimental procedures and data analysis methods for non-adherent blood cancer cells. In this work, the elasticity of NB4 cells derived from acute promyelocytic leukemia (APL) was probed by AFM indentation measurements to investigate the effects of the disease on cellular biomechanics. Understanding how leukemia influences the nanomechanical properties of cells is expected to provide a better understanding of the cellular mechanisms associated to cancer, and promises to become a valuable new tool for cancer detection and staging. In this context, the quantification of the mechanical properties of APL cells requires a systematic and optimized approach for data collection and analysis, in order to generate reproducible and comparative data. This Thesis elucidates the automated data analysis process that integrates programming, force curve collection and analysis optimization to assess variations of cell elasticity in response to processing criteria. A processing algorithm was developed by using the IGOR Pro software to automatically analyze large numbers of AFM data sets in an efficient and accurate manner. In fact, since the analysis involves multiple steps that must be repeated for many individual cells, an automated and un-biased processing approach is essential to precisely determine cell elasticity. Different fitting models for extracting the Young’s modulus have been systematically applied to validate the process, and the best fitting criteria, such as the contact point location and indentation length, have been determined in order to obtain consistent results. The designed automated processing code described in this Thesis was used to correlate alterations in cellular biomechanics of cancer cells as they undergo drug treatments. In order to fully assess drug effects on NB4 cells, viability assays were first performed using Trypan Blue staining for primary insights before initiating thorough microplate fluorescence intensity readings using a LIVE/DEAD viability kit involving ethidium and calcein AM labelling components. From 0 to 24 h after treatment using 30 µM arsenic trioxide, relative live cell populations increased until 36 h. From 0 to 12 h post-treatment, relative populations of dead cells increased until 24 h post-treatment. Furthermore, a drastic drop in dead cell count has been observed between 12 and 24 h. Additionally, arsenic trioxide drug induced alterations in elasticity of NB4 cells can be correlated to the cell viability tests. With respect to cell mechanics, trapping of the non-adherent NB4 cells within fabricated SU8-10 microwell arrays, allowed consistent AFM indentation measurements up to 48 h after treatment. Results revealed an increase in cell elasticity up to 12 h post-treatment and a drastic decrease between 12 and 24 h. Furthermore, arsenic trioxide drug induced alterations in elasticity of NB4 cells can be correlated to the cell viability tests. In addition to these indentation and viability testing approaches, morphological appearances were monitored, in order to track the apoptosis process of the affected cells. Relationships found between viability and elasticity assays in conjunction with morphology alterations revealed distinguish stages of apoptosis throughout treatment. 24 h after initial treatment, most cells were observed to have burst or displayed obvious blebbing. These relations between different measurement methods may reveal a potential drug screening approach, for understanding specific physical and biological of drug effects on the cancer cells.
APA, Harvard, Vancouver, ISO, and other styles
17

Emery, Kristine Louise. "Testing in the schools individualized intelligence tests and curriculum based measurements /." Online version, 2003. http://www.uwstout.edu/lib/thesis/2003/2003emeryk.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Fullilove, John Pope III. "Examining oral English proficiency some factors affecting rater reliability in the use of English oral examination /." Click to view the E-thesis via HKUTO, 1992. http://sunzi.lib.hku.hk/hkuto/record/B4389334X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Tang, Kim-chow Catherine. "Effects of different assessment procedures on tertiary students' approaches to studying /." [Hong Kong : University of Hong Kong], 1991. http://sunzi.lib.hku.hk/hkuto/record.jsp?B1300945X.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Pour, Robert L. "Race, gender and omissions on standard achievement tests." Diss., Virginia Tech, 1991. http://hdl.handle.net/10919/39871.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Clifton, Karen S. "The testing effect using retrival [sic] practice in the classroom /." Huntington, WV : [Marshall University Libraries], 2005. http://www.marshall.edu/etd/descript.asp?ref=561.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Noonan, Brian W. "The effect of test length, IRT model, type of aberrance, and level of aberrance on the distribution and effectiveness of three appropriateness indices." Thesis, University of Ottawa (Canada), 1990. http://hdl.handle.net/10393/5594.

Full text
Abstract:
There were two basic purposes for this study. The first purpose was to investigate the characteristics of the distributions of Lz, ECIZ4, and W3 for non-aberrant response patterns in combinations of test lengths (40 items and 80 items) and IRT model (the 2PLM and the 3PLM). The second purpose was to investigate the effectiveness of the three indices in twenty-four combinations of two test lengths, two IRT models, two types of aberrance, and three levels of aberrance. In order to investigate the distributions of appropriateness indices in non-aberrant response patterns, data were generated by computer to simulate various measurement conditions. Item parameters were generated within specified ranges to produce similar tests for the two test lengths and two IRT models. Simulated examinees were generated from the normal (0,1) distribution. Two thousand non-aberrant, response vectors were generated for each of four conditions, test length by IRT model. The three appropriateness indices, Lz, ECIZ4, and W3 were calculated for each examinee. This procedure was replicated fifty times for each of the four combinations of test length and IRT model. Of the three indices, ECIZ4 produced the most stable distributions over replications. To examine the effect of test length and IRT model on characteristics of the distributions of the indices, the mean, standard deviation, skewness, and kurtosis were computed for each index in each of the combinations of test length and IRT model over fifty replications. There were no significant effects for either test length or IRT model on the means of the three indices. Based on skewness and kurtosis, the distributions of ECIZ4 most closely approximated normality, while the distribution of W3 was least normal. To establish false positive rates, the tails of the distributions of each index were then examined at P$\sb $, P$\sb $, P$\sb $, and P$\sb{25}$ for each of the four conditions. Of the three indices ECIZ4 seemed least affected and W3 most affected by test length, IRT model, and the interaction of test length and IRT model. To investigate the effectiveness of the indices, aberrant response patterns were generated for the twenty-four combinations of the four variables (2 test lengths x 2 models x 2 types of aberrance x 3 levels of aberrance). Four thousand simulated examinees were generated for each of the twenty-four combinations and each index was computed for each examinee for each of the twenty-four combinations. The detection rates of the indices were then computed and compared for each index for each of the twenty-four conditions. Overall, the 80 item test produced somewhat better detection rates than the 40 item test and the 2PLM better rates than the 3PLM. Spuriously low scores tended to produce slightly higher detection rates than spuriously high scores under most conditions. Higher levels of aberrance tended to produce higher detection rates although for some conditions there was little difference between 15% and 30% aberrance. Lz and ECIZ4 tended to produce better detection rates than W3; however, no detection rates seemed to be as high as those reported in previous research. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
23

Boulet, John R. "A Monte Carlo comparison of the Type I error rates of the likelihood ratio chi-square test statistic and Hotelling's two-sample T2 on testing the differences between group means." Thesis, University of Ottawa (Canada), 1990. http://hdl.handle.net/10393/5708.

Full text
Abstract:
The present paper demonstrates how Structural Equation Modelling (SEM) can be used to formulate a test of the difference in means between groups on a number of dependent variables. A Monte Carlo study compared the Type I error rates of the Likelihood Ratio (LR) Chi-square ($\chi\sp2$) statistic (SEM test criterion) and Hotelling's two-sample T$\sp2$ statistic (MANOVA test criterion) in detecting differences in means between two independent samples. Seventy-two conditions pertaining to average sample size ((n$\sb1$ + n$\sb2$)/2), extent of inequality of sample sizes (n$\sb1$:n$\sb2$), number of variables (p), and degree of inequality of variance-covariance matrices ($\Sigma\sb1$:$\Sigma\sb2$) were modelled. Empirical sampling distributions of the LR $\chi\sp2$ statistic and Hotelling's T$\sp2$ statistic consisted fo 2000 samples drawn from multivariate normal parent populations. The actual proportion of values that exceeded the nominal levels are presented. The results indicated that, in terms of maintaining Type I error rates that were close to the nominal levels, the LR $\chi\sp2$ statistic and Hotelling's T$\sp2$ statistic were comparable when $\Sigma\sb1$ = $\Sigma\sb2$ and (n$\sb1$ + n$\sb2$)/2:p was relatively large (i.e., 30:1). However, when $\Sigma\sb1$ = $\Sigma\sb2$ and (n$\sb1$ + n$\sb2$)/2:p was small (i.e., 10:1) Hotelling's T$\sp2$ statistic was preferred. When $\Sigma\sb{1} \not=\Sigma\sb2$ the LR $\chi\sp2$ statistic provided more appropriate Type I error rates under all of the simulated conditions. The results are related to earlier findings, and implications for the appropriate use of the SEM method of testing for group mean differences are noted.
APA, Harvard, Vancouver, ISO, and other styles
24

Blais, Christine Lorraine. "Problem-solving characteristics of relative novices and experts within an intermediate range of expertise in linear kinematics." Thesis, University of Ottawa (Canada), 1990. http://hdl.handle.net/10393/5756.

Full text
Abstract:
Within the context of this study, expertise is used to describe the range of skills (a continuum) which lies between those of novice and expert. Although some of these expert-novice differences have been identified, what is less understood is how an individual becomes an expert: the transition from novice to expert. As the study tests a specific hypothesis and seeks information related to a specific objective, it has both confirmatory and exploratory components. The independent variables were context, level of expertise and Problem Type and the dependent variables were solution time and solution patterns. There were two categories of context (familiar and unfamiliar), two levels of expertise (novice and expert) and two Problem Types (simultaneous and successive movement). Solution time was analyzed within a confirmatory framework and solution patterns within an exploratory framework. An information-processing approach to problem-solving was used. From 108 university students an inventory of contexts was compiled to produce familiar and unfamiliar isomorphic problems. The level of expertise of a second group of 57 subjects was based on educational background and produced Concept Map. From this process, two intermediate groups of subjects were identified as relative experts or novices. Each subject was presented with eight isomorphic problems, four in familiar and four in unfamiliar contexts, were presented to each subject. The subjects were presented with one of two Problem Types reflecting Simultaneous or Successive movements as defined by Piaget. The problem solutions were recorded using the technique developed by Ericsson and Simon (1984), were divided into 5-second intervals, and then evaluated using a Coding Grid developed for this study. Thus, the data submitted for analysis was based on a total of 224 problems. While the subjects in this study did represent two distinct levels of expertise, they did not evidence those characteristics associated with the extremes of the expert-novice continuum. There were no significant differences between experts and novices in their problem solution times, but the relative expert subjects did demonstrate some of the 'traditional' expert traits. In particular, experts evidenced an improved ability to recognize key information and, thereby, improve the accuracy of their performance. The more expert problem solver also used more conceptual, as distinct from computational, types of strategy. Overall, while there were no significant differences in solution time, many expert-novice distinctions arose when examining the processes whereby these solutions were achieved. In particular, while the experts tended to show an analytical approach, the novices were more speculative.
APA, Harvard, Vancouver, ISO, and other styles
25

Malara, Eric. "Étude des commentaires rétroactifs par les pairs dans le contexte de l'évaluation formative." Thesis, University of Ottawa (Canada), 2002. http://hdl.handle.net/10393/6384.

Full text
Abstract:
Les courants de l'heure en éducation misent de plus en plus sur des modalités pédagogiques qui responsabilisent davantage les étudiants dans la prise en charge de leurs apprentissages. L'évaluation par les pairs constitue une de ces modalités pédagogiques car, parmi ses nombreux avantages, elle encourage un véritable échange d'idées. De nombreuses recherches ont validé l'approche de l'évaluation par les pairs avec des notes chiffrées ou des lettres. Les étudiants-maîtres pourtant dans leur carrière d'enseignant ne donneront pas uniquement des notes mais aussi des commentaires. Les mécanismes d'application et la nature de cette forme d'évaluation sont encore à explorer. Cette recherche vise à examiner de manière plus approfondie les caractéristiques spécifiques des commentaires rétroactifs fournis par les pairs dans le contexte d'une évaluation formative de l'enseignement d'une leçon simulée. Plus spécifiquement, la question de recherche retenue dans cette étude est la suivante: dans un contexte d'évaluation formative pratiquée par les pairs, quelle sera la nature des commentaires des étudiants-maîtres en termes de pertinence, de précision, de direction et de nombre? (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
26

Cole, Gary. "A Monte Carlo study of the effects of four factors on the effectiveness of the LZ and ECIZ4 appropriateness indices." Thesis, University of Ottawa (Canada), 1994. http://hdl.handle.net/10393/6538.

Full text
Abstract:
While a test score may be valid for a group there may sometimes be reason to suspect its validity for an individual. Unusual examinee response patterns may indicate that the test may be invalid for the individual and quantitative measures called appropriateness indices have been developed to detect these unusual patterns. For a number of reasons, Lz and ECIZ4 have so far proven to be two of the most useful of these indices. There were three purposes for this study. The first purpose was to investigate the effects of four variables on the cutoff values of the indices: the range of the distribution of the b parameter (Diff), the level of the a parameter (Disc), IRT model (Model), and sample size used to estimate item parameters (Sampsiz). The second purpose was to investigate the effects of these same variables on the detection rates for response vectors that were made spuriously high(i.e. high aberrance) and for response vectors that were made spuriously low (i.e. low aberrance). The third purpose was to determine the extent to which detection rates obtained by using cutoff values from the standard normal distribution were similar to those obtained by using cutoff values obtained by simulating non-aberrant response vectors. Two levels were set for each of the four variables. For Diff, a broad and a narrow range of the b parameter was used. For Disc, a high and low level for the a parameter of the test items was used. For Model, the 2PL and 3PL models were used. For Sampsiz, a sample size of 1000 and 2500 was used to estimate item parameters. For each of the 16 combinations of these variables, non-aberrant as well as aberrant response vectors were simulated for a 60 item test. For the aberrant response vectors, both high and low aberrance was created by modifying ten of the test items. Detection rates were obtained at the.01,.05, and.10 false-positive rates using cutoff values based on the distribution of the non-aberrant response vectors and using cutoff values based on a standard normal distribution. The simulation of each combination of conditions was replicated 90 times. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
27

Jennings, Martha. "The robustness of validity and efficiency of the one-sample t test in the presence of normal contamination." Thesis, University of Ottawa (Canada), 1994. http://hdl.handle.net/10393/6629.

Full text
Abstract:
The performance of parametric tests given data which are essentially normal but contain outliers is largely unknown. In this Monte Carlo study the robustness of validity and efficiency for the one-sample location problem are investigated. The Type I error rate and power of the one-sample t test given a normal underlying population are compared with the performance of this test given a systematic range of outlier contamination in the underlying population. Sample sizes of 8, 16, 32, 64, and 128 are included in the design. The robustness of validity results are explored using three sets of regression models. The first set of models is constructed using the parameters of the contamination model and is intended to inform the social science methodologist. The second set of models is constructed using skewness and kurtosis values. A third set of models is developed using an index of contamination proposed by Zumbo (1993). This set of models has practical relevance to the data analyst confronted with outlier contaminated data. Robustness of efficiency results are expressed using both power curves and a proposed fairly stringent criterion for power. In general, the results indicate that the one-sample t test demonstrates fairly stringent robustness of validity for all the symmetric contamination explored. When contamination is asymmetric the Type I error rate becomes inflated as the proportion of contamination increases. If robustness of validity is intact, power is not greatly affected when medium or large effect sizes are examined. This is not necessarily true for small effect sizes and the problems are further exacerbated when sample sizes are also small.
APA, Harvard, Vancouver, ISO, and other styles
28

Audette, Sylvain. "Étude des qualités métrologiques d'un instrument d'observation des comportements déviants en classe d'éducation physique." Thesis, University of Ottawa (Canada), 1994. http://hdl.handle.net/10393/6699.

Full text
Abstract:
Les buts de l'etude sont de proposer une demarche conceptuelle appropriee a la mesure du niveau de discipline en classe et de verifier les qualites metrologiques d'un unstrument d'observation des comportements deviants en classe d'education physique. Le terme "comportement deviant" designe une transgression a une regle en classe. L'instrument elabore par Kennedy (1980) utilise l'observation directe des comportments deviants suivis d'une reaction de l'enseignant et fait l'objet de la presente etude. Dans un premier temps, l'instrument est traduit et adapte en francais et dans un deuxieme temps est soumis a des analyses de fidelite et de validite. Apres une periode d'entrai nement intensive aupres des deux codeurs, 20 groupes-classes ont participe a l'etude. L'analyse des resultats demontre une bonne fidelite de l'instrument en ce qui concerne les 22 categories de comportements, les cinq sous-categories et le total pour la seance. Cependant, la fidelite calculee avec le pourcentage d'accord est faible si on tient compte des erreurs sur le nombre d'eleves impliques. L'indice de validite concourante (0,86) calcule entre la frequence de comportements deviants et une evaluation par l'enseignant par seance s'avere tres bon. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
29

Kennedy, Michael. "The influence of sample size, effect size, and percentage of DIF items on the performance of the Mantel-Haenszel and logistic regression DIF identification procedures." Thesis, University of Ottawa (Canada), 1994. http://hdl.handle.net/10393/6884.

Full text
Abstract:
The frequent use of standardized tests for admission, advancement, and accreditation has increased public awareness of measurement issues, in particular, test and item bias. The logistic regression (LR) and Mantel-Haenszel (MH) procedures are relatively new methods of detecting item bias or differential item functioning (DIF) in tests. In only a few studies has the performance of these two procedures been compared. In the present study, sample size, effect size, and percentage of DIF items in the test were manipulated in order to compare detection rates of uniform DIF by the LR and MH procedures. Simulated data, with known amounts of DIF, were used to evaluate the effects of these variables on DIF detection rates. In detecting uniform DIF, the LR procedure had a slight advantage over the MH procedure at the cost of increased false positive rates. P-value difference was definitely a more accurate measure of the amount of DIF than b value difference. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
30

Ibrahim, Abdul K. "Distribution and power of selected item bias indices: A Monte Carlo study." Thesis, University of Ottawa (Canada), 1992. http://hdl.handle.net/10393/7831.

Full text
Abstract:
This study examines the following DIF procedures--Transformed Item Difficulty (TID), Full Chi-Square, Mantel-Haenszel chi-square, Mantel-Haenszel delta, Logistic Regression, SOS2, SOS4, and Lord's chi-square under three sample sizes, two test lengths, four cases of item discrimination arrangement, and three item difficulty levels. The study is in two parts: The first part examines the distributions of the indices under null (no bias) conditions. The second part deals with the power of the procedures to detect known bias in simulated test data. Agreements among procedures are also addressed. Lord's chi-square certainly appears to perform very well. Its detection rates were very good, and its percentiles were not affected by discrimination level or test length. In retrospect, one would like to know how well it might do at smaller sample sizes. When the tabled values were used, it performed equally well in detecting bias and improved in reducing false positive rates. Of the other indices, the Mantel-Haenszel and the logistic regression indices seemed the best. Camilli chi-square had a number of problems. Its tabled values were not at all useful for detection of bias. The TID was somewhat better but does not have a significance test associated with it. One would need to rely on baseline studies, if one were to use it. For uniform bias either Mantel-Haenszel chi-square or logistic regression would be recommended, while for nonuniform bias logistic regression would be appropriate. It is interesting to note that Lord's chi-square was effective for detecting either kinds of bias. We have been told that sample size is related to chi-square values. For each of the chi square indices the observed values were considerably lower than tabled values. Of course, these were conditions where no bias was present except that which might be randomly induced in data generation. Perhaps it is those instances where bias is truly present that larger sample sizes allow us to more easily identify biased items. Certainly the proportions of biased items detected was greater for large sample sizes for Camilli chi-square, Mantel-Haenszel chi-square, and logistic regression chi-squares.
APA, Harvard, Vancouver, ISO, and other styles
31

Fournier, Charles. "Étude corrélationnelle des liens entre l'anticipation, la préparation et l'autoévaluation, et le résultat à un examen de rendement scolaire." Thesis, University of Ottawa (Canada), 1991. http://hdl.handle.net/10393/7843.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

De, Champlain André F. "Assessing test dimensionality using two approximate chi-square statistics." Thesis, University of Ottawa (Canada), 1992. http://hdl.handle.net/10393/7848.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Brown, Paulette C. "An empirical study of the consistency of differential item functioning detection." Thesis, University of Ottawa (Canada), 1992. http://hdl.handle.net/10393/7928.

Full text
Abstract:
Total test scores of examinees on any given standardized test are used to provide reliable and objective information regarding the overall performance of the test takers. When the probability of successfully responding to a test item is not the same for examinees at the same ability levels, but from different groups, the item functions differentially in favour of one group over the other group. This type of problem, defined as differential item functioning (DIF), creates a disadvantage for members of certain subgroups of test takers. Test items need to be accurate and valid measures for all groups because test results may be used to make significant decisions which may have an impact on the future opportunities available to test takers. Thus, DIF is an issue of concern in the field of educational measurement. The purpose of this study was to investigate how well the Mantel-Haenszel (MH) and logistic regression (LR) procedures perform in the identification of items that function differentially across gender groups and regional groups. Research questions to be answered by this study were concerned with three issues: (1) the detection rates for DIF items and items which did not exhibit DIF, (2) the agreement for the MH and LR methods in the detection of DIF items, and (3) the effectiveness of these indices across sample size and over replications. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
34

Hurley, Noel P. "Resource allocation and student achievement: A microlevel impact study of differential resource inputs on student achievement outcomes." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/9724.

Full text
Abstract:
This study examined the relationships between resource allocation and student achievement using a modified version of a conceptual model designed by Bulcock (1989) within a general model proposed by Guthrie (1988). Five research questions were developed from a review of literature to investigate the relationship between microlevel student input variables and student output variables--both cognitive and affective. The mediating effects of the student perceptions of the quality of school life on student achievement outcomes were also examined. Multiple regression analyses were utilized and data were analyzed at both the individual and school levels. Models were used to investigate the indirect effects of the quality of school life on student achievement outcomes. Substantively meaningful relationships were identified between linguistic resources, language usage and reading outcomes; socioeconomic level, gender, linguistic resources, language usage, and mathematics achievement; gender, student attitudes, and student well-being. All grade eight Newfoundland students (10,146) were the subjects of the study. Participants in the study completed the Canadian Test of Basic Skills (CTBS) and the Bulcock Attitudinal Inventory (BAI). Females scored higher than males on every test of the CTBS and also had more favourable attitudes towards school as measured using the BAI. Urban students outperformed rural students by the equivalent of nearly one year on the CTBS scores. A variable was constructed to test Bernstein's (1961) theory of language discontinuity. Bernstein contended that the further an individual's language code departed from the standard language code in use in that society, the greater the difficulty that person would have in learning. The language code variable was constructed using the language usage score from the CTBS to create a continuous variable. This language code variable proved to be highly explanatory in that it explained a large percentage of the variance in reading achievement outcomes and in mathematics achievement outcomes. The measure for students' perceptions toward their schooling experiences explained a large percentage of the variance of student well-being. Two other noteworthy findings in the present study arose from relationships identified between mathematics achievement and independent variables. A strong relationship was identified between mathematics achievement and socioeconomic level. In general, the higher one's socioeconomic level the greater were the outcome measures in mathematics achievement. Indirect effects analyses produced a significant relationship between gender and mathematics achievement that favoured girls. The construction of the educational production function in the present study proved to be an accurate model. The present study contributed to research in several ways. This is one of the first studies that has employed Quality of School Life indicators as developed in the BAI in an educational production function model. A second contribution was the inclusion of microlevel student linguistic resources as predictors of cognitive achievement outcomes. The third contribution of the present study was the high percentage of variance of cognitive achievement outcomes explained by the modified Bulcock model.
APA, Harvard, Vancouver, ISO, and other styles
35

Boulet, John R. "The effect of nonnormal ability distributions on IRT parameter estimation using full-information and limited-information methods." Thesis, University of Ottawa (Canada), 1996. http://hdl.handle.net/10393/9725.

Full text
Abstract:
The relationship between nonlinear factor analysis (FA) models and Item Response Theory (IRT) models has been well established. Furthermore, in terms of modern measurement theory, the use of nonlinear FA models to describe item-trait relationships is currently becoming more popular and may offer some statistical and/or computational advantages in the analysis of item response data. Both limited-information (LI) and full-information (FI) nonlinear FA models can be used to derive the familiar IRT parameter estimates. In general, the two approaches (LI and FI) are distinguished simply by the extent to which they use information in the data matrix of examinee (subject) responses. The focus of this study was to compare the accuracy and efficiency of IRT parameter estimates (i.e., item difficulty, item discrimination) using both LI and FI nonlinear FA models. A Monte Carlo study was employed to investigate the precision and stability of parameter estimates in situations where (a) the manifest variables (test items) are binary and there is a single underlying normally distributed latent variable and (b) the manifest variables are binary and there is a single underlying latent variable that is not normally distributed. In addition, parameter recovery was explored under various simulated test lengths (number of items) and sample sizes (number of examinees). The results of the study suggest that, for conditions involving a normally distributed latent variable, the limited-information approach incorporated in the NOHARM computer program generally provides more accurate and stable parameter estimates than the theoretically preferred FI estimator incorporated in the TESTFACT computer program. For situations involving a nonnormal distribution of the latent trait, or ability, FI estimation provided a marginally better calibration of the 2-parameter logistic response model. Both estimators were, however, prone to producing item values that were outside of feasible ranges, resulting in poor goodness-of-fit of the estimates. Furthermore, based on the conditions modelled in the study, neither the sample size, the test length, nor the sample size/test length ratio were important in terms of explaining between-program differences in the recovery of the item parameters.
APA, Harvard, Vancouver, ISO, and other styles
36

Brez, Sharon. "Adult learners' perspectives on screening reading ability for patient teaching." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/9879.

Full text
Abstract:
The expectation of greater individual responsibility for health promotion practices and decision making in hospitals is dependent upon knowledgeable consumers. The heavy reliance on printed material for both gathering and disseminating information in hospitals has led to recommendations that literacy screening tests be considered to enhance the efficacy of patient teaching interventions for the significant number of adults with low literacy skills. A qualitative case study design was used to investigate the response of adults with low literacy skills to literacy screening. Data were collected through in depth interviews including an experience using the Rapid Estimate of Adult Literacy in Medicine (REALM) word recognition tool. Analysis was achieved using a constant comparison technique. A conceptual model of response to screening was developed and compared to the Health Belief Model and Knox's Proficiency Theory of adult learning. While all participants supported the principle of screening in the context of the hospital, response to the REALM experience was variable. Factors found to influence responses to screening included perceived risks of illiteracy exposure, perceived risks of non-disclosure during hospitalization and the attribution of characteristics to the hospital leading to it's designation as a "special" place. Specific responses to the REALM were found to be further influenced by a set of individual historic factors. The results have lead to several recommendations for health care professionals considering utilization of literacy screening instruments.
APA, Harvard, Vancouver, ISO, and other styles
37

Patsula, Liane. "A comparison of item parameter estimates and ICCs produced with TESTGRAF and BILOG under different test lengths and sample sizes." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/9889.

Full text
Abstract:
There are many procedures used to estimate IRT parameters; however, among the most popular techniques are those used in the LOGIST and BILOG computer programs. LOGIST requires large numbers of examinees and items (in the order of 1000 or more examinees and 40 or more items) for stable 3PL model parameter estimates. BILOG is a more recent estimation program and, in general, requires smaller numbers of examinees and items than LOGIST for stable 3PL model parameter estimates. It also has been found that, regardless of sample size and test length, BILOG estimates tend to be uniformly more or at least as accurate as LOGIST estimates. For this reason, BILOG is now used as the standard to which new estimation programs are compared. The purpose of this study was to examine the effects of varying sample size (N = 100, 250, 500, and 1000) and test length (20- and 40-item tests) on the accuracy and consistency of 3PL model item parameter estimates and ICCs obtained from TESTGRAF and BILOG. Overall, TESTGRAF seemed to perform better or just as well as BILOG. Where large bias effect sizes existed, in all but one case, TESTGRAF was more accurate than BILOG. TESTGRAF was slightly less accurate than BILOG in estimating the $P(\theta$)'s at high ability levels. Where large efficiency effect sizes existed, in all but two cases, TESTGRAF was more consistent than BILOG. TESTGRAF was slightly less consistent than BILOG in estimating the a parameter with a sample size of 1000 and in estimating the c parameter at all sample sizes. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
38

Mâsse, Louise C. "A presentation and comparison of some new statistical techniques in the analysis of polytomous differential item functioning: A Monte Carlo investigation." Thesis, University of Ottawa (Canada), 1994. http://hdl.handle.net/10393/9904.

Full text
Abstract:
There is a need to develop and investigate methods which can assess the Item Response Differences (IRD) found in all the options of an item. In this study, such an investigation was referred to as Polytomous Differential Item Functioning (PDIF). The purpose of this study was to present and investigate the performance of four new approaches in the assessment of PDIF. The four approaches are a MANOVA (MCO) and a MANCOVA (MCA) approach applied to categorical dependent variables, a Polytomous Logistic Regression (PLR) approach, and an ANOVA analysis based on the item responses quantified by Dual Scaling (DS). In this study the effectiveness of these approaches (MCA, MCO, PLR, and DS) as well as the Log-Linear (LOG) approach of Mellenbergh (1982) were assessed under various conditions of test length, sample size, item difficulty, and the amount and location of PDIF. A two-parameter polytomous logistic regression model was used to generate the data. In this study, only uniform PDIF was introduced in the alternatives of the item. The type of PDIF simulated (e.g. uniform) in this study did not allow for a direct comparison of the nonuniform test of hypothesis between the Logistic (LOG and PLR) approaches and the MAN(C)OVA (MCA and MCO) approaches because the Logistic approaches test for a difference in logits while the MAN(C)OVA approaches test for a difference in proportions. It was shown in this study that varying the probability of choosing the alternatives resulted in uniform logit differences which did not only translate into uniform differences in proportions but also translated into nonuniform differences in proportions. These differences affected the interpretation of the PDIF results because the test of nonuniform PDIF for the Logistic procedures corresponded to a valid test of the null hypothesis while the MAN(C)OVA results for nonuniform PDIF had to be adjusted in order to yield a test which approximated a true test of the null hypothesis. The results of this study lend some optimism to the employment of the MCA and PLR approaches. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
39

Chirchir, Andrew K. "The relationship between teacher training in measurement and classroom assessment procedures in Kenya's secondary schools." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/9930.

Full text
Abstract:
The purpose of this study was to determine teacher use of measurement principles and the factors influencing this use in the assessment of student achievement in the Riftvalley Province (Kenya). Given that most of the assessment in the classroom consists of instruments developed by teachers, a first step in exploring the utility of measurement principles is to investigate the use of these principles in specific assessment areas. This could lead to the determination and the improvement of the fit between measurement training and teacher classroom assessment practices. The study was designed to provide information on teacher use of measurement principles by considering whether teachers had received training in educational measurement principles, how important they perceived these principles to be, and how often they used the principles in the assessment of student achievement. The study was also designed to determine factors influencing the use of measurement principles in schools. The results show that teachers have been trained in the principles of educational measurement. However, there is some indication that measurement training did not effectively address the assessment concerns of many classroom teachers. Teachers do not feel adequately prepared in test construction, marking, and the reporting of student assessment results. The results on the importance of measurement principles provide some clear indication that teachers attach much importance to the principles for test construction, test administration, marking, and the reporting of student assessment results. Teacher interviews revealed that teachers are overwhelmed by the demands associated with Kenya's 8-4-4 system of education. On the basis of the study findings, suggestions were made for improving teacher training in measurement and for further research. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
40

Hadley, Patrick. "The performance of the Mantel-Haenszel and logistic regression dif detection procedures across sample size and effect size: A Monte Carlo study." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/10019.

Full text
Abstract:
In recent years, public attention has become focused on the issue of test and item bias in standardized tests. Since the 1980's, the Mantel-Haenszel (Holland & Thayer, 1986) and Logistic Regression procedures (Swaminathan & Rogers, 1990) have been developed to detect item bias, or differential item functioning (dif). In this study the effectiveness of the MH and LR procedures was compared under a variety of conditions, using simulated data. The ability of the MH and LR to detect dif was tested at sample sizes of 100/100, 200/200, 400/400, 600/600, and 800/800. The simulated test had 66 items, the first 33 items with item discrimination ("a") set at 0.80, the second 33 items with "a" set at 1.20. The pseudo-guessing parameter ("c") was 0.15 for all items. The item difficulty ("b") parameter ranged from $-$2.00 to 2.00 in increments of 0.125 for the first 33 items, and again for the second 33 items. Both the MH and LRU detected dif with a high degree of success whenever sample size was large (600 or more), especially when effect size, no matter how measured, was also large. The LRU outperformed the MH marginally under almost every condition of the study. However, the LRU also had a higher false-positive rate than the MH, a finding consistent with previous studies (Pang et al., 1994, Tian et al., 1994a, 1994b). Since the "a" and "b" parameters which underly the computation of the three measures of effect size used in the study are not always determinable in data derived from real world test administrations, it may be that the $\Delta\sb{\rm MH}$ is the best available measure of effect size in real world test items. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
41

Tian, Fang. "The performance of the Mantel-Haenszel and logistic regression DIF identification procedures with real data." Thesis, University of Ottawa (Canada), 1994. http://hdl.handle.net/10393/10028.

Full text
Abstract:
Numerous statistical methods have been proposed for detecting differential item functioning (DIF). Among them, methods based on item response theory (IRT) are theoretically preferred but very complicated and expensive to implement. As an alternative, the Mantel-Haenszel (MH) procedure has emerged as one of the most popular procedures because of its ease of implementation, relatively small sample size requirement, and associated test of significance. In addition, it provides a measure of the amount and direction of DIF. However, the MH procedure is not designed for and therefore not very effective in detecting nonuniform DIF. As an extension of the MH procedure, a more general DIF detection method, a logistic regression procedure (LR) has been shown to be powerful in detecting both uniform and nonuniform DIF. The purpose of this study is to examine the consistency of the MH and LR procedures and their agreement in the identification of DIF across sample size and criterion when using real examinee data. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
42

Charland, Julie Marie Lise. "Étude des critères d'évaluation d'un stage d'enseignement." Thesis, University of Ottawa (Canada), 1996. http://hdl.handle.net/10393/10030.

Full text
Abstract:
Cette these porte sur les criteres d'evaluation des stages d'enseignement en Ontario francais. La recension des ecrits comprend des recherches sur certains aspects de cette evaluation ainsi que des analyses de formulaires d'evaluation de stages employes par differentes universites. Il ressort de la recension des ecrits qu'il n'y a pas de consensus sur les criteres de reussite d'un stage d'enseignement. Consequemment, la presente etude pose les questions de recherche suivantes: (1) Quels sont les criteres d'evaluation consideres par le panel d'experts comme essentiels a la reussite d'un stage d'enseignement? (2) Quels sont les criteres d'evaluation consideres par le panel d'experts comme complementaires a la reussite d'un stage d'enseignement? Pour y repondre, une technique Delphi a ete employee. L'objectif etait d'obtenir un consensus parmi des experts de quatre groupes: les enseignants associes, les etudiants mai tres, les superviseurs universitaires et les administrateurs. Les deux premieres rondes ont permis aux participants de chaque groupe de suggerer des criteres et d'en juger l'importance au sein de leur groupe alors qu'au troisieme et dernier tour, les criteres suggeres par tous les groupes etaient presentes a l'ensemble des participants. Au terme de l'exercice, il y a eu un consensus sur 32 criteres juges essentiels et quatre juges complementaires. Ces criteres sont repartis sous quatre themes: la facon d'enseigner, la gestion de la classe, les responsabilites professionnelles et les relations interpersonnelles. Quatorze des criteres identifies par les experts n'avaient pas ete releves dans les recherches anterieures ni dans les formulaires consultes.
APA, Harvard, Vancouver, ISO, and other styles
43

Bonadie, Jenelle N. "Evaluation of a mental skills training program implemented by an elementary classroom teacher." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/10088.

Full text
Abstract:
The purpose of this study was to implement Orlick's (1993) mental skills/life skills training program and assess the extent to which children (1) learned to relax themselves at will, (2) successfully implemented stress control strategies, and (3) improved the frequency of their highlights (any simple pleasure, joy, or other positive experience that improves the quality of one's day). Two intact classes of grade 2 children took part in the study. One class served as the experimental group, while the other class served as the control group. The usual classroom teacher delivered the program 4 to 5 times per week, for 9 consecutive weeks. Each intervention session was 10 to 15 min in length. Significant positive effects were found in the experimental group with respect to the children's abilities to lower their heart rates at will and successfully implement relaxation and stress control strategies in their daily lives. Children in the experimental group also experienced a significant increase in the frequency of their highlights over the course of the intervention period. The results suggested that children in grade 2 can (1) learn to relax themselves at will as measured by heart rate, (2) successfully implement stress control strategies, and (3) improve the frequency of their highlight experiences when the usual classroom teacher delivers Orlick's (1993) mental skills training program.
APA, Harvard, Vancouver, ISO, and other styles
44

Labrecque, Monique. "Adaptation et validation d'un modèle d'évaluation des besoins auprès des détenteurs d'enjeux concernés par un programme universitaire de formation initiale en sciences infirmières." Thesis, University of Ottawa (Canada), 1996. http://hdl.handle.net/10393/10104.

Full text
Abstract:
Les buts vises par cette recherche etaient en premier lieu, d'adapter et de valider un modele d'evaluation des besoins et en second lieu, d'identifier les besoins des detenteurs d'enjeux concernes par un programme universitaire de formation initiale en sciences infirmieres. La recension des ecrits nous a permis de constater qu'aucun auteur n'avait tente d'identifier des besoins a partir de la definition du concept de besoin de Guba et Lincoln (1982) basee sur la notion d'ecart, du benefice significatif et de l'etat insatisfaisant et d'en verifier la validite en s'assurant que la presence des conditions definies au niveau de l'etat cible soit significativement benefique au sujet et que leur absence le blesse, l'indispose ou lui impose une contrainte. De plus, la majorite des evaluations de besoins effectuees ne tenaient pas compte du contexte local et des valeurs vehiculees par les personnes concernees. L'adaptation du processus d'evaluation des besoins de Guba et Lincoln (1982) presente dans cette recherche prend en consideration les valeurs des personnes concernees par un programme universitaire de formation initiale en sciences infirmieres, tout en tenant compte du contexte local dans lequel se deroulait l'evaluation des besoins. Le present modele comprend quatre etapes principales: (1) l'identification du ou des domaines cibles par les participantes; (2) l'identification des besoins potentiels par les participantes; (3) la verification de l'authenticite des besoins potentiels par celles-ci et (4) la mise en priorite des besoins authentiques par ces dernieres. La deuxieme etape, c'est-a-dire l'identification des besoins potentiels, comprend quatre phases: (1) la definition du niveau "minimum" par les participantes en fonction des perspectives idiographique et nomothetique; (2) la determination de la grandeur de l'ecart entre l'etat cible et l'etat actuel que les participantes considerent comme significative; (3) l'identification des etats cibles en termes de performance et de moyens et (4) l'identification des ecarts. La triangulation utilisee pour valider le modele d'evaluation des besoins comprend quatre methodes de collecte des donnees: les entrevues de groupe (focus group), les questionnaires, l'approche hermeneutique et l'estimation de l'amplitude lors de la priorisation des besoins. Les participantes provenaient de cinq groupes differents: les administratrices, les professeures, les representantes des milieux hospitaliers et communautaires et les graduees du programme. Les resultats obtenus ont permis de conserver la majorite des etapes du modele. L'auteure considere cependant qu'il n'est pas necessaire de demander aux participantes de definir le niveau "minimal" lorsqu'on traite des moyens car ceux-ci sont en fait des interventions qui devront etre effectuees pour faciliter l'acquisition et la mai trise des habiletes cognitives, motrices et affectives par les graduees dans le futur. L'inclusion des moyens lors de la phase d'identification des ecarts ne s'avere pas necessaire pour les memes raisons. La phase de la determination d'une grandeur significative doit etre conservee selon l'ordre presente dans le modele. L'attribution de cette valeur apres la phase d'identification des ecarts pourrait permettre l'inclusion de besoins futurs dans la categorie des besoins de base, et de ce fait, eviter de nombreuses modifications au sein du programme ainsi que l'ajout de nouvelles ressources. L'impossibilite de discerner individuellement les besoins authentiques des besoins potentiels peut etre attribuable, en partie, a la methodologie utilisee ou aux repondantes. Une nouvelle validation du modele aupres d'un nombre plus eleve de repondantes, tout en considerant les suggestions relatives a la methodologie, est donc recommandee. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
45

Mycio-Mommers, Luba. "An investigation of the robustness of the Type I error rate of the t, M-W-W, Welch and Welch on ranks tests applied to reaction time populations with unequal variances." Thesis, University of Ottawa (Canada), 1995. http://hdl.handle.net/10393/10419.

Full text
Abstract:
This thesis investigates the robustness of the Type I error rate of the t test, the t test on ranks (Mann-Whitney-Wilcoxon), the Welch test and the Welch test on ranks applied to reaction time populations with unequal variances. Reaction time is often encountered as a dependent variable in educational and behavioural research. Reaction time data are typically skewed and are commonly modelled on a family of distributions known as the ex-Gaussian. A Monte Carlo study compared the robustness of type I error rates of the four tests under study under 36 conditions wherein four factors were observed: total sample size ($N=24$ and 72), ratio of sample sizes ($n\sb1:n\sb2$ = 1:1, 1:2, and 1:3); ratio of population variances (var1:var2 = 1:1, 1:2, 1:4, and 1:9), and negative and positive conditions. In each condition, 5,000 scores were generated from Miller's (1988) most skewed distribution that represented a boundary condition of reaction time data. The results indicated that the t test was the preferred option under all simulated conditions, except the negative condition. Furthermore, under the negative condition, all four tests produced liberal Type I errors.
APA, Harvard, Vancouver, ISO, and other styles
46

Korir, Daniel K. "The effects of item difficulty and examinee ability on the distribution and effectiveness of LZ and ECIZ4 appropriateness indices." Thesis, University of Ottawa (Canada), 1992. http://hdl.handle.net/10393/10488.

Full text
Abstract:
Test scores are intended to provide a measure of examinee's estimate of ability. High ability examinees are expected to get few easy items wrong and low ability examinees are exepcted to get few difficult items right. But there are occasions when the test-taking behavior of some atypical examinees may be so unsual that their test scores cannot be regarded as an appropriate measure of ability. An atypical examinee can have a spuriously low or a spuriously high score. However, appropriateness indices can be used to identify examinees with potentially inaccurate total scores. Appropriateness indices provide quantitative, measures of response pattern atypicality. These indices fall into two major categories: (a) IRT-based and (b) non-IRT based indices. The dependency of non-IRT based indices on the item difficulty order of a particular group has rendered them inadequate for detecting aberrant reponse patterns. IRT-based indices are group invariant. Researchers have investigated the effectiveness and the distributions of these indices under varying conditions of testing. However, some test situations might require efficient and accurate indices of appropriateness measurement for restricted samples. It might be helpful, for example, to accurately identify examinees with potential spuriously low scores falling just the below the criterion of a minimum competency test, on a certification test, it might be helpful to concentrate on identifying examinees with spuriously high scores. Therefore, the effects of item difficulty 7 and examinee ability distributions on the effectiveness and the distributional characeristics of LZ and ECIZ4 (IRT-based) appropriateness indices were investigated in this study. To examine the effects of item difficulty and ability distributions on the distributional characteristics of LZ and ECIZ4, data were generated in nine combinations of item difficulty and ability distributions to simulate the responses of 2000 examinees to 60 test items according to the three-parameter model. Three uniform distributions of item difficulty were used. Items typical of diagnostic tests were generated in the interval -3.0 to +1.2; items typical of power tests were generated in the interval of -3.0 to +3.0; and items typical of certification and licencing tests were generated in the interval of -1.2 to +3.0. Three distributions of ability were used. Thetas typical of low, medium, and high ability examinees were generated to have normal distributions with the means of -1.2, 0.0, and +1.2 respectively and each with a standard deviation of 0.6. The mean, standard deviation, skewness, kurtosis, and the percentile estimates of LZ and ECIZ4 were significantly affected by the variations of item difficulty and ability distributions. The distributions of the two indices approximated a normal distribution when the ability estimates matched the item difficulty. Overall, the distributions of LZ approximated a normal distribution better than the distribution of ECIZ4. To examine the effectiveness of LZ and ECIZ4 in detecting aberrant response patterns, two samples, each consisting of 500 response patterns (for spuriously low and spuriously high) were generated for each of the nine combinations of item difficulty and ability distribution and subjected to spurious treatments. Twenty percent and 10% spuriously high scores were created by randomly selecting 20% or 10% of the original responses and changing incorrect answers to correct. Twenty percent and 10% spuriously low scores were created by randomly selecting 20% or 10% of the original responses and changing correct answers to incorrect. The percentile estimates obtained were used as cutoff points to classify response patterns as aberrant or non-aberrant. Spuriously low aberrant response patterns were easier to detect by the two indices under the low item difficulty and spuriously high aberrant response patterns were easier to detect under high item difficulty. At low (0.01 and 0.05) false positive rates, LZ had higher detection rates of spuriously high and spuriously low aberrant response patterns than ECIZ4 under the high item difficulty; and ECIZ4 had higher detection rates than LZ under the medium and under the low item difficulty. Twenty percent treatment samples were easier to detect by the two indices than the 10% treatment samples.
APA, Harvard, Vancouver, ISO, and other styles
47

Ochieng, Charles M. O. "Examination of the distribution of the logistic regression and the Mantel-Haenszel statistics under different conditions of the null hypothesis: A Monte Carlo study." Thesis, University of Ottawa (Canada), 1992. http://hdl.handle.net/10393/10899.

Full text
Abstract:
Educators and practitioners have been striving for bias-free tests for the last few decades. As a result of this, several indices for bias detection have been developed, among which are the logistic regression and Mantel-Haenszel procedures. However, the effects of variables other than DIF on the performance of the logistic regression and Mantel-Haenszel indices have yet to be researched. The present study examines the effects of sample size, item difficulty, item discrimination, and ability distribution on the distributions and percentiles (P90 and P95) of logistic regression and Mantel-Haenszel statistics under the null hypothesis. Simulated data were used in order to evaluate the effects of the stated variables on the distributions of the logistic regression indices of uniform (LU) and nonuniform (LN) differential item functioning DIF or item bias. The same simulated data were used to evaluate the effects of the variables on Mantel-Haenzel procedure (MH-delta and MH-CHISQ). Results of this study show that the logistic regression procedure has advantages over the MH procedures, taking into account the effects of the independent variables studied. This is evident from the fact that the distribution of LN and LU index are known and that the four independent variables had no significant effect on the LU index. However the observed values were notably larger than expected values. Further research should be done to evaluate the effects of the stated variables and others such as test length, and using data with known amount of dif. Generalization of this study should be proved by replications of its findings. Evidently, variables other than DIF, significantly influence the two procedures. (Abstract shortened by UMI.)
APA, Harvard, Vancouver, ISO, and other styles
48

Halsall, Nancy Diane. "An investigation of the effectiveness of three judgmental techniques under two degrees of domain elaboration in establishing the item-domain congruence of criterion-referenced test items." Thesis, University of Ottawa (Canada), 1989. http://hdl.handle.net/10393/21094.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Boulé, Serge. "Utilisation du degré de certitude dans un contexte d'évaluation diagnostique critériée." Thesis, University of Ottawa (Canada), 2007. http://hdl.handle.net/10393/27814.

Full text
Abstract:
À l'origine, la recherche sur le degré de certitude (DC) et son utilisation se sont développées dans le cadre d'évaluations certificatives normees. Dans un contexte d'évaluation pour l'apprentissage, des indicateurs et des profils de réalisme sont développés pour vérifier à quel point un étudiant peut tirer profit d'un tel feedback. Dans un contexte diagnostique le degré de certitude sert à identifier les items pour lesquels l'étudiant n'a pas répondu au hasard et a surestimé ses habiletés sans le savoir. Ces "évènements critiques" nécessitent une remédiation. Cette recherche étudie le degré de réalisme en tenant compte des différences individuelles, des caractéristiques métriques et des demandes cognitives provenant des items. Les questions de recherche se formulent ainsi: Comment et dans quelle mesure l'expression du degré de certitude en la réponse choisie, et par conséquent le réalisme des étudiants, varient-ils en fonction du niveau taxonomique des items et de leurs propriétes métriques? Comment et dans quelle mesure le sexe et le niveau de performance des étudiants sont-ils liés à l'expression du degré de certitude en la réponse choisie, et par conséquent au réalisme de l'étudiant? L'échantillon visé se compose de 152 participants âgés de 18 ans et plus. Le test est de type questionnaire à choix multiples et comprend 40 items avec 4 choix de réponses. En plus des réponses aux items, l'étudiant doit exprimer son DC par rapport à ses reponses. Notre analyse permet de conclure que plus les étudiants sont performants, plus ils sont réalistes par rapport à la qualité de leurs réponses. Il ne semble pas y avoir de différences marquées entre les hommes et les femmes. Nous avons néanmoins remarqué des inégalités dans la linéarité des distributions. Plus l'item est facile, plus les étudiants sont réalistes. Il existe une relation probable entre le coefficient de discrimination d'un item et le réalisme. Le lien entre les niveaux taxonomiques des items et le réalisme des étudiants n'est pas clair et mérite d'être étudié ultérieurement en utilisant un devis construit spécifiquement à cette fin. Mots clés. Degré de certitude, regulation des apprentissages, test critérié, test diagnostique, métacognition, connaissance partielle.
APA, Harvard, Vancouver, ISO, and other styles
50

Rizzo, Michelin Linda L. "Concept mapping in evaluation practice and theory, a synthesis of current empirical research." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape15/PQDD_0003/MQ36724.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography