10/29/2009

The Automated Essay Scoring Systems for Writers

As mentioned earlier, this study was designed to evaluate, within the constraints of the methods presented, the validity of computer-generated essay scores as substitutes for scores assigned by raters, given the score interpretation and use specified by users of such scores.

The principal objective of the study was to establish whether evidence supporting a parallelism of automated c scoring system computational processes with rater cognitive processes exists.
However, while a claim of construct equivalence of the two scoring processes hinges on this evidence, there are five other sources of evidence relevant to the validity claim overall. Ultimately, supportive evidence is required jointly from all six of these sources to substantiate a validity claim for the specified interpretations and uses of the scores.

essay writer
First, an overview of the e-rater automated essay scoring system, the GRE Writing Assessment, and the study sample is provided. Then, the procedures followed are presented in six phases, paralleling the six aspects of Messick's (1995) unified construct validity concept.

Briefly, content relevance and representativeness of the e-rater models were gauged by the comprehensiveness with which the factor structure identified for each e-rater model appeared to represent the constructs of writing measured by the GRE Writing Assessment.

The factors guided the construction of factor-specific scoring rubrics and corresponding factor-specific e-rater submodels. Reflectivity of the task and domain structures was appraised from expert reviews of the rubrics and submodels. Of particular importance were experts' judgments of the likelihood that the rubrics would prompt rater engagement in desired cognitive processes, not merely in counting, and the adequacy with which the e-rater submodels reflected the factors they subsumed.

The degree of rater engagement in substantive theories and process models was evidenced from the contents of "think- aloud" protocols transcribed from verbalized mock essay scoring sessions and from the strength of correlations of factor-specific with holistic scores of essays scored by raters and by e-rater.

The degree of e-rater score convergent and discriminant correlations with external variables was evidenced from the magnitudes of correlations of e-rater scores generated across tasks within the GRE Writing Assessment program and of e-rater scores generated for essays written for a different essay test program.

Similarly, evidence of the generalizability and boundaries of score meaning was manifested from the magnitudes of correlations of generic-model with prompt-specific model e- rater scores and with rater-assigned scores across all six prompts used in the study, as well as from the consistency e-rater scores exhibited when a key distributional assumption underlying the scoring models was changed.

Finally, consequences as validity evidence were elucidated by a stratified random survey of graduate program admissions decision-makers that (a) identified actual and potential interpretations and uses of partially computergenerated essay scores.

No comments:

Post a Comment