学位论文 > 优秀研究生学位论文题录展示

PETS三级考试中作文评分效度研究

作　者: 赵海永
导　师: 修旭东
学　校: 鲁东大学
专　业: 英语语言文学
关键词: 写作测试评分效度全国英语等级考试(三级)
分类号: H319
类　型: 硕士论文
年　份: 2007年
下　载: 154次
引　用: 0次
阅　读: 论文下载

内容摘要

写作测试是语言测试不可或缺的一部分。许多大规模的英语测试,例如,CET, TEM, TOEFL,都把写作作为测试的一个重要部分,PETS考试也不例外。但因受多种因素影响,其效度很难得到保证。其中评分是影响其效度的最重要因素之一。作为一种大规模测试,PETS考试已经并将受到更多的重视,但目前尚未见到关于PETS考试中写作评分效度的相关研究。本文的研究问题是:PETS三级考试中作文评分在多大程度上有效?为了较全面的阐述本研究问题,本文将回答以下两个具体问题:(1)2004年3月份和2005年3月份PETS三级考试中的作文原始分数评分员和参照分数评分员分别在多大程度上一致?(2)2004年3月份和2005年3月份PETS三级考试中作文原始分数在多大程度上相关?本研究基于PETS语料库,采用系统抽样的方法抽取作文样本,对所抽取的样本进行多人重新评分来获取参照分数以便与试卷上的原始分数进行比较,并对所得数据进行相关分析和T检验。研究结果显示:1)2004年3月份PETS三级考试中作文的原始分数和参照分数呈显著性相关(r=0.869, P=0.000<0.01),并且二者的平均分之间不存在显著性差异,但是原始分数的平均分比参照分数的平均分高。这说明虽然原始分数的评分员和参照分数的评分员对2004年3月份PETS三级考试中的作文评判总体一致,但原始分数的评分员对作文的评判总体较宽松。2)2005年3月份PETS三级考试中作文的原始分数和参照分数呈显著性相关(r=0.798, P=0.000<0.01),但是二者的平均分在0.01水平上呈显著性差异。这说明虽然原始分数的评分员和参照分数的评分员对2005年3月份PETS三级考试中的作文评判总体一致,但他们的平均分之间存在显著性差异。此外,原始分数的平均分比参照分数的平均分高,这说明原始分数的评分员对作文的评判总体较宽松。3)2004年3月份PETS三级考试中作文原始分数的平均分和标准差比2005年3月份PETS三级考试中作文原始分数的平均分和标准差低。二者不仅不相关,而且平均分之间存在显著性差异。这说明PETS三级考试中作文的跨年度间的评分可能存在不等值的现象。结果还显示,2005年3月份PETS三级考试中作文的题目要求不具体,没有详细、清晰地向考生说明写作程序及评分方法,导致部分考生不能准确把握写作内容。此外,评分员也不清楚该如何权衡分数的比重。本研究结果的启示是,为了确保评分的稳定性和可靠性,PETS考试中心不仅要保证试题库的稳定,而且还要有一支合格的,稳定的,公正的评分员队伍。此外,借鉴CET和TEM考试的研究成果,PETS考试中心还应该加强试题开发,确保试题更科学,更标准。

全文目录

Acknowledgements  4-5
Abstract (English)  5-7
Abstract (Chinese)  7-12
List of Tables and Figures  12-14
List of Abbreviations  14-15
Introduction  15-17
Chapter One Literature Review  17-42
  1.1 Importance and Development of Language Testing  17-19
    1.1.1 The pre-scientific phase  17-18
    1.1.2 The psychometric-structuralist phase  18
    1.1.3 The psycholinguistic-sociolinguistic phase  18-19
  1.2 Validity and Validation  19-20
  1.3 Aspects of Validity Evidence  20-25
    1.3.1 Theory-based validity  21-22
    1.3.2 Context validity  22-23
    1.3.3 Scoring validity  23-24
    1.3.4 Criterion-related validity  24-25
    1.3.5 Consequential validity  25
  1.4 The Parameters of Scoring Validity  25-37
    1.4.1 Rating scale  26-29
    1.4.2 Rating procedure  29-34
    1.4.3 Raters  34-35
    1.4.4 Grading and awarding  35-37
  1.5 A Brief History of Writing Assessment  37-38
  1.6 Nature of Writing Assessment  38-39
  1.7 An Introduction to PETS3  39-41
  1.8 Research Questions  41
  1.9 Summary  41-42
Chapter Two Methodology  42-45
  2.1 Participants  42
  2.2 Instruments  42-43
  2.3 Procedures of Data Collection  43-45
    2.3.1 Selecting raters  43
    2.3.2 Rater training  43-44
    2.3.3 Rating  44-45
Chapter Three Results and Discussion  45-77
  3.1 Discussion on the Rating Reliability of Three Raters  45-48
  3.2 Discussion on the Relationship Between Original Ratings and Standardized Ratings on Essays in PETS3 in March 2004  48-51
    3.2.1 Discussion on the descriptive statistics of original ratings and standardized ratings on essays in PETS3 in March 2004  48-50
    3.2.2 Discussion on the correlation between original ratings and standardized ratings on essays in PETS3 in March 2004  50-51
    3.2.3 Discussion on the mean comparison between original ratings and standardized ratings on essays in PETS3 in March 2004  51
  3.3 Discussion on the Relationship Between Original Ratings and Standardized Ratings on Essays in PETS3 in March 2005  51-61
    3.3.1 Discussion on the descriptive statistics of original ratings and standardized ratings on essays in PETS3 in March 2005  51-53
    3.3.2 Discussion on the correlation between original ratings and standardized ratings on essays in PETS3 in March 2005  53-54
    3.3.3 Discussion on the mean comparison between original ratings and standardized ratings on essays in PETS3 in March 2005  54-57
    3.3.4 Discussion on mean comparison of essays in higher score groups  57-58
    3.3.5 Discussion on mean comparison of essays in lower score groups  58-59
    3.3.6 Discussion on mean comparison of essays in medium score groups  59-61
  3.4 Discussion on the Relationship Between Original Ratings on Essays in PETS3 in March 2004 and Those in March 2005  61-66
    3.4.1 Discussion on the descriptive statistics of ratings in March 2004 and those in March 2005  61-62
    3.4.2 Discussion on the correlation between ratings in March 2004 and those in March 2005  62-65
    3.4.3 Discussion on the mean comparison between ratings in March 2004 and those in March 2005  65-66
  3.5 Discussion on Results of Qualitative Analysis of the Writing Task  66-76
    3.5.1 Discussion on the writing task in PETS3 in March 2005  66-71
    3.5.2 Comparison of raters on poor essay: Analysis of script 262  71-72
    3.5.3 Comparison of raters on good essay: Analysis of script 112  72-73
    3.5.4 Divergence between original and standardized raters:Analysis of script 187 and script 397  73-76
  3.6 Summary  76-77
Chapter Four Conclusion  77-82
  4.1 Findings and Answers to the Research Questions  77-79
  4.2 Implications for Writing Assessment in PETS and High-stake Tests  79-80
  4.3 Limitations of the Study and Recommendations for Future Research  80-82
Bibliography  82-88
Appendix 1 Rating Principles for Essays in PETS3  88-89
Appendix 2 Scoring Criteria for Essays in PETS3  89-90
Appendix 3 Training Procedure (Ⅰ)  90-91
Appendix 4 Training Procedure (Ⅱ)  91-92
Appendix 5 Publications  92

PETS三级考试中作文评分效度研究

内容摘要

全文目录

相似论文