Education

Is there room for subjectivity in USE assessment? Using Russian essay scoring as an example

Is there room for subjectivity in USE assessment? Using Russian essay scoring as an example

Course with employment: "The profession of a methodologist from scratch to PRO"

Find out more

A teacher of Russian language and literature in Khimki, Moscow Region, is an expert on the Unified State Exam in Russian. His professional skills and deep knowledge of the subject help students successfully prepare for exams and achieve high results. Thanks to an individual approach to each student, he promotes the development of not only knowledge, but also a love of literature and language. The training includes analysis of works, work on literacy, and test preparation, making the lessons effective and engaging. The teacher actively uses modern methods and resources, which allows the learning process to be adapted to the specific needs of students.

To check the Unified State Exam, teachers take the exam themselves

Is your participation in the review committee the result of a personal choice or is it an offer that is hard to refuse?

I have been reviewing the Basic State Exam since its introduction, and I only joined the Unified State Exam Review Committee two years ago. This decision was made consciously, since becoming an expert is only possible by personal desire. An application for selection as an expert must be submitted at the beginning of the school year through the school vice-principal. To become a Unified State Exam expert, a teacher must have a certain expert qualification category - first or highest. The selection process takes into account teaching experience and experience working in graduating classes. After this, students must complete training, which runs from January to March, and pass a final exam.

There are significantly fewer USE experts than for the OGE. With the OGE, most teachers in a school can participate in marking exam papers. This is due to the fact that the number of eleventh-graders is lower than that of ninth-graders, resulting in a smaller volume of exam papers. Furthermore, preparing for and assessing the USE requires a more serious approach, as results directly impact university admission and determine the level of competition among applicants.

Teachers aspiring to become USE experts must undergo specialized training. This training can be either a one-time event or require annual refresher training. Regular professional development helps teachers stay abreast of current changes in the examination system and assessment methods. Thus, continuous professional development is key for USE experts.

Each year, the USE preparation process begins anew, taking into account all updates and changes in the examination system. We conduct both in-person and remote classes, during which we complete numerous independent assignments and assessments. These tasks help us assess our readiness for the final assessment. At the end of each academic year, a final assessment is held, the results of which determine whether a teacher can become a Unified State Exam (USE) expert for the following year.

I began assessing the USE because I believe this process is important for ensuring fairness and objectivity in education. My motivation lies in the desire to contribute to the quality of education and help students achieve the grades they deserve. Although this is an additional burden, I see it as an opportunity to improve my professional skills and participate in an important social process. Assessing the USE is not only a responsibility but also a chance to develop, which makes this work meaningful and in-demand.

For a long time, I prepared children for exams simply by studying all the necessary materials, and this was enough to understand the topic. However, the essay requirements began to become more complex, which prompted me to sign up for assessment. I became interested in learning about the internal workings of this process and what nuances are important for the evaluator. Without this insider perspective, it was difficult to determine what exactly to focus on when teaching children.

Secondly, it piques my interest. For a practicing teacher, this is a chance to evaluate the performance of not only their own students but also other graduates, analyzing their level of intelligence and reasoning. You might notice that one year's graduating class is distinguished by a high level of intellectual activity, while another has weak essays. This process allows for a better understanding of the dynamics of the educational process and the identification of learning trends.

Teachers receive additional pay for their additional workload, but conditions may vary depending on the educational institution and region. This work also provides a number of benefits, such as the opportunity to improve their qualifications, develop professional skills, and positively influence career advancement. It is important to keep in mind that additional pay and benefits may depend on the specific working conditions and policies of the institution.

Payment for grading assignments is based on the number of completed assessments. Time spent on the courses is unpaid. The school does not track this activity, and it is left to the individual's discretion. However, there is a positive aspect: participants in the Unified State Exam (USE) preparatory courses receive certificates of advanced training, which is an important aspect of teacher certification.

I am a Russian language and literature teacher in Khimki, Moscow Region. I am a Unified State Exam (USE) expert in Russian, possessing extensive knowledge and teaching experience. My goal is to help students master Russian language and literature, prepare for exams, and develop critical thinking. I provide an individual approach to each student, taking into account their unique characteristics and needs. I work to improve my language proficiency, reading, and literary analysis skills, which contributes to successful passing of the Unified State Exam and further education.

Under the cameras, but without the stress: how the inspection is conducted

The inspection process includes both technical and organizational aspects. On the technical side, the inspection is carried out using specialized software and tools that help identify errors and shortcomings in systems and documentation. Organizationally, the process includes a clear distribution of responsibilities among participants, defining inspection stages and deadlines. An important element is the creation of protocols and reports that record the inspection results and allow them to be analyzed for further process improvement. Effective inspection requires teamwork and the continuous improvement of methods and tools.

The day after the Unified State Exam, inspectors gather at the inspection point, which is selected each year depending on the building's capacity. The examination papers are checked over the course of 3-7 days. The minimum inspection period is three days, but the most common process takes five. If any controversial issues arise, the inspection may last up to a week, which requires the involvement of a third expert, as standard practice is for two experts to evaluate each paper.

Inspectors are located in classrooms, one per desk. Last year, during the pandemic, the number of students in class was significantly reduced, and we all wore masks, strictly adhering to sanitary standards. This ensured safety and minimized the risk of spreading the virus.

The environment at the inspection site can be stressful for the examiners. Experts often wonder whether they need to hand over phones and personal belongings, as is the case with Unified State Exam (USE) participants. It is important to clarify whether it is possible to leave the classroom at any time or only during the scheduled lunch break. These aspects can significantly impact the examiners' concentration and performance throughout the process.

The inspection environment is stress-free. We conduct inspections under camera surveillance, which does not create significant discomfort. Phones are not handed over, but they must be placed in bags, which are stored in a separate cabinet during work. We bring water, black gel pens, pencils, and erasers, as we grade printed scans of work, not originals, which makes it easy to mark errors. If necessary, we can go out for a snack, make a call, or just take a walk, but communicating with other graders is prohibited. It is important to remember that taking graded work out of the classroom is strictly prohibited: if I leave the room, the work must remain on my desk.

Each person determines their own lunch break time. Given that employees come from different locations, each person may have different needs. For example, someone who started the day with an early rise may feel hungry within an hour of starting work, while someone else can work undisturbed until the end of the grade. This approach allows everyone to work at their own pace, creating a comfortable environment for everyone.

You mentioned that exam papers are provided for grading as a paper scanned copy. I wonder if this refers to the full version of the completed USE form, including the graduate's last name, first name, and patronymic, as well as the computer-processed test questions, or only the essay portion?

We only receive copies of the essays; we do not see the test portion of the work. We also do not have access to graduates' personal data, as each work has a unique identification code by which it is recognized.

The full name of the expert who will evaluate it is indicated on each work in advance, and the selection is made randomly. If an expert is unable to attend the evaluation due to illness, their work will be reprinted with the name of another expert. It is not possible to arbitrarily evaluate work for other teachers, as it is necessary to accurately indicate who exactly conducted the evaluation. Therefore, experts are personally responsible for the quality of the evaluation, which is also important for accounting purposes.

Each expert receives a set of essays, which are accompanied by a scoring sheet. This helps organize the evaluation process and ensures transparency in grading. Each expert must carefully review the evaluation criteria to evaluate the submitted work as objectively as possible.

The number of works I review in a single day depends on their complexity and volume. On average, I can review 10 to 20 works, devoting attention to each one to achieve high accuracy and quality. This ensures thorough review and correction, which, in turn, contributes to improving the overall level of content.

Each expert is initially given approximately 20-30 papers to check. If the expert is ready and able to evaluate a larger number, they may be given additional materials. I am capable of checking 40 to 60 papers per day. The speed of checking depends on each individual expert: some complete the process in three hours, while others require more time.

It often feels like information fading towards the middle or end of the process. This may be due to fatigue or a lack of concentration, which negatively impacts the quality of task performance. It is important to know how to maintain attention and avoid a drop in productivity during lengthy processes. Proper time management techniques and breaks can help maintain clarity of thought and high work efficiency.

Checking the first paper is the most challenging. Immersing yourself in the text, understanding the problem, and the essence of the essay topic requires time and effort. However, subsequent checks become easier, as the expert is usually given one version of the papers, meaning that all essays are written on the same topic. This simplifies the process considerably. After checking 10-15 papers, it is necessary to take a break – drink some water or go outside to avoid eye strain and maintain concentration.

The subjective factor: how to evaluate if in doubt?

Each paper is checked by two experts, who evaluate it independently. This ensures the objectivity and accuracy of the evaluation results, as well as minimizes the influence of personal opinions on the final grade. This approach contributes to a fairer and more comprehensive analysis of each paper.

Each expert evaluates the work independently of the others, without knowing who else is checking the same work. This ensures the most objective assessment. An essay is a personal expression of the graduate, but the opinion of the expert also matters. One expert may find the work insignificant, while another will rate it as outstanding. If the experts agree in their assessments, this is good. In the event of a disagreement, it is important to pay attention to the degree of discrepancy in scores, which is determined by the computer system. The difference in scores should not exceed eight points. If the discrepancy exceeds this limit, a third evaluator is called in to re-evaluate the work and determine whose score is more reasonable. Ultimately, the third evaluator's score is accepted.

Significant discrepancies in scores may indicate that some evaluators are applying overly strict criteria. Perhaps they hold the opinion that only classics such as Pushkin and Tolstoy deserve maximum scores for essays. This underscores the importance of an objective approach to evaluation, which considers the individual achievements of each author rather than focusing on ideals that may be inaccessible to most. It is important to develop flexibility in evaluation criteria to ensure a fair assessment of creative works, taking into account the diversity of styles and approaches.

As a rule, evaluators strive not to be strict, but, on the contrary, to evaluate as objectively and fairly as possible. They understand that their child's future and their chances of entering university depend on these grades. This is especially important given that exams cause significant stress and anxiety in children. It is crucial that assessors demonstrate understanding and leniency in these circumstances in order to support students during this challenging period.

Assessment criteria require that the assessor evaluate the essay for the presence or absence of necessary elements in accordance with established requirements. Typically, most criteria are worth only one or two points, allowing the assessor to quickly determine whether the text contains the necessary components for which a mark is awarded. This simplifies the assessment process and makes it more objective, as the emphasis is placed on specific aspects of the work.

There is a criterion that is assessed with a maximum of six points. Assessment for this criterion is based on a clear algorithm that determines what points are awarded and what points are not awarded. However, in practice, situations arise when applying this criterion causes difficulties, and the algorithm requires refinement to improve its effectiveness.

If the difference between the assessments of two experts is less than eight points, then the arithmetic mean of these assessments is taken.

When comparing expert assessments, the highest scores for each criterion are taken into account and then summed. For example, if one expert assessed the fifth criterion as one point, and the other assigned zero, then the final score for this criterion will be one point. This approach allows for a more objective assessment of essays, taking into account the opinions of different experts.

If you have doubts about the assessment for any criterion, it is important to follow a clear algorithm of actions. First, analyze the assessment criteria to understand which aspects make you unsure. Compare your assessment with examples or standards to identify possible discrepancies. If doubts persist, it is recommended to discuss your assessment with colleagues or mentors who can provide an objective opinion. It is important not to hesitate to ask questions, as this will help clarify the situation and improve your understanding of the criteria. Additionally, regularly reviewing and self-analyzing your decisions will help you increase your confidence in your future grades.

I've learned an important rule: if I have any doubts about a grade, I award two points instead of one. Doubts signal reasons for hesitation, while obvious points raise no questions. I've heard this opinion from many experts and decided to follow this principle. Although I don't often have doubts, as we actively practice during the course of our studies. This allows us to clearly see whether an essay meets the grading criteria and deserves a certain number of points.

If you have serious doubts about assessing a particular work for a specific criterion, you have the right to consult with someone. You mentioned that assessors are unable to communicate with each other. It is important to understand that in such cases, it is recommended to consult an independent specialist or mentor who can provide an objective assessment and help clarify the situation.

Each room where the assessment is carried out has a designated person – a senior assessor. You can contact them for assistance if you have questions or problems. If the senior assessor is unable to provide the necessary assistance, they will call a more experienced specialist for advice.

Evaluating texts for literacy criteria is a straightforward task, but it is much more difficult to evaluate a commentary regarding a stated problem in the original text. This is the second criterion, which requires a score of six. Checking for ethical errors is also no less labor-intensive, as reflected in the 11th criterion. You must admit that such assessments can be highly subjective and depend on the individual perception of the assessor.

I believe the assessment criteria are objective, and graduates are strongly advised to familiarize themselves with them during the preparation process. Double-checking helps minimize the risk of subjectivity in assessment. This is an important step to achieving high results.

The overall assessment of the work is formed based on several criteria. Even if a student misunderstands the essay topic and makes errors in its disclosure, they may receive zero points for the first four criteria. However, if their text is well-written, logical, and well-written, they can still earn points for the criteria related to these aspects. One of my students, who always showed excellent results, experienced a similar situation.

If a graduate correctly understood the essay topic and expressed their thoughts expressively, but interprets a situation from classical literature in an unconventional way, this may affect their assessment. For example, if a student justifies Raskolnikov's actions and supports his motives, it is important that their argumentation be logical and persuasive. Critical interpretation of classic works can be valuable if supported by a deep analysis and understanding of the text. However, if a student's personal opinion is unfounded or contradicts the main ideas of the work, this may negatively impact their grade. It is important that students not only express their thoughts but also be able to substantiate them using the text and literary devices.

Each student has their own opinion, but there are generally accepted moral standards that are taken into account when evaluating essays. The evaluation process uses the criterion of compliance with ethical standards. If an essay is free of ethical errors, the work receives one point for this criterion. Otherwise, no point is awarded for the presence of errors. Thus, a student may lose a significant point due to a violation of ethical standards.

While working on a text, a logical problem may arise if the conclusions presented in a literary example contradict what the child has presented earlier. Logical inconsistencies can lead to a loss of points. Therefore, it is important to monitor the consistency of thoughts and arguments in the text.

As a teacher, are you completely satisfied with the current assessment criteria, or are there aspects that, in your opinion, require change?

In my opinion, the second assessment criterion, for which the maximum score is six, has become too complex. This complicates the process for both students and assessors. To achieve the maximum result, it is necessary to fulfill many requirements that are not always clear. For example, if a child finds examples and explains why they chose them, and also points out the connection with the main topic of the essay, this may not be noticed by the assessor. Ultimately, the question arises: was there really no connection, or did the assessor simply not notice it. A more transparent grading system is needed to avoid misunderstandings and improve objectivity.

Unfortunately, the fourth criterion (arguments to the problem) is now worth only one point instead of three. Previously, graduates were required to recall several literary works and provide two examples to receive the maximum score for this criterion, resorting to personal experience only in extreme cases. Now, it is sufficient to simply justify your opinion, whether based on personal experience or a literary example. If justification is present, the point for this criterion will be counted. Thus, the incentive to review literary works has been removed, as one can earn points by relying on one's own experience. Personally, I liked the essays that were assessed according to the old criteria better.