Should I use ready-made tests from the internet?

Methodologists and teachers often need to independently develop tests to assess knowledge of the curriculum. However, many do not know how to properly compile such tests. Sometimes it is easier to use a ready-made test from the Internet, but the question arises: how much can you trust its quality? We asked an expert to explain what to look for when choosing tests and how to distinguish high-quality tests from low-quality ones in accessible language.

Photo: Irina Brun's personal archive

We thank everyone, who supported us and helped us understand complex issues. Their knowledge and experience were invaluable in our journey to understanding. Thanks to their assistance, we were able to overcome difficulties and achieve our goals. We appreciate the contribution of everyone who was a part of this process.

A psychometric specialist, lecturer in the Master's program "Teaching and Assessment as a Science" at the Institute of Education of the Higher School of Economics and Director of the R&D department of the consulting company Formatta. He has deep knowledge of psychometric analysis and the assessment of educational outcomes. He participates in the development of innovative teaching and assessment methods aimed at improving the quality of education.

A high-quality test is always accompanied by a detailed description, which includes a test passport or specification. This description clearly defines what exactly is being measured. It describes in detail the property being measured, its components, the relationships between them, as well as their manifestations in behavior. The specification also specifies the tasks used for measurement, the scoring methodology, as well as information about who conducts the testing and what conclusions can be drawn based on the results. This approach ensures testing transparency and helps achieve objectivity in assessment.

To assess critical thinking, it is necessary to identify its key components, such as searching for and analyzing information, and forming arguments. Each of these components is assessed through five tasks of a specific format, with each task assigned a specific number of points. This approach allows for a more accurate measurement of critical thinking and the identification of strengths and weaknesses in test takers.

A poor test is characterized by providing only tasks, without additional information. If there is no data on what exactly is being measured, how the results should be interpreted, and under what conditions the test can be used, then such a test cannot be trusted. A high-quality test is always accompanied by clear explanations and recommendations, which enable the results to be accurately assessed and used in relevant situations.

A high-quality test always has a detailed description, known as a test specification. This description clearly defines what is being measured and how. It details the property being measured, its components, the relationships between them, and their behavioral manifestations. It also specifies the types of tasks used for assessment, scoring rules, as well as information about who is conducting the test and what conclusions can be drawn based on the results. This approach ensures transparency and reliability of testing, which is especially important for data analysis and interpretation.

To assess critical thinking, it is necessary to identify its key components: information search and analysis, and argument formation. Each of these components can be assessed using five tasks of a specific format, with an established scoring system. This approach allows for objective results and a detailed analysis of the level of critical thinking development in test takers.

An ineffective test is characterized by offering only a list of tasks without additional information. If there is no data on what exactly is being measured, how to interpret the results, and in what situations the test can be used, then such tests should not be trusted. Quality testing should provide complete information, allowing users to understand how to use the results to make informed decisions.

There are many rules regarding test development. In particular, the 2013 test development guidelines contain 23 rules that must be taken into account for multiple-choice items. For other types of items, the number of rules is even greater. However, the main rule of a quality test is the need to create a large amount of documentation during the development process. This includes evidence and proof of the validity of the test results obtained. This approach ensures the reliability and validity of the results, which is critical for assessing the knowledge and skills of participants.

There are many rules regarding test development. The 2013 test design guide presents 23 rules that must be considered for multiple-choice items alone. For other types of items, the number of rules is even greater. However, the key rule to remember when developing a good test is the need to create extensive documentation during the process. This includes evidence and proof of the validity of the test results obtained. Proper documentation is essential for ensuring test quality and confirming its reliability and validity.

The process can be divided into three key stages, each with its own characteristics and rules. Each stage requires a careful approach and adherence to certain principles to achieve the best results.

Specification development is a key stage in the measurement process. It is important to break down what we intend to measure into the smallest possible elements. Each of these elements must be assigned a specific task, including its number, which will be used to measure them. This approach ensures precision and clarity during the development process and avoids misunderstandings associated with the interpretation of requirements.

Task creation is an important process that requires a careful approach and attention to detail. Correct tasks help effectively teach and develop skills, as well as assess student knowledge. To successfully create tasks, it is necessary to consider the training level and interests of the target audience.

It is important that tasks are clearly formulated and consistent with the stated educational goals. Using a variety of formats, such as tests, essays, practical assignments, or project work, makes the learning process more engaging and productive.

Furthermore, it is important to provide feedback so that students can understand their strengths and weaknesses. Designing assignments also includes regularly analyzing their effectiveness and adjusting them based on the results. This not only improves the quality of education but also makes the learning process more adaptive and personalized.

Therefore, competently designing assignments is a key element in the educational process, contributing to the achievement of high results.

If possible, it is better to avoid negatives in the formulation of questions - otherwise, a person may answer incorrectly because they did not read the question carefully, and not because they do not know the answer. For example, the option "In such-and-such a situation, will you turn right?" is better than "In such-and-such a situation, will you not turn right?"
One assignment is one idea that we want to test. You should not combine two or more ideas in one assignment. Because if a person makes a mistake in such a task, it will be impossible to determine what exactly he does not know or what skill he is lacking.
If the test has answer options, they should be made as similar as possible so that the correct one does not stand out visually.

Still: the film "Mean Girls" / Paramount Pictures

Validation studies are designed to confirm that your test actually measures the stated characteristics. These studies depend on the purpose for which the test is being developed. However, the main principle of high-quality testing is to conduct two pilot studies. This will ensure the accuracy and reliability of the results obtained, as well as the test's compliance with the stated criteria. Proper test validation is a key aspect in its development and use, as it ensures confidence in the data obtained and their interpretation.

qualitative piloting - interviews to check how respondents understand the wording of the tasks;
quantitative - statistical indicators of test performance are tested on 30-100 respondents, such as reliability, difficulty, and others.

The process can be roughly divided into three key stages, each with its own unique rules and features. These stages help structure the work and ensure its effectiveness. Correct adherence to established rules at each stage contributes to the successful completion of the entire process.

Developing a specification is a key step in the measurement process. It is important to break down what we plan to measure into as many components as possible. Each of these components must be associated with a specific task, indicating its number. This will ensure clarity and accuracy in the measurement process, as well as simplify the analysis and interpretation of the obtained data. Careful development of the specification contributes to more effective project management and helps to avoid errors in the future.

Developing tasks is an important aspect of the educational process. Correctly formulated tasks help students better absorb the material and develop critical thinking. It is important to consider the level of knowledge of students, their interests, and learning goals.

To effectively develop tasks, it is necessary to determine what skills and knowledge need to be developed. Tasks can range from simple questions to complex projects, which allows you to cover various aspects of the learning process.

In addition, it is important that the tasks are clearly structured and understandable. Using clear language and specific wording helps students better understand requirements and expectations.

Also, it is necessary to consider a variety of assignment formats, such as written assignments, presentations, or group projects, which promotes deeper student engagement.

Designing assignments requires a creative approach and constant analysis of their effectiveness, which ultimately contributes to improving the quality of education.

Whenever possible, it's best to avoid negatives when formulating questions—otherwise, a person might answer incorrectly because they didn't read the question carefully, not because they don't know the answer. For example, "In such-and-such a situation, would you turn right?" is better than "In such-and-such a situation, would you not turn right?"
One task is one idea we want to test. Don't combine two or more ideas in one task. Because if a person makes a mistake in such a task, it will be impossible to determine what exactly he does not know or what skill he is lacking.
If the test has answer options, they should be made as similar as possible so that the correct one does not stand out visually.

Still: the film "Mean Girls" / Paramount Pictures

Validation studies aim to confirm that your test actually measures the indicators it claims to measure. These studies depend on the purpose for which the test is being developed. As a general rule, two validation studies are recommended for quality testing. This will ensure the reliability and accuracy of the results obtained and will increase the credibility of the test among users and stakeholders. Proper test validation is a key step in the development and implementation process, ensuring compliance with standards and requirements.

qualitative testing—interviews to verify respondents' understanding of the task wording;
quantitative testing—test statistical performance indicators, such as reliability, difficulty, and others, are tested on 30–100 respondents.

The books "Theory and Practice of Designing Pedagogical Tests" by Marina Chelyshkova, "The Form of Test Items" by Vadim Avanesov, and "Practical Testology" by Alexander Shmelev remain important sources of knowledge in the field of pedagogical testing. Although some recommendations have lost their relevance due to changes in psychometric science and technological advances, many of them remain relevant and can be useful for specialists in this field. These works provide valuable ideas and methodological approaches to test development, making them useful for educators and researchers. Downloading and studying the Russian Personnel Testing Standard before developing a test will be extremely helpful. This standard, although intended for personnel assessment, offers universal principles of test development and psychometrics that are applicable to various fields, such as online education, psychology, and HR. Familiarization with these principles will help create more effective and reliable tests in any field. The books "Theory and Practice of Pedagogical Test Design" by Marina Chelyshkova, "Test Item Form" by Vadim Avanesov, and "Practical Testology" by Alexander Shmelev remain important sources for testing specialists. Although some recommendations have become outdated due to developments in psychometrics and technology, the basic principles and approaches presented in these publications remain relevant. They offer valuable advice on developing effective pedagogical tests and constructing test items, making them useful for educators and researchers seeking to improve the quality of knowledge assessment.

Downloading and studying the Russian Personnel Testing Standard before developing a test can be extremely helpful. Although this standard is designed for personnel assessment, its test development and psychometric principles are universal and applicable to various fields, such as online education, psychology, and HR. Using these principles will allow us to create higher quality and more effective tests that meet modern requirements.

Still: the film "Suits" / Netflix

The test adequacy checklist is a set of criteria for assessing the quality and reliability of testing. Key elements of such a checklist include:

— Clarity of test objectives: the test should have clear and measurable objectives that define its purpose.
— Content appropriateness: test questions and tasks should be relevant to the objectives and content of the material being studied.
— Variety of formats: the test should include a variety of task types to accommodate different learning styles and skills.
— Level of difficulty: tasks should be adapted to the target audience’s level of knowledge and experience to ensure fair assessment.
— Clear instructions: test takers should receive clear directions on how to complete the tasks.
— Objectivity of assessment: assessment criteria should be transparent and consistent for all participants.
— Validity Test: The test should be periodically reviewed and updated to ensure its relevance and compliance with modern standards. By following this checklist, you can create tests that will accurately assess participants' knowledge and skills, ensuring a high degree of validity of results. Is the test's data sheet or specifications publicly available? If not, request this information from the copyright holder. If the copyright holder says there are only tasks, run away from that test and never look back. Such a test is useless at best, and harmful at worst. Instructions and questions should be clearly written. This is a recent version of the test, created taking into account the latest scientific data. For example, the Wechsler Intelligence Test, the seventh version, was released in 2020. The Russian-language test, which is publicly available, is a translation of the third version and is already outdated.

For tests of psychological characteristics, a good sign is a description in the results of the differences between people who receive different scores. This means not only stating whether the measured trait is expressed strongly or weakly, but also describing how this manifests itself in the cognitive, behavioral, and emotional spheres.

The basic information about the test indicates how the validation was conducted, that is, proof of the validity of the conclusions. Non-specialists are unlikely to understand this, but if such data is provided, it is already good.

Unfortunately, in the Russian-language segment, there is a shortage of high-quality instruments for measuring psychological characteristics. Many cognitive and clinical psychologists continue to use outdated and unreliable tests created in the 1970s and 1980s. As an example of high-quality validation studies, we can consider the work of specialists from the Center for Psychometrics and Measurement in Education at the HSE Institute of Education. Their articles provide detailed theoretical frameworks, the structure of the social-emotional skills questionnaires, and a reliability analysis of the data, confirming the high reliability of their findings.

Still: the film "Suits" / Netflix

The test adequacy checklist includes several key aspects that help evaluate its quality and effectiveness. It is important to ensure that the test meets the stated goals and objectives. It must be understandable and accessible to the target audience. It is necessary to check that the test is structured, with a logical, sequential transition from one question to the next. It is also important that the test tasks are varied and reflect all the necessary aspects of the material being studied. Pay attention to the clear wording of the questions and the absence of ambiguity. Check that the test provides an adequate assessment of knowledge and skills, and meets the established criteria. Finally, the test should be tested on a small group to identify possible shortcomings and improve its quality before widespread use.

Is the test datasheet or specifications publicly available? If not, request this information from the copyright holder. If the copyright holder says there are only tasks, run away from this test in the opposite direction and never look back. Such a test is useless at best, and harmful at worst.
Instructions and questions must be written clearly.
This is a recent version of the test, created taking into account the latest scientific data. For example, the Wechsler Intelligence Test (WIT) released its seventh version in 2020. The Russian-language version, which is publicly available, is a translation of the third version and is already outdated.
For psychological trait tests, a good sign is a description in the results of how people with different scores differ. This means not only stating whether the measured trait is expressed strongly or weakly, but also describing how this manifests itself in the cognitive, behavioral, and emotional spheres.
The basic information about the test indicates how validation was conducted, i.e., proof of the validity of the conclusions. Non-specialists are unlikely to understand this, but if such data is provided, it is already good.

There is a shortage of high-quality instruments for measuring psychological traits in Russian. Many clinical and cognitive psychologists continue to use outdated tests developed in the 1970s and 1980s, which leads to unreliable results. However, it's worth noting the examples of successful validation studies conducted by colleagues from the Center for Psychometrics and Measurement in Education at the HSE Institute of Education. Their articles present theoretical frameworks, detail the scales that comprise the questionnaires, and analyze the reliability of the data obtained. These studies are an important contribution to the development of psychometric tools and can serve as an example for other specialists seeking to improve the quality of their research.

Incorrect management decisions can significantly impact the success of an online course. To improve it, it is important to identify problem areas. An effective test will help identify the topics in which students most frequently make mistakes, as well as identify parts of the course that do not promote the development of essential skills. Analysis of such data will allow us to make targeted changes, which will ultimately improve the quality of instruction and increase student satisfaction.

If our tests fail to identify problems or point them out in the wrong places, this may result in us leaving the weakest lectures and assignments in place when updating the course. As a result, the course will be formally updated and resources will be expended, but the quality of the educational product will not improve. Students may continue to drop out or leave negative reviews. This emphasizes the importance of thorough diagnostics and analysis of results to truly improve educational quality and student satisfaction.

Rework the text, staying on topic. Optimize it for SEO and add additional details if necessary. Avoid using emojis and redundant characters. Do not include sections with numbers or bullet points.

Should I use ready-made tests from the internet?

Contents: