VALIDITY:
It is the degree to which a test
measures what is suppose to measure. (L. R. Gay)
- Test validity refers to the degree to
which the test actually measures what it claims to measure.
- Test validity is also the extent to
which inferences, conclusions, and decisions made on the basis of test
scores are appropriate and meaningful.
- Validity is the strength of our
conclusions, inferences or propositions.
- Validity refers to the accuracy of an
assessment -- whether or not it measures what it is supposed to measure.
- If a test is valid, it is almost always
reliable.
Measurement
of validity:
There are three ways in
which validity can be measured.
Type
of Validity
|
Definition
|
Example/Non-Example
|
Content
|
The extent to which the
content of the test matches the instructional objectives.
|
A semester or quarter exam
that only includes content covered during the last six weeks is not a valid
measure of the course's overall objectives -- it has very low content validity.
|
Criterion
|
The extent to which scores
on the test are in agreement with (concurrent validity) or predict
(predictive validity) an external criterion.
|
If the end-of-year math
tests in 4th grade correlate highly with the statewide math tests, they would
have high concurrent validity.
|
Construct
|
The extent to which an
assessment corresponds to other variables, as predicted by some rationale or
theory.
|
If you can correctly
hypothesize that ESOL students will perform differently on a reading test than
English-speaking students (because of theory), the assessment may have
construct validity.
|
In order to have confidence that a test is
valid (and therefore the inferences we make based on the test scores are
valid), all three kinds of validity evidence should be considered. So, does all this talk about validity and reliability mean
you need to conduct statistical analyses on your classroom quizzes? No, it
doesn’t. (Although you may, on occasion, want to ask one of your peers to
verify the content validity of your major assessments.) However, you
should be aware of the basic tenets of validity and reliability as you
construct your classroom assessments, and you should be able to help parents
interpret scores for the standardized exams.
Types of Validity:
There
are four types of validity commonly examined in social research.
- Conclusion
validity asks is there a relationship between
the program and the observed outcome? Or, in our example, is there a
connection between the attendance policy and the increased participation
we saw?
- Internal
Validity asks if there is a relationship between
the program and the outcome we saw, is it a causal relationship? For
example, did the attendance policy cause class participation to increase?
- Construct
validity is the hardest to understand in my
opinion. It asks if there is there a relationship between how I
operationalized my concepts in this study to the actual causal
relationship I'm trying to study/? Or in our example, did our treatment
(attendance policy) reflect the construct of attendance, and did our
measured outcome - increased class participation - reflect the construct
of participation? Overall, we are trying to generalize our conceptualized
treatment and outcomes to broader constructs of the same concepts.
- External
validity refers to our ability to generalize the
results of our study to other settings. In our example, could we
generalize our results to other classrooms?
Characteristics of Validity:
There
are following characteristics of validity commonly examined in social research.
- Content Validity:
How well the sample of test items represents the content the test is
designed to measure.
- Predictive validity:
How well predictions made by a test are confirmed by later behavior of
subjects.
- Concurrent validity:
Similar to predictive validity, but behavior is measured at same time as
test.
- Construct validity:
How well a particular test can be shown to measure a particular construct
(a theoretical construction about the nature of human behavior, such as
intelligence, anxiety, or creativity).
- Face validity:
How closely the test appears to measure what it's supposed to measure.
RELIABILITY:
Reliability
is the degree to which a test is consistently measures whatever it measure.
(L. R. Gay)
Test reliability refers to
the degree to which a test is consistent and stable in measuring what it is
intended to measure. Most simply put, a test is reliable if it is consistent
within itself and across time. Reliability is the consistency of your
measurement, or the degree to which an instrument measures the same way each
time it is used under the same condition with the same subjects. In short, it
is the repeatability of your measurement. It is the level of internal
consistency or stability of the test over time, or the ability of the test to
obtain the same score from the same student at different administrations (given
the same conditions). Reliability is usually expressed as some sort of
correlation coefficient. Values may range from .00 (low reliability) to 1.00
(perfect reliability). Reliability refers to the extent to which assessments
are consistent. Just as we enjoy having reliable cars (cars that start
every time we need them), we strive to have reliable, consistent instruments to
measure student achievement. Another way to think of reliability is to imagine
a kitchen scale. If you weigh five pounds of potatoes in the morning, and the
scale is reliable, the same scale should register five pounds for the potatoes
an hour later (unless, of course, you peeled and cooked them). Likewise,
instruments such as classroom tests and national standardized exams should be
reliable – it should not make any difference whether a student takes the
assessment in the morning or afternoon; one day or the next. Another measure of
reliability is the internal consistency of the items. For example, if you
create a quiz to measure students’ ability to solve quadratic equations, you
should be able to assume that if a student gets an item correct, he or she will
also get other, similar items correct. The following table outlines three
common reliability measures.
Type
of Reliability
|
How
to Measure
|
Stability or
Test-Retest
|
Give the same assessment
twice, separated by days, weeks, or months. Reliability is stated as the
correlation between scores at Time 1 and Time 2.
|
Alternate Form
|
Create two forms of the
same test (vary the items slightly). Reliability is stated as
correlation between scores of Test 1 and Test 2.
|
Internal
Consistency
|
Compare one half of the
test to the other half. Or, use methods such as Kuder-Richardson
Formula 20 (KR20) or Cronbach's Alpha.
|
Estimation
of Reliability:
There are two ways by which
reliability is usually estimated
- Test/Retest: Test/retest
is the more conservative method to estimate reliability. The idea is that
you should get the same score on test 1 as you do on test 2.
The
three main components to this method are as follows:
o
Implement your measurement instrument at two
separate times for each subject
o
Compute the correlation between the two
separate measurements;
o
Assume there is no change in the underlying
condition (or trait you are trying to measure) between test 1 and test 2.
- Internal Consistency: Internal
consistency estimates reliability by grouping questions in a questionnaire
that measure the same concept. For example, you could write two sets of
three questions that measure the same concept (say class participation)
and after collecting the responses, run a correlation between those two
groups of three questions to determine if your instrument is reliably
measuring that concept.
The primary difference
between test/retest and internal consistency estimates of reliability is that test/retest involves two
administrations of the measurement instrument, whereas the internal consistency
method involves only one administration of that instrument.
The Relationship of
Reliability and Validity
In order for assessments to
be sound, they must be free of bias and distortion. Reliability and validity
are two concepts that are important for defining and measuring bias and
distortion. Test validity is requisite to test
reliability. If a test is not valid, then reliability is moot. In other
words, if a test is not valid there is no point in discussing reliability
because test validity is required before reliability can be considered in any
meaningful way. Likewise, if a test is not reliable it is also not valid.
OBJECTIVITY:
Objectivity is the extent to
which the instrument is free from personal error (personal bias) that is
subjectivity on the part of the error. (C.V Good)
The objectivity of test
refers to the degree to which equally competent scores obtained the same
results. (Norman E. Gronlund)
ACCURACY:
A term used to describe the
size of the relative error. (C.V Good)
ADECUACY:
A characteristic evidenced
by its sufficient length to sample widely the behaviour it is designed to
measure. (C.V Good)
Methods of Data Recording
The assessment
techniques in this category may be used with any of the ongoing student
activities as well as with the quizzes and tests. The appropriateness of the
technique for the purpose intended should act as a guide.
Anecdotal records refer to
written descriptions of student progress that a teacher keeps on a day- to-day
basis.
A teacher may decide to keep
anecdotal records on students' ability to manipulate materials at assessment
stations, to work in a group, to work in a test-taking situation, or to
complete a project or a written report. There are situations where a teacher
will keep anecdotal comments on the development of specific skills related to
instructional objectives, on the behavior of a student, or on the attitude
expressed or demonstrated by a student. Anecdotal records are as flexible as a
teacher wishes to make them.
Observation checklists are
lists of criteria a teacher determines are important to observe in students at
a particular time. Beside each of the criteria, a notation is made as to
whether that particular criterion was observed.
Checklists can be used to
record the presence or the absence of knowledge, particular skills, learning
processes, or attitudes. They may be used to record such information in
relation to written assignments, presentations, classroom performance,
test-taking behaviors, individual or group work, fulfillment of the
requirements of a contract, self- and peer-assessment of work, or completion of
an assessment station. How a teacher wishes to use an observation checklist
depends upon the type of student progress information required.
Rating scales have the same
usage as observation checklists. The essential difference lies in what is
indicated. Observation checklists record the presence or absence of a
particular knowledge item, skill, or process. Rating scales record the degree
to which they are found or the quality of the performance.
Anecdotal
Records Description
An anecdotal
record is a written description of the observations made on students. These
records are usually collected in a specific book or folder.
Evaluation Context
The very act of recording
observations may serve to alert you to some aspect of a student's learning or
attitude that may need immediate attention; for example, an outburst caused by
frustration.
Since the anecdotal record
concentrates on describing incidents of student performance over a period of
time, the sequence of anecdotes can serve as a record of the student's
development towards long term goals such as lifelong learning, healthy
self-concept, cooperative learning, skill development, work/study habits,
knowledge attainment, and interest/attitude.
Through the regular
spotlighting of a student's performance, areas needing special attention may
emerge. Examples include communication skills and personal development. Your
anecdotal records may start to show that Billy is consistently having trouble
in expressing coherent thoughts. As a consequence, you may decide to
investigate the causes of this behavior more thoroughly.
Using Technique
to Best Advantage
Entries must be
made with appropriate frequency. They should eventually encompass all the
students, although some students may warrant more entries than others.
Anecdotal records offer you a way of recording aspects of your students'
learning that might not be identified by other techniques.
Guidelines for use
First, you write a
description of the incident in an objective way by describing what actually
happened. Then make further notes on your analysis of the situation, any
comments you want to make, and any questions you pose to yourself that may
guide further observations.
For many teachers, the time
when students are engaged in writing offers an opportunity to demonstrate that
teachers are writing, too. You can use a portion of your writing time for
recording your anecdotes. Teachers who do not have these opportunities may use
times when students are engaged in independent work. In program areas such as
physical education and home economics, there are parts of the period when
students change clothes or tidy up equipment. You might be able to use these
times for recording entries. Whichever scheme is chosen, it should offer regular
opportunities for entering observations.
Various formats have been
developed. A notebook with each entry dated offers a powerful chronological record;
although it is sometimes difficult locate a particular student. Alphabetized
notebooks, looking like large address books, are available and they permit easy
reference by student name. Alternatively, a loose leaf format may be used so
that the entries may be entered chronologically, and at the end of the year may
be reformatted by student name. One further idea: modern technology has
provided us with conveniences for recording and storing student progress data
that range from electronic student data files available on various software
programs to removable self-stick notes that can be used to record the anecdote
and then be affixed to the student record.
Example: No
example is required for the open-ended, unstructured anecdotal record. The
examples that follow are formats for anecdotal records designed to give you
ideas as to how to set up this type of data recording method. Keep in mind
these are only examples.
Using the Information for Student
Evaluation
While the entries themselves are usually not
shown to the student or the parents/guardians, they can form a valuable basis
for communication. They allow you to flesh out your year-end reports on the
more holistic dimensions of student growth.
Observation
Checklists Description
The observation
checklist is a listing of specific concepts, skills, processes, or attitudes,
the presence or absence of which you wish to record. If the observation
checklist is used relatively frequently and over time, a longitudinal profile
of a student is assembled and ultimately evaluated.
Evaluation Context
The observation checklist is most
appropriately used in situations where you wish to assess your students'
abilities, attitudes, or performance in process areas. For example, it can
assess communication skills, cooperative learning skills, extent of
participation, interest in the topic, and psychomotor skills.
Using Technique to Best Advantage
Used on a single occasion, the observation
checklist can provide formative evaluation information for the situation in
which it is used For example, to learn how effective students are when working
in groups, a checklist to observe them in a single group session can be used.
This will provide information to guide future instruction.
Observation checklists are most useful when
collected over time and used summatively or diagnostically. Once you decide to
use observation checklists in your evaluation plan, you must use them
systematically. They are misleading when used sporadically.
Guidelines for Use
Usually the observation checklist is used
during class time. Therefore, it must be simple. The most efficient way to
collect data is to record learning progress on four or five students at the
same time. If you choose to observe four students per lesson and you have 28
students, you will cover the class once every seven lessons. At the end of the
term or unit, you will have several observations on every student. If your
class is working in groups, do one group every day. If not, use your seating
plan to identify groups of students sitting in the same area. If you choose
students alphabetically, you may find that your eyes have to cover too much of
the room in order to encompass the selected students.
- Before
the unit or course begins, develop an estimate
of what would constitute appropriate learning outcomes for your students.
If you intend to use the information for making criterion-referenced
judgments, decide on what your criteria will be. You may wish to develop
minimum criteria (e.g., "six of the eight behaviors must be observed
over the course of the unit"), or you may wish to develop different
criteria levels for what would constitute excellent, satisfactory, or
unsatisfactory work. Decisions on criteria should be made before the
observation sequence begins.
- Before
every class, enter the names of the students, the
date, and the activity. During class, pay special attention to the
selected group so that you build an impression of their level of
competence or execution of the skills, processes, or attitudes you wish to
record.
Recording options: You may simply
mark an entry on the item's first appearance and leave it at that, or you may
record an item's every appearance (e.g., ). If you develop
some measure of degree to describe the item (e.g., !, ?, or X), you have
transformed your observation checklist into a rating scale. This is a
characteristic of rating scales and checklists that gives you more flexibility.
Make sure you record the date and the class on every observation checklist you
use.
- After
class, annotate the checklist sheet with any appropriate
thoughts. For example, "Fire drill interrupted the group activity -
recorded instances are therefore lower than I anticipated." Ä File
the checklist sheet with the others so that the class set is available for
evaluation at the end of the course or unit. Large envelopes are useful
here.
Example: The example checklists are designed to give you ideas as to how to
set up this type of data recording technique. Keep in mind these are only
examples.
Using the
Information for Student Evaluation
Arrange the sheets into piles according to
the student groups. Read them all over once or twice to develop a feeling for
the overall class picture. For criterion-referenced judgments, refer to the
criterion levels you made initially. For norm-referenced judgments, estimate
where each student lies relative to the others in the class and make your
judgment. If you have looked for very general or broad items, be careful not to
over interpret your data - for example, "On these aspects of the course
Kim seems to be performing a little bit more consistently than most of the
students." This may be about the level of sophistication that is possible,
depending on how you constructed the instrument. For self-referenced judgments,
all the checklists on one particular student can be studied, providing a
measure of progress over the span of the unit or course. This is one of the
most powerful uses of the checklist.
- Where
you can, start with an existing checklist and modify it according to your
needs.
- Choose
items that relate to the intended learning outcomes of the unit. If you
wish to use checklists in several courses and they have many overlapping
items, develop a master list and eliminate those items that are
inappropriate for the specific unit or course.
- Choose
items that you can observe or reasonably infer. If an item is too vague
(e.g., interest in the subject), you may not be consistent throughout the
term in your estimation and recording of it.
- Keep
the list of items manageable. Twelve is about the maximum.
- Keep
the language of the items simple and jargon-free. In that way you can use
the checklists at parent-teacher or student-teacher interviews.
Variants
Develop
checklists that detail one particular series of components. For example, a
checklist on the correct operation of a microscope may be useful in minimum
competency situations where something just has to be done correctly.
As previously
mentioned, the observation checklist shares many characteristics with the
rating scale. This is an advantage that can be a time-saver for you.
Rating Scales Description
Rating scales are measuring
instruments that allow representation of the extent to which specific concepts,
skills, processes, or attitudes exist in students and their work.
Evaluation Context
Rating scales enable the teacher to record
student performance on a wide range of skills and attitudes. They are
particularly useful in situations where the student performance can be
described along a continuum, such as participation in a debate or skill in
preparing a microscope slide.
Guidelines for Use
As the rating scale is usually used during
class time, it must be simple to use.
- Developing
the rating scale
Once you decide upon the
activity you wish to rate, break it up into its constituent parts. Make the
parts as specific as possible so as to increase the scale's reliability. For
example, instead of globally rating "performance in debates," decide
on what performance criteria you wish to observe in the student. Perhaps
"states argument," "demonstrates background preparation,"
"responds to opposition arguments relevantly" might together give a
less inferential picture of the student's performance than the rating on the
global behavior alone.
The next task is to develop
the scale points. You might use the old stand-by: "very good/good/
average/poor/very poor," or you can develop more descriptive scale points.
For the criterion mentioned above, "states argument," you could
choose to use points based upon how forceful the student was: "very
forceful/forceful/average/ diffident/very diffident."
- Before
the unit or course begins
If you intend to use the
information for making criterion-referenced judgments, decide on what your
criteria will be. You may wish to develop minimum criteria such as, "six
of the eight behaviors must be rated at the satisfactory level or higher over
the course of the unit." Or you may wish to develop different criteria
levels for what would constitute excellent, satisfactory, or unsatisfactory
work.
Enter the names of the
students, the date, and the activity. This will usually be governed by the activity
being rated. If Peter and Petra
are facing off in today's debate, then theirs are the names entered.
As you form an impression of
student behavior on each criterion, mark the point on the continuum.
Examine the individual criteria
and decide on an overall rating for each student on the total behavior being
rated. File the rating sheet with the others so that the class set is available
as a record. Large envelopes are useful here.
Example: In the first example provided, the full sheet on 'Performance in
Debates' is developed. The other examples that follow are designed to give you
ideas as to how to set up this type of data recording method. Keep in mind
these are only examples.
Two Variants
Rating scales have many variants and any book
on measurement will offer examples. Two variants are described here.
Rating scales are very useful in allowing
students to perform self-evaluation on their own work. Present the student with
a rating scale that covers the aspects of the unit or project which you wish
him or her to self-evaluate. Examples may be the amount of effort expended in
research, the amount of effort expended on initial organization, the extent to
which the student reflected on the initial organization, the amount of
reorganization, or the effort spent on writing. The student's ratings on the
five-point scale can form a useful starting-point for teacher- student
dialogue.
The number line is a variant that is
particularly useful with pre-reading students. On a long piece of paper, draw a
horizontal line and mark off five to ten intervals. On the extreme left- hand
mark, draw a sad face, at the mid-point draw a neutral face, and at the
right-hand mark, draw a happy face. Mount the number line on the wall at a
suitable height. The student then places the left palm on the sad face and, in
response to a question (such as "How much did you like that story?"),
positions the right palm accordingly. If the story was not a success, then both
hands overlap on the unhappy face. By training the students to pass by the
number line fairly quickly, you can obtain rapid feedback on the question you
pose. With experience, more sophisticated questions can be asked. Here are
examples from a unit on estimation. "When you guessed the number of peas
in the pea pod that I showed you, how sure were you of your answer?"
"Now, when you guessed the number of Smarties in the bottle, how sure were
you?"