Search This Blog

Monday, November 28, 2011

Tools And Techniques Of Measurement And Evaluation


VALIDITY:
It is the degree to which a test measures what is suppose to measure. (L. R. Gay)
  • Test validity refers to the degree to which the test actually measures what it claims to measure.
  • Test validity is also the extent to which inferences, conclusions, and decisions made on the basis of test scores are appropriate and meaningful.
  • Validity is the strength of our conclusions, inferences or propositions.
  • Validity refers to the accuracy of an assessment -- whether or not it measures what it is supposed to measure.
  • If a test is valid, it is almost always reliable.
Measurement of validity:
There are three ways in which validity can be measured.
Type of Validity
Definition
Example/Non-Example
Content
The extent to which the content of the test matches the instructional objectives.
A semester or quarter exam that only includes content covered during the last six weeks is not a valid measure of the course's overall objectives -- it has very low content validity.
Criterion
The extent to which scores on the test are in agreement with (concurrent validity) or predict (predictive validity) an external criterion.
If the end-of-year math tests in 4th grade correlate highly with the statewide math tests, they would have high concurrent validity.
Construct
The extent to which an assessment corresponds to other variables, as predicted by some rationale or theory.
If you can correctly hypothesize that ESOL students will perform differently on a reading test than English-speaking students (because of theory), the assessment may have construct validity.
In order to have confidence that a test is valid (and therefore the inferences we make based on the test scores are valid), all three kinds of validity evidence should be considered. So, does all this talk about validity and reliability mean you need to conduct statistical analyses on your classroom quizzes? No, it doesn’t. (Although you may, on occasion, want to ask one of your peers to verify the content validity of your major assessments.) However, you should be aware of the basic tenets of validity and reliability as you construct your classroom assessments, and you should be able to help parents interpret scores for the standardized exams.
Types of Validity:
There are four types of validity commonly examined in social research.
  1. Conclusion validity asks is there a relationship between the program and the observed outcome? Or, in our example, is there a connection between the attendance policy and the increased participation we saw?
  2. Internal Validity asks if there is a relationship between the program and the outcome we saw, is it a causal relationship? For example, did the attendance policy cause class participation to increase?
  3. Construct validity is the hardest to understand in my opinion. It asks if there is there a relationship between how I operationalized my concepts in this study to the actual causal relationship I'm trying to study/? Or in our example, did our treatment (attendance policy) reflect the construct of attendance, and did our measured outcome - increased class participation - reflect the construct of participation? Overall, we are trying to generalize our conceptualized treatment and outcomes to broader constructs of the same concepts.
  4. External validity refers to our ability to generalize the results of our study to other settings. In our example, could we generalize our results to other classrooms?
Characteristics of Validity:
There are following characteristics of validity commonly examined in social research.
  1. Content Validity: How well the sample of test items represents the content the test is designed to measure.
  2. Predictive validity: How well predictions made by a test are confirmed by later behavior of subjects.
  3. Concurrent validity: Similar to predictive validity, but behavior is measured at same time as test.
  4. Construct validity: How well a particular test can be shown to measure a particular construct (a theoretical construction about the nature of human behavior, such as intelligence, anxiety, or creativity).
  5. Face validity: How closely the test appears to measure what it's supposed to measure.

RELIABILITY:
Reliability is the degree to which a test is consistently measures whatever it measure.
(L. R. Gay)
Test reliability refers to the degree to which a test is consistent and stable in measuring what it is intended to measure. Most simply put, a test is reliable if it is consistent within itself and across time. Reliability is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects. In short, it is the repeatability of your measurement. It is the level of internal consistency or stability of the test over time, or the ability of the test to obtain the same score from the same student at different administrations (given the same conditions). Reliability is usually expressed as some sort of correlation coefficient. Values may range from .00 (low reliability) to 1.00 (perfect reliability). Reliability refers to the extent to which assessments are consistent. Just as we enjoy having reliable cars (cars that start every time we need them), we strive to have reliable, consistent instruments to measure student achievement. Another way to think of reliability is to imagine a kitchen scale. If you weigh five pounds of potatoes in the morning, and the scale is reliable, the same scale should register five pounds for the potatoes an hour later (unless, of course, you peeled and cooked them). Likewise, instruments such as classroom tests and national standardized exams should be reliable – it should not make any difference whether a student takes the assessment in the morning or afternoon; one day or the next. Another measure of reliability is the internal consistency of the items. For example, if you create a quiz to measure students’ ability to solve quadratic equations, you should be able to assume that if a student gets an item correct, he or she will also get other, similar items correct.  The following table outlines three common reliability measures.
Type of Reliability
How to Measure
Stability or Test-Retest
Give the same assessment twice, separated by days, weeks, or months. Reliability is stated as the correlation between scores at Time 1 and Time 2.
Alternate Form
Create two forms of the same test (vary the items slightly).  Reliability is stated as correlation between scores of Test 1 and Test 2.
Internal  Consistency
Compare one half of the test to the other half.  Or, use methods such as Kuder-Richardson Formula 20 (KR20) or Cronbach's Alpha.   

Estimation of Reliability:
There are two ways by which reliability is usually estimated
  1. Test/Retest: Test/retest is the more conservative method to estimate reliability. The idea is that you should get the same score on test 1 as you do on test 2.
The three main components to this method are as follows:
o    Implement your measurement instrument at two separate times for each subject
o    Compute the correlation between the two separate measurements;
o    Assume there is no change in the underlying condition (or trait you are trying to measure) between test 1 and test 2.
  1. Internal Consistency: Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept. For example, you could write two sets of three questions that measure the same concept (say class participation) and after collecting the responses, run a correlation between those two groups of three questions to determine if your instrument is reliably measuring that concept.
The primary difference between test/retest and internal consistency estimates of reliability is that test/retest involves two administrations of the measurement instrument, whereas the internal consistency method involves only one administration of that instrument.
The Relationship of Reliability and Validity
In order for assessments to be sound, they must be free of bias and distortion. Reliability and validity are two concepts that are important for defining and measuring bias and distortion. Test validity is requisite to test reliability. If a test is not valid, then reliability is moot. In other words, if a test is not valid there is no point in discussing reliability because test validity is required before reliability can be considered in any meaningful way. Likewise, if a test is not reliable it is also not valid.
OBJECTIVITY:
Objectivity is the extent to which the instrument is free from personal error (personal bias) that is subjectivity on the part of the error. (C.V Good)
The objectivity of test refers to the degree to which equally competent scores obtained the same results. (Norman E. Gronlund)
ACCURACY:
A term used to describe the size of the relative error.                                      (C.V Good)
ADECUACY:
A characteristic evidenced by its sufficient length to sample widely the behaviour it is designed to measure.                                                                                         (C.V Good)


Methods of Data Recording
The assessment techniques in this category may be used with any of the ongoing student activities as well as with the quizzes and tests. The appropriateness of the technique for the purpose intended should act as a guide.
Anecdotal records refer to written descriptions of student progress that a teacher keeps on a day- to-day basis.
A teacher may decide to keep anecdotal records on students' ability to manipulate materials at assessment stations, to work in a group, to work in a test-taking situation, or to complete a project or a written report. There are situations where a teacher will keep anecdotal comments on the development of specific skills related to instructional objectives, on the behavior of a student, or on the attitude expressed or demonstrated by a student. Anecdotal records are as flexible as a teacher wishes to make them.
Observation checklists are lists of criteria a teacher determines are important to observe in students at a particular time. Beside each of the criteria, a notation is made as to whether that particular criterion was observed.
Checklists can be used to record the presence or the absence of knowledge, particular skills, learning processes, or attitudes. They may be used to record such information in relation to written assignments, presentations, classroom performance, test-taking behaviors, individual or group work, fulfillment of the requirements of a contract, self- and peer-assessment of work, or completion of an assessment station. How a teacher wishes to use an observation checklist depends upon the type of student progress information required.
  • Rating Scales
Rating scales have the same usage as observation checklists. The essential difference lies in what is indicated. Observation checklists record the presence or absence of a particular knowledge item, skill, or process. Rating scales record the degree to which they are found or the quality of the performance.
Anecdotal Records Description
An anecdotal record is a written description of the observations made on students. These records are usually collected in a specific book or folder.


Evaluation Context
  • Formative
The very act of recording observations may serve to alert you to some aspect of a student's learning or attitude that may need immediate attention; for example, an outburst caused by frustration.
  • Summative
Since the anecdotal record concentrates on describing incidents of student performance over a period of time, the sequence of anecdotes can serve as a record of the student's development towards long term goals such as lifelong learning, healthy self-concept, cooperative learning, skill development, work/study habits, knowledge attainment, and interest/attitude.
  • Diagnostic
Through the regular spotlighting of a student's performance, areas needing special attention may emerge. Examples include communication skills and personal development. Your anecdotal records may start to show that Billy is consistently having trouble in expressing coherent thoughts. As a consequence, you may decide to investigate the causes of this behavior more thoroughly.
Using Technique to Best Advantage

Entries must be made with appropriate frequency. They should eventually encompass all the students, although some students may warrant more entries than others. Anecdotal records offer you a way of recording aspects of your students' learning that might not be identified by other techniques.
Guidelines for use
  • What to write
First, you write a description of the incident in an objective way by describing what actually happened. Then make further notes on your analysis of the situation, any comments you want to make, and any questions you pose to yourself that may guide further observations.
  • When to use
For many teachers, the time when students are engaged in writing offers an opportunity to demonstrate that teachers are writing, too. You can use a portion of your writing time for recording your anecdotes. Teachers who do not have these opportunities may use times when students are engaged in independent work. In program areas such as physical education and home economics, there are parts of the period when students change clothes or tidy up equipment. You might be able to use these times for recording entries. Whichever scheme is chosen, it should offer regular opportunities for entering observations.
  • How to record
Various formats have been developed. A notebook with each entry dated offers a powerful chronological record; although it is sometimes difficult locate a particular student. Alphabetized notebooks, looking like large address books, are available and they permit easy reference by student name. Alternatively, a loose leaf format may be used so that the entries may be entered chronologically, and at the end of the year may be reformatted by student name. One further idea: modern technology has provided us with conveniences for recording and storing student progress data that range from electronic student data files available on various software programs to removable self-stick notes that can be used to record the anecdote and then be affixed to the student record.
Example:          No example is required for the open-ended, unstructured anecdotal record. The examples that follow are formats for anecdotal records designed to give you ideas as to how to set up this type of data recording method. Keep in mind these are only examples.
Using the Information for Student Evaluation
While the entries themselves are usually not shown to the student or the parents/guardians, they can form a valuable basis for communication. They allow you to flesh out your year-end reports on the more holistic dimensions of student growth.
Observation Checklists Description
The observation checklist is a listing of specific concepts, skills, processes, or attitudes, the presence or absence of which you wish to record. If the observation checklist is used relatively frequently and over time, a longitudinal profile of a student is assembled and ultimately evaluated.
Evaluation Context
The observation checklist is most appropriately used in situations where you wish to assess your students' abilities, attitudes, or performance in process areas. For example, it can assess communication skills, cooperative learning skills, extent of participation, interest in the topic, and psychomotor skills.
Using Technique to Best Advantage
Used on a single occasion, the observation checklist can provide formative evaluation information for the situation in which it is used For example, to learn how effective students are when working in groups, a checklist to observe them in a single group session can be used. This will provide information to guide future instruction.
Observation checklists are most useful when collected over time and used summatively or diagnostically. Once you decide to use observation checklists in your evaluation plan, you must use them systematically. They are misleading when used sporadically.
Guidelines for Use
Usually the observation checklist is used during class time. Therefore, it must be simple. The most efficient way to collect data is to record learning progress on four or five students at the same time. If you choose to observe four students per lesson and you have 28 students, you will cover the class once every seven lessons. At the end of the term or unit, you will have several observations on every student. If your class is working in groups, do one group every day. If not, use your seating plan to identify groups of students sitting in the same area. If you choose students alphabetically, you may find that your eyes have to cover too much of the room in order to encompass the selected students.
  • Before the unit or course begins, develop an estimate of what would constitute appropriate learning outcomes for your students. If you intend to use the information for making criterion-referenced judgments, decide on what your criteria will be. You may wish to develop minimum criteria (e.g., "six of the eight behaviors must be observed over the course of the unit"), or you may wish to develop different criteria levels for what would constitute excellent, satisfactory, or unsatisfactory work. Decisions on criteria should be made before the observation sequence begins.
  • Before every class, enter the names of the students, the date, and the activity. During class, pay special attention to the selected group so that you build an impression of their level of competence or execution of the skills, processes, or attitudes you wish to record.
Recording options: You may simply mark an entry on the item's first appearance and leave it at that, or you may record an item's every appearance (e.g., Undisplayed Graphic). If you develop some measure of degree to describe the item (e.g., !, ?, or X), you have transformed your observation checklist into a rating scale. This is a characteristic of rating scales and checklists that gives you more flexibility. Make sure you record the date and the class on every observation checklist you use.
  • After class, annotate the checklist sheet with any appropriate thoughts. For example, "Fire drill interrupted the group activity - recorded instances are therefore lower than I anticipated." Ä File the checklist sheet with the others so that the class set is available for evaluation at the end of the course or unit. Large envelopes are useful here.
Example: The example checklists are designed to give you ideas as to how to set up this type of data recording technique. Keep in mind these are only examples.
Using the Information for Student Evaluation
Arrange the sheets into piles according to the student groups. Read them all over once or twice to develop a feeling for the overall class picture. For criterion-referenced judgments, refer to the criterion levels you made initially. For norm-referenced judgments, estimate where each student lies relative to the others in the class and make your judgment. If you have looked for very general or broad items, be careful not to over interpret your data - for example, "On these aspects of the course Kim seems to be performing a little bit more consistently than most of the students." This may be about the level of sophistication that is possible, depending on how you constructed the instrument. For self-referenced judgments, all the checklists on one particular student can be studied, providing a measure of progress over the span of the unit or course. This is one of the most powerful uses of the checklist.
  • Where you can, start with an existing checklist and modify it according to your needs.
  • Choose items that relate to the intended learning outcomes of the unit. If you wish to use checklists in several courses and they have many overlapping items, develop a master list and eliminate those items that are inappropriate for the specific unit or course.
  • Choose items that you can observe or reasonably infer. If an item is too vague (e.g., interest in the subject), you may not be consistent throughout the term in your estimation and recording of it.
  • Keep the list of items manageable. Twelve is about the maximum.
  • Keep the language of the items simple and jargon-free. In that way you can use the checklists at parent-teacher or student-teacher interviews.
Variants

Develop checklists that detail one particular series of components. For example, a checklist on the correct operation of a microscope may be useful in minimum competency situations where something just has to be done correctly.
As previously mentioned, the observation checklist shares many characteristics with the rating scale. This is an advantage that can be a time-saver for you.

Rating Scales Description
Rating scales are measuring instruments that allow representation of the extent to which specific concepts, skills, processes, or attitudes exist in students and their work.
Evaluation Context
Rating scales enable the teacher to record student performance on a wide range of skills and attitudes. They are particularly useful in situations where the student performance can be described along a continuum, such as participation in a debate or skill in preparing a microscope slide.
Guidelines for Use
As the rating scale is usually used during class time, it must be simple to use.
  • Developing the rating scale
Once you decide upon the activity you wish to rate, break it up into its constituent parts. Make the parts as specific as possible so as to increase the scale's reliability. For example, instead of globally rating "performance in debates," decide on what performance criteria you wish to observe in the student. Perhaps "states argument," "demonstrates background preparation," "responds to opposition arguments relevantly" might together give a less inferential picture of the student's performance than the rating on the global behavior alone.
The next task is to develop the scale points. You might use the old stand-by: "very good/good/ average/poor/very poor," or you can develop more descriptive scale points. For the criterion mentioned above, "states argument," you could choose to use points based upon how forceful the student was: "very forceful/forceful/average/ diffident/very diffident."
  • Before the unit or course begins
If you intend to use the information for making criterion-referenced judgments, decide on what your criteria will be. You may wish to develop minimum criteria such as, "six of the eight behaviors must be rated at the satisfactory level or higher over the course of the unit." Or you may wish to develop different criteria levels for what would constitute excellent, satisfactory, or unsatisfactory work.
  • Before every class
Enter the names of the students, the date, and the activity. This will usually be governed by the activity being rated. If Peter and Petra are facing off in today's debate, then theirs are the names entered.
  • Recording
As you form an impression of student behavior on each criterion, mark the point on the continuum.
  • After class
Examine the individual criteria and decide on an overall rating for each student on the total behavior being rated. File the rating sheet with the others so that the class set is available as a record. Large envelopes are useful here.

Example: In the first example provided, the full sheet on 'Performance in Debates' is developed. The other examples that follow are designed to give you ideas as to how to set up this type of data recording method. Keep in mind these are only examples.
Two Variants
Rating scales have many variants and any book on measurement will offer examples. Two variants are described here.
  • Self-evaluation
Rating scales are very useful in allowing students to perform self-evaluation on their own work. Present the student with a rating scale that covers the aspects of the unit or project which you wish him or her to self-evaluate. Examples may be the amount of effort expended in research, the amount of effort expended on initial organization, the extent to which the student reflected on the initial organization, the amount of reorganization, or the effort spent on writing. The student's ratings on the five-point scale can form a useful starting-point for teacher- student dialogue. 
  • Number line
The number line is a variant that is particularly useful with pre-reading students. On a long piece of paper, draw a horizontal line and mark off five to ten intervals. On the extreme left- hand mark, draw a sad face, at the mid-point draw a neutral face, and at the right-hand mark, draw a happy face. Mount the number line on the wall at a suitable height. The student then places the left palm on the sad face and, in response to a question (such as "How much did you like that story?"), positions the right palm accordingly. If the story was not a success, then both hands overlap on the unhappy face. By training the students to pass by the number line fairly quickly, you can obtain rapid feedback on the question you pose. With experience, more sophisticated questions can be asked. Here are examples from a unit on estimation. "When you guessed the number of peas in the pea pod that I showed you, how sure were you of your answer?" "Now, when you guessed the number of Smarties in the bottle, how sure were you?"

No comments:

Post a Comment