Esta es la versión html del archivo http://www.ed.uiuc.edu/circe/Publications/Responsive_Eval.pdf.
G o o g l e genera automáticamente versions html de los documentos mientras explora la web.
Para vincularse a esta página o para marcarla, utilice el siguiente url: http://www.google.com/search?q=cache:m1hAZIYHEkEJ:www.ed.uiuc.edu/circe/Publications/Responsive_Eval.pdf+Cronbach+Lee+&hl=es&ie=UTF-8


Google no tiene relación con los autores de esta página ni es responsable de su contenido.
Se han resaltado estos términos de búsqueda:  cronbach  lee 

Page 1
PROGRAM EVALUATION, PARTICULARLY RESPONSIVE EVALUATION
1
Robert E. Stake
Center for Instructional Research and Curriculum Evaluation
University of Illinois
I am pleased to have this opportunity to talk about some recent developments in the
methodology of program evaluation, and about what I call "responsive evaluation".
I feel fortunate to have, not only these two days, but some seven months to think about
these things. My hosts here at the Gothenburg Institute of Educational Research have been most
hospitable, but generous also in hearing me out, pointing my head in still another way, weighing
the merit of our several notions, and offering occasionally the luxury of a passionate argument.
When Erik or Hans or Sverker or Ulf and I agree, we are struck by the fact that the world is
but one world and the problems of education are universal. When we disagree, they are quick
to suggest that the peculiar conditions of education in America have caused me to make
peculiar assumptions, and perhaps even warped my powers of reasoning. I am sure that some
of you here today will share those findings. What I have to say is not only that we in
educational research need to be doing some things we have not been doing, but that in doing
what we have been doing we are in fact part of the problem.
Our main attention will be on program evaluation. A program may be strictly or loosely
defined. It might be as large as all the teacher-training in the United States or it might be as
small as a field-trip for the pupils of one classroom. The evaluation circumstances will be these:
that someone is commissioned in some way to evaluate a program, probably an on-going
program; that he has some clients or audiences to be of assistance to—usually including the
educators responsible for the program; and that he has the responsibility for preparing
communications with these audiences.
In 1965 Lee Cronbach, then President of the American Educational Research Association,
asked me to chair a committee to prepare a set of standards for evaluation studies, perhaps
like the Standards for Educational and Psychological Tests and Manuals, compiled by John
French and Bill Michael and published in 1966 by the American Psychological Association. Lee
Cronbach, Bob Heath, Tom Hastings, Hulda Grobman, and other educational researchers have
worked with many of the U.S. curriculum reform projects, in the 1950's and early 1960's, and
had recognized the difficulty of evaluating curricula, and the great need for guidance on the
design of evaluation studies.
Our committee reported that it was too early to decide upon a particular method or set of
criteria for evaluating educational programs, that what educational researchers needed was a
period of field work and discussion to gain more experience in how evaluative studies could be
done. Ben Bloom, successor to Lee Cronbach in the Presidency of AERA, got the AERA to
sponsor a Monograph Series on Curriculum Evaluation for the purpose we recommended. The
seven volumes completed under AERA sponsorship are shown on the handout sheet. The series
in effect will continue under sponsorship of the UCLA Center for the Study of Evaluation,
whose director, Marv Alkin, was a guest professor here at this Institute for Educational
Research two years ago. I think this Monograph Series can take a good share of the credit, or
1
Keynote presentation at a conference on "New Trends in Evaluation" in October 1973, at the
Institute of Education at Göteborg University.

Page 2
2
blame, for the fact that by my count over 200 sessions at the 1973 AERA Annual Meeting
programs were directly related to the methods and results of program evaluation studies.
There were two primary models for program evaluation in 1965, and there are two today.
One is the informal study, perhaps a self-study, usually using information already available,
relying on the insights of professional persons and respected authorities. It is the approach of
regional accrediting associations for secondary schools and colleges in the United States, and is
exemplified by the Flexner report (1916) of medical education in the USA and by the Coleman
report (1966) of equality of educational opportunity. On the sheet you received with your
background reading materials, one entitled Prototypes of Curriculum Evaluation, I have ever-so
briefly described this and other models: this one is referred to there as the School Accreditation
Model. Most Educators are partial to this evaluation model, more so if they can specify who
the panel members or examiners are. Researchers do not like it because it relies so much on
second-hand information. But there is much good about the model.
Most researchers have preferred the other model, the pretest/post-test model, what I have
referred to on the prototype sheet as Ralph Tyler's model. It often uses prespecified statements
of behavioral objectives—such as are available from Jim Popham's Instructional Objectives
Exchange—and is nicely represented by Tyler's Eight-Year Study, Husén's International
Education Study and the National Assessment of Educational Progress. The focus of attention
with this model is primarily on student performance.
Several of us have proposed other models. In a 1963 article is Cronbach's preference to
have evaluation studies considered applied research on instruction, to learn what could be
learned in general about curriculum development, as was done on Hilda Taba's Social Studies
Curriculum Project. Mike Scriven strongly criticized Cronbach's choice in AERA Monograph No
1, stating that it was time to give consumers (purchasing agents, taxpayers, and parents)
information on how good each existing curriculum is. To this end, Kenneth Komoski established
in New York City on Educational Products Information Exchange, which has reviewed
equipment, books, and teaching aids but has to this day still not caught the buyer's eye.
Dan Stufflebeam was one who recognized that the designs preferred by researchers did not
focus on the variables that educational administrators have control over. With support from
Egon Guba, Dave Clark, Bill Gephart and others, he proposed a model for evaluation that
emphasized the particular decisions that a program manager will face. Data-gathering would
include data on Context, Input, Process and Product; but analysis would relate those things to
the immediate management of the program. Though Mike Scriven criticized this design too,
saying that it had too much bias toward the concerns and the values of the education
establishment, this Stufflebeam CIPP model was popular in the U.S. Office of Education for
several years. Gradually it fell into disfavor because it was not generating the information—or
the protection—that program sponsors and directors needed. But that occurred, I think, not
because it was a bad model, but partly because managers were unable or unwilling to examine
their own operations as part of the evaluation. Actually no evaluation model could have
succeeded. A major obstacle was a federal directive which said that no federal office could
spend its funds to evaluate its own work, that that could only be done by an office higher up.
Perhaps the best examples of evaluation reports following this approach are those done in the
Pittsburgh schools by Mal Provus and Esther Kresh. Before I describe the approach that I have
been working on—which I hope will someday challenge the two major models—I will mention
several relatively recent developments in the evaluation business.
It is recognized, particularly by Mike Scriven and Ernie House, that co-option is a problem,
that the rewards to an evaluator for producing a favorable evaluation report often greatly
outweigh the rewards for producing an unfavorable report. I do not know of any evaluators
who falsify their reports, but I do know many who consciously or unconsciously choose to

Page 3
3
emphasize the objectives of the program staff and to concentrate on the issues and variables
most likely to show where the program is successful. I often do this myself. Thus the matter of
"meta-evaluation", providing a quality-control for the evaluation activities, has become an
increasing concern.
Early in his first term of office President Nixon created a modest Experimental Schools
Program, a program of five-year funding for three carefully selected high schools, (from all those
in the whole country) and the elementary schools that feed students into them. Three more have
been chosen each year, according to their proposal to take advantage of a broad array of
knowledge and technical developments, and to show how good a good school can be. The
evaluation responsibility was designed to be allocated at three separate levels, one internal at
the local school level; one external at the local school level (i.e., in the community, attending to
the working of the local school, but not controlled by it) and a third at the national level,
synthesizing results from the local projects and evaluating the organization and effects of the
Experimental Schools Program as a whole. Many obstacles and hostilities hampered the work
of the first two evaluation teams. And work at the third level—according to Egon Guba who
did a feasibility study—was seen to be so likely to fail that it probably should be carried no
further.
Mike Scriven has made several suggestions for meta-evaluation, one most widely circulated
based on abstinence, called "Goal-free evaluation." Sixten Marklund has jokingly called it
"Aimless evaluation." But it is a serious notion, not to ignore all idea of goals, but to refrain
completely from any personal discussion of goals with the program sponsors or staff. The
evaluator perhaps with the help of colleagues and consultants, then is expected to recognize
manifest goals and accomplishments of the program as he works it in the field. Again, with the
concern for the consumer of education, Scriven has argued that what is intended is not
important, that the problem is a failure if its results are so subtle that they do not penetrate the
awareness of an alert evaluator. Personally I fault Scriven for expecting us evaluators to be as
sensitive, rational, and alert as his designs for evaluation require. I sometimes think that Mike
Scriven designs evaluation studies that perhaps only Mike Scriven is capable of carrying out.
Another interesting development is the use of adversarial procedures in obtaining evidence
of program quality and especially in presenting it to decision makers. Tom Owens, Murray
Levine, and Marilyn Kourilsky have taken the initiative here. They have drawn upon the work
of legal theorists who claim that truth emerges when opposing forces submit their evidence to
cross-examination directly before the eyes of judges and juries. Graig Gjerde, Terry Denny and I
tried something like this in our TCITY REPORT. You have a copy of it in the conference reading
materials you received several weeks ago. If you have that orange-colored document with you
you might turn to the very last pages, pages 26 and 27. On page 26 you find a summary of the
most positive claims that might reasonably be made for the Institute we were evaluating. On
page 27 is a summary of the most damaging charges that might reasonably be made. It was
important to us to leave the issue unresolved, to let the reader decide which claim to accept, if
any. But we would have served the reader better if we had each written a follow-up statement
to challenge the other's claims. At any rate, this is an example of using an adversary technique
in an evaluation study.
Now in the next 45 minutes or so I want to concentrate on the approach for evaluating
educational programs presently advocated by Malcolm Parlett of the University of Edinburgh,
Barry MacDonald of the University of East Anglia, Lou Smith of Washington University of St.
Louis, Bob Rippey of the University of Connecticut, and myself. You have had an opportunity
to read an excellent statement by Malcolm Parlett and David Hamilton. Like they did, I want
to emphasize the settings where learning occurs, teaching transactions, judgment data, holistic
reporting, and giving assistance to educators. I should not suggest that they endorse all I will
say today, but their writings for the most part are harmonious with mine.

Page 4
4
Let me start with a basic definition, one that I got from Mike Scriven. Evaluation is an
OBSERVED VALUE compared to some STANDARD. It is a simple ratio but this numerator is
not simple. In program evaluation it pertains to the whole constellation of values held for the
program. And the denominator is not simple, for it pertains to the complex of expectations and
criteria that different people have for such a program.
The basic task for an evaluator is made barely tolerable by the fact that he does not have to
solve this equation in some numerical way or to obtain a descriptive summary grade, but needs
merely to make a comprehensive statement of what the program is observed to be, with useful
references to the satisfaction and dissatisfaction that appropriately selected people feel toward
it. Any particular client may want more than this; but this satisfies the minimum concept, I
think, of an evaluation study.
If you look carefully at the TCITY REPORT you will find no direct expression of this
formula, but it is in fact the initial idea that guided us. The form of presentation we used was
chosen to convey a message about the Twin City Institute to our readers in Minneapolis and St.
Paul, rather than to be a literal manifestation of our theory of evaluation.
Our theory of evaluation emphasizes the distinction between a preordinate approach and a
responsive approach. In the recent past the major distinction being made by methodologists is
that between what Scriven called formative and summative evaluation. He gave attention to
the difference between developing and already-developed programs, and implicitly to
evaluation for a local audience of a program in specific setting as contrasted to evaluation for
many audiences of a potentially generalizable program. These are important distinctions, but I
find it even more important to distinguish between preordinate evaluation studies and
responsive evaluation studies.
I have made the point that there are many different ways to evaluate educational programs.
No one way is the right way. Some highly recommended evaluation procedures do not yield a
full description, nor a view of the merit and shortcoming of the program being evaluated. Some
procedures ignore pervasive questions that should be raised whenever educational programs are
evaluated:
-
Do all students benefit or only a special few?
-
Does the program adapt to instructors with unusual qualifications?
-
Are opportunities for aesthetic experience realized?
Some evaluation procedures are insensitive to the uniqueness of the local conditions. Some are
insensitive to the quality of the learning climate provided. Each way of evaluating leaves some
things de-emphasized.
I prefer to work with evaluation designs that perform a service. I expect the evaluation
study to be useful to specific persons. An evaluation probably will not be useful if the
evaluator does not know the interests and language of his audiences. During an evaluation
study, a substantial amount of time may be spent learning about the information needs of the
persons for whom the evaluation is being done. The evaluator should have a good sense of
whom he is working for and their concerns.
Responsive Evaluation
To be of service and to emphasize evaluation issues that are important for each particular
program, I recommend the responsive evaluation approach. It is an approach that sacrifices

Page 5
5
some precision in measurement, hopefully to increase the usefulness of the findings to persons in
and around the program. Many evaluation plans are more "preordinate," emphasizing
(1) statement of goals, (2) use of objective tests, (3) standards held by program personnel, and
(4) research-type reports. Responsive education is less reliant on formal communication, more
reliant on natural communication.
Responsive evaluation is an alternative, an old alternative. It is evaluation based on what
people do naturally to evaluate things: they observe and react. The approach is not new. But
it has been avoided in planning documents and institutional regulations because, I believe, it is
subjective, poorly suited to formal contracts, and a little too likely to raise the more
embarrassing questions. I think we can overcome the worst aspects of subjectivity, at least.
Subjectivity can be reduced by replication and operational definition of ambiguous terms, even
while we are relying heavily on the insights of personal observation.
An educational evaluation is responsive evaluation (1) if it orients more directly to program
activities than to program intents, (2) if it responds to audience requirements for information,
and (3) if the different value-perspectives of the people at hand are referred to in reporting the
success and failure of the program. In these three separate ways an evaluation plan can be
responsive.
To do a responsive evaluation, the evaluator of course does many things. He makes a plan
of observations and negotiations. He arranges for various persons to observe the program.
With their help he prepares brief narratives, portrayals, product displays, graphs, etc. He finds
out what is of value to his audiences. He gathers expressions of worth from various individuals
whose points of view differ. Of course, he checks the quality of his records. He gets program
personnel to react to the accuracy of his portrayals. He gets authority figures to react to the
importance of various findings. He gets audience members to react to the relevance of his
findings. He does much of this informally, iterating, and keeping a record of action and
reaction. He chooses media accessible to his audiences to increase the likelihood and fidelity of
communication. He might prepare a final written report; he might not—depending on what he
and his clients have agreed on.
Purposes and Criteria
Many of you will agree that the book edited by E. F. Lindquist, Educational Measurement,
has been the bible for us who have specialized in educational measurement. Published in 1951
it contained no materials on program evaluation. The second edition, edited by Bob Thorndike,
has a chapter on program evaluation. Unfortunately, the authors of this chapter, Alex Astin
and Bob Panos, chose to emphasize but one of the many purposes of evaluation studies. They
said:
"The principal purpose of evaluation is to produce information that can guide
decisions concerning the adoption or modification of an educational program."
People expect evaluation to accomplish many different purposes:
-
to document events
-
to record student change
-
to detect institutional vitality
-
to place the blame for trouble
-
to aid administrative decision making
-
to facilitate corrective action
-
to increase our understanding of teaching and learning.

Page 6
6
Each of these purposes is related directly or indirectly to the values of a program, and may be a
legitimate purpose for a particular evaluation study. It is very important to realize that each
purpose needs separate data, all the purposes cannot be served with a single collection of data.
Only a few questions can be given prime attention. We should not let Astin and Panos decide
what questions to attend to, or Tyler, or Stake. Each evaluator, in each situation, has to decide
what to attend to. The evaluator has to decide.
On what basis will he choose the prime questions? Will he rely on his preconceptions? Or
on the formal plans and objectives of the program? Or on actual program activities? Or on the
reactions of participants? It is at this choosing that an evaluator himself is tested.
Most evaluators can be faulted for over-reliance on preconceived notions of success. I
advise the evaluator to give careful attention to the reasons the evaluation was commissioned,
then to pay attention to what is happening in the program, then to choose the value questions
and criteria. He should not fail to discover the best and worst of program happenings. He
should not let a list of objectives or an early choice of data-gathering instruments draw
attention away from the things that most concern the people involved.
Many of my fellow evaluators are committed to the idea that good education results in
measurable outcomes: student performance, mastery, ability, attitude. But I believe it is not
always best to think of the instrumental value of education as a basis for evaluating it. The
"payoff" may be diffuse, long delayed; or it may be ever beyond the scrutiny of evaluators. In
art education, for example, it is sometimes the purpose of the program staff or parent to
provide artistic experiences—and training—for the intrinsic value alone. "We do these things
because they are good things to do" says a ballet teacher. Some science professors speak
similarly about the experiential value of reconstructing certain classical experiments. The
evaluator or his observers should note whether or not those learning experiences were well
arranged. They should find out what appropriately selected people think are the "costs" and
"benefits" of these experiences in the dance studio or biology laboratory. The evaluator should
not presume that only measurable outcomes testify to the worth of the program.
Sometimes it will be important for the evaluator to do his best to measure student outcomes,
other times not. I believe that there are few "critical" data in any study, just as there are few
"critical" components in any learning experience. The learner is capable of using many
pathways, many tasks, to gain his measure of skill and aesthetic "benefit". The evaluator can
take different pathways to reveal program benefit. Tests and other data-gathering should not
be seen as essential; neither should they be automatically ruled out. The choice of these
instruments in responsive evaluation should be made as a result of observing the program in
action and of discovering the purposes important to the various groups having an interest in the
program.
Responsive evaluations require planning and structure; but they rely little on formal
statements and abstract representations, e.g., flow charts, test scores. Statements of objectives,
hypotheses, test batteries, teaching syllabi are, of course, given primary attention if they are
primary components of the instructional program. Then they are treated not as the basis for the
evaluation plan but as components of the instructional plan. These components are to be
evaluated just as other components are. The proper amount of structure for responsive
evaluation depends on the program and persons involved.
Substantive Structure
Instead of objectives, or hypotheses as "advanced organizers" for an evaluation study, I
prefer issues. I think the word "issues" better reflects a sense of complexity, immediacy, and
valuing. After getting acquainted with a program, partly by talking with students, parents,

Page 7
7
taxpayers, program sponsors, and program staff, the evaluator acknowledges certain issues or
problems or potential problems. These issues are a structure for continuing discussions with
clients, staff, and audiences. These issues are a structure for the data-gathering plan. The
systematic observations to be made, the interviews and tests to be given, if any, should be those
that contribute to understanding or resolving the issues identified.
In evaluating TCITY, Graig Gjerde and I become aware of such issue-questions as:
-
Is the admissions policy satisfactory?
-
Are some teachers too "permissive"?
-
Why do so few students stay for the afternoon?
-
Is opportunity for training younger teachers well used?
-
Is this Institute a "lighthouse" for regular school curriculum innovation?
The importance of such questions varies during the evaluation period. Issues which are
identified early as being important tend to be given too much attention in a preordinate data
plan, and issues identified toward the end are likely to be ignored. Responsive-evaluation
procedures allow the evaluator to respond to emerging issues as well as to preconceived issues.
The evaluator usually needs more structure than a set of questions to help him decide "what
data to gather". To help the evaluator conceptualize his "shopping list", I once wrote a paper
entitled "The countenance of Educational Evaluation". It contained the matrix, the 13
information categories, shown in this presentation on the screen. You may notice that my
categories are not very different from those called for in the models of Dan Stufflebeam and Mal
Provus.
For different evaluation purposes there will be different emphases on one side of the matrix
or the other: descriptive data and judgemental data. And, similarly, there will be different
emphases on antecedent, transaction, and outcome information. The "countenance" article also
emphasized the use of multiple and even contradicting sources of information.

Page 8
8

Page 9
9

Page 10
10
It also pointed out the often ignored question about the match-up between intended
instruction and observed instruction; and the even more elusive question about the strength of
the contingency of observed outcomes upon observed transactions, under the particular
conditions observed. I think these "countenance" ideas continue to be good ones for planning
the content of the evaluation study.
I like to think of all of these data as observations: intents, standards, judgements, and
statements of rationale are observed data too. Maybe it was a mistake to label just the second
column "Observations". Thoreau said:
Could a greater miracle take place than for us to look through each other's eyes
for an instant.
Human observers are the best instruments we have for many evaluation issues. Performance
data and preference data can be psychometrically scaled when objectively quantified data are
called for. The important matter for the evaluator is to get his information in sufficient amount
from numerous independent and credible sources so that it effectively represents the perceived
status of the program, however complex.
Functional Structure
"Which data" is one thing but "how to do the evaluation" is another. My responsive-
evaluation plan allocates a large expenditure of evaluation resources to observing the program.
The plan is not divided into phases because observation and feedback continue to be the
important functions from the first week through the last. I have identified twelve recurring
events. On the screen here I show them as if on the face of a clock. I know some of you would
remind me that a clock moves clockwise, so I hurry to say that this clock moves clockwise and
counter-clockwise and cross-clockwise. In other words, any event can follow any event.
Furthermore, many events occur simultaneously, and the evaluator returns to each event many
times before the evaluation ends.
For example, take twelve o'clock. The evaluator will discuss many things on many
occasions with the program staff and with people who are representative of his audiences. He
will want to check his ideas of program scope, activities, purposes, and issues against theirs.
He will want to show them his representations (e.g., sketches, displays, portrayals,
photographs, tapes) of value questions, activities, curricular content, and student products.
Reactions to these representations will help him learn how to communicate in this setting. He
should provide useful information. He should not pander to desires for only favorable (or only
unfavorable) information, nor should he suppose that only the concerns of evaluators and
external authorities are worthy of discussion. (Of course, these admonitions are appropriate
for responsive evaluation and preordinate evaluation alike.)
This behavior of the responsive evaluator is very different from the behavior of the
preordinate evaluator. Here on the screen now is my estimate as to how the two evaluators
would typically spend their time.
Preordinate
Responsive
Identifying issues, goals
10%
10%
Preparing instruments
30%
15%
Observing the program
5%
30%
Administering tests, etc.
10%
----

Page 11
11
Gathering judgments
----
15%
Learning client needs, etc.
----
5%
Processing formal data
25%
5%
Preparing informal reports
----
10%
Preparing formal reports
20%
10%
I believe the preordinate evaluator conceptualizes himself as a stimulus, seldom as a
response. He does his best to generate standardized stimuli, such as behavioral objective
statements, test items, or questionnaire items. The responses that he evokes are what he
collects as the substance or his evaluation report.
The responsive evaluator considers the principal stimuli to be those naturally occurring in
the program, including responses of students and the subsequent dialogues. At first his job is to
record these, learning both of happenings and values. For additional information he assumes a
more interventionist role. And with his clients and audience he assumes a still more active role,
stimulating their thought (we hope) and adding to their experience with his reports.
Philosopher David Hawkins responded to the idea of reversing S - R roles in this way:
...I like the observation that one is reversing the S and R of it. In an experiment
one puts the system in a prepared state, and then observes the behavior of it.
Preparation is what psychologists call "stimulus", ...In naturalistic investigation
one does not prepare the system, but looks for patterns, structures, significant
events, as they appear under conditions not controlled or modified by the
investigator, who is himself now a system of interest. He is a resonator, a
respondent. He must be in such an initial state that (a) his responses contain
important information about the complex of stimuli he is responding to, and
(b) they must be maximally decodable by his intended audience.
In the next section of this paper I will talk about maximally decodable reports. Let me
conclude these two sections on structure by saying that the evaluator should not rely only on his
own powers of observation, judgment and responding. He should enlist a platoon of students,
teachers, community leaders, curriculum specialists, etc.,—his choice depending on the issues to
be studied and the audiences to be served. The importance of their information, and the
reliability of it, will increase as the number and variety of observers increase.
Portrayal and Holistic Communication
Maximally decodable reports require a technology of reporting that we educational
measurements people have lacked. We have tried to be impersonal, theoretical, generalizable.
We have sought the parsimonious explanation. We have not accepted the responsibility for
writing in a way that is maximally comprehensible to practicing educators and others concerned
about education. According to R. F. Rhyne:
There is a great and growing need for the kind of powers of communication that
help a person gain, vicariously, a feeling for the natures of fields too extensive
and diverse to be directly experienced.
Prose and its archetype, the mathematical equation, do not suffice. They offer
more specificity within a sharply limited region of discourse than is safe, since
the clearly explicit can be so easily mistaken for truth, and the difference can be
large when context is slighted.

Page 12
12
We need this power of communication, this opportunity for vicarious experience, in our
attempts to solve educational problems.
One of the principal reasons for backing away from the preordinate approach to evaluation
is to improve communication with audiences. The conventional style of research-reporting is a
"clearly explicit" way of communication. In a typical research project the report is limited by
the project design. A small number of variables are identified and relationships among them are
sought. Individuals are observed, found to differ, and distributions of scores are displayed.
Covariations of various kinds are analyzed and interpreted. From a report of such analytic
inquiry it is very hard, often impossible, for a reader to know "what the program was like". If he
is supposed to learn "what the program was like", the evaluation report should be different
from the conventional research report.
As a part of my advocacy of the responsive approach I have urged my fellow evaluators to
respond to what I believe are the natural ways in which people assimilate information and
arrive at understanding. Direct personal experience is an efficient, comprehensive, and
satisfying way of creating understanding, but is a way not usually available to our evaluation-
report audiences. The best substitute for direct experience probably is vicarious
experience—increasingly better when the evaluator uses "attending" and "conceptualizing" styles
similar to those which members of the audience use. Such styles are not likely to be those of the
specialist in measurement or theoretically minded social scientist. Vicarious experience often
will be conceptualized in terms of persons, places, and events.
We need a reporting procedure for facilitating vicarious experience. And it is available.
Among the better evangelists, anthropologists, and dramatists are those who have developed
the art of story-telling. We need to portray complexity. We need to convey holistic impression,
the mood, even the mystery of the experience. The program staff or people in the community
may be "uncertain". The audiences should feel that uncertainty. More ambiguity rather than
less may be needed in our reports. Oversimplification obfuscates. Ianesco
16
said,
As our knowledge becomes separated from life, our culture no longer contains
ourselves (or only an insignificant part of ourselves), for it forms a "social"
context into which we are not integrated.
So the problem becomes that of bringing our life back into contact with our
culture, making it a living culture once again. To achieve this, we shall first have
to kill "the respect for what is written down in black and white..." to break up
our language so that it can be put together again in order to re-establish contact
with "the absolute", or as I should prefer to say, with "multiple reality"; it is
imperative to "push human beings again towards seeing themselves as they really
are". (p. 298).
Some evaluation reports should reveal the "multiple reality" of an educational experience.
The responsive evaluator will often use portrayals. Some will be short, featuring perhaps a
five-minute "script", a log, or scrapbook. A longer portrayal may require several media:
Narratives, maps and graphs, exhibits, taped conversations, photographs, even audience role-
playing. Which ingredients best convey the sense of the program to a particular audience? The
ingredients are determined by the structure chosen by the evaluator.
Suppose that a junior-high-school art program is to be evaluated. For portrayal
of at least one issue, "how the program affects every student", the students might

Page 13
13
be thought of as being in two groups: those taking at least one fine-arts course
and those taking none. (The purpose here is description, not comparison).
A random sample of ten students from each group might be selected and twenty
small case studies developed. The prose description of what each does in
classes of various kinds (including any involvement with the arts in school) might
be supplemented with such things as (1) excerpts from taped interviews with the
youngster, his friends, his teachers, and his parents; (2) art products (or
photographs, news clippings, etc., of same) made by him in or out of class;
(3) charts of his use of leisure time; and (4) test scores of his attitudes toward
the arts. A display (for each student) might be set up in the gymnasium which
could be examined reasonably thoroughly in 10-20 minutes.
Other materials, including the plan, program, and staffing for the school, could
be provided. Careful attention would be directed toward finding out how the
description of these individual youngsters reveals what the school and other
sources of art experience are providing in the way of art education.
It will sometimes be the case that reporting on the quality of education will require a "two-
stage" communication. Some audiences will not be able to take part in such a vicarious
experience as that arranged in the example above. A surrogate audience may be selected. The
evaluator will present his portrayals to them; then he will question them about the apparent
activity, accomplishments, issues, strengths and shortcomings of the program. He will report
their reactions, along with a more conventional description of the program, to the true
audiences.

Page 14
14
These twenty displays could be examined by people specially invited to review
and respond to them. The reviewers might
17
might be students, teachers, art
curriculum specialists, and patrons of the arts. They might also visit regular
school activities, but most attention would be to the displays. These reviewers
should be asked to answer such questions as "Based on these case studies, is the
school doing its share of providing good quality art experience for all the young
people?" and "Is there too much emphasis on disciplined creative performance
and not enough on sharing the arts in ways that suit each student's own tastes?"
Their response to these portrayals and questions would be a major part of the
evaluation report.
The portrayal will usually feature descriptions of persons. The evaluator will find that case
studies of several students may more interestingly and faithfully represent the educational
program than a few measurements on all of the students.
18
The promise of gain is two-fold:
the readers will comprehend the total program, and some of the important complexity of the
program will be preserved. The several students usually cannot be considered a satisfactory
representation of the many—a sampling error is present. The protests about the sampling error
will be loud; but the size of the error may be small, and it will often be a satisfactory price to
pay for the improvement in communication.
There will continue to be many research inquiries needing social survey technology and exact
specification of objectives. The work of John Tukey, Torsten Husén, Ralph Tyler, Ben Bloom,
and James Popham will continue to serve as a model for such studies.
Often the best strategy will be to select achievement tests, performance tests, or observation
checklists to provide evidence that prespecified goals were or were not achieved. The
investigator should remember that such a preordinate approach depends on a capability to
state the important purposes of education and a capability to discern the accomplishment of
those purposes, and those capabilities sometimes are not at our command. The preordinate
approach usually is not sensitive to ongoing changes in program purpose, nor to unique ways in
which students benefit from contact with teachers and other learners, nor to dissimilar
viewpoints that people have as to what is good and bad.
Elliot Eisner nicely summarized these insensitivities in AERA Monograph 3. He advocated
consideration of expressive objectives—toward outcomes that are idiosyncratic for each learner
and that are conceptualized and evaluated after the instructional experience; after a product,
an awareness, or a feeling has become manifest, at a time when the teacher and learner can
reflect upon what has occurred. Eisner implied that sometimes it would be preferable to
evaluate the quality of the opportunity to learn—the "intrinsic" merit of the experience rather
than the more elusive "payoff" to use Scriven's terms.
In my own writing on evaluation I have been influenced by Eisner and Scriven and others
who have been dissatisfied with contemporary testing. We see too little good measurement of
complex achievements, development of personal styles and sensitivities. I have argued that
few, if any, specific learning steps are truly essential for subsequent success in any of life's
endeavors. I have argued that students, teachers, and other purposively selected observers
exercise the most relevant critical judgements, whether or not their criteria are in any way
explicit. I have argued also that the alleviation of instructional problems is most likely to be
accomplished by the people most directly experiencing the problem, with aid and comfort
perhaps (but not with specific solutions or replacement programs) from consultants or external
authorities. I use these arguments as assumptions for what I call the responsive evaluation
approach.

Page 15
15
Utility and Legitimacy
The task of evaluating an educational program might be said to be impossible if it were
necessary to express verbally its purposes or accomplishments. Fortunately, it is not necessary
to be explicit about aim, scope, or probable cause in order to indicate worth. Explication will
usually make the evaluation more useful; but it also increases the danger of misstatement of
aim, scope, and probable cause.
To layman and professional alike, evaluation means that someone will report on the
program's merits and shortcomings. The evaluator reports that a program is "coherent",
"stimulating", "parochial", and "costly". These descriptive terms are also value-judgement terms.
An evaluation has occurred. The validity of these judgements may be strong or weak; their
utility may be great or little. But the evaluation was not at all dependent on a careful
specification of the program's goals, activities or accomplishments. In planning and carrying out
an evaluation study, the evaluator must decide how far to go beyond the bare bones ingredients:
values and standards. Many times he will want to examine goals. Many times he will want to
provide a portrayal from which audiences may for their own value judgements.
The purposes of the audiences are all important. What would they like to be able to do
with the evaluation of the program? Chances are they do not have any plans for using it. They
may doubt that the evaluation study will be of use to them. But charts and products and
narratives and portrayals do affect people. With these devices persons become better aware of
the program, develop a feeling for its vital forces, a sense of its disappointments and potential
troubles. They may be better prepared to act on issues such as a change of enrollment or a
reallocation of resources. They may be better able to protect the program.
Different styles of evaluation will serve different purposes. A highly subjective evaluation
may be useful but not be seen as legitimate. Highly specific language, behavioral tasks, and
performance scores are considered by some to be more legitimate. In America, however, there is
seldom a greater legitimacy, than the endorsement of large numbers of audience-significant
people. The evaluator may need to discover what legitimacies his audiences (and their
audiences) honor. Responsive evaluation includes such inquiry.
Responsive evaluation will be particularly useful during formative evaluation when the staff
needs help in monitoring the program, when no one is sure what problems will arise. It will be
particularly useful in summative evaluation when audiences want an understanding of a
program's activities, its strengths and shortcomings and when the evaluator feels that it is his
responsibility to provide a vicarious experience.
Preordinate evaluation should be preferred to responsive evaluation when it is important to
know if certain goals have been reached, if certain promises have been kept, and when
predetermined hypotheses or issues are to be investigated. With greater focus and opportunity
for preparation, preordinate measurements made can be expected to be more objective and
reliable.
It is wrong to suppose that either a strict preordinate design or responsive design can be
fixed upon an educational program to evaluate it. As the program moves in unique and
unexpected ways, the evaluation efforts should be adapted to them, drawing from stability and
prior experience where possible, stretching to new issues and challenges as needed.