“SGPs Are Not Test Scores” And Other Tales From Trenton

Last week, I got to attend a talk by a high level representative of the New Jersey Department of Education who explained where we are going regarding the Partnership for Assessment of Readiness for College and Careers (PARCC) assessments administered in the Spring.  Little was said that was especially new or interesting.  We heard an enthusiastic appraisal of the computer interface and the “success” of the computer administered exams.  Next steps include how the state will disseminate and interpret data when it eventually comes back with hopes that everyone will find it very useful and very granular.  A talking point expressly did not rule out using PARCC results for grade level promotion or graduation in the future, but it was not emphasized.  Time was spent lamenting what teachers have been saying about the PARCC as if they were simply misinformed about how good the examinations are and how useful the data will be.

And at one point, the DOE representative said, in response to a question, that “SGPs (student growth percentiles) are not test scores.”

Let that sink in for a minute.  “SGPs are not test scores.”

This is one of those incredible moments in time when an actually true statement is, in fact, entirely misleading.  It is absolutely true that SGPs are not raw test scores, and it would incorrect to simply say that New Jersey teachers are evaluated using test scores.  A Student Growth Percentile is a computation that compares a student to other students with similar previous year scores and predicts how much that student should “grow” as measured on an annual standardized test.  When used in teacher evaluation, the difference between a student’s anticipated growth and the actual scores, either positive or negative, are attributed to the teacher.  Proponents of manipulating test data this way believe that these measures are more “objective” than standard administrator observations of teachers because they are tied to students’ actual performance on a measure of their learning.

So, it is technically true that “SGPs are not test scores.”  In much the same way that a houses are not trees.  However, if you want to make a house and have no idea from where you will get the lumber, you won’t get very far.  In the same vein, without standardized tests to feed into their calculations, SGPs and other related growth scores used to evaluate teachers would not exist.

Of course, planning to make your SGP out of test scores the way it has been done in New Jersey might very well be a wasted exercise.  Bruce Baker of Rutgers University and Joseph Oluwole of Montclair State University discussed the many problems underlying New Jersey Student Growth Percentiles in this 2013 NJ Education Policy Forum discussion:

…since student growth percentiles make no attempt (by design) to consider other factors that contribute to student achievement growth, the measures have significant potential for omitted variables bias.  SGPs leave the interpreter of the data to naively infer (by omission) that all growth among students in the classroom of a given teacher must be associated with that teacher. Research on VAMs indicates that even subtle changes to explanatory variables in value-added models change substantively the ratings of individual.Omitting key variables can lead to bias and including them can reduce that bias.  Excluding all potential explanatory variables, as do SGPs, takes this problem to the extreme by simply ignoring the possibility of omitted variables bias while omitting a plethora of widely used explanatory variables.

The authors explain how the state’s claim that using the same starting points for students “fully accounts” for variables such as poverty is unsupported by research or methodology. Further, there are multiple potential reasons why schools’ average proficiency scores correlate to their growth percentiles, but the SGP model makes it impossible to say which is correct.

Dr. Baker revisited this topic a year later on his personal blog.  With an additional year of data, he noted that SGPs were almost as closely correlated with the poverty characteristics of a school as they were with themselves and were also as related to prior performance as they were to themselves.  So while the SGPs were relatively “reliable,” meaning that they produced consistent results over time, there is no reason to believe that they are valid, meaning that they are actually measuring what they are said to measure.  Taking the growth percentiles as a valid measure of teaching would have you  believe that the distribution of ineffective teachers in New Jersey just happens to directly concentrate into schools with high percentages of students in poverty and low overall proficiency levels on standardized tests. You would have to believe this even though SGPs were never actually designed to statistically isolate teacher input into student test scores.

So, yes — “SGPs are not test scores.”  They are just a lousy thing to do WITH test scores and to put into teachers’ evaluations and tenure decisions.

Perhaps the most frustrating aspect of this is not the even the sleight of hand explanation of SGPs and their relationship with test scores.  It is the wasted time and opportunity that could have been spent developing and implementing teacher evaluations that were aimed at support and improvement rather than at ranking and removing.  Linda Darling Hammond, writing for the Stanford Center for Opportunity Policy in Education, proposed a comprehensive system of teacher evaluation that incorporates truly thoughtful and research supported policies.  Her proposal begins the process with standards and locally designed standards-based evaluation, incorporates genuine performance assessments, builds capacity and structures to actually support fair standards-based evaluation, and provides ongoing and meaningful learning opportunities for all teachers.  Most importantly, Dr. Darling-Hammond states that evaluation should include evidence of student learning but from sources other than standardized tests, and she rejects growth measures such as SGPs and Value-Added Models because of the ever increasing research base that says they are unreliable and create poor incentives in education.  Dedicated teachers know that they are constantly generating evidence of student learning, but to date, policy makers have only shown interest in the most broadly implemented and facile demonstrations.

Taking Darling-Hammond’s vision seriously would mean admitting failure and hitting a reset button all the way back to the drawing board in New Jersey.  Trenton would need to admit that Student Growth Percentiles cannot be fairly attributed to teacher input when they were never designed to find that in the first place, and the problems with Value-Added Models in other states mean that growth measures in general should be rejected.  Further, if the state were to become serious about teachers actually demonstrating student learning in meaningful ways, the DOE would need to reject the “Student Growth Objective” (SGO) process that it has established as a second leg of the evaluation process. While the concept of the SGO sounded promising when first proposed, the state guidebook makes it an exercise in accounting mostly.  Teachers are instructed to only select objectives that are measured by data, they are told to select a level of performance demonstrating “considerable learning” with no guidance on how to make that determination via data, they are required to determine how many students could meet that level with no explanation of how to project that based on existing data, and then they are told to set an entirely arbitrary 10-15 percent range below that for partial obtainment of the objective.

From page 16 of the SGO manual:

page 16

These are not instructions to help teachers conduct meaningful self study of their teaching effectiveness.  These are instructions designed to create easy to read tables.

Teaching, teacher evaluation, and providing meaningful support for teachers to grow in an environment that is both supportive and focused on student learning is a serious endeavor.  It requires a systemic approach, real capacity, and the development of tools sensitive to and responsive to context.  It cannot be forced by incentives that distract from the most important work teachers do with students: fostering genuine curiosity and love of learning around rich content and meaningful tasks with that content.

It certainly cannot be made out of standardized test scores.


Filed under Common Core, Data, PARCC, Testing, VAMs

6 responses to ““SGPs Are Not Test Scores” And Other Tales From Trenton

  1. “Oh dear, last year too high a percentage of the teachers reached the “improvable, with help” level. We need to adjust the teacher evaluation system cut scores.”.

  2. One extra point, Dan: while Bruce’s work shows, IN THE AGGREGATE, that SGPs have appreciable reliability, there is still way more than enough noise in the measures so that they are inappropriate for high-stakes decisions based on cut scores.

    These are statistically dirty measures that, at best, should INFORM decisions, not COMPEL them.

    • Important reminder — I do have trouble imagining the current SGP/VAM proponents agreeing to downgrade their precious instruments to mere informational tools rather than to evaluative ones.

      • specialeducator

        Dan, great article you wrote. I, too, can’t imagine any of them giving up the SGP/VAM. They are way too useful. For example, this past year I was labeled a “highly effective” teacher here in New York State. My secret sauce? I worked with students in special education. In September we pre-tested all the students in the grade and my students in particular had the lowest scores. So I set each of their student learning outcomes (SLO) on the spring state test at a Level 1, the lowest possible level. Lo and behold, all my students scored a 1. I met my goal with 100% efficiency and received a highly effective ranking. Pretty cool, eh? Of course, I’m being snarky here, but that is exactly what happened. What a load of bureaucratic crap all this testing is.

  3. Pingback: Weekly Diigo Posts (weekly) | The Reading Zone

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s