Evolution of the 6.0 Judging System

 

Introduction

When the ISU first introduced its proposed new judging system it did so because, it said, the current 6.0 system (Ordinal in the US, OBO in the ISU) could not be "fixed."  Since then the ISU has consistently misrepresented its proposed system to make it appear better than it really is, and the 6.0 system to make it appear worse than it really is.  In addition, it appears no serious effort was made by the ISU to determine how the 6.0 system might be revised to meet the goals of an improved scoring system and an improved method of evaluating skating.  This article describes several steps that could be take to revise the 6.0 system to meet the goals many have expressed over the past year as being desirable in a new scoring system.

Goals

Before discussing the details of a potential new 6.0 system, let us first consider what goals a new system might  have.  Then we will consider the ways a new 6.0 system might be constructed to meet these goals.  The new system should do the following:

  1. Reduce the impact potential bias, misconduct or block judging can have on corrupting the results.
  2. Have improved statistical accuracy.
  3. Be sufficiently simple, open and transparent in its use so that skaters and the public can easily understand the results and have confidence in the reliability and integrity of the results.
  4. Include mechanisms that can be used to enhance judges' accountability and education, without intimidating judges into marking by second-guessing the panel.
  5. Provide detailed feedback to the skaters and the public to explain why each skater placed as they did, and to give skaters the detailed information they need to improve their performances.
  6. Decrease the amount of subjectivity in the process and increase the objectivity.
  7. Mark on an absolute scale to the greatest extent humanly possible, to allow reflective judging without the the need to "save" marks.
  8. Eliminate place-switching in the intermediate results.
  9. Use a marking scale that is consistent and directly comparable from event to event and season to season.
  10. Improve the criteria by which Ice Dancing is judged so results are less stagnant, and to provide a clear understanding of what skating skills are to be rewarded and their relative difficulty. 
  11. Include sufficient statistical information to establish meaningful records, and rigorous comparisons of performances.
  12. Incorporate sufficient technology to meet the other goals, but not to the extent that the system is priced out of use by smaller federations or in smaller competitions.  To this end, a viable manual method of using the system should be available so that all competitions at all skill levels can be judged according to the same criteria.

This list is a lot to ask of a scoring system, and one might well wonder if the rickety old 6.0 system can evolve to achieve these goals.  The conclusion here is that it can.  We begin with a minimum set of simple innovations that can meet the above stated goals in whole or in part.  Following those we will discuss additional options that the more intrepid might be willing to adopt.

Proposals for a New 6.0 Evaluation System

Starting with the current 6.0 system, the following are proposed revisions to that system.  These proposals roughly follow in the order of the above goals.  We summarize them first in the following list and then describe them in more detail below.

  1. Increase the panel size and limit representation on a geographic basis. 
  2. Base the order of finish on the actual judges marks.
  3. Use all the judges' marks to calculate the results.  Display/publish the judges' marks without concealing their identities and eliminate the process of reading the judges' marks.
  4. Award marks on a 6.0 scale to the nearest 0.05 points.
  5. Retain the program requirements for the short programs, compulsory dances, and original dance.  Revise the well balanced program requirements for the free skating and free dance to consist of a specific number of elements of each type appropriate for each event.
  6. Rigorously specify the relative values (credit given) for the elements of skating  and provide the judges specific guidelines for what range of scores should be given for various program contents and quality of execution.
  7. Assign marks on an absolute scale, according to standardized program content guidelines.
  8. For championship events and other large events which can support the costs, annotate performances using a computer touch-screen display.  Following an event, provide the skaters all the annotated information and included it in the official protocol.
  9. For those events where an automated annotation system is not economically feasible, use an improved manual approach to annotating the programs.  Provide the skaters copies of the manual record sheets.
  10. Determine the order of finish in each part of an event using the total mark each skater received.  Determine the order of finish for an entire event using the weighted sum of the total marks from the individual parts of the event.

Proposal 1:  Judging panels would consist of a minimum of 9 and a maximum of 15 judges.  The ISU member federations would be divided into several geographical regions and representation on a panel of judges would be distributed as evenly as possible among the regions.  No more than two judges from any one region would be allowed on a panel.

Reason:  Increasing the size of the panel increases the statistical accuracy and reliability of the results.  It reduces the impact one biased judge can have on the results.  Control of block representation on the panels prevents one block of judges from controlling the results.

Proposal 2:  Results would be determined using the actual judges marks, awarded on an absolute scale, without resorting to ordinals or win-loss comparison.  The marks for the panel as a whole would be combined by taking the median of the judges' marks.

Reason:  Use of the actual judges marks on an absolute scale provides marks that are: easy to understand; directly comparable from skater to skater and competition to competition; free from place switching in intermediate results.  In an absolute point system use of the median is the most effective method of filtering out bias, misconduct and error in the marks.  Other methods of combining the judges marks could be considered (such as trimmed means), but are not recommended as they are mathematically inferior in their ability to filter bias from the computation process.

Proposal 3:  Marks from all the judges on a panel would be used to calculate the results; i.e.; random selection of a subset of the panel would not be used.  All the marks would be displayed/published and the judges' identities would not be concealed.  The process of reading the judges' marks would be eliminated to reduce the time delay between performances.

Reason:  The greatest statistical accuracy and reliability is obtained by using as many judges as possible, thus the entire panel should be used.  Further, random selection of judges leads to random determination of the places a significant fraction of the time.  Providing all the marks and the judges' identities is needed to have maximum credibility with the skaters and the public that nothing is being swept under the rug.  It is sufficient to simply post the marks on the scoreboard.  The spectators can read them for themselves.  This time consuming process between performances is unfair to the skaters yet to skate who have to wait for unnecessarily long periods of time, boring to the public who spend up to 2/3 of their time watching nothing happening on the ice, and is an unnecessary burden on the judges by requiring them to stay focused and concentrate on the performances for a much longer period of time than is needed or advisable for consistent judgment throughout an event.

Proposal 4:  Marks would be awarded on a 6.0 scale to the nearest 0.05 points.

Reason:  One reason the judges have to save marks is that there aren't enough of them in a large event with up to 30 competitors.  At a World Championships with 30 competitors, for example, where presentation marks might rightly be earned between a score of 3.0 and 6.0, only 30 marks are available in the current 6.0 system; one for each skater.  This leads to the saving of marks and the judges sometimes running out of unique marks to give -- preventing truly reflective judging.  Doubling the number of available marks for both marks would eliminate the need to save marks, would eliminate the problem of running out of marks, and would allow the judges to use completely reflective scoring; i.e., it allows the judges to mark "in the moment" if they wish to.  A perfect score of 6.0 would be retained for the Presentation Mark.

Proposal 5:  The program requirements for the short programs, compulsory dances, and original dance would not change.  The well balanced program requirements for the free skating and free dance would be revised to consist of a specific number of elements of each type appropriate for each event. These requirements would be chosen to correspond to the typical programs skaters currently perform in competition.

Reason: This insures each competitor has the same opportunity to obtain their Technical Merit Mark on a level playing field, and insuring the focus in not just on quantity, but also on quality.  It also allows for easier, more consistent and more reliable evaluation of the programs by the judges.  How this Would Work 

Proposal 6:  The relative values (credit given) for the elements of skating would be rigorously specified.  Judges would be given specific guidelines for what range of scores should be given for various program contents and quality of execution.  This schedule of values would be structured to span the entire range of skating skill from No-Test through the Senior level.  The maximum Technical Merit Mark of 6.0 would be retained.  The element values would be specified, however, such that a Technical Merit mark of 6.0 would be at the maximum limit human capability and never be reached.  For example, in singles, this would consist of the most difficult footwork, most difficult spins, and 11 quad and triple jumps (including all quads through quad Axel) all perfectly executed.

Reason:  This eliminates as much subjectivity in the Technical Merit mark as does the proposed ISU system.  It makes it clear to the skaters what scores they can expect for a given program content and what increased score they will receive if they increase the difficulty of their content.  It would make it clear to the skaters and the public why a skater received the marks they did.  Combined with  marking to a 0.05 value, it further eliminates the need to save marks and allows the judges to mark the Technical Merit Mark in the moment, on an absolute scale.  It makes the Technical Merit Marks quantitatively comparable from one skater to the next and one competition to the next. It allows for meaningful records and statistics since all competitions would now be judged on the same absolute scale.  For Ice Dancing it would provide a degree of objectivity and rigor that is needed to produce greater movement in dance events, and would make it clear why one dance is scored higher than another.  How this Would Work

Proposal 7:  The judges would be re-educated to assign each mark on an absolute scale, according to standardized guidelines.  Judges, however, would still be allowed (encouraged) to consider all the marks they gave during an event to insure consistency in their marking scale.

Reason:  The goal of a new evaluation system is to mark on an absolute, consistent point scale using reflective judging.  This will require a change in mindset for most judges and will require a re-education effort (not unlike the effort that occurred when moves in the field were introduced).  Judges would no longer be required to think in terms of determining a specific placement, and would be permitted to judge completely "in the moment" if they prefer, so long as their marks remained in accordance with the absolute scale specified by standardized program guidelines.  Nevertheless, in order to insure the marking scale is indeed absolute, self consistent and accurate, there is great value in using previously assigned scores as "markers".  Humans can make absolute judgments to no better than about 15% in the absence of anchoring markers.  Reference to the scores given throughout an event provide markers needed to increase the accuracy of the marking system to the level needed for results to be meaningful.

Proposal 8:  For championship events and other large events which can support the costs, the judges would annotate performances using a computer touch-screen display that incorporates element replay.  This display would be pre-programmed with the planned element content.  Judges would enter any changes to the element content, and their quality assessments.  Judges would have access to their individual previous assessments and their marks through this system.  Following an event, all of the annotated information (content and assessments) from all the judges would be provided to the skaters and be included in the official protocol.

Reason:  This will provide the skaters and the public the information they need to understand why each skater received the marks they did from each judge.  Skaters can use this information to guide them in making revisions to their programs and in making decisions about their training.  The public can use this information to understand exactly why the skaters were placed they way they were, and to draw their own conclusions on the reasonableness of the results. This information would also be used to assess the performance of the judges in the spirit of improving their knowledge and skill, and also to help guard against misconduct by identifying the most egregious anomalies.  This technology would only be used as an annotation system, not as a method of calculating scores.

Proposal 9:  For those events where an automated annotation system is not economically feasible, an improved manual method of annotating the programs would be used.  Judges would be provided a worksheet for each skater with the planned content listed.  Judges would note any changes to the planned content and record their quality assessment.  Copies of these manual record sheets would be provided to the skaters.  In order to insure consistency and readability a standardized set of abbreviations and notations would be used.

Reason:  In the absence of an automated system, the skaters still need the feedback from the judges' assessments, and they need this at all levels of competition.  This approach, while it makes for a larger than previous amount of paperwork, should prove no more distracting than note taking on the current worksheets.  Since the judges will be required to mark on an absolute scale, excessive shuffling of papers will not be required if the process is implemented in a prudent way.  While many judges use unique note taking methods and may be resistant to being required to use a standardized notation system, it is in the best interest of the skaters and the sport they do so.  Use of a manual annotation system that corresponds one-to-one with the automated system insures that all events at all skill levels are judged in the same way, regardless of the technology employed.  In lower level competitions where asking the skaters/coaches to provide program content (and requiring the accountants to prepare all the paperwork) might prove a burden, the worksheet for each skater would have the content blank and the judges would enter the content on each worksheet.  It is also anticipated that even when an automated annotation system is used at major competitions, the trial judges would use manual forms.  Consistency between the automated process and the manual process would enhance the educational value of the trial judging experience if properly used.

Proposal 10:  The order of finish in each part of an event would be determined by the total mark each skater received (the sum of the marks for Technical Merit and Presentation).  The order of finish for an entire event would be based on the weighted sum of the total marks from the individual parts of the event.  The weights would remain the same as they are today.

Reason:  When marking on an absolute scale, the best indicator of placement is the total mark.  Converting to ordinals throws away the direct indication of margin of victory.  If the judges mark on a consistent and absolute scale, ordinalization of the marks is unnecessary.  Other proposals presented here work to insure the judges will mark on as absolute a scale as needed (a perfectly absolute scale is probably not achievable) for the results to be consistent and meaningful.  By marking both the short program and long program on the same marking scale and then applying the weighting factor, both the total marks for the two programs and mark differences from place to place for the two programs keep the same relative weights.  Skaters have the chance to dig themselves out of a hole in the short program, and at the same time a huge victory in the short program does not necessarily guarantee overall victory.  Under the proposed ISU system if I beat you by 5 points in the short program and you beat me by 4.99 points in the long program, I win even though the short program is considered only half as important as the long program.  In this proposal, my margin of victory in the short program is reduced to 2.5 points using the 0.5 weighting factor, reflecting the lesser value of the short program.  Your 4.99 point victory in the long program now takes the day due to the long program being valued twice as much as the short program.  To use an American football analogy, one point in the short program is a field goal.  One point in the long program is a touchdown.  The proposed ISU system counts the total number of field goals and touchdown to determine the winner.  In this proposal we retain the fact that a field goal is worth half as mush as a touchdown.  How this Would Work

From the discussion of these proposals is should be clear that a revised 6.0 system of evaluation can meet all of the proposed goals, in whole or in part.  Most of these proposals require only minor rule changes, and some do not require any rule changes at all and can be implemented directly by the technical committees.

More Adventurous Proposals

The above proposals are a minimum set of proposals that achieves nearly all aspects of the initially stated goals, without totally redefining skating as we know it, without adding excessive unnecessary complexity and without introducing a new set of problems as does the proposed ISU system.  For the more adventurous, though, we might push the envelope a little farther with the following additional proposals.

   11.  Construct the Technical Merit mark from 3 sub-marks that specify the contribution
          from each type of element in a program.
   12.  Limit Short Program deductions to true defects in elements actually completed; i.e.,
          elements that are not omitted or completely failed.
   13.  Apply Short Program deductions based on a consensus of the panel.
   14.  Construct the Presentation Mark from 2 sub-marks, that specify the contribution from
          the two main contributors to the presentation mark.
   15.  Revise the specification of the Presentation sub-marks to make the Presentation Mark more
          objective.
   16.  Apply a weighting factor of 0.9 to the Presentation Mark to balance that mark with the revised
          standard for the Technical Merit Mark.

Proposal 11:  Construct the Technical Merit mark from three sub-marks.  These sub-marks would be specified as follows:

Singles Pairs Dance
Jumps 0 - 3.0 Lifts,
Throws and Jumps 0 - 3.0
Lifts 0 - 2.0  (OD)
Lifts 0 - 2.4 (FD)
Spins 0 - 1.5 Spins and Death Spirals 0 - 1.5 Spins 0 - 1.0 (OD)
Spins and Twizzles 0 - 1.2 (FD)
Sequences 0 - 1.5 Sequences 0 - 1.5 Sequences 0 - 6.0 (CD)
Sequences 0 - 3.0 (OD)
Sequences 0 - 2.4 (FD)

The mark for Sequences includes the spiral/step sequence elements, basic skating skills and connecting elements.  In Compulsory Dance, each dance would be divided into 6-8 sequences (as in CoP), with each sequence be marked separately, and the contribution  of each sequence to the Technical Merit mark to be specified by the technical committee.

Reason:  Although a single Technical Merit mark can be used successfully to specify the value of the technical content of a program, breaking the mark down into component parts insures that the contribution of each element type to the Technical Merit mark is clear and unambiguous.  It would insure all judges are giving identical weight to each type of element, and that the skaters and public would have a clear understanding of what that weight is. It would provide the skaters and public a better understanding of why skaters place where they do in competition.  In CoP, the judges assign up to 19 marks in evaluating a performance.  This is unnecessarily complex and confusing, and impractically large for use in a manual system.  The small increase in  the number of marks proposed here has equivalent practical value, but is not so large as to preclude use in a manual system.

Proposal 12:  Limit Short Program deductions to true defects in elements actually completed; i.e., elements that are not omitted or completely failed.  If a required element is omitted or completely failed reduce the maximum possible value of the related sub-mark in proportion to the number required elements omitted or completely failed compared to the number of required elements that make up the sub-mark.  For elements that are completed only take deductions for errors that are true defects in execution.

Reduction in maximum possible sub-mark for each
omitted or completely failed element
Singles SP Pairs SP Dance OD
Jumps (3 elements)

1.0

Lifts, Throws
and Jumps (4 elements)
0.75
Lifts (2 elements)

1.0

Spins (3 elements)

0.5

Solo Spin, Pair Spin,
Death Spiral (3 elements)
0.5
Spin (1 element)

1.0

Sequences (2 elements)

0.5

Sequence (1 element)

0.5

Sequences (2 element)

1.0

Reason:  Determining a base mark for an element not seen, and then applying a deduction, is more black magic than a quantitative way of evaluating a performance.  Deductions for omissions and complete failures would be eliminated, and would be handled by a reduction in the maximum possible sub-mark for the required elements, based on the number actually completed compared to the number required.  The potential credit lost for an omission of a required element must be consistent with the maximum potential credit available for completing all the elements; however, note that if the required sequences were omitted, the score for sequences would not be zero, since that sub-mark also included aspects of the program content other than the required sequences.  An element that exceeds specified requirements would be considered an omission and an extra element.  Deductions for errors in elements not omitted or completely failed would remain in the range 0.05 through 0.4, and would continue to be specified in ISU Communications, but would be limited to errors that are true defects in execution, as compared to errors that result in an element of lesser difficulty (and thus lesser base mark) being executed.  How this Would Work

Proposal 13:  Apply deduction based on a consensus of the panel by entering the deductions separately from the base mark, then determining the deduction that is the consensus of the panel and applying that consensus deduction to all the judges' base marks.

Reason:  A large part of the variation in marks a skater receives in the Short Program is due to differences in the deductions taken among the judges.  This source of inconsistency can be eliminated by determining the deductions based on a consensus of the panel.  It is unfair to the competitors to have one skater "punished" because one judge took a deduction no one else did, or another skater "rewarded" because one judge did not take a deduction the rest of the panel did.  Applying the deductions uniformly would produce a more accurate and reliable result in the Short Program.  With the use of replay systems, the unique perspective of one judge is more likely to be wrong than right, and that source of uncertainty should be removed from the scoring system.  How this Would Work

Proposal 14:  Construct the Presentation Mark from two sub-marks that specify the two main aspects of the presentation mark.  The first of these sub-marks would include expression of the music, timing and unison (for pairs). The second of these sub-marks would include carriage & style, variation in speed, transitions between elements and use of the ice.

Reason:  Although a single Presentation mark can successfully be used to specify the value of the presentation aspects of a program, this mark can be difficult to apply and understand since it encompasses many aspects of a performance that are the fundamentally different from each other.  Breaking the mark into two groups of more closely related characteristics would make the presentation easier to judge, and make the meaning of the mark clearer to the skaters and public.  The focus of the first sub-mark is timing of the movements in relation to themselves, the music, and to a partner (for pairs and dance).  The focus of the second sub-mark is the quality and difficulty of body position and the construction of the program as presented on the ice as a seamless whole.  Omitted from the presentation criteria in this proposal are originality and composition.  These two performance characteristics are more artistic than athletic in nature, are earned more by the choreographer than the skater, and once a program is set do not change significantly from one competition to the next.  The purpose of the five program component scores in CoP is similar to this proposal; however, the use of five scores is overkill, and experience at the recent Grand Prix competitions shows that the judges are marking these five scores more or less identically, so having five only adds complexity with no practical purpose.

Proposal 15:  Revise the specifications for the Presentation sub-marks to make the Presentation Mark more objective.

Reason:  Rules for marking the current Presentation Mark (and the CoP program component scores) are purely descriptive, making it difficult to assign these marks in a quantitative way.  Similar to the establishment of standardized program content guidelines, standardized presentation guidelines would be created to specify the score appropriate for specific actions on the ice.  For example, in use of the ice, instead of subjectively saying "good use of the ice deserves more credit compared to mediocre use of the ice" an objective requirement would be established that states "these specific actions on the ice, ... constitute good use of the ice and will receive a score of ..."  As in the case of the schedule of deductions and standardized program guidelines, these presentation guidelines would be provided in a communication, while only the description of the presentation sub-marks need be specified in the rules.

Proposal 16:  Apply a weighting factor of 0.9 to the Presentation Mark to balance that mark with the revised standard for the Technical Merit Mark.

Reason:  In the proposed program content standards for the Technical merit Mark, a contemporary Men's program with up to three quad jumps would receive a Technical Merit score of 5.4.  To prevent the Presentation Mark from overwhelming the Technical Merit Mark, the weight of the presentation mark would be reduced.  A Presentation Mark of 6.0 could still be given, but its contribution to the total mark would be limited to 5.4, to be in balance with the Technical Merit Mark.

Summary

The claim made by the ISU that effective scoring system reform cannot be achieved through evolution of the 6.0 system is clearly without merit.  Several simple revisions to the 6.0 could be made to improve the method of evaluating skating, while retaining skating's essential character as a sport.  This set of proposals provides an example of how that could be done.  Variations on these proposals, no doubt, could also prove effective in achieving the goals of scoring system reform.

Due to the many mathematical flaws and arbitrary assumptions inherent in the proposed ISU system, the detailed placements under that system are nearly meaningless.  These flaws and assumptions are not present in the proposals discussed here, and thus the placements determined using a revised 6.0 system will be more believable and meaningful than those produced by the proposed ISU system.

Return to title page

Copyright 2003 by George S. Rossano