Ethical Principles In Forensic Assessment by Ira Gorman
This outstanding article, written by Ira Gorman, Ph.D., discusses, among other things, the use and mis-use of many of the various 'personality' or psychological tests frequently administered in the course of evaluations.
If the tests aren't administered properly, the results are questionable at best. This fact can be used to question the credibility and competency of an evaluator who has done a biased, incomplete, or otherwise seriously deficient evaluation. If the evaluator can be shown to have performed an improper evaluation, his or her report should be thrown out by the court.
Ethical Principles In Forensic Assessment
The new Ethical Principles for Psychologists (APA, 1992) provides guidelines for those who perform evaluations (Ethical Standards 2.01- 2.10) and specifically for psychologists who do forensic assessments (Ethical Standards 7.01 -7.06). As I am asked frequently to review other psychologists' reports and testing procedures to assist attorneys with cross examination, I have had the opportunity to closely examine my colleagues' work.
This strategic position has allowed me to discover problems in assessment which can have serious effects on the results of an evaluation and can distort the report's conclusions. Since the judiciary relies on our work, it is crucial that we provide the highest level of clinical excellence.
This article will address some of the problems that I have observed. I will provide examples of questionable practices in the choice of tests and lack of attention to norms. In addition, I will discuss the importance of 'clarifiers' when using tests impressionistically, when assistants participate in the evaluation, and when there are reasons to question the validity of the assessment.
Ethical Principles: Our code of conduct stresses the importance of truthfulness and candor. This is denoted in our Ethical Principles:7.02 (b) “Whenever necessary, to avoid misleading, psychologists acknowledge the limits of their data or conclusions,” and 2.02 “Psychologists who develop, administer, score, interpret or use psychological assessment techniques, interviews, tests, or instruments do so in a manner and for purposes that are appropriate in light of the research on our evidence of the usefulness and proper application for the techniques.”
Test Selection: Ethical standard 2.02 tells us it is unethical to use tests that are inappropriate in light of research findings. Yet I frequently read reports in which psychologists use tests which have popularity but are lacking in validity and reliability.
Tests based on figure drawings such as the Draw a Person (Goodenough, 1926), The Kinetic Family Drawings (Burns and Kaufman, 1970) and the House, Tree, Person Test (Buck, 1948), rank high in popularity and continually find their way into forensic assessments. Despite our pleasure in sounding erudite when making interpretations, such instruments have consistently been shown to be lacking in reliability and in validity (Anastasi, 1988). To make matters worse, rather than using the tests as authors intended, we seem to ignore standardized testing procedures and administer the test in various ways.
An example might be illuminating. Think of the last time that you recall anyone administering the House, Tree, Person Test according to the instructions developed by Buck (reviewed by Hammer in Rabin, 1981):
“A number two pencil with eraser is employed on a few page form sheets of white paper, each page 7 X 8'/2 inches in size. Only one surface is exposed at a time to the subject. The drawing of a house is requested with the longer axis of the sheet placed horizontally before the subject, and his drawings of a tree and persons of each sex in turn are then obtained in separate sides with the longer axis the vertical way. The subject is asked to draw as good a house (and later tree and person) as possible. The subject is instructed to draw any house, erase as much as necessary, and take as much time as needed. The pencil and the pencil drawings are then taken away, crayons are substituted and a chromatic set of drawings of a house, tree, and person of each sex is obtained. The subject is allowed to use the crayons in any way, as few or as many as desired, to shade in or draw only the outline as preferred, and all questions are handled non-directively."
I do not know of even one psychologist who uses these instructions. In my experience, the psychologist only asks the test taker to draw one picture either with a pencil, crayon or marking pen on whatever paper is available. This is only the beginning of the problem with this testing instrument. After the single drawing is obtained, the psychologist then makes overreaching statements based on a 'sign' approach to interpretation.
This is not only true of the HTP, but of all drawing tests. I have seen psychologists talk about 'bonding' based on the proximity of the figures to each other in a Kinetic Family Drawing Test. This interpretation by the psychologist is not supported by any well-designed scientific research that I have read. This is a classic example of 'illusory correlation' (Chapman and Chapman, 1971). Since we associate 'bonding' with closeness of figures, if the child draws family members close together, this must by inference mean that this scientific instrument is measuring bonding. (Editor’s note: I would certainly challenge this statement. Figure - closeness does, in my opinion, reveal important information, BUT ONLY IF THE PSYCHOLOGICAL CONTEXT IS FACTORED IN. This can make figure placement seem like an unreliable measure, but much of the difficulty disappears when one takes account of the exact instructions given to the child. In other words, “bonding” varies with the psychological context in which a child finds him- or herself.)
Ziskin & Faust, (1988) in their review of figure drawings, quote Maloney and Glasser, who discuss the DAP Test: “Despite its general popularity, the DAP has been the subject of constant controversy. Reports of experimental research have yielded no evidence that the DAP has any predictive validity whatsoever (P183).” If one reads the reviews from the Mental Measurement Yearbook and follows its guidelines, these tests are best left for hypothesis-generating and not for forensic assessment.
Lack of Attention to Norms: Perhaps the awareness that “no one test is best for all” leads the psychologist to embrace the corollary, that “all tests are good for all”. This seems to be the most omnipresent problem in assessment. Examiners are not considering norms for gender, age and ethnicity.
Gender Norms: The Parenting Stress Index (PSI)) (Abidin, 1986) and the Personality Inventory for Children (PIC) (Lachar, 1982), are two examples of tests which have been normed on mothers with only limited sampling of fathers. Even though there have been some studies with fathers, parents' scores on these instruments are not equivalent. Different results are obtained on the subtests for fathers and mothers; therefore, mothers' and fathers' scores cannot be compared directly. Yet in child custody evaluations, I consistently see evaluators using these tests with fathers, and then using mothers' norms and comparing mothers and fathers. (A new set of norms for the PSI which includes fathers has been published.)
To underline the importance of restricting the administration of this test to mothers of young children many of the items only apply to mothers of infants and toddlers and are inappropriate to use with mothers of adolescents. I would question how a mother of an adolescent girl (and worse yet a father) would respond to question #8 which asks about how the child reacts to the parent's dressing and bathing her.
Since the test sounds face-valid to evaluate parenting stress, it has found its way into acceptance (for both sexes of all age children) because of nicely-placed advertising. Despite Gresham 's scathing critique (Gresham in Buros, 1989), the test seems to be increasing in its popularity.
Gresham writes: “The PSI is a poorly standardized, unreliable, and an invalid measure of stress. Users of the PSI should be aware that whatever it is that is being measured is being measured with a great deal of error. Moreover, profile interpretations of the PSI suggested in the manual, represent the worst form of 'subtest scatter' analysis and yield data that are virtually meaningless given the unreliability of the subscales. The use of the PSI in clinical settings to make important decisions for families cannot be recommended.”
The Personality Inventory for Children, is similar in format to the MMPI and also used frequently in custody studies. It is an objective personality assessment tool that was developed for use with children and adolescents ages 6-16, to provide a comprehensive and clinically relevant description of a child's behavior, affect, and cognitive status, as well as family characteristics. Since its publication ten years ago, it has received generally favorable reviews (Anastasi, 1988), although Dr. Knoff (Knoff in Buros, 1985) criticizes the test for being normed only on Minnesota children rather than nationally normed. Anastasi states “...what has been accomplished in the short period since the publication of the PIC is impressive in extent and quality, and the results are promising (p.538).” I have found the PIC is useful when one wishes to evaluate how a mother views her child, but it is inappropriate to use on fathers. The norms are based on the responses of mothers. This is expressed by Knoff, who states,“profiles of PICs completed by anyone other than a child's mother may have some degree of variability or error.”
Even Dr. Lachar recognizes the test's limitations with fathers, and recommends that fathers should be tested on this instrument only if mothers are unavailable. There is a significant lack of correspondence between parents on several of the scales with fathers generally under reporting their children's symptomology.
Age Norms: Another common error of evaluators is their failure to consider normal changes in test responses that occur with age and interpreting pathology when a more accurate interpretation would be the factor of advancing age.
The MMPI scale analysis provides a good example of this point. Scales, 1, 2, and 3 elevate with age. A psychologist who only looks at the elevations of the psychogram and not the age norms may interpret a code type with high 123 as someone who has depressive or somatic features, and at the worst may over interpret suicide potential, and yet the norms show that elevations even ten T points above the mean may only be a sign of normal aging.
In a forensic report, the reader without the benefit of all of the MMPI norms, may accept the psychologist's misdiagnosis. To only accept and report the cook-book interpretation of this 123 code type and not also report the norms for older individuals may be considered unethical. In custody cases involving grandparents, or in guardianship matters, this comment has particular relevance. In a recent case of age discrimination against the police department, I was asked to review another psychologist's report. The results of the MMPI were crucial in the disqualification of this aging applicant from being hired by the police department, in part, because of his (misapplied) MMPI findings.
Ethnic Norms: In Orange County, we are faced with ethnic diversity, yet I have seen psychologists test individuals indiscriminately without concern for whether the individual can clearly understand the items or if the test norms are appropriate.
This is most pronounced in the MMPI which depends on a clear understanding of American idioms and grammatical structure. Compensating for this by using translations can help, but does not always solve the problem. In addition, translations are not always accurate. I recently attempted to obtain a Korean version of the MMPI. After finding a psychologist who had a translation, he explained the problem of the Korean translation to me. Koreans' understanding of the double negatives result in reversing the scoring of some items. They answer yes, where a native-born individual may answer in the negative, and yet both mean the same (Lee, 1993). Because of this problem in translating the MMPI into Korean, I chose to not administer the MMPI to the individual, and was forced to use other instruments which are relatively 'culture free'.
Sometimes different interpretations of scale scores are dependent on cultural meanings. An example of this might be in the interpretation of scale 5 in an Asian person. A female who answers items on Scale 5 in a masculine manner, or a male who responds in a passive way might be considered out of harmony with his culture. They may base their answer to MMPI questions on their under-standing of the cultural ideology, which may be contradictory with Caucasian norms. One common correlate for a high 5 male is someone who has artistic and aesthetic interests. Yet an Asian who may be artistic and cultured and fits that description may interpret and respond to the items to show publicly his masculine side in accordance with the philosophy of Confucius.
Differences between blacks and whites have been studied extensively as have comparisons between Hispanics and Caucasians (Greene, 1991), and yet I rarely see any mention of differential findings of these groups in reports.
Although studies conflict, some studies (Gynther, Fowler, and Erdberg, 1971) have found blacks scoring higher on F,K, 8 and 9 on the MMPI. Hispanics also were found to score higher on F and other clinical scales (Diaz, et. al., 1984).
The importance of these ethnic differences is that the evaluator needs to be sensitive to the possibility that different scores might be a function of characteristics of an ethnic group rather than an indication of psychopathology.
Importance of 'Clarifiers': Using Psychological Tools vs Psychological Tests: Sometimes psychologists use tests despite their limitations or decide that our understanding of the individual can be best obtained by using instruments which would never pass a 'psychometric screening test'. The prototypical case in point is the Thematic Apperception Test, (TAT) (Murray, 1943) 1 have never seen a clinician score the TAT according to Murray's need-press system, and now even if it were done, this probably would be unethical according to principal 2.07--using obsolete measures.
The same is true for sentence completion instruments. The Rotter Incomplete Sentence Test (IST), and the Washington Sentence Completion Test (WSCT) have objective scoring systems (Rabin & Zlotogorski, 1981). Despite this, sentence completion tests in forensic studies have generally been used clinically.
Testing for overall adjustment on the Rotter IST or levels for ego development with the WSCT (Rabin & Zlotogorski, 1981) may not be relevant to the question which is being asked the evaluator. Yet clinically interpreting sentence completion items, a skilled psychologist may obtain valuable data.
Some clinicians use the Roberts Apperception Test For Children (McArthur & Roberts, 1982) impressionistically for children below age six, the lower limit for the formal scoring system. They rely on their clinical skills. I have no argument with this approach when a scoring system is less than adequate or fails to meet the needs of the evaluator. Since I am critical of this approach, the reader may question how can I now accept this non-actuarial departure? To be clear, a nomothetic approach is preferable when it is available and/or meets the needs of the evaluator. The clinical approach is acceptable when alternatives are less suitable. Sometimes the clinical/forensic psychologist must adjust his testing to answer the referral question, but when the examiner deviates from the orthodox and standardized format, this must be explained in his/her report. A simple statement in a footnote of the report would clarify this deviation. An example with the TAT is illustrative:
“The TAT has been used impressionistically, rather than with a standardized objective scoring system. Though research attempts have been made to standardize the TAT, lower levels of reliability and validity have been reported than with more objective tests such as the MMPI-2. However, this evaluator has administered the test approximately 2,500 times. By training and 25 years of experience, this evaluator has found this instrument useful when combined with other tests in a battery.”
In this way, the reader of the report will know the limitations of the of the instrument and can assign proper weight to the expert's interpretations. Rather than judge the test on its own merit, the evaluator's competence in interpretation will be judged.
Earlier, when I discussed using other tests with my Korean client since the MMPI was not a viable choice, I was faced with the dilemma of not testing at all. I chose a compromise and administered the TAT and the Rorschach. Since the Rorschach stimuli are relatively free of cultural bias, I was left with using American norms but not knowing how a Korean subject would respond compared to the norms. It seemed Exner only minimally addressed the cultural issues when he developed norms for the Comprehensive Scoring System. He reported one study comparing non-white subjects with whites, and found that non-whites produced larger numbers of chromatic responses, thereby effecting weighted C and the EB ratio (Exner, 1986).
To meet my ethical obligation, I added a footnote to my report stating the limitations and indicating that the results of the Rorschach may not be valid.
Proper Test Procedure: A 'clarifier' is also necessary when the results of a test may be questioned because of the test administration. A therapy client of mine who is being treated for depression stemming from stress at her work was referred by her employer to a psychiatrist for an evaluation. He administered the MMPI under atypical conditions--part before, and the remainder following a very upsetting interview. The client told me that she stopped reading questions after the interview to hastily complete the test and escape from his office. The psychiatrist hopefully will add a ‘clarifier' to his report to explain the atypical testing format, and the effect this might have upon the results. Pope and Vasquez (1991) list some of these circumstances which could effect test results: dim lighting, frequent interruptions, noisy environment, medication, and taking a test at home. To this list I would add the lack of privacy when an individual takes the test in a waiting room, and certainly when an upsetting intervention occurs in the middle of a test.
Use of an Assistant: Another application of 'clarifiers' is when the evaluator uses an assistant for part of the assessment. Each individual involved in the evaluation should be named in the report and his/her qualification delineated. Just as is required in Workers' Compensation reports (Herlick, 1990), individuals contributing to the assessment should sign the report. In this way, attorneys will know who to subpoena, and the Court will also know how much to rely on these assistants.
The supervisor is responsible for reviewing the work of those who perform any part of the assessment. I recently reviewed the entire file of a psychologist who did an Evidence Code section 730 evaluation, (the court's expert). The '730' evaluator relied on the testing of an assistant. When I reviewed the Rorschach, I was shocked. The Rorschach had only ten responses. This would of course invalidate the test and it should not be scored at all (Exner, 1989), yet the supervisor did not seem to notice and reported the results of the test.
The importance of supervision of subordinates is described in Ethics Code 1.22(b):
“Psychologists provide proper training and supervision to their employees or supervise and take reasonable steps to see that such persons perform services responsibly, and ethically.”
The report and the care the individual psychologist might place in compiling the components of that report can be unraveled by failing to review an assistant's work. A significant problem such as faulty testing may call into question the entire evaluation procedure.
Conclusions: In this commentary, I have discussed some of the problems which I have observed in my role as a rebuttal expert in forensic assessments. Unfortunately, this list has not been exhausted in this article.
A few years ago, the California Psychological Association convention theme was Excellence in Practice, and the new APA Code of Ethics is our guideline to perform excellently in our individual practices.
The preamble and the general principals are our aspiration goals to guide us to the highest ideals of psychology. Perhaps if we will apply ourselves to these goals, excellence in psychology will be the norm and not just an aspiration.
Author: Ira Gorman, Ph.D., ABPP, RCE-AD. 930 West 17th Street, Suite D Santa Ana, CA 92706 714-542-4144 (Fax) 714-542-3858 REFERENCES Abidin, R.A. (1986). Parenting Stress Index. Second Edition. Charlottesville, VA.: Pediatric Psychology Press. American Psychological Association (1992). Ethical Principles of Psychologists & Code of Conduct. American Psych. Anastasi, Anne (1988). Psychological Testing, 6th Edition, NewYork:MacMillan. Buck, R.C. (1948). The HTP Technique: A Quantitative and Qualitative Scoring Manual. Clinical Psychology Monographs, 5,1-20.
Burns, R.C. and Kaufman, S.H. (1970). Kinetic Family Drawing (KFD): An Introduction to Understanding ChildrenThrough Kinetic Drawings. New York: Brunner-Mazel.
Chapman, N.J. and Chapman, J.P. (1971). Psychology Today.
Diaz, J.O.P. Nogueras, J.A. and Draguns, J. (1984). MMPI (Spanish Translation) Puerto Rican Adolescents: Preliminary Data on Reliability and Validity Hispanic Journal of Behavior Science. 6(2), 179-189.
Exner, J.E., Jr. (1986). Rorschach: A Comprehensive System, Vol I, Second Edition, New York: Wiley & Son.
Exner, J.E., Jr. (1989). A Rorschach Workbook for the Com prehensive System., Third Ed., Rorschach Research Foundation.
Goodenough, F. (1926). Measurement of Intelligence by Drawings, New York: World Book Company
Greene, R.L. (1991). The MMPI-2/MMPI, An Interpretive Manual, Boston:Allyn and Bacon.
Gresham, F. (1989). Review of the Parenting Stress Index: In Buros (Ed.), The Tenth Mental Measurement Yearbook, Highland Park, NJ: Gryphon Press.
Gynther, M.D., Fowler, R.D., and Erdberg,P. (1971). False Positives Galore: The Application of Standard MMPI Criteria to a Rural Isolated Negro Sample. Journal of Clinical Psychology, 27, 234-237.
Hammer, E. (1981). Projective Drawings: In A.I. Rabin (Ed.), Assessment with Projective Techniques, New York: Springer Publishing Co.
Herlick, S.D. (1990). California Workers' Compensation Handbook, 10th Edition, Carlsbad, CA: Parker and Son Publication.
Knoff, H.M. (1985). Review of the Personality Inventory for Children: In Buros (Ed.) The Ninth Mental Measurement Yearbook, Highland Park, NJ: Gryphon Press.
Lachar, David (1982). Personality Inventory for Children (PIC) - A Revised Manual. Los Angeles: Western Psychological Services.
Lee, Scott (1994,. Personal Communication.
McArthur, D. and Roberts, Glen (1982). Roberts Apperception Test for Children Manual. Western Publishing Services.
Murray, H. (1943). Thematic Apperception Test Manual. Harvard University Cambridge, Mass.
Pope, K.S. and Vasquez, M.J.T. (1991). Ethics In Psychotherapy and Counseling: A Practical Guide for Psychologists. San Francisco: Jossey Bass Publishers.
Rabin, A.I. & Zlotogorski, Z. (1981). Completion Methods: Word Association, Sentence, and Story Completion: In Rabin, A.l. (Ed.), Assessment with Project Techniques. NY: Springer Publishing Company.
Ziskin, J. and Faust, D.(1988). Coping with Psychiatric and Psychological Testimony, 4th Edition. CA: Law and Psychology Press.