This project began as an attempt to develop systematic, measurable indicators of bias in written forensic mental health evaluations focused on the issue of insanity. Although forensic clinicians observed in this study did vary systematically in their report-writing behaviors on several of the indicators of interest, the data are most useful in demonstrating how and why bias is hard to ferret out. Naturalistic data was used in this project (i.e., 122 real forensic insanity reports), which in some ways is a strength. However, given the nature of bias and the problem of inferring whether a particular judgment is biased, naturalistic data also made arriving at conclusions about bias difficult. This paper describes the nature of bias – including why it is a special problem in insanity evaluations – and why it is hard to study and document. It details the efforts made in an attempt to find systematic indicators of potential bias, and how this effort was successful in part but also how and why it failed. The lessons these efforts yield for future research are described. We close with a discussion of the limitations of this study and future directions for work in this area.