IJT
Electronic Books
Friedemann Pfäfflin, Astrid Junge
Sex Reassignment. Thirty Years of International Follow-up Studies After Sex Reassignment Surgery: A Comprehensive Review, 1961-1991(Translated from German into American English by Roberta B. Jacobson and Alf B. Meier)
Content
Introduction

Methods
Follow-up Studies
(1961-1991)
Reviews
Table of Overview
Results and Discussion
References

IJT
Current Volume
Search
Linklist

Subscribers
only
book Historic Papers
Electronic Books
Printed Digest

Newsletter

Type in your E-mail address (press Enter) to get the abstracts of every new issue via E-mail.

Info
Authors´Guidelines
Subscription Info

© Copyright

Published by
Symposion Publishing


Chapter 6: Results and Discussion

Research Methods

Independence of the researcher
In the majority of the studies at least one of the authors participated in the treatment as well as in the follow-up study. Even if many works don't have explicit reference to this, this is valid for almost all works until the mid-1970s. The claim of Abramowitz (1986) that only Benjamin was independent as a follow-up examiner because he was the only one not part of the surgical team is incorrect or inadequate in its reasoning. Disregarding that many authors as well as Benjamin did not belong to surgical teams and that the indication for surgery was made by a psychiatrist (for example, Vogt, 1968), this is totally irrelevant for the question of independence. To delegate the decision as to the indication is a formal division that in single cases may be therapeutically sensible, but does not attain the required independence under methodological aspects. The same is valid vice-versa for all surgeons who are supported by an indication made by a psychiatrist but made the follow-up studies on the patients operated by them themselves (e.g., Eicher, 1984; Eicher et al., 1991).

That the follow-up examiner participated in the treatment frequently provokes the argument that the results could be falsified by the non-guidable desire to justify. Some authors tried to solve this problem by judging their findings especially critically. They thought that they could avoid distortions or, if distortions were unavoidable, distort rather in the pessimistic direction (e.g., Hastings, 1974). Other working groups formed mixed teams with those participating in the treatment and those not (e.g., Money & Brennan, 1968; Money & Primrose, 1968; Hunt & Hampson, 1980 b; McCauley & Ehrhardt, 1984; Fahrner et al., 1987; Kockott & Fahrner, 1988; Ross & Need, 1989; Pfäfflin & Junge, 1990). Principally the dissertations should be counted in this group (Wyler, 1978; Simona-Politta, 1983; Wiegand, 1984; Junge, 1987; Dudle, 1989) because the doctoral candidates, maybe with the exception of Kando (1973), as a rule, worked closely together with the participants of the treatment as well as a series of other publications which seem to stem from dissertations. A relative independence was attained by the non-participation in the follow-up study of the respective participants in the treatment (e.g., Fahrner, 1987; Junge, 1987) or that tape-recorded protocols of the katamnesis dialogues were judged independently, the results compared and the concordances analyzed statistically (e.g., Hunt & Hampson, 1980 b; Pfäfflin & Junge, 1990) or that the participating and non-participating in the treatment analyzed partial samples and that the attained conclusions were discussed (e.g., Ross & Need, 1989). Besides this there is a series of publications that declaratively were done totally independently of connections to the treatment and in which trained advisors gave their advice independently (e.g., Kuiper, 1985; Lindemalm et al., 1986; Kuiper & Cohen-Kettenis, 1988).

Maintaining the formal requirement of independence is no guarantee of objectivity. The work that referred most explicitly to this criteria (Lindemalm et al., 1986) seems to us just as prejudiced in the selection and evaluation of the referenced literature, the representation of the katamnesis and the evaluation of the results. In view of the year-long and usually complicated treatments, it seems to us very difficult to obtain exclusively independent evaluations in different time periods because the patients go through many stations in the course of their treatment in which, for example, in the context of a gender identity clinic the same questions are always asked and have no benefit from these additional expert opinions and evaluations so that the willingness to cooperate for additional scientific research is not exactly furthered. The follow-up studies by Wyler (1978) and Wiegand (1984) are exemplary for the low participation quota. The work of Täschner & Wiesbeck (1988), as well as Wiesbeck & Täschner (1989) that only consider the treatment independently from the evaluation in accordance to transsexualism laws are exemplary for how superficial the results are that are gained by such a follow-up study. From our experience, (Junge, 1987; Pfäfflin & Junge, 1990) the bigger the readiness to participate in a follow-up study, the better the personal relationship to the caregiver had been. The more the patients experience during treatment that they were taken seriously, the more they hoped from a participation that possibly unsatisfactory treatment results could be corrected. The happier the patients are and the better they get along since the end of the treatment, the less they take the trouble to participate in a follow-up study if it's not that they come due to attachment to those who have helped them previously. Because of this, the follow-up study of a researcher who is totally independent from the treatment could possibly distort already at the beginning the sample in the direction of a less favorable result.

Information Sources and Evaluation
The most obvious source of information is the patient personally, who will be questioned orally or in writing and, depending on the scope, submitted to additional examinations. Besides this, a series of other sources can be useful. Depending on the methodological aspect, the scope and accessibility of patients, in some studies different sources were used for the total sample or for partial samples.

Verbal Questioning: Even though it is to be expected that direct questioning of the patients should be self-understood, this procedure was not used in all follow-up studies, and in some studies not on all follow-ups. Some authors quoted exclusively hearsay and other indirect communication in their evaluations, either for the entire sample (e.g., Tsoi et al., 1978) or only for a partial sample of the non-reachable (e.g., Benjamin, 1964 a, b). With patients living far away, sometimes they were interviewed only by phone, by correspondence or friends and relatives were questioned without a system (e.g., Benjamin, 1964 a, b; Walser, 1968; Stürup, 1976; Wiegand, 1984) but also fellow patients (e.g., Lothstein, 1980).

The formal realization of the questionnaires differs in the studies remarkably. It spans from one-time to systematically divided over the time period of a year phone interviewing (e.g., Hunt & Hampson, 1980b) to visits at the hospital bed to evaluate the immediate post-surgical impressions (e.g., Hastings, 1974). Psychiatric interviewing combined with post-examinations in the surgical fields (e.g., Stone, 1977), single (e.g., Stürup, 1976) or repeated unstructured interviews (e.g., Hertz et al., 1961; Benjamin, 1964 a, b; Vogt, 1968; Randell, 1969; Ball, 1981) and partially structured interviews (e.g., Spengler, 1980; Kröhn et al., 1981; Lundström, 1981; Simona-Politta, 1983; Lindemalm et al., 1986; Junge, 1987) to full-structured interviews (e.g., Sörensen, 1981 a, b; Blanchard et al., 1987; Fahrner at al., 1987; Kuiper & Cohen-Kettenis, 1988; Mate-Kole et al., 1990). Correspondingly diverging was the time used that, insofar as values were given, were between a half-hour (Spengler, 1980) and a maximum of nine hours (Simona-Politta, 1983).

A group of authors made their studies during a stationary or partially stationary admittance to a psychiatric clinic (Hoenig et al., 1970 a, b, 1971). For the rest the follow-up studies were done in the corresponding treatment facilities and under special circumstances, for example, when a patient lived very far away, also in the homes (e.g., Meyer & Reter, 1979; Junge, 1987). Only in the works of Kuiper (1985), resp., Kuiper & Cohen-Kettenis (1988) is it remarked that most of the interviewing was done in the homes of the patients and only in exceptions in treatment facilities. Kando (1973) visited some of the females in his sample in their family circle or escorted them in their sub-culture.

Written Interviews: The written sampling with a standardized katamnesis questionnaire was first described by Kando (1973). In the following years this method was used more frequently (i.e, Wyler, 1968; Lothstein, 1980; Wiegand, 1984; Blanchard et al., 1985; McEwan, 1986; Fahrner et al., 1987; Pfäfflin & Junge, 1990; Eicher et al., 1991).

Standardized Testing Methods: There are four groups of psycho-diagnostic methods that were used in follow-up studies: (1) Intelligence tests (Money & Brennan, 1968; Hoenig et al., 1970 a, b; Täschner & Wiesbeck, 1988 a, b); (2) Projective testing methods (Kando, 1973; Wiegand, 1984); (3) Different personality inventories, and among these, most frequently the MMPI (Hoenig et al., 1970 a, b; Kando, 1973; Sadoughi et al., 1978; Hunt & Hampson, 1980 b; Lothstein, 1980; Kröhn et al., 1981; Fahrner et al., 1987; Junge, 1987; Kuiper & Cohen-Kettenis, 1988; Mate-Kole et al., 1990); and (4) Specific tests on gender role stereotypes (Money & Brennan, 1968; Junge, 1987; Mate-Kole et al., 1990). In some works it is mentioned that certain but mostly not specified test methods were used (i.e, Kröhn et al, 1981) and the test results are not reported. It is said generally, for example, in Turner et al.(1978), where the testing method used is not named, that this had confirmed the subjective happiness expressed by the patient. Hunt & Hampson (1988) used the MMPI test to validate their judgement scales. With some other authors, one gains the impression that the selection and application of testing methods were not guided by theory but that they followed the principle of coincidental availability in a treatment facility. It is to be supposed that one of the reasons why the psychological test results are treated so marginally in follow-up studies is that such results are normally published separately (e.g., Mate-Kole et al, 1988; comp. the overview of Lothstein, 1984).

Physical Examinations: Depending on the scope and the attitude of the examiner, various extensive physical examinations were done. In the publications that come from the surgical field, there is a physical examination as a rule, even if the results are not always presented in detail. Some authors speak generally about physical examinations (e.g., Benjamin, 1964 a; Vogt, 1968; König et al., 1971; Eicher et al., 1991), others about regular endocrinological and surgical controls (e.g., Zingg et al., 1980) sampling of the gynecological and urologic local findings (e.g., Kröhn et al., 1981) physical inspections (e.g., Randell, 1969) or photographic documentation of the surgery results (e.g., Junge, 1987). Except in some reports of the surgical field, detailed somatic findings regarding the surgical results can be found in the works of McEwan et al.(1986); Blanchard et al. (1987); Ross & Need (1989), that in relation to their specific scope also make urographies, as well as Pfäfflin & Junge (1990). In one study tissue samples were taken out of the vaginas for a histological analysis (e.g., Kröhn et al., 1981).

Interviewing of Other Persons: As already mentioned, where patients were not available, relatives, friends or fellow patients were interviewed. Some work groups fundamentally wanted to include partners, friends and relatives (e.g., Benjamin, 1966; Hoenig et al., 1970 a, b; 1971; Hunt & Hampson, 1980 b) or included these if a more-or-less coincidental occasion arose (e.g., Stone, 1977; Junge, 1987). Others interviewed unsystematically previously treating physicians or psychotherapists (e.g., Benjamin, 1966; Lothstein, 1980).

Written Sources: The most frequently used written sources were treatment documents (treatment files, expert opinions, reports about stationary psychiatric admittance, etc.). As a rule, the files of the first examination in the incumbent treatment facility were used as a basis for the retrospective evaluation of the first polarity in the before and after comparison (e.g., Hore et al., 1975; Lothstein, 1980; Spengler, 1980; Pfäfflin & Junge, 1990). Some works were based exclusively on the treatment files, after which the poles of the comparisons were evaluated (e.g., Wyler, 1978; Dudle, 1989) or they used for this comparison additionally other methods (e.g., Walser, 1968; Lothstein, 1980). To the latter mentioned group belong all those publications based on the course documents (Wålinder, 1967; Money & Brennan, 1968; König et al., 1970 a, b; 1971; Money & Ehrhardt, 1970; Lundström, 1981; McCauley & Ehrhardt, 1984; Eicher et al., 1991). The most explicit inclusion of secondary sources can be found in the works of Wålinder & Thuwe (1975) and Lundström (1981). They compiled extensive data about all illness descriptions; diagnoses; alcohol dependency; medical retirements; receiving of social welfare; marital status; police record, etc,. that in other countries are not registered in the same way or that cannot be accessible by reason of privacy acts. The evaluation of the socio-economic situation and health condition in the latter-named follow-up studies are based on much more objective data than in other follow-up studies.

Evaluation: The follow-up studies differentiate in if patients evaluate their situation themselves or if the evaluation was based on external evaluations (researcher, relatives, friends, file information, etc.). In most works the self- and external evaluations were regarded, but not clearly differentiated in the representation of the results (e.g., Hoenig et al., 1971). But there are other publications in which similar detailed scaling measures were developed for both and the results were represented correspondingly differentiated (McEwan et al, 1986; Fahrner et al., 1987). In those studies where global evaluations were undertaken, the evaluation of the patients did not differ essentially from that of the researcher. In some items the researcher, in others the patient, was more critical (e.g., Randell, 1969; Kröhn et al., 1981). In the statistical comparison of the subjective evaluation and the external evaluation, Pfäfflin & Junge (1990) could not find significant differences in their sample.

Because the self- as well as the external evaluation could -- willingly or unwillingly -- be distorted, it was tried in many studies to compile, additionally to the subjective evaluations, objective areas. To the contrary of Walser, who highlighted that he had used in his sample a total of "86 objective sources" (Walser, 1968, p. 420), Stürup (1976) declaratively disregarded a big methodological input, but believed primarily the patient and trusted his own judgment as an experienced psychiatrist as much as questionnaires or other differentiated evaluation instruments. He also disregarded physical examinations because he considered them unsuitable in the context of the psychiatric follow-up studies. Here we remind about his position, not because we think it superfluous to develop and use differentiating measuring instruments, but because an attitude is expressed that testifies the respect for the patient and regards what is to be achieved by a follow-up study and for whom it is good.

Kuiper (1985) as well as Kuiper and Cohen-Kettenis (1988), who focused their studies on theoretical reasoning, mainly of the subjective evaluation of the patients, compared the data with other "more objective" variables (e.g., place of work, partnership) and could demonstrate that the subjective judgments were valid.

Measuring Times: Most follow-up studies are "ex-post-facto"-studies as said explicitly in the title of the work of Kuiper (1985), that is, there was only one measuring time and the anamnesis and the evaluation for the time of the treatment start were asked retrospectively or based on previous treatment documents. This can be problematic under methodological viewpoints if categories developed for the post-evaluation are applied retrospectively on the unsystematically collected data at the beginning of the treatment. On the other hand, the formulation of evaluation criteria of many authors -- who have accompanied patients over years and report about their long-term courses -- indicates that they watch for, especially in the description and evaluation of their patients, special characteristics so a too-large discrepancy of their evaluation standards for the different evaluation times is not to be expected; but it is rather to be supposed that they judged their retrospectively-evaluated treatment files after similar viewpoints in the follow-up studies. This is probably true for a series of well-documented course studies (e.g., Walser, 1968; Vogt, 1968; Randell, 1969; Hoenig et al., 1970 a, b; 1971; Money & Ehrhardt, 1970; Hunt & Hampson, 1980 b; Sörensen, 1981 a, b; McCauley & Ehrhardt, 1984; Pfäfflin & Junge, 1990; Stein et al., 1990) and especially for the study by Wålinder (1967) and Wålinder & Thuwe (1975) that was planned as a prospective research from the very beginning. As was demonstrated in the study of Kuiper and Cohen-Kettenis (1988), as well as Fahrner et al. (1987), the "ex-post-facto"-method allows, with sufficient good documentation of the course, differentiation for more than two times. With two or more measuring times, in which the same method was used, worked Pomeroy (1967); Money & Brennan (1968); Sadoughi et al. (1978); McCauley & Ehrhardt (1984); Lothstein (1980) as well as Mate-Kole et al. (1990).

Evaluation Criteria: The relative evaluation of a katamnesis situation, as in comparison to the start situation, as "better" or "worse," can only be done in an external evaluation in the limiting sense for many items if in the course of study and at both measuring times it was evaluated with the same analysis instrument and with the same criteria, which as we showed, only happened in very few follow-up studies. But there are also many items that can be quantified retrospectivly, especially with good documentation, in a reliable form in "more" or "less." (for example, the number of stationary psychiatric admissions; suicidal attempts; length of partnerships; length of unemployment; income, etc.). If absolute evaluations such as "good" or "bad," are used, the roof and bottom effects are to be regarded. We mean to say that a person cannot essentially better an item evaluated as good and cannot extensively worsen an item evaluated as bad (comp. Simona-Politta, 1983).

To standardize the evaluation criteria, a few authors have developed a scaling system that we refer to extensively in the respective follow-up study in section 3 (Hastings, 1974; Laub & Fisk, 1974; Wålinder & Thuwe, 1975; Meyer & Reter, 1979; Hunt & Hampson, 1980 b; Lundström, 1981; Lindemalm et al., 1984; Fahrner et al, 1987; Dudle, 1989; Pfäfflin & Junge, 1990). Predecessors of the first scaling system by Hastings (1974) that highlighted the four areas of economic, sexual, social and mental well-being as the most important areas for the evaluation were the evaluation criteria by Benjamin (1964 a), Randell (1969) as well as Money & Ehrhardt (1970). The system used by Meyer & Reter (1979) is hardly convincing. The point-rated evaluation system developed by Hastings (1974) was used by Steiner (1976) and -- in altered form -- by Laub & Fisk (1974); Junge (1987); Pfäfflin & Junge (1990) and others. Frequently used was the scale by Hunt & Hampson (1988) that proved to be test-statistical reliable and measured by the MMPI as valid. It was used by many later follow-up studies with or without variation, resp., additions.