HOW SHOULD WE RATE RESEARCH UNIVERSITIES?
by Nancy Diamond and Hugh Davis Graham
Planning for the National Research Council's (NRC) next study of research-doctorate
programs in the United States, with publication expected in 2004, has highlighted
disagreements over how quality should be measured. One side in the debate supports
continued reliance on reputational surveys as the primary measure of quality.
On the other side, advocates call for more objective measures of research performance,
as demonstrated in publications, awards, prizes, and other indicators of scientific
and scholarly achievement. Elite institutions favored by the traditional reputational
method generally resist the use of more quantitative per capita measures that
may favor newer, aspiring programs and universities. Like Republicans and Democrats
arguing over which methods to use in conducting the Census, the champions of
subjective and more objective methods know that the choice or mix of methods
will significantly determine who benefits --and who loses -- from the findings. The commercial success of college and university rankings published annually by U.S. News & World Report and the 1995 publication of the NRC report, Research-Doctorate Programs in the United States (hereafter, Report), has intensified this debate. (1) The Report contained a wealth of program data, including quantitative indicators of research output. At the same time, however, the NRC ranked faculty and programs exclusively by their reputational rating. This produced top-quartile lists and top-twenty bragging rights that necessarily disappointed many of the 274 institutions whose programs were included in the study. In the competitive academic marketplace, the stakes of this ratings game are high. Top-ranked research-doctorate programs, or those seen to be within striking distance of the top tier, may win increased funding, recruit nationally recognized faculty and talented students, and place their graduates in the academic job market. Conversely, low ranking can produce program decline and even termination. The prospect of another national NRC study, the first of the 21st century, has heightened interest in the planning process. Ambitious universities not previously accorded top-tier status are especially open to alternative methods that offer institutional challengers an opportunity, one less influenced by inherited hierarchies of status and prestige, to demonstrate their research achievement. In this article, reference to "rising" or "challenging" institutions denotes universities that were not ranked among the top 25 according to any of the four major national surveys since 1960. In The Rise of American Research Universities (Johns Hopkins, 1997), we emphasized the importance and value of quantitative per capita measures of scholarly research over reputational surveys. (2) Because that book charts the research development of more than 200 universities since World War II, we aggregated data at the institutional level at several points over time, rather than at the program level, where national studies sponsored by the American Council on Education (ACE) and the NRC have concentrated their analysis. (3) In this article, we apply the per capita method to program-level data and compare the results with the NRC's reputational ratings of the research quality of program faculty. Our purpose is to test, at the program or department level, our book's dual finding that first, quantitative per capita assessments confirmed the research excellence of most of the elite universities customarily found among the top 20 when judged according to reputation. Second, per capita measures also demonstrated the superior performance of "rising" institutions, whose achievements often have been masked by the national surveys that ranked campuses according to reputation. On the basis of these comparisons, we offer specific recommendations for how -- and how not -- to rate research universities in the next NRC study. The Strengths and Weaknesses of Reputational Ratings Reputation surveys have dominated 20th-century assessments of American faculty and graduate education. Developed during the 1920s and 1930s through the pioneering work of Raymond Hughes, and advanced by Hayward Keniston in the late 1950s, reputational surveys had won credibility for three reasons. (4) First, these evaluations rested on the peer-review principle that scientific, scholarly, and artistic quality is best assessed by recognized experts in the field. Peer review thus represented a qualitative, holistic judgment that also could reflect quantitative measures of research performance. Since World War II especially, peer review has enjoyed wide respect among academics, as well as government, business, and foundation officials, as the most appropriate method for awarding appointments, promotions, tenure, research grants and contracts, and prizes. Second, the crucial assumptions underpinning peer review -- that the rater is an expert who knows the body of work or persons being assessed -- were reasonably met during the early and middle decades of the 20th century when reputational ratings became the primary evaluation method of the major national studies. Doctoral education prior to World War II was dominated by the prestigious members of the Association of American Universities (AAU), a group of 14 founding campuses whose ranks increased to only 30 institutions in 1940. Even in 1960, the Council of Graduate Schools (CGS), representing institutions that granted 95 per cent of all Ph.D.s, had only 100 member universities. In this still relatively small world of graduate study, the teaching function of doctoral education largely coincided with its research function. Doctoral programs were housed in traditional academic departments, where the faculty generally knew the work of their disciplinary colleagues on other American campuses. Third, in the absence of alternative, more objective methods of measurement, this legacy of rater familiarity with the research of faculty in their disciplines lent credibility to subjective ratings. Not until the late 1960s and early 1970s did the reporting of federal research funding and developments in electronic data processing, most notably in citation indexing, offer opportunities to measure individual and institutional research output directly, rather than indirectly through the filter of reputation. (5) At the same time, however, the development of quantitative measures, together with American higher education's dramatic expansion in the 1960s, and the larger revolution in communications and research networks, rapidly undermined the institutional arrangements that had earned early respect for reputational ratings. The resulting criticism of reputational assessments generally rests on two grounds. One is based on research in the psychology of human perception, while the other, accelerating in its impact, is based on the rapidly changing research environment of the post-Sputnik era. The first body of criticism, duly noted in the NRC Report, emerged from the development of survey research in the 1950s and 1960s. It demonstrates that reputation surveys are biased by a halo effect that lifts the reputations of departments and programs with academic stars, and of those located on prestigious campuses. (6) Additionally, reputation ratings are biased in favor of large programs. Raters who recognize three published scholars in a department of forty faculty tend to rate it higher than a department of twenty where only two are recognized. (7) A second line of criticism, less recognized though more damaging to the validity of reputational ratings, is based on changes which have undermined the very premise that legitimated reputational surveys in the first place. Driven by the defense research imperatives of the Cold War, the unprecedented growth of the American economy, the demographics of the baby boom, and technical advances in communications, the revolution in knowledge creation has radically rearranged our research environment. We have witnessed this great transformation in our lifetimes, and our careers have been enriched by it. Yet we are so intimately caught up in its processes that we need to step back and consider the impact of these changes on the assessment of research achievement. What are the chief attributes of this transformation? Perhaps most important, research became increasingly specialized, widening the spectrum of inquiry and deepening its penetration. Knowledge creation also grew increasingly interdisciplinary, with a resulting fragmentation of our disciplinary communities. By the 1980s and 1990s, as American universities conferred between 30-to-40,000 new Ph.D.'s annually, the number of qualified researchers exploded, and quality research spread to second- and third-tier institutions. Research institutes proliferated, as did new scientific and scholarly associations and journals. The entire apparatus of research communications and infrastructure was internationalized. Interdisciplinary research was furthered as publication and research collaboration on the internet and electronic mail communication became instantaneous. The Peer Review Disconnection These changes have produced important consequences for the evaluation of research-doctorate programs. Most significant has been a profound split between the university's discipline-based organization for graduate training on the one hand, and the interdisciplinary research networks on the other. As the American research university enters the 21st century, its department-based teaching is still grounded in a horizontal structure that is resistant to change. Departments hire faculty to cover the main subfields of the disciplinary terrain and attend to important organizational routines -- such as promotion and tenure decisions and graduate and undergraduate teaching obligations -- requirements that fix faculty firmly within these traditional arrangements. Our large, discipline-based professional associations continue to publish directories that list faculty rosters by department, and the reputational surveys reflect such arrangements. In 1993, for example, the NRC used a disciplinary focus, asking more than 16,000 respondents to rate the scholarly quality of the faculty in some fifty departments in their fields. (8) At the same time, faculty research networks that have become increasingly vertical no longer correspond to this horizontal department organization. In this constantly changing research environment, specialized, interdisciplinary networks typically connect researchers to only one or two members of their discipline who share their research interests. These networks then branch outward, and with increasing regularity, reach across the globe. Rather than reflecting department directories, faculty research networks more closely reflect our own e-mail address lists. This growing disconnection between faculty research networks and the discipline-based doctoral programs is the loss of expertise from the peer review equation in reputational surveys. Faculty raters, who know a great deal about the quality of scholarship in their research areas, are asked instead to assess the work of entire faculties and graduate programs in scores of other departments. It is probable that the distortions of the halo effect, always problematical, have been magnified during recent decades as raters have faced departments filled with specialists whose work was unfamiliar to them. Under these circumstances, scholarship was far less important in determining prestige ratings than either the past reputations of departments or affiliated universities. The most troublesome consequence of continued reliance on reputational surveys is the harm this subjective method inflicts, however inadvertently, on aspiring departments, programs, and institutions. The prestige of established elites appears to act as a filter, screening from view the research achievements of the challengers, depriving them of recognition for accomplishments they have earned. The result of this baneful process in fact may be two-directional, screening our most prestigious universities from the bracing effects of vigorous competition by challenging institutions. Comparing Reputational Ratings and Quantitative Measures by Academic Discipline The argument outlined above, that reputational ratings have grown obsolete and harmful, is plausible, but unproven. Indeed, the history of reputational surveys as the mainstay of national university comparisons since the 1920s shows remarkably little research validating its utility as an accurate measure of research quality. The major national studies instead presumed the primacy of reputational surveys as a measure of research quality. This presumption was defensible through the 1960s and early 1970s when alternative measures of assessment were underdeveloped, and there was a loose academic consensus -- one that still exists -- that rankings based on reputation ratings were more or less correct, especially at the top of the research hierarchy. However, to perpetuate this untested assumption in the face of the extraordinary changes that were undermining its premise represented a disappointing standard of scientific rigor. In the absence of a systematic validation of the most promising subjective and quantitative measures of university research quality against a benchmark standard of excellence, what evidence is available to test the proposition that reputational ratings fail to recognize the research achievements of rising programs and institutions? First, studies that documented research achievement in individual disciplines, especially sociology and political science, have provided a more finely grained analysis of research performance. Such studies produced rankings based on the number of publications, citations, grants, patents, and other research indicators, and compared ratings based on these measures with reputational ratings. (9) In several of these single-discipline studies, especially those that relied on per capita measures, researchers have found a discrepancy between reputational ratings and the levels of research achievement shown by rising departments and programs. (10) Second, in The Rise of American Research Universities, we demonstrated the same phenomenon at the institutional level. In the public sector we identified 21 rising universities, including the University of California (UC), Santa Barbara and the State University of New York (SUNY) at Stony Brook. In the private sector there were 11 such campuses, including Brandeis and Rochester, institutions whose achievements were under recognized by the major reputational surveys. The institutional-level focus we employed, designed for a different purpose, does not yield the level of precision available through program-level analysis. In this article, to extend our analysis, we compare NRC's reputational ratings with per capita measures of citation and award density. The tables below reflect these comparisons in both individual disciplines and broad fields. The left-hand columns document the NRC reputational rankings of scholarly quality of program or department faculty, while the right-hand columns reflect rankings based on per capita citation density (or awards density for humanities fields). Citation density and award measures were provided in the NRC Report, but were not used by the NRC for ranking purposes. (11) Finally, we compare David Webster's and Tad Skinner's institutional aggregation of the NRC reputational rankings (Change, 1996) with our own grand ranking that is based on quantitative per capita data. The Strengths and Weaknesses of Citation Measures Before discussing the tables, it is important to note the strengths and weaknesses of the per capita citation and award measures. These indicators refer to the number of citations or awards for a given department or program divided by the number of program faculty. Such indicators thereby avoid the problem, common in press-release competition among universities, of conflating quantity with quality by comparing total output data (for annual publications, citations, awards, research dollars, etc.) irrespective of program or institutional size. Per capita indicators offer instead a unit of research productivity that can be compared across programs at institutions of different sizes and types. (12). The value of citation analysis as an indicator of research quality has been widely acknowledged. (13) Published scholarship varies widely in quality -- roughly half of all scholarly and scientific publications, bibliometricians report, are never cited at all. Ranking university doctoral programs by the frequency with which the published scholarship of their faculty is cited by others thus provides a valuable benchmark of research quality, arguably the best single measure available. On the other hand, despite its superior value as an indicator of research importance, there are inherent limitations. Citation analysis is but a single indicator, and no single indicator, however excellent, is sufficient for measuring the complexity and quality of institutional knowledge creation. There are other drawbacks. The NRC's funding and deadline pressures in the early 1990s, combined with the limited capacities of the Institute for Scientific Information (ISI), publisher of the Citation Index series, produced a data base of objective indicators with a level of reliability substantially below that which can be achieved today. (14) In the last NRC study, errors were introduced through misreporting by campus-based Institutional Coordinators, who were assigned the task of providing the number of campus faculty (the denominator in per capita citation density measures). Still other errors involved output data (publications, citations) caused by mistakes in recording names and institutions, in matching zip codes, and in data entry. However, such flaws tend to be randomly distributed, and produce little significant distortion when aggregated at the level of academic field or institution. Moreover, the number of arts and humanities awards, collected by the NRC staff, avoided most electronic data processing errors. Although caution is required when comparing programs or departments on the basis of the NRC Report's citation density scores, careful comparisons demonstrate persistent discrepancies between subjective and objective measures of research achievement. In all of the comparisons that follow, the patterns of research performance that emerge are consistent with our research findings that reputational rankings tend to mask the demonstrable research achievements of challenging institutions. Comparing Reputational and Quantitative Measures of Research Achievement by Discipline Tables 1 through 5 compare reputational and citation density (or award density for the humanities) for individual disciplines. Table 1 shows rankings for the top 25 programs in astrophysics and astronomy, as representative of fields in math and the physical sciences. We selected astronomy as an illustrative discipline for several reasons. Because only 33 doctoral programs in astronomy and astrophysics were rated by the NRC (as compared, for example, with 179 programs in cell and developmental biology), we thought that astronomers are more likely to know one another's work. By implication, members of small research communities should be less vulnerable to the halo-effect distortions of institutional prestige. Thus, the appearance of significant differences between reputation and citation rankings in astronomy reinforces the argument that institutional prestige often distorts collegial perception of research performance. Table 1 reflects three patterns. The first supports a finding demonstrated in The Rise of American Research Universities: the nation's elite universities that have won the top reputation rankings, have earned their enviable status through superior research achievement. Familiar institutional elites -- Caltech, Princeton, UC Berkeley, Harvard, MIT -- dominate the top ten ranks in both reputation and per capita citation density. However, according to citation density scores (displayed in the right-hand column), certain challenging institutions, either absent or not highly ranked by reputation (displayed in the left-hand column) rise toward the top of the list. These campuses include Massachusetts-Amherst, UC Santa Cruz, SUNY-Stony Brook, and Colorado. This dual pattern is repeated throughout the tables that follow. On the one hand, established elite institutions, such as the Ivy League campuses and great state flagships, are often top-ranked on both reputational and objective measures. At the same time, challenging universities, often younger and smaller institutions such as SUNY-Stony Brook, Brandeis, or the newer UC campuses, break into the upper ranks when measured by their research achievements rather than by a perceived level of prestige. A third pattern found in Table 1 seems distinctive to the fields of astronomy and astrophysics. Certain universities (Arizona, Hawaii-Manoa) appear to benefit from the prominence of their astronomical observatories, scoring higher on reputation but lower when ranked by citation density. Tables 2 through 5, comparing disciplines representing the biological sciences, engineering, social and behavioral sciences, and arts and humanities, show similar patterns of high rank in both reputation and per capita measures by traditionally prestigious institutions, high rank by rising institutions on quantitative measures, and certain patterns distinctive to specific disciplines. In cell and developmental biology (Table 2), for example, traditional elites -- MIT, Caltech, and Harvard -- rank high according to both measures. (15) At the same time, several challenging institutions -- Case Western Reserve, Vanderbilt, Brandeis, and Cincinnati (all of which save Brandeis have a campus medical school) -- break into the top 25 when measured by citation density. Finally, cell biology programs based in medical schools are strongly represented in both rankings. The programs at the Stanford and Colorado medical schools, for example, are ranked higher on both reputational and quantitative measures than their counterparts in the arts and sciences. Similarly, in the field of electrical engineering (Table 3), all three patterns hold. Proven elites -- Caltech, Princeton, Stanford, MIT -- rank high according to both measures. They are joined in the top 10 quantitative rankings (right-hand column) by rising challengers UC Santa Barbara and SUNY-Buffalo. Third, when ranked according to citation density, the rising research universities include non-flagship land-grant universities -- for example, North Carolina State (which also appears among the reputational top 25) and Colorado State -- a group not strongly represented among the top ranks in other fields. In social and behavioral sciences disciplines such as history, where publication typically takes the form of books rather than journal articles, citation density is a less reliable indicator. However, economics (Table 4) provides a more typical example. The reputational ranking for economics holds few surprises. It is worth noting, however, that at Caltech, where faculty divisions are not organized according to the NRC's disciplinary taxonomy, a "virtual" economics program assembled by the Institutional Coordinator, ranked 19th (of 107 programs) in the NRC's reputation survey. In the eyes of faculty raters, Caltech's exceptional "coattail effect" boosted the reputation of even a program that did not formally exist. In economics, the top 10 citation density ranking was led by a number the same elite institutions -- Chicago, Harvard, MIT, Stanford -- found in the reputational top 10, with the striking exception of Maryland-College Park, which jumped to first place in citation density from 20th rank in reputation. Maryland's high per capita ranking demonstrates in part the power of academic stars, in this case, College Park economist Mancur Olson, whose widely cited 1971 book, The Logic of Collective Action, created a new analytical paradigm. (16) Aside from Maryland's striking rise (and Caltech's highly regarded "virtual" program), the economics comparison demonstrates a similar pattern of challenging institutions -- Boston University, Rochester, Vanderbilt -- rising in the per capita category. The final discipline-based comparison ranks programs in philosophy (Table 5) as representative of the arts and humanities. Because no accurate method for measuring book publication was available in the early 1990s, the NRC staff independently compiled a data file of honors and awards received by humanities program faculty. Unfortunately, the Report provides awards data for only a small number of programs, especially when compared with the high numbers of article and citation data that were documented. As a consequence, a small difference in awards per program faculty produced a large difference in ordinal ranking, and also a large number of ranking ties. In our own research for The Rise of American Research Universities, to account for fact that book publication was not represented, we constructed a similar index for measuring the research productivity of arts and humanities faculty. In a pilot study that documented the relationship between per capita awards and book publication in three humanities disciplines, we found a positive correlation of 0.73. This correlation demonstrates that the documentation of awards can provide a practical substitute for book publication. The awards density measure thus tends to be a high quality, low quantity indicator, the opposite of such measures as total publications or total research grant dollars. The philosophy comparisons show similar patterns to those in other disciplines, with programs at prestigious universities -- Princeton, Harvard, UC Berkeley, Stanford, Michigan, Cornell, MIT, Chicago, and Brown -- ranked among the top 15 in both reputation and awards density. Yet, consistent with the other disciplinary comparisons (Tables 1-4), challenging institutions -- Illinois-Chicago (ranked 8th), Massachusetts-Amherst, Emory, Notre Dame, and Syracuse -- break into the top 25 quantitative ranking. A distinctive element is represented by Pittsburgh, which placed two top-ranked programs (Philosophy, and History and Philosophy of Science) in both the reputational and award density categories. Comparing Reputational and Quantitative Measures for Academic Fields Tables 6 through 10 compare top-20 reputational (provided by Webster and Skinner) and per capita institutional rankings for the five broad fields of study represented by single disciplines (Tables 1-5). When performance measures for doctoral programs are aggregated at the level of field rather than discipline, the top 10 ranks in citation density are typically dominated by established elites, with challengers breaking into the second ten ranks. Thus, in the physical sciences and mathematics (Table 6), the challenging universities in the citation density category include Arizona, UC Santa Barbara, Colorado, New York University, and Pittsburgh. In the biological sciences (Table 7), challengers -- UC Irvine, Iowa, and Colorado -- break into the quantitative top 20. In engineering (Table 8), UC Santa Barbara, tied for 16th in the reputational category, soars into third place in the citation density ranking. Syracuse, SUNY-Buffalo, and Rochester are ranked in the second ten. On the other hand, Purdue, Carnegie Mellon, Georgia Tech, and Penn State, ranked in the reputational top 20, do not appear in the per capita citation top 20. In the social and behavioral sciences (Table 9), established leaders dominate the first 10 places in the citation density column, and challengers, led by 9th-ranked SUNY-Stony Brook, dominate the second 10 ranks. It is striking how many universities highly ranked by reputation -- UC Berkeley, Princeton, Minnesota, Cornell, North Carolina-Chapel Hill, and Illinois-Urbana -- are not included in the top 20 citation density ranking. In the arts and humanities (Table 10), where high rankings are dominated by private institutions, prestigious universities continue to lead the top ranks on both reputational and quantitative measures. Challengers -- UC Davis, Rice, and UC Irvine -- follow in the second 10 of the awards density category. Institutional Grand Ranking The final comparison (Table 11) shows the top 50 institutions ranked according to the mean score of reputation in the left column and by citation and awards density in the right column. (17) Not surprisingly, at this level of competition, it is more difficult for challengers to break into the top ranks in either category. The established leaders who dominate the reputational rankings tend to be strong across the academic spectrum. This is especially true for the well-endowed private universities, which claim eight of the top 10 positions in both rankings. (The other two universities are the UC campuses at Berkeley and San Diego.) Challenging institutions, in contrast, typically have concentrated their resources on their strongest programs, building what Stanford provost Frederick E. Terman called "steeples of excellence." For such rising institutions (which included Stanford in the 1950s), this strategy seems well designed for breaking into the top ranks -- Terman emphasized that "the steeples be high for all to see." (18) Thus, the 18 highest ranked universities in both the reputational and quantitative rankings are all institutions rated in the top 20 in the previous major reputational surveys. Judged according to the more objective citation or award density measure, the challengers appear beginning with UC Santa Barbara, ranked 19th among all universities, and 6th among public universities. Successful rising challengers -- Colorado, Washington University, Rochester, UC Irvine, and SUNY-Stony Brook -- rank 21 through 25 respectively. A second block of rising universities is led by Rice and Brandeis, ranked 31st and 32nd respectively, in the quantitative per capita density categories. How Should the Next NRC Study Rate Research-Doctorate Programs? As the NRC gears up for the next study, measuring the research quality of program faculty is only one item on the Council's planning agenda. Economist Charlotte Kuh, the project director, has been meeting with various administrative and faculty groups to hear their opinions, and to let them know that the next study must prove more useful than its predecessors to nonacademic constituencies, chiefly, government policymakers, private foundations, and perhaps most challenging to reach, business leaders. The new study thus will include more interpretation of data and trends that address the interests and needs of both academic and nonacademic constituencies. Raising funds to support the next study will be difficult, partly because its predecessors are seen as chiefly of interest to academics concerned about the pecking order of institutional and program prestige. In addition, vexing problems of program taxonomy, alluded to in our earlier discussion of the growing mismatch between traditional departmental structures and the increasing interdisciplinary fluidity of the research enterprise, confront the study's planners. The graduate student constituency also needs more useful measures of program effectiveness. There is little evidence, for example, that beyond the problematical reputational rankings, prospective graduate students have found the findings from previous national surveys useful. Nonetheless, the heart of the next NRC study should remain a comparative assessment of the quality of research in the nation's research-doctorate programs. Leading the world in knowledge production, American universities are crucial to economic growth and competitive success in the global market of the 21st century. We hear this rhetoric all around us and read it in boilerplate promotional literature from campus and corporation alike; yet, it is profoundly true. It is therefore important that the first national study of the 21st century succeed where the previous national studies fell short -- by producing a report that documents not only the sustained and merited reputation of traditional elites, but also the new research leadership of rising institutional challengers. Leaders in government, business, and industry, comfortable in their long association with elite programs and campuses, could learn from a new study that the pool of institutional talent is much deeper than it has appeared. What research design should guide the next NRC assessment of research-doctorate programs? At a June 1999 NRC project planning conference in Washington, D.C., a group of faculty and administrators drawn from the nation's campuses reached a consensus that the new study's design should be guided by the results of a pilot project. This pilot study would examine intensively a sample of representative programs and institutions, measuring multiple indicators of performance (publications, citations, patents, research funding, awards and fellowships, etc.) to establish benchmarks of research quality and program effectiveness. Then, against these standards a variety of indicators designed to measure the full program universe would be tested to compare their ability to predict the standard. The methods tested would include measures used in previous studies (reputational survey questions, article publication, citations, humanities awards, etc.) and potential new indicators (publications and citations in leading journals, patents, book publication, measures of graduate training effectiveness such as job placement). Should the next NRC study include a reputation survey of faculty? In our Chronicle of Higher Education (June 1999) opinion essay, we answered no -- reputation assessments should be given an honorable burial in the century that gave it birth, that benefitted from its maturity, and that witnessed its subsequent decay under the relentless pressures of the knowledge revolution. (19) But a decision should depend substantially on the results of the NRC's pilot study. It is striking that throughout the last century the reputation survey device, though employed as the lead rating measure in all the major national studies, has never been systematically validated as a measure of research quality. The NRC decision on whether or not to use the reputation survey moreover, may be determined in part by other factors. These surveys are defended as a social science method providing uniquely holistic, peer review assessments that reflect the strengths of large programs, and that also provide continuity over time for longitudinal studies. At the same time, there are political factors to be considered, legitimate concerns when dealing with a quasi-official body performing such a high-stakes service. Universities that traditionally have dominated reputation surveys constitute a powerful lobby for their continued use. The political need of organizations to avoid the controversy inherent in ranking their own members or constituent groups is also a relevant concern. Assessments of reputation enable organizations, such as the NRC and the ACE, to claim that the quality ratings were determined by expert members of the constituency, not by the sponsoring research organization. Whether or not a reputation survey is included in the next study, the NRC must above all avoid repeating the major mistake of the 1995 project -- the listing of all programs as ranked by reputational survey score. This identified the Council, the research arm of the National Academy of Sciences, as an official arbiter of rank in the great academic ratings game. Even were the reputational evaluation not so vulnerable to challenge, the decision to rank programs exclusively by subjective data, rather than to list both subjective and objective program data alphabetically by program (a presentation used in the NRC 1982 study), stamped a particular pecking order with the NRC's powerful imprimatur. However cloudy the future of subjective rankings may remain, the promise of effective objective measures of research quantity and quality appears bright. The ISI reports significant advances since the early 1990s in the comprehensiveness and reliability of its data files, and in its ability to match authors, publications, citations, and programs on a large scale. (20) The NRC pilot project will provide an opportunity to develop and test new indicators of scholarly research performance, including measures of publication and citation in leading journals. (21) An indicator documenting book publication would be appropriate for assessing humanities faculty, who have limited engagement in journal publications. (22) The awards indicator for humanities programs, high in promise as a qualitative measure, but weakened by low award totals in the 1995 study, can be greatly strengthened by including competitive awards and prizes conferred by academic associations. Moreover, in assessing graduate program effectiveness, the pilot program may develop and test such revealing measures as time to degree, job placement, and postdoctoral fellowship awards. (23) A study of research-doctorate programs appropriate for the 21st century may be published on the web in a format convenient to consumers. Users might download the data and calculate their own rankings, possibly by using software accompanying the report that allows users to construct composite scoring schemes, similar to those used by the U.S. News rankings, that assign varying weights to selected measures of program performance. The planning for the next NRC national assessment is facing intense scrutiny because the stakes are unusually high. As evidence accumulates that the tradition of focusing primarily on prestige ratings has masked a successful surge by challenging programs and institutions, the risk remains that the next study could repeat the old pattern. Much of the blame rests with the academic audience itself, which has rushed to embrace or criticize the prestige ratings, even as the sponsoring organizations, the NRC and the ACE, have tried to emphasize the variety of program measures and to resist aggregating program data into grand institutional rankings. Clark Kerr, taking a longer view, observed (Change, 1991) that the timing of reputational change in American higher education history has coincided with periods of great transformation. The first occurred, Kerr claimed, after the Civil War when the great private and state research universities were built, and the second occurred with the expansion of research activity inspired by federal funding in the wake of Sputnik. In Kerr's view, the period from 1990 to 2010, during which "[a]t least three fourths of the faculties will turn over, and there will be some net additions . . . as enrollments rise," may be another period of significant change in the leadership configuration of America's research universities. (24) If Kerr is correct, the NRC has a unique opportunity to describe the new alignment and set the standard of evaluation. Revised June, 2000 NOTES 1. National Research Council, Research-Doctorate Programs in the United States: Continuity and Change (Washington, D.C.: National Academy Press, 1995). 2. Hugh Davis Graham and Nancy Diamond, The Rise of American Research Universities: Elites and Challengers in the Postwar Era (Baltimore: Johns Hopkins University Press, 1997). 3. The four major national post-World War II reputational studies are: Alan M. Cartter, An Assessment of Quality in Graduate Education (Washington, D.C.: American Council on Education, 1966; Kenneth D. Roose and Charles Andersen, A Rating of Graduate Programs (Washington, D.C.: American Council on Education, 1970; Lyle V. Jones et al., An Assessment of Research-Doctorate Programs in the United States, 5 vols.(Washington, D.C.: National Academy Press, 1982); and the National Research Council 1995 study cited above. 4. Raymond M. Hughes, A Study of the Graduate Schools of America (Oxford, Ohio: Miami University Press, 1925); Hughes, "Report of the Committee on Graduate Instruction," Educational Record 15: 192-234. Hayward Keniston, Graduate Study and Research in the Arts and Sciences at the University of Pennsylvania (Philadelphia, Pa.: University of Pennsylvania Press, 1959). 5. In the NRC-sponsored studies of 1982 and 1995, which expanded the use of quantitative measures, reputational ratings showed a strong positive correlation with the more objective research indicators. Such high correlations are an expected result when comparing large numbers of research doctorate programs. 6. James Fairweather, "Reputational Quality of Academic Programs: The Institutional Halo Effect," Review of Higher Education 28,4 (1988): 345-56; Robert K. Toutkoushian, Halil Dundar, and William E. Becker, "The National Research Council Graduate Program Ratings: What Are They Measuring?" Review of Higher Education 21,4 (1998): 315-42. 7. For a review of reputational surveys, see David S. Webster, "Reputational Rankings of Colleges, Universities, and Individual Disciplines and Fields of Study from their Beginnings to the Present," Higher Education: A Handbook of Theory and Research, vol.8, ed.by John C. Smart, 234-304 (New York: Agathon Press, 1992). See also David L. Tan, "The Assessment of Quality in Higher Education: A Critical Review of the Literature and Research," Research in Higher Education 24,3 (1986): 223-65, and Clifton F. Conrad and Robert T. Blackburn, "Program Quality in Higher Education: A Review and Critique of the Literature and Research," John C. Smart, ed., Higher Education: Handbook of Theory and Research, vol.1 (New York: Agathon Press, 1986). 8. The NRC in 1993 sent questionnaires to 16,700 of the 65,470 faculty in the 274 institutions in the study; roughly half (7,900) returned usable questionnaires. The survey's most important rating indicator was "93Q," where respondents rated the scholarly quality of the program faculty on a scale of 0 to 5, with 0 denoting "Not sufficient for doctoral education" and 5 denoting "Distinguished." 9. These studies began to appear in the late 1960s. See for example, Lionel S. Lewis, "On Subjective and Objective Rankings of Sociology Departments,"American Sociologist 3 (1968): 129-31; W. Miles Cox and Viola Catt, "Productivity Ratings of Graduate Programs in Psychology Based on Publication in the Journals of the American Psychological Association," American Psychologist (October 1973): 793-809. For a more recent view, see James C. Garand and Kristy L. Graddy, "Ranking Political Science Departments: Do Publications Matter?" PS 32,1 (March 1999): 113-16. 10. See for example Richard C. Anderson, Francis Narin, and Paul McAlister, "Publication Rating versus Peer Rating of Universities," Journal of the American Society for Information Science (March 1978): 91-103. 11. The NRC Report, Appendix P, contains reputational and citation data for all programs, and per capita density measures for citations. The number of awards won by arts and humanities program faculty was also provided in Appendix J. We calculated per capita award density measures for these fields (Tables 5 and 10) by dividing the total number of awards for each university by the number of department or program faculty. The citation and award density scores were then converted into Z-scores to standardize the quantitative rankings. In Tables 6 through 11, the reputational scores are drawn from David S. Webster and Tad Skinner, "Rating PhD Programs: What the NRC Report Says ...and Doesn't Say," Change (May/June 1996): 24-44. 12. For a discussion of per capita measures, see Graham and Diamond, Rise of American Research Universities, 55-63. 13. See for example Jonathan R. Cole and Stephen Cole, Social Stratification in Science (Chicago: University of Chicago Press, 1973). 14. Research-Doctorate Programs in the United States (1995), Appendix G, 143-46; Brendan A. Maher, The NRC's Report on Research-Doctorate Programs: Its Uses and Misuses," Change (November/December 1996): 54-59. 15. Rockefeller University and UC San Francisco, ranked second and third, respectively, by reputation, were not included in the citation density ranking for this study. Both institutions had fewer than 11 programs rated in the 1995 NRC study, the minimum number selected for inclusion in our comparison. 16. See Mancur Olson, The Logic of Collective Action (Cambridge: Harvard University Press, 1971). The NRC Report recorded a citation density of 32.3 for the economics program faculty at Maryland, ranked 20th by reputation for scholarly quality, compared with an average citation density of 15.9 for the top 10 economics programs ranked by reputation. The Report provided a Gini coefficient of 74.7 for the Maryland department, indicating an unusually high concentration of citations on a small number of the program faculty. (The mean Gini coefficient for to 10 economics programs ranked by reputation is 10.0.) Olson accounted for roughly one fifth of the citations attributed to Maryland's 47 economics faculty members. For a discussion of the Gini coefficient, see Ronald G. Ehrenberg and Peter J. Hurst, "The 1995 NRC Rankings of Doctoral Programs: A Hedonic Model," Change (May/June 1996): 46-50. 17. The reputational rankings are from Webster and Skinner, who ranked 104 institutions with 15 or more programs included in the 1995 NRC study. In calculating the z-scores for citation and award density, we included 110 institutions. To Webster and Skinner's 104 campuses, we added six institutions with fewer than 15 NRC-rated programs: Alabama-Birmingham (13 programs), Brandeis (14), Dartmouth (11), Delaware (13), Georgetown (14), and Tufts (11). 18. Stanford provost Frederick E. Terman, quoted in Roger L. Geiger, Research and Relevant Knowledge: American Research Universities Since World War II (New York: Oxford University Press, 1993), 125. 19. Hugh Davis Graham and Nancy Diamond, "Academic Departments and the Ratings Game," Chronicle of Higher Education, 18 June 1999. 20. Henry Small, "Relational Bibliometrics," Proc. Fifth Biennial Conference, International Society for Scientometrics and Infometrics, M.E.D. Koenig and A. Bookstein, eds. (Medford, N.J.: Learned Information, 1995), 525-32. 21. In The Rise of American Research Universities, top-journal analysis of publications provided high quality indicators of research achievement in science, engineering, and the social and behavioral sciences. Analysis of citations in such leading journals offers even greater promise as an indicator of research quality. The leading journals in a program field may be identified using objective criteria, such as the ISI's list of "Journals Ranked by Times Cited." However, the problem of using top-journal analysis, from the perspective of a quasi-official sponsoring organization such as the NRC, may be less technical than political. Identifying top journals in an NRC study may provoke resentment by academic organizations, subscribers, and researchers associated with excluded journals, who object to NRC selection as a form of an endorsement that provides researchers, especially untenured scientists and scholars, with prestige guideposts on where and where not to seek publication. 22. The inability to link book authors to their academic programs and institutions on other than a hand-count basis has meant that books (other than anthologies, which are included in ISI data) have been excluded from all the major studies. The exclusion of book publication from studies of academic research achievement has been a glaring weakness of the large-scale studies, including, to our regret, The Rise of American Research Universities. There is reason to believe, however, that technical methods are now available to link book authors to their institutions. The pilot study should provide the NRC, and perhaps the ISI, with an opportunity to develop and test such a measure, especially in arts and humanities programs where book publication is the norm of scholarly output. 23. See Maresi Nerad and Joseph Cerny, "From Rumors to Facts: Career Outcomes of English Ph.D.s; Results from the Ph.D.'s-Ten Years Later Study," CGS Communicator 32,7 (Special Issue Fall 1999): 1-12. 24. Clark Kerr, "The New Race to be Harvard or Berkeley
or Stanford," Change (May/June 1991): 1. |
Table 1
Top 25 Research-Doctorate Programs in Astrophysics and Astronomy
Ranked by Mean Score of Reputation Rating and Citation Density
|
|
||||
Rank | Campus | 93Q Score | Rank | Campus | Z-Score |
1 | Caltech | 4.91 | 1 | Caltech | 3.486 |
2 | Princeton | 4.79 | 2 | UC-Berkeley | 1.707 |
3 | UC-Berkeley | 4.65 | 3 | UMass-Amherst | 1.622 |
4 | Harvard | 4.49 | 4 | UC-Santa Cruz | 1.134 |
5 | Chicago | 4.36 | 5 | Harvard | 0.724 |
6 | UC-Santa Cruz | 4.31 | 6 | Princeton | 0.716 |
7 | Arizona | 4.10 | 7 | MIT | 0.410 |
8 | MIT | 4.00 | 8 | SUNY-Stony Brook | 0.338 |
9 | Cornell | 3.98 | 9 | Colorado | 0.292 |
10 | Texas-Austin | 3.65 | 9 | Yale | 0.292 |
11 | Hawaii-Manoa | 3.60 | 11 | Minnesota | 0.221 |
12 | Colorado | 3.54 | 12 | Chicago | 0.039 |
13 | Illinois-Urbana | 3.53 | 12 | Cornell | 0.039 |
14 | Wisconsin-Madison | 3.46 | 14 | UCLA | 0.007 |
15 | Yale | 3.31 | 15 | Maryland-College Park | -0.152 |
16 | UCLA | 3.27 | 16 | Arizona | -0.214 |
17 | Virginia | 3.23 | 17 | Texas-Austin | -0.224 |
18 | Columbia | 3.20 | 18 | Stanford | -0.299 |
19 | Maryland-College Park | 3.07 | 19 | Columbia | -0.307 |
20 | UMass-Amherst | 3.04 | 20 | Wisconsin-Madison | -0.469 |
21 | Penn State | 3.00 | 21 | Illinois-Urbana | -0.501 |
22 | Stanford | 2.96 | 22 | Indiana | -0.595 |
23 | Ohio State | 2.91 | 22 | Ohio State | -0.595 |
24 | Minnesota | 2.59 | 24 | Hawaii-Manoa | -0.629 |
25 | Michigan | 2.65 | 25 | Michigan | -0.696 |
Source: National Research Council, Report, 1995, Appendix table L-1.
Note: The 93Q score refers to the NRC reputation rating of scholarly quality
of program faculty on a scale of 0 to 5,
with 0 denoting "not sufficient for
doctoral education," and 5 denoting "distinguished."
Table 2
Top 25 Research-Doctorate Programs in Cell and Developmental Biology
Ranked by Mean Score of Reputation Rating and Citation Density
|
|
||||
Rank | Campus | 93Q Score | Rank | Campus | Z-Score |
1 | MIT | 4.86 | 1 | MIT | 4.649 |
2 | Rockefeller U | 4.77 | 2 | Stanford Medical | 3.547 |
3 | UC-San Francisco | 4.76 | 3 | UC-San Diego | 3.219 |
4 | Caltech | 4.73 | 4 | Colorado Medical | 2.661 |
5 | Harvard | 4.70 | 5 | Harvard | 2.176 |
6 | Stanford Medical | 4.55 | 6 | Caltech | 2.029 |
7 | UC-San Diego | 4.50 | 7 | Yale | 1.969 |
8 | U of Washington | 4.49 | 8 | Duke | 1.420 |
9 | Washington U | 4.48 | 9 | Princeton | 1.680 |
10 | Yale | 4.37 | 10 | U of Washington | 1.515 |
11 | Princeton | 4.36 | 11 | Washington U | 1.436 |
11 | Stanford (A&S) | 4.36 | 12 | Case-Western Reserve | 1.088 |
13 | UC-Berkeley | 4.15 | 13 | UCLA | 1.045 |
14 | Duke | 4.11 | 14 | UNC-Chapel Hill | 1.032 |
15 | Chicago | 4.10 | 15 | Columbia | 0.880 |
16 | Wisconsin-Madison | 4.05 | 16 | Penn | 0.848 |
17 | UCLA | 3.99 | 17 | Chicago | 0.751 |
18 | Texas-SW Medical | 3.98 | 18 | Vanderbilt | 0.759 |
19 | Columbia | 3.94 | 19 | Johns Hopkins | 0.532 |
20 | Johns Hopkins | 3.91 | 20 | New York U | 0.517 |
21 | New York U | 3.88 | 21 | UC-Berkeley | 0.456 |
22 | Colorado Medical | 3.85 | 22 | Brandeis | 0.386 |
23 | Pennsylvania | 3.81 | 23 | Minnesota Medical | 0.380 |
24 | Baylor Medical | 3.80 | 24 | Cincinnati | 0.374 |
25 | UNC-Chapel Hill | 3.79 | 25 | Illinois-Chicago | 0.355 |
Source: National Research Council, Report,
1995, Appendix table P-7.
Table 3
Top 25 Research-Doctorate Programs in Electrical Engineering
Ranked by Mean Score of Reputation Rating and Citation Density
Reputation | Citations/Faculty | ||||
Rank | Campus | 93Q Score | Rank | Campus | Z-Score |
1 | Stanford | 4.83 | 1 | Caltech | 3.861 |
2 | MIT | 4.79 | 2 | Princeton | 3.738 |
3 | Illinois-Urbana | 4.70 | 3 | Stanford | 3.178 |
4 | UC-Berkeley | 4.59 | 4 | UC-Santa Barbara | 2.685 |
5 | Caltech | 4.46 | 5 | MIT | 2.034 |
6 | Michigan | 4.38 | 6 | Columbia | 1.952 |
7 | Cornell | 4.35 | 7 | SUNY-Buffalo | 1.677 |
8 | Purdue | 4.02 | 8 | UC-Berkeley | 1.591 |
9 | Princeton | 4.01 | 9 | Illinois-Urbana | 1.362 |
10 | Southern California | 4.00 | 10 | UC-San Diego | 0.960 |
10 | UCLA | 4.00 | 11 | Pennsylvania | 0.939 |
12 | Carnegie-Mellon | 3.94 | 12 | CUNY | 0.853 |
13 | Georgia Tech | 3.93 | 13 | Northwestern | 0.843 |
14 | Texas-Austin | 3.88 | 14 | Cornell | 0.802 |
15 | Columbia | 3.79 | 15 | Michigan | 0.604 |
16 | Wisconsin-Madison | 3.75 | 16 | North Carolina State | 0.558 |
17 | MD-College Park | 3.75 | 17 | Purdue | 0.527 |
18 | Minnesota | 3.73 | 18 | Brown | 0.456 |
19 | UC-Santa Barbara | 3.71 | 19 | Rochester | 0.375 |
20 | UC-San Diego | 3.57 | 20 | UCLA | 0.359 |
21 | North Carolina State | 3.54 | 21 | Maryland-College Park | 0.288 |
22 | Ohio State | 3.53 | 21 | Texas-Austin | 0.288 |
23 | Rensselaer | 3.44 | 23 | Colorado State | 0.258 |
24 | Polytechnic | 3.42 | 24 | Rice | 0.232 |
24 | U of Washington | 3.42 | 25 | Yale | 0.141 |
Source: National Research Council, Report, 1995, Appendix table P-16.
Table 4
Top 26 Research-Doctorate Programs in Economics
Ranked by Mean Score of Reputation Rating and Citation Density
|
|
||||
Rank | Campus | 93Q Score | Rank | Campus | Z-Score |
1 | Chicago | 4.95 | 1 | Maryland-College Park | 4.199 |
1 | Harvard | 4.95 | 2 | Chicago | 2.823 |
3 | MIT | 4.93 | 3 | Harvard | 2.743 |
4 | Stanford | 4.92 | 4 | MIT | 2.631 |
5 | Princeton | 4.84 | 5 | UC-San Diego | 2.071 |
6 | Yale | 4.70 | 6 | Boston U | 1.799 |
7 | UC-Berkeley | 4.55 | 7 | Stanford | 1.607 |
8 | Penn | 4.43 | 8 | Rochester | 1.527 |
9 | Northwestern | 4.39 | 9 | UC-Berkeley | 1.495 |
10 | Minnesota | 4.22 | 10 | U of Washington | 1.383 |
11 | UCLA | 4.12 | 11 | Vanderbilt | 1.367 |
12 | Columbia | 4.07 | 12 | Northwestern | 1.303 |
13 | Michigan | 4.03 | 13 | Yale | 1.159 |
14 | Rochester | 4.01 | 14 | Pennsylvania | 1.111 |
15 | Wisconsin-Madison | 3.93 | 15 | Princeton | 1.095 |
16 | UC-San Diego | 3.80 | 16 | Michigan | 1.999 |
17 | New York U. | 3.62 | 17 | Michigan State | 0.983 |
18 | Cornell | 3.56 | 18 | Rice | 0.711 |
19 | Caltech | 3.54 | 19 | UCLA | 0.583 |
20 | Maryland-College Park | 3.80 | 20 | Southern California | 0.519 |
21 | Boston U | 3.39 | 21 | Duke | 0.279 |
22 | Duke | 3.36 | 21 | Wisconsin-Madison | 0.279 |
23 | Brown | 3.34 | 23 | New York U | 0.231 |
24 | Virginia | 3.20 | 24 | Iowa | 0.135 |
25 | UNC-Chapel Hill | 3.16 | 25 | Cornell | 0.087 |
25 | Kentucky | 0.087 |
Table 5
Top 25 Research-Doctorate Programs in Philosophy
Ranked by Mean Score of Reputation Rating and Citation Density
|
|
||||
Rank | Campus | 93Q Score | Rank | Campus | Z-Score |
1 | Princeton | 4.93 | 1 | Harvard | 3.471 |
2 | Pittsburgh | 4.73 | 2 | Cornell | 2.471 |
3 | Harvard | 4.69 | 3 | Brown | 1.647 |
4 | UC-Berkeley | 4.66 | 3 | MIT | 1.647 |
5 | Pittsburgh* | 4.47 | 5 | Chicago | 1.588 |
6 | UCLA | 4.42 | 6 | Princeton | 1.529 |
7 | Stanford | 4.20 | 6 | UC-Berkeley | 1.529 |
8 | Michigan | 4.15 | 8 | Illinois-Chicago | 1.353 |
9 | Cornell | 4.11 | 9 | Northwestern | 1.176 |
10 | MIT | 4.01 | 10 | Michigan | 1.118 |
11 | Arizona | 3.98 | 11 | Pittsburgh | 1.000 |
12 | Chicago | 3.88 | 12 | UMass-Amherst | 0.882 |
13 | Rutgers | 3.82 | 13 | Pittsburgh* | 0.706 |
13 | Brown | 3.92 | 14 | Columbia | 0.647 |
15 | UC-San Diego | 3.79 | 14 | Indiana | 0.647 |
16 | Notre Dame | 3.69 | 14 | Emory | 0.647 |
17 | UNC-Chapel Hill | 3.67 | 17 | Penn | 0.529 |
18 | Illinois-Chicago | 3.51 | 17 | Duke | 0.529 |
19 | CUNY Graduate School | 3.45 | 17 | Notre Dame | 0.529 |
20 | UMass-Amherst | 3.44 | 20 | Syracuse | 0.412 |
21 | UC-Irvine | 3.30 | 21 | UC-San Diego | 0.235 |
22 | Wisconsin-Madison | 3.28 | 22 | Washington U | 0.176 |
23 | Syracuse | 3.28 | 22 | Penn State | 0.176 |
24 | Ohio State | 3.21 | 24 | UCLA | 0.118 |
25 | Northwestern | 3.18 | 25 | Iowa | 0.000 |
Source: National Research Council, Report, 1995, Appendix table J-9.
*Program in History and Philosophy of Science
Table 6
Top 20 Institutions in Physical Sciences and Mathematics
(of Those That Had at Least Four of the Eight Such Programs Ranked)
Ranked by Mean Score of Reputation Rating and Citation Density
|
|
||||
Rank | Campus | Mean Score | Rank | Campus | Z-Score |
1 | UC-Berkeley | 4.74 | 1 | Harvard | 17.307 |
2 | MIT | 4.69 | 2 | Caltech | 12.372 |
3 | Caltech | 4.61 | 3 | MIT | 10.595 |
4 | Harvard | 4.50 | 4 | U Washington | 10.263 |
5 | Princeton | 4.48 | 5 | UC-Berkeley | 9.456 |
6 | Cornell | 4.36 | 6 | Princeton | 9.356 |
7 | Chicago | 4.30 | 7 | Stanford | 9.050 |
8 | Stanford | 4.22 | 8 | Columbia | 8.257 |
9 | UC-San Diego | 4.07 | 9 | Arizona | 6.127 |
10 | Texas-Austin | 4.04 | 10 | Johns Hopkins | 6.038 |
12 | UCLA | 3.97 | 11 | UCLA | 5.872 |
12 | Columbia | 3.97 | 12 | Northwestern | 5.735 |
12 | Yale | 3.97 | 13 | UC-San Diego | 5.558 |
14 | U Washington | 3.91 | 14 | UC Santa Barbara | 4.791 |
15 | Illinois-Urbana | 3.89 | 15 | Colorado | 4.162 |
16 | Wisconsin-Madison | 3.81 | 16 | New York U | 3.761 |
17 | Brown | 3.73 | 17 | Yale | 3.687 |
18 | Carnegie Mellon | 3.66 | 18 | Pittsburgh | 3.552 |
19 | Purdue | 3.58 | 19 | Penn | 3.092 |
20 | Rice | 3.56 | 20 | Cornell | 3.081 |
Table 7
Top 20 Institutions in Biological Sciences
(of Those That Had at Least Four of the Eight Such Programs Rated)
Ranked by Mean Score of Reputation Rating and Citation Density
Reputation | Citations/Faculty | ||||
Rank | Campus | Mean Score | Rank | Campus | Z-Score |
1 | UC-San Francisco | 4.6 | 1 | Stanford | 21.727 |
2 | MIT | 4.54 | 2 | Harvard | 17.197 |
3 | Harvard | 4.43 | 3 | UC-San Diego | 16.985 |
4 | UC-San Diego | 4.42 | 4 | Caltech | 14.029 |
4 | Stanford | 4.42 | 5 | Yale | 14.015 |
6 | Yale | 4.40 | 6 | MIT | 13.244 |
7 | UC-Berkeley | 4.36 | 7 | Columbia | 9.547 |
8 | Rockefeller U | 4.31 | 8 | U of Washington | 9.205 |
9 | Washington U | 4.19 | 9 | Johns Hopkins | 8.772 |
10 | U of Washington | 4.18 | 10 | UC-Berkeley | 6.092 |
11 | Columbia | 4.15 | 11 | Pennsylvania | 6.018 |
12 | Caltech | 4.07 | 12 | Duke | 5.553 |
12 | Duke | 4.07 | 13 | Michigan | 5.363 |
14 | Wisconsin-Madison | 4.04 | 14 | UCLA | 4.866 |
15 | Pennsylvania | 4.03 | 15 | Washington U | 4.770 |
16 | Chicago | 3.99 | 16 | UC-Irvine | 4.404 |
16 | Johns Hopkins | 3.99 | 17 | Iowa | 3.424 |
18 | Texas-SW Medical | 3.94 | 18 | Colorado | 3.415 |
19 | UCLA | 3.93 | 19 | Chicago | 3.291 |
20 | Baylor Medicine | 3.87 | 20 | UNC-Chapel Hill | 3.043 |
Table 8
Top 20 Institutions in Engineering
(of Those That Had at Least Four of Eight Programs Rated)
Ranked by Mean Score of Reputation Rating and Citation Density
Rank
Campus
Mean Score
Rank
Campus
Z-Score
1
MIT
4.65
1
Stanford
16.553
2
UC-Berkeley
4.47
2
Caltech
13.280
3
Stanford
4.33
3
UC Santa Barbara
10.954
4
Caltech
4.31
4
UC-Berkeley
9.610
5
Cornell
4.16
5
Minnesota
9.126
6
Princeton
4.13
6
MIT
8.615
7
Illinois-Urbana
4.05
7
Princeton
7.749
8
Michigan
4.00
8
Northwestern
7.657
9
UC-San Diego
3.92
9
Cornell
7.421
10
Minnesota
3.15
10
Texas-Austin
5.536
11
Northwestern
3.84
11
UCLA
5.355
12
Purdue
3.83
12
Johns Hopkins
5.062
13
Texas-Austin
3.82
13
Illinois-Urbana
5.038
14
Carnegie-Mellon
3.80
14
Syracuse
2.986
15
Pennsylvania
3.71
15
Pennsylvania
2.925
16
UC-Santa Barbara
3.70
16
SUNY-Buffalo
2.733
16
Wisconsin-Madison
3.70
17
Michigan
2.710
18
Georgia Tech
3.60
18
Wisconsin-Madison
1.967
19
UCLA
3.50
19
UC-San Diego
1.649
20
Penn State
3.44
20
Rochester
1.634
Source: National Research Council, Report,
1995; Webster and Skinner, Table 4.
Table 9
Top 20 Institutions in Social and Behavioral Sciences
(of Those That Had at Least Three of the Seven Such Programs Rated)
Ranked by Mean Score of Reputation Rating and Citation Density
Rank
Campus
Mean Score
Rank
Campus
Z-Score
1
Harvard
4.61
1
Stanford
12.135
2
Chicago
4.56
2
Harvard
11.776
3
UC-Berkeley
4.48
3
Chicago
10.412
4
Michigan
4.45
4
Duke
9.885
5
Stanford
4.43
5
Yale
9.621
6
Yale
4.33
6
UCLA
6.850
7
UCLA
4.22
7
UC-San Diego
6.798
7
Princeton
4.22
8
Michigan
5.924
9
Wisconsin-Madison
4.15
9
SUNY-Stony Brook
5.592
10
Columbia
3.97
10
U of Washington
5.047
11
Pennsylvania
3.94
11
Washington U
4.652
12
UC-San Diego
3.78
12
Rochester
4.509
12
Northwestern
3.78
13
Johns Hopkins
3.920
14
Minnesota
3.76
14
Pennsylvania
3.713
15
Cornell
3.67
15
Maryland-College Park
3.384
16
Duke
3.63
16
Boston U
3.366
17
U of Washington
3.57
17
Northwestern
3.284
18
UNC-Chapel Hill
3.55
18
UC-Santa Barbara
3.112
19
Texas-Austin
3.53
19
UC-Irvine
3.007
20
Illinois-Urbana
3.50
20
Ohio State
2.199
Source: National Research Council Report,
1995; Webster and Skinner, Table 2.
Table 10
Top 20 Institutions in Arts and Humanities
(of Those That Had at Least Five of the Eleven Such Programs Rated)
Ranked by Mean Score of Reputation Rating and Awards Density
Rank
Campus
Mean Score
Rank
Campus
Z-Score
1
UC-Berkeley
4.36
1
Harvard
19.536
2
Princeton
4.28
2
Princeton
12.713
3
Harvard
4.20
3
Chicago
11.799
4
Columbia
4.12
4
Stanford
11.672
5
Yale
3.95
5
Johns Hopkins
10.973
6
Cornell
3.93
6
Penn
10.050
7
Penn
3.88
7
UC-Berkeley
8.118
9
Chicago
3.85
8
Northwestern
6.947
9
Duke
3.85
9
Columbia
5.604
9
Stanford
3.85
10
Cornell
5.515
11
UCLA
3.67
11
Brown
5.021
12
Michigan
3.66
12
Duke
4.001
13
UC-Irvine
3.63
13
UC-Davis
1.972
14
Johns Hopkins
3.55
14
Rice
1.987
15
Virginia
3.54
15
Michigan
1.025
16
CUNY Grad School
3.45
16
UNC-Chapel Hill
0.598
17
Brown
3.42
17
UC-San Diego
0.539
18
Texas-Austin
3.40
18
Washington U
0.293
19
UC-San Diego
3.37
19
UC-Irvine
0.206
20
Northwestern
3.23
20
Virginia
0.146
Source: National Research Council, Report,
1995; Webster and Skinner, Table 2.
Table 11
Top 50 Institutions Ranked by Mean Score of Reputation Rating
and Citations or Awards Density of All Programs
Rank
Institution
Mean Score
Rank
Institution
Z-Score
1
MIT
4.60
1
Stanford
71.137
2
UC-Berkeley
4.49
2
Harvard
65.816
3
Harvard
4.40
3
Caltech
39.752
4
Caltech
4.29
4
MIT
37.386
4
Princeton
4.29
5
UC-Berkeley
35.330
6
Stanford
4.21
6
Johns Hopkins
34.765
7
Chicago
4.13
7
Princeton
32.192
8
Yale
4.08
8
UC-San Diego
31.529
9
Cornell
3.95
9
Chicago
28.563
10
UC-San Diego
3.93
10
Yale
27.931
11
Columbia
3.92
11
Pennsylvania
25.798
12
UCLA
3.85
12
U Washington
24.404
12
Michigan
3.85
13
Columbia
24.305
14
Pennsylvania
3.79
14
Northwestern
23.712
15
Wisconsin-Madison
3.70
15
UCLA
21.505
16
Texas-Austin
3.63
16
Duke
20.365
17
U of Washington
3.60
17
Cornell
17.528
18
Northwestern
3.58
18
Michigan
15.813
20
Carnegie Mellon
3.56
19
UC-Santa Barbara
12.865
20
Duke
3.56
20
Brown
8.173
20
Illinois-Urbana
3.56
21
Colorado
8.107
20
Johns Hopkins
3.56
22
Washington U
8.007
23
Minnesota
3.45
23
Rochester
6.451
24
UNC-Chapel Hill
3.44
24
UC-Irvine
5.994
25
Brown
3.40
25
SUNY-Stony Brook
5.960
26
New York U
3.37
26
Minnesota
4.553
27
UC-Irvine
3.35
27
Wisconsin-Madison
3.656
28
Virginia
3.34
28
New York U
3.506
29
Purdue
3.31
29
UNC-Chapel Hill
3.238
30
Arizona
3.25
30
Illinois-Urbana
2.788
31
Rochester
3.24
31
Rice
2.585
32
Emory
3.23
32
Brandeis
2.120
32
Rutgers
3.23
33
Utah
0.922
34
Washington U
3.22
34
Southern California
0.894
35
UC-Davis
3.18
35
Tufts
0.166
35
Penn State
3.18
36
Emory
-0.038
37
Ohio State
3.16
37
Boston U
-0.045
38
Indiana
3.15
38
Georgetown
-0.047
39
SUNY-Stony Brook
3.13
39
Iowa
-0.063
40
Rice
3.11
40
UC-Santa Cruz
-0.431
41
UC-Santa Barbara
3.08
41
Virginia
-0.803
42
Colorado
3.05
42
Delaware
-1.290
42
CUNY Graduate School
3.05
43
Vanderbilt
-1.521
44
Maryland-College Park
3.04
44
Arizona
-1.865
44
Southern California
3.04
45
Carnegie Mellon
-1.899
46
North Carolina State
3.03
46
Case-Western Reserve
-2.488
47
Texas A&M
3.00
47
Texas-Austin
-2.624
48
Vanderbilt
2.99
48
UC-Davis
-2.999
49
UMass-Amherst
2.98
49
UMass-Amherst
-3.270
50
Iowa
2.97
50
Maryland-College Park
-4.143
Source: National Research Council, Report, 1995, Appendix tables P 1-41; Webster and Skinner, Table 1.