The Oxford Research Encyclopedia of International Studies will be available via subscription on April 26. Visit About to learn more, meet the editorial board, or recommend to your librarian.

Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, INTERNATIONAL STUDIES ( (c) Oxford University Press USA, 2017. All Rights Reserved. Personal use only; commercial use is strictly prohibited. Please see applicable Privacy Policy and Legal Notice (for details see Privacy Policy).

date: 19 March 2018

The Interaction of Theory and Data

Summary and Keywords

Theory shapes how data is collected and analyzed in at least three ways. Theoretical concepts inform how we collect data because data attempt to capture and reflect those concepts. Theory provides testable hypotheses that direct our research. Theory also helps us draw conclusions from the results of empirical research. Meanwhile, research using quantitative methods seeks to be rigorous and reproducible. Mathematical models develop the logic of a theory carefully, while statistical methods help us judge whether the evidence matches the expectations of our theories. Quantitative scholars tend to specialize in one approach or the other. The interaction of theory and data for them thus concerns how models and statistical analysis draw on and respond to one another. In the abstract, they work together seamlessly to advance scientific understanding. In practice, however, there are many places and ways this abstract process can stumble. These difficulties are not unique to rigorous methods; they confront any attempt to reconcile causal arguments with reality. Rigorous methods help by making the issues clear and forcing us to confront them. Furthermore, these methods do not ensure arguments or empirical judgments are correct; they only make it easier for us to agree among ourselves when they do.

Keywords: theory, data, empirical research, quantitative methods, statistical methods, interaction of theory and data


In an article on whether life exists on Mars, The Economist posed the mutual reliance of theory and data on one another as follows:

Observations, which to an outsider might sound like simple things, are often remarkably difficult, and depend on complex models to make any sense at all. Thomas Huxley, Darwin's ally in the fight to get evolution accepted, spoke warmly of the facility with which ugly facts can kill beautiful theories. But that fatal ability should not hide the fact that well-applied theories, beautiful and otherwise, can play a crucial role in deciding which observations get treated as facts in the first place. (The Economist, January 1, 2011, 72)

This chapter considers how theory and data depend on one another in research conducted in the tradition of scientific studies of international processes.

Research using quantitative methods seeks to be rigorous and reproducible. Mathematical models develop the logic of a theory carefully, while statistical methods help us judge whether the evidence matches the expectations of our theories. Quantitative scholars tend to specialize in one approach or the other. The interaction of theory and data for them concerns how models and statistical analysis draw on and respond to one another. How do they work together to advance scientific understanding?

In the abstract, they work together seamlessly. Modelers write down and solve a particular model of an international process. Testable hypotheses are derived from the solution. These hypotheses feed into the empirical work which assesses whether the evidence supports them, directing both the collection of the data and its analysis. When the test supports the hypotheses, the credibility of the model increases, and modelers can explore further development and extensions of that model. When the test contradicts the hypotheses, the modeler goes back and rethinks the model. The empirical researcher holds the modelers to account for reality as reflected in the data. Each step involves substantial creativity, more than this rote description suggests.

In practice, there are many places and ways this abstract process can stumble. This chapter explains those hurdles and illustrates them with examples from the literature. The examples draw on well-known and well-thought-of research to show that the issues here are not the result of shoddy work. The object is not to denigrate any research in the field but rather to lay out some of the many difficulties of making progress in the accumulation of scientific knowledge. Awareness of the difficulties of our research helps to cultivate a healthy skepticism about what we know, a skepticism that improves our understanding by leading us to challenge what seems to be commonly known or established.

Nor should the reader conclude that these issues are unique to quantitative methods. They are general issues in the generation and testing of theory, no matter how either part of that process is conducted. The rigorous approach has the advantages of making both the logic of theoretical arguments and the standards of empirical tests more explicit and transparent to those trained in those methods. The issues here are easier to see in formal models and statistical tests. The imprecision of less formal approaches to theory and tests can obscure these issues but not eliminate them. In this sense, the issues here should be of interest even to those who do not use quantitative methods.

How Does Theory Shape Data?

Theory shapes how we collect and analyze data in at least three ways. Theoretical concepts inform how we collect data because data attempt to capture and reflect those concepts. Theory provides testable hypotheses that direct our research. Theory also helps us draw conclusions from the results of empirical research. I take each in turn.

Incommensurability of Data

Thomas Kuhn (1970) famously argued that all observations are laden with theory. Because different theories conceive of the world differently, how we see phenomena changes with the theory we use to understand the world. As Kuhn put it:

The operations and measurements that a scientist undertakes in the laboratory are not “the given” of experience but rather “the collected with difficulty.” … Far more clearly than the immediate experience from which they in part derive, operations and measurements are paradigm-determined.

(Kuhn 1970, 126)

While not accepting the most extreme interpretations of this position – those that lead to postmodernism – and recognizing that the philosophy of science literature qualified Kuhn's view of theory-laden observation, I still begin with examples and implications of the theory-laden nature of observation for data-based work.

Many of our concepts are drawn from common ideas of world politics, posing the problem of how general understandings of what happens in world politics are sharpened into concepts and measures that support scientific research on them. Start with a simple question: what's a “war?” War involves violent conflict, but what conflicts are wars? How much violence is necessary? Some types of violence, such as riots, do not fit into wars, but where is the precise line drawn? Different types of actors, not just legally recognized states, use violence; which ones can fight wars? The answers to these questions lie in the theory that seeks to explain the use of violence in international politics.

J. David Singer, the founder and prime mover of the Correlates of War Project (COW) – the most used and influential data collection project in our field, believed that uniform definitions of variables would aid cumulation in scientific research (Singer 1970, 530–3, 537–9).1 Tests of various hypotheses conducted on a common data base would allow for comparability of results and judgments about which factors were most strongly interrelated. The compilation of a list of wars was the first aim of COW, although other tasks, such as the definition of the state system and its members, had priority because they were needed to create that list.2 War was defined as sustained combat which caused substantial casualties. Combat required that both sides committed organized forces to combat, with each warring party either committing 1,000 soldiers to combat or suffering at least 100 battle deaths. One thousand battle deaths for all sides is the threshold of substantial casualties. This threshold was chosen to ensure that only sustained combat would lead to war; violent incidents that killed fewer would not qualify. Wars were categorized based on the participants. Interstate wars required states as identified by COW as warring parties on both sides, extrasystemic wars (now called extrastate wars) involved a state fighting against political entities not recognized as states, and civil wars (now called intrastate wars or nonstate wars) occurred within states.3 One-sided massacres were excluded by the requirement that both sides field organized forces. Singer recruited Melvin Small, a historian, to the project to ensure that the list of wars generated by these rules corresponded to the general consensus of historians about which conflicts were wars. These definitions and their descendants over time are the most used war data in the field.

Underlying these definitions and the data collection was theory that treated war as the result of system structure, in part a reflection of a trend toward general systems theory as an interdisciplinary approach to social sciences. The initial efforts to explain war, such as Singer et al. (1972), sought to explain how concentration of power in the system correlated with the amount of war. The view of war as a systemic phenomenon influenced the definition of war in the data collected earlier. A high threshold of battle deaths for a violent conflict ensured that small conflicts that fell below that threshold would not swamp the major conflicts that shaped the system.4 The amount of war in the system was measured by magnitude, the nation-months of war, severity, the number of battle deaths from all wars, and intensity, the ratio of battle deaths to either nation-months of war or total population, during a given period, typically five years. If some smaller wars failed to reach the coding thresholds, their omission was unlikely to affect these measures greatly. The systemic focus also allowed for including the extrasystemic wars as part of the amount of war underway.

This orientation towards systemic explanations of war should strike many readers as odd because the field has turned away from it in favor of dyadic explanations. War is thought of as the result of how pairs of countries interact, so we should examine conflict and what produces it in the dyad. On the purely empirical side, Bremer (1992) marks the shift towards examining dyads as the primary unit of analysis. The democratic peace also aided the shift to dyadic explanations as it is commonly thought of as a dyadic phenomenon. Finally, the application of game theory to crisis bargaining (cf. Morrow 1989; Fearon 1995) provided modeling tools for examining the logic of interaction in the dyad closely. Even before these developments in method and model, the event data tradition (cf. McClelland and Hoggard 1969; Azar 1980) sought to collect an extensive record of lesser events beyond wars.

The original COW data can be used for dyadic studies even though it was collected to test systemic theories. The multilateral wars can be broken down into dyadic wars, as was done by Stam (1996). Even before the shift of focus away from the system and toward the dyad, the COW project began collecting the Militarized Interstate Dispute data (Gochman and Maoz 1984; Ghosn et al. 2004) that has become a central focus of dyadic studies of dispute onset and escalation. The shift from a research program focused on system structure to one built around dyadic interaction required adjusting the existing war data and collecting additional data; the theoretical focus called for different data.

This adjustment goes beyond just reinterpreting and recoding existing data. The definition of war in the systemic approach centered on the amount of war generated by the system. The dyadic approach conceives of war as violent conflict chosen by both parties. Unlike the systemic approach, where ignoring small wars is inconsequential, the dyadic approach suggests that we should consider any violent conflict where the central authorities of both sides commit forces to combat as a war. For example, the invasion of Panama by the United States in 1989 led to a few days of combat between US and Panamanian forces. The number of battle deaths did not come close to the threshold of 1,000, yet the invasion fits the dyadic view of a war. These less costly wars are particularly important if one wishes to study uses of force where one side has overwhelming power, making it willing to use force because it anticipates low costs (as in Bueno de Mesquita et al.'s 2003 study of foreign imposed regime change).

Although the dyadic approach may have supplanted the systemic one in both models and methods, it imposes its own blinders in how it thinks of the data and how it should be analyzed. Dyadic approaches force us to consider the effects of outside parties as reduced contributions entered into a dyadic calculation. Other recent approaches (e.g., Maoz et al. 2007; Ward et al. 2007) consider a fuller view of multilateral relationships and how they might affect conflict and cooperation in the dyad. These studies use dyads across the entire system to examine how relations outside a dyad might influence those within it.

This brief exploration of how different approaches to understanding war shape the way we collect data illustrates that the problem of incommensurability can be surmounted. The original COW war data can and are used to test the dyadic approach that it was not designed to address. Bridging the gap to adjust data to fit a different theoretical approach requires awareness of how the data have been collected and what one wants in the data to be used in the test.

Source of Testable Hypotheses

No empirical researcher proceeds entirely without preconceptions of what she might find in the data. Otherwise, how would we even know what data to collect or what patterns to search for? Ideally, hypotheses should be drawn from a model of the process. The model to be tested is solved, producing predicted behavior. The conclusions of the model make predictions about what would happen if the model was true, which we then compare with patterns of evidence we have. Multiple testable hypotheses are superior to a single hypothesis because of the logic of falsification. Successful empirical tests fail to falsify rather than demonstrate truth; given the indirect nature of such tests, multiple failures to falsify are more convincing than one isolated one.

Practically, the process of using models to produce testable hypotheses is more complicated than this simple picture. Models by their very nature are simplified abstractions of a complicated reality. By reducing a common situation to its essence, a model can make clear the logical connections in a casual process and allow us to see how and why the different factors in the model produce the predicted behavior (cf. Powell 1999, Ch. 1 on the use of formal models). Models allow us to consider unobservable variables and analyze how they should affect those we can observe. For example, all rational choice models depend on the actors' preferences, but preferences are inherently unobservable. We assume preferences, commonly allowing for some variation in them, variation we cannot observe. In a crisis bargaining model, for instance, actors prefer winning to a negotiated settlement to losing. The exact attractiveness of a negotiated settlement is assumed and variable as we would like our models to allow for a variety of actors. We can then see how behavior, which we can observe, varies with observable features of a crisis, such as relative power, and unobservable ones, such as preferences. The model helps us see connections between the things we can observe, and so can hold testable hypotheses about, while allowing for those we cannot observe.

At the same time, the simplification in models forces us to exclude important elements of the situation or reduce complex reality to forms which can be analyzed. Consider how crisis bargaining models (e.g., Morrow 1989; Fearon 1994) treat the outcomes of a crisis. Most of these games end with a clear outcome of one side or the other winning the crisis, either through a settlement or fighting. If one side concedes, that concession is clear and unqualified. But many disputes end without a clear winner or negotiated settlement. Some threats lapse with time, ending the dispute without a clear outcome. In the MID data, about two-thirds of disputes end without a clear resolution or settlement.5 There are many reasons why disputes might end without a clear concession by either side. If crises create audience costs (Fearon 1994), leaders might choose to avoid clear resolutions that would trigger their audience costs. Targets of threats might ignore them and hope that the threatening state will not escalate further. The threatening leader might be content with issuing the threat even if it is ignored. All of these possibilities lie outside the common structure of formal models. Models simplify away from the many possible outcomes to produce a structure which can be analyzed completely.

The simplification of models means that they do not provide a full account of all the control variables that one might include in a statistical model. Consider the audience cost model again (Fearon 1994). Empirical tests of it (e.g., Eyerman and Hart 1996; Partell and Palmer 1999) commonly assume that democracies generate higher audience costs and then show a pattern of evidence consistent with that assumption; they do not provide direct tests of audience costs or how they vary with regime type or actions taken in a crisis.6 These tests extract the audience cost hypothesis from the model without referencing what the model predicts. Generically in equilibrium, a crisis in the Fearon model begins with a range of the side with the lower audience costs dropping out, meaning that no crisis is observed. During the crisis, some types of both players end the crisis by surrendering the stakes to the other. These types have the lowest values for war; those with higher values for war do not yield ever. The side with the lower audience costs quits at a higher rate than the side with higher audience costs.7 Eventually, the horizon is reached, where the only types of both players remaining have higher values for fighting than backing down, and so no type left will back down, and war is inevitable. The prediction is probabilistic; the side with the higher audience cost may back down before the side with lower costs. Each side's value for war matters as the types with higher values for war do not back down. But these values for war are private information and may not include any measurable indicator commonly used to assess which side is likely to win in statistical studies of crisis escalation, such as relative capabilities. The model provides little guidance on what a fully specified empirical test would look like. Further, the model assumes away critical parts of the process. Audience costs automatically accumulate with the passage of time during the crisis; no state leader ever has to “draw a line in the sand” to create audience costs. These observations do not imply that such models are untestable, only that testing them is a difficult process which requires clever work of its own.

Data, like models, simplify a complex reality to make analysis possible. Coding rules reduce an individual case to the categories of data. The issue is not whether we simplify; it is that we understand how we simplify reality in data and models and how those simplifications reflect what we can say about international processes. Both models and data analysis rely on their explicit rules to aid our understanding of how they simplify reality, and so what patterns in the data are consistent and which inconsistent with formal models.

Does Theory Respond to Data?

Theory developed through formal models is often elegant, at other times obscure and even obtuse, but does it explain what we see in the world, and when it does not, do models change to respond to those failures? The answer here, as in the section on how theory shapes data, is “difficultly at best.” Why is it hard for ugly facts to kill beautiful theories?

When data speaks with a clear voice, it can provide the impetus for models. Once the democratic peace was established as a robust empirical pattern, modelers turned to formalizing arguments that could explain it. Some of these models (e.g., Schultz 2001; Bueno de Mesquita et al. 2003) elaborated earlier, informal arguments in novel directions, producing new testable hypotheses. Signal events redirect the attention of the field, modelers included, to new problem, such as how academic interest and attention to terrorism grew after the 9/11 attacks.

Data rarely speaks with one clear voice unfortunately. Most important questions produce many studies, not a single, decisive demonstration of a pattern. Those studies rarely produce the same pattern uniformly. Sorting through the contradictions and variations in results across these divergent studies poses a thorny problem, particularly because most participants have prior beliefs about what patterns should be found in the data. Consider the question of whether disputes and wars raise the risk of removal for the leaders of the countries involved. The first studies (Bueno de Mesquita et al. 1992; Bueno de Mesquita and Siverson 1995) found that defeat raised the risk of removal for the leader of the losing country by means fair or foul. Subsequent studies differed in how regime type affected these chances. Bueno de Mesquita et al. (2003) found that the consequences of disputes and wars were substantially larger for democratic leaders than for autocrats. Others (Chiozza and Goemans 2004; Debs and Goemans 2010) found the opposite; the results of conflict had larger effects on the tenure of autocrats than democratic leaders. There are many differences between these two camps in the specification of the model of leader removal which could explain their divergent results. A deeper problem, however, faces any study that seeks to gauge the effect of international conflict on leader tenure. The effect is the difference between the risk of removal after conflict compared to that in the absence of conflict, but leaders choose when to engage in conflict. Conflicts are not randomly assigned experimental treatments. In particular, democracies win an overwhelming share of the wars they fight, making it difficult to judge the effects of a lost war on the tenure of democratic leaders. Goemans and his co-authors give priority to the comparison with the removal process in the absence of conflict, while Bueno de Mesquita et al. focus on how the selection of crises might influence how we judge the consequences of conflict for leader tenure by examining just the period after a conflict. An ideal statistical analysis would address both issues by controlling for the selection into international conflict in the comparison with the general process of leader tenure. Both approaches do agree that winning is good for leaders and losing bad; they just disagree on the size of those consequences across political systems. At the time of writing, this question has not been resolved, and so there are no ugly facts.

Data can also fail to speak with one voice because of differences in measures of concepts or structure of tests. A classic empirical question is the effect of the dyadic balance of capabilities on the likelihood of conflict. Are states more likely to fight when they are roughly equal or when one is clearly superior? Early tests (e.g., Garnham 1976; Weede 1976) found that large differences in relative power pacified dyads; others (e.g., Siverson and Tennefoss 1984) found balance more conducive to peace. More recent work (Reed 2003) finds rough equality is more dangerous but only weakly. But the measurement of the dyadic balance is not straightforward. Should the possible contributions of third parties be considered, and if so, how do we discount for the chance they may remain neutral? The precise calculation of the measures of relative capabilities also matters; two common ways of calculating it – dividing the capabilities of one by those of the other versus the fraction of their total capabilities held by the stronger side – produce very different measures. When one side is very strong, the former measure grows without bound, while the latter goes to 1. How does distance affect the calculation of relative power given that power declines over the distance it has to be extended? What conflicts should be counted? These difficulties obscure any clear judgment about the effect of the dyadic balance on war and peace. This lack of clarity also faces the theoretical expectation that there is no relationship between the dyadic balance and the lack of conflict (Powell 1999); what matters is the difference between the dyadic balance and the status quo. It is more difficult to prove the absence of a pattern than the presence of one.

Models develop arguments through chains of them where each subsequent one varies and generalizes some aspect of an earlier model. The line of argument being developed is not dropped because further development could address an anomaly. Continuing the example of how the dyadic balance of capabilities affects the likelihood of conflict, the argument for the lack of a relationship hinges on the degree of uncertainty about that balance. A side can make credible threats to use force when it is possible that the side sees conflict as preferable to the status quo; that is, when its perception of its chance of winning minus the costs of war is better for it than living with the status quo. Powell (1999) calls such a side potentially dissatisfied, and only one side can be potentially dissatisfied in his model. The status quo is stable when neither side is potentially dissatisfied. The spread of perceptions of the balance matches the uncertainty of the situation. Reed (2003) argues that the degree of uncertainty about the balance increases when the sides are roughly equal in capabilities compared to situations where one side is dominant. This increase makes it more likely that one side is potentially dissatisfied, and so conflict more likely under rough equality than preponderance. The argument concerning how the dyadic balance contributes to conflict has been developed to explain a weak tendency of balance to produce conflict. These chains of models elaborating an argument are both a primary tool of progress and a way to frustrate attempts to end an argument by showing it does not explain the data. Because arguments can and must develop to account for anomalies, it is difficult to produce an empirical result that cannot be explained and so forces us to look for new arguments instead of extending the old ones. As some in software development say, “It's not a bug; it's a feature.”

Unmeasurable concepts complicate the problem of falsifying arguments further. Some of the concepts common to our theory are unobservable by their nature. An actor's preferences are one such concept, as are types of a player in game theory. Many models use these unobservable concepts as critical building blocks of the argument and model. One strength of formal models is their ability to allow us to examine how unobservables could influence behavior in rigorous ways. Although we may not be able to verify an actor's preferences completely through observation, we can show how its behavior varies with them. This unobservable variation could undermine the ability to falsify argument by providing a way to explain discrepant evidence. It is also difficult to infer the effects of unobservables from data because we cannot directly investigate how they are correlated with the things we can observe.

Making theory respond to data is, as we have seen, a messy business. Data rarely speak with one clear voice, models can and should be altered to account for anomalies, and while models allow us to examine how unobservables could affect behavior, they are still, well, unobservable. These difficulties, however, do not absolve us of the obligation to try to compare the conclusions of our models with the real world, no matter how imperfectly we observe it.


One might draw the conclusion from this chapter that the business of getting the two tribes of the scientific study of international processes, the formal modelers and the statistical mavens, to confront one another to improve both their research is a hopeless exercise, and that all these tools do not bridge this gap. These difficulties are not unique to rigorous methods; they confront any attempt to reconcile causal arguments with reality. Rigorous methods help by making the issues clear and forcing us to confront them. Informal arguments and casual use of cases face the same difficulties but can obscure their presence.8 Rigorous methods do not ensure arguments or empirical judgments are correct; they only make it easier for us to agree among ourselves when they do.

The movement for the empirical investigation of theoretical models, known commonly as EITM, seeks a way to bring models and empirics into closer correspondence. Two early examples approach this fusion by focusing either on a model first or the structure of the data collection. Signorino (1999) derives a statistical distribution of outcomes from the structure of a simple extensive form game, and so has the model drive the statistical analysis. With this technique, we might be concerned that our results would depend on the specific model assumed, facing the problem that we rarely think that any one model is the true model of the process. Signorino and Yilmaz (2003) show that strategic interaction creates misspecification if we use statistical models that do not correct for it. Smith (1999) derives what we can conclude about a case from the structure of the data. For example, disputes where only one side threatens to use force allows us to conclude that the side that did not must have low value for the use of force, while it is difficult for us to know how willing the first side was to use force. These restrictions also allow us to put restrictions on the likelihood model to be estimated. Both these examples bring model and method closer together by considering explicitly how the two should fit together, a goal we can all agree with.


Azar, E.E. 1980. The Conflict and Peace Data Bank (COPDAB) Project. Journal of Conflict Resolution (24) (1), 143–52.Find this resource:

Bremer, S.A. 1992. Dangerous Dyads: Conditions Affecting the Likelihood of Interstate War, 1816–1965. Journal of Conflict Resolution (36) (2), 309–41.Find this resource:

Bueno de Mesquita, B., and Siverson, R.M. 1995. War and the Survival of Political Leaders: A Comparative Study of Regime Types and Political Accountability. American Political Science Review (89) (4), 841–55.Find this resource:

Bueno de Mesquita, B., Siverson, R.M., and Woller, G. 1992. War and the Fate of Regimes: A Comparative Analysis. American Political Science Review (86) (3), 638–46.Find this resource:

Bueno de Mesquita, B., Smith, A., Siverson, R.M., and Morrow, J.D. 2003. The Logic of Political Survival. Cambridge, MA: MIT Press.Find this resource:

Chiozza, G., and Goemans, H.E. 2004. International Conflict and the Tenure of Leaders: Is War Still “Ex Post” Inefficient? American Journal of Political Science (48) (3), 604–19.Find this resource:

Debs, A., and Goemans, H.E. 2010. Regime Type, the Fate of Leaders, and War. American Political Science Review (104) (3), 430–45.Find this resource:

Eyerman, J., and Hart, R.A. 1996. An Empirical Test of the Audience Cost Proposition. Journal of Conflict Resolution (40) (4), 597–616.Find this resource:

Fearon, J.D. 1994. Domestic Political Audiences and the Escalation of International Disputes. American Political Science Review (88) (3), 577–92.Find this resource:

Fearon, J.D. 1995. Rationalist Explanations for War. International Organization (49) (3), 379–414.Find this resource:

Garnham, D. 1976. Power Parity and Lethal International Violence, 1969–1973. Journal of Conflict Resolution (20) (3), 379–94.Find this resource:

Ghosn, F., Palmer, G., and Bremer, S. 2004. The MID3 Data Set, 1993–2001: Procedures, Coding Rules, and Description. Conflict Management and Peace Science (21), 133–54.Find this resource:

Gochman, C.S., and Maoz, Z. 1984. Militarized Interstate Disputes, 1816–1976: Procedures, Patterns, and Insights. Journal of Conflict Resolution (28) (4), 585–616.Find this resource:

Kuhn, T.S. 1970. The Structure of Scientific Revolutions, 2nd edn. Chicago: University of Chicago Press.Find this resource:

Maoz, Z., Terris, L.G., Kuperman, R.D., and Talmud, I. 2007. What Is the Enemy of My Enemy? Causes and Consequences of Imbalanced International Relations, 1816–2001. Journal of Politics (69) (1), 100–15.Find this resource:

McClelland, C.A., and Hoggard, G.D. 1969. Conflict Patterns in the Interactions among Nations. In J.N. Rosenau (ed.) International Politics and Foreign Policy. New York: The Free Press, pp. 711–24.Find this resource:

Morrow, J.D. 1989. Capabilities, Uncertainty, and Resolve: A Limited Information Model of Crisis Bargaining. American Journal of Political Science (35) (4), 941–72.Find this resource:

Organski, A.F.K. 1968. World Politics, 2nd edn. New York: Knopf.Find this resource:

Partell, P.J., and Palmer, G. 1999. Audience Costs and Interstate Crises: An Empirical Assessment of Fearon's Model of Dispute Outcomes. International Studies Quarterly (43) (2), 389–405.Find this resource:

Powell, R. 1999. In the Shadow of Power. Princeton, NJ: Princeton University Press.Find this resource:

Reed, W. 2003. Information, Power, and War. American Political Science Review (97) (4), 633–41.Find this resource:

Sarkees, M.R. n.d. The COW Typology of War: Defining and Categorizing Wars (Version 4 of the Data). At, accessed September 2011.Find this resource:

Sarkees, M.R. and Wayman, F. 2010. Resort to War: 1816–2007. Washington, DC: CQ Press.Find this resource:

Schultz, K.A. 2001. Democracy and Coercive Diplomacy. New York: Cambridge University Press.Find this resource:

Signorino, C.S. 1999. Strategic Interaction and the Statistical Analysis of International Conflict. American Political Science Review (93) (2), 279–97.Find this resource:

Signorino, C.S., and Yilmaz, K. 2003. Strategic Misspecification in Regression Models. American Journal of Political Science (47) (3), 551–66.Find this resource:

Singer, J.D. 1970. From a Study of War to Peace Research: Some Criteria and Strategies. Journal of Conflict Resolution (14) (4), 527–42.Find this resource:

Singer, J.D. 1975. Cumulativeness in the Social Sciences: Some Counter-Prescriptions. PS: Political Science and Politics (8) (1), 19–21.Find this resource:

Singer, J.D., and Small, M. 1972. The Wages of War: A Statistical Handbook. New York: Wiley.Find this resource:

Singer, J.D., Bremer, S., and Stuckey, J. 1972. Capability Distribution, Uncertainty, and Major Power War, 1820–1965. In B. Russett (ed.) Peace, War, and Numbers. Beverly Hills, CA: Sage, pp. 19–48.Find this resource:

Siverson, R.M., and Tennefoss, M.R. 1984. Power, Alliance, and the Escalation of International Conflict, 1815–1965. American Political Science Review (78) (4), 1057–69.Find this resource:

Smith, A. 1999. Testing Theories of Strategic Choice: The Example of Crisis Escalation. American Journal of Political Science (43) (4), 1254–83.Find this resource:

Snyder, J., and Borghard, E.D. 2011. The Cost of Empty Threats: A Penny, Not a Pound. American Political Science Review (105) (3), 437–56.Find this resource:

Stam, A.C. 1996. Win, Lose, or Draw. Ann Arbor: University of Michigan Press.Find this resource:

Vasquez, J.A. 1997. The Realist Paradigm and Degenerative versus Progressive Research Programs: An Appraisal of Neotraditional Research on Waltz's Balancing Proposition. American Political Science Review (91) (4), 899–912.Find this resource:

Waltz, K.N. 1979. Theory of International Politics. New York: Random House.Find this resource:

Ward, M.D., Siverson, R.M., and Cao, X. 2007. Disputes, Democracies, and Dependencies: A Reexamination of the Kantian Peace. American Journal of Political Science (51) (3), 583–601.Find this resource:

Weede, E. 1976. Overwhelming Preponderance as a Pacifying Condition among Contiguous Asian Dyads, 1950–1969. Journal of Conflict Resolution (20) (3), 395–411.Find this resource:


(1.) See Singer (1975) for a sharp-tongued and often humorous response to those who criticized his view of the value of rigorous data collection.

(2.) Singer and Small (1972) was the first full presentation of the war data along with complete description of the definition of war and statistical patterns in the data. Sarkees and Wayman (2010) is the most recent complete treatment of the war data.

(3.) See Sarkees (n.d.) for a careful summary of how COW categories of wars have changed over time.

(4.) I note that neorealism (Waltz 1979) and power transition theory (Organski 1968), two other systemic theories of that time, also gave precedence to major, systemic conflicts over smaller wars among minor powers.

(5.) From the MID 3.0 data, 1,596 of 2,332 disputes end in either a stalemate or unclear outcome and 1,734 of those 2,332 end with no settlement.

(6.) See Snyder and Borghard (2011) for a recent critique of audience cost arguments from cases.

(7.) The quit rate of a player has to balance the added audience cost of its opponent in order for the opponent to be indifferent between continuing and quitting at any instant during the crisis.

(8.) For example, see Vasquez (1997) for a criticism of neorealism and the many responses to his critique.