Confirmation, Heuristics, and Explanatory Reasoning

 


Thus natural science appears completely to lose from sight the large and general questions; but all the more splendid is the success when, groping in the thicket of special questions, we suddenly find a small opening that allows a hitherto undreamt of outlook on the whole.

 

-- Ludwig Boltzmann, Theoretical Physics and Philosophical Problems


 

ABSTRACT

 

Recent work on inference to the best explanation has come to an impasse regarding the proper way to coordinate the theoretical virtues in explanatory inference with probabilistic confirmation theory, and in particular with aspects of Bayes's Theorem. I argue that the theoretical virtues are best conceived heuristically and that such a conception gives us the resources to explicate the virtues in terms of ceteris paribus theorems. Contrary to some Bayesians, this is not equivalent to identifying the virtues with likelihoods or priors per se; the virtues may be more accessible epistemically than likelihoods or priors. I then prove a ceteris paribus theorem regarding theoretical consilience, use it to correct a recent application of Reichenbach's common cause principle, and apply it to a test case of scientific reasoning.

 

1 Explanation and confirmation

2 The heuristic conception of theoretical virtues

3 Abduction and the accessibility of explanatory power

4 Evidential and theoretical consilience

5 A test case: gravitational lensing

6 Conclusion

 

1 Explanation and Confirmation

Explanatory reasoning, construed with sufficient latitude, seems to be a pervasive feature of both scientific practice and mundane life. Doctors infer that their patients have pneumonia from an examination of their symptoms and mothers infer that their children have been stamping in mud puddles from an examination of their shoes: what both inferences seem to have in common is that the conclusion explains the observed facts and does so better than any other available explanation. Between the ubiquity of explanatory arguments and the difficulties besetting other accounts of uncertain reasoning, some enthusiastic philosophers have been moved to claim that the only primitive rational form of non‑deductive inference is inference to the best explanation (Harman [1973]; Foster [1985], p. 227).

But when explanationists try to spell out inference to the best explanation (IBE) in a more precise form and show how it provides a distinctive form of non-deductive inference, they find themselves hard pressed. The standard model cites various 'theoretical virtues' such as simplicity, consilience and precision that make some hypotheses more 'lovely' than others; ostensibly, the presence of such virtues enables us to select among rival hypotheses (Thagard [1978]; Lipton [1991]). But attempts to bring such theoretical virtues into clear focus without explicitly invoking the probability calculus have not advanced the discussion beyond a clash of intuitions regarding their epistemic relevance, leaving critics with the impression that 'inference to the best explanation' remains a slogan rather than an accurate characterization of any form of non-demonstrative inference (Salmon [2001a], p. 60).

Recent work on explanatory inference has increasingly focused on coordinating the theoretical virtues with the prior probabilities and likelihoods required for Bayesian inference. The leading idea here is that explanatory considerations may in some fashion be a guide to high posterior probability -- that 'loveliness' can be a guide to 'likeliness', and that the Bayesian and the explanationist should be friends (Lipton [1991], pp. 61-4, [2001], p. 94). This strategy is a promising way to address the relevance problem. But in many cases it is difficult to determine in a principled way whether a given virtue makes its contribution to the prior probability of a hypothesis or to its likelihood; there is no straightforward sense in which those categories match the virtues. And the strategy risks trivializing IBE by reducing it to unvarnished probabilistic confirmation theory. As Salmon notes in a piece of friendly criticism, 'it seems that the Bayesian has more to offer to the explanationist than vice-versa'. (Salmon [2001b], p. 130).

I shall argue here that we can make progress by reconstructing explanatory reasoning in heuristic rather than inferential terms and by a more nuanced probabilistic representation of theoretical virtues than previous writers have offered. The resulting picture of explanatory reasoning is less sweeping than some enthusiastic partisans of IBE have in mind. Many of the virtues are local, in the sense that they apply to the relation between hypotheses and particular pieces of evidence in situ rather than to those hypotheses considered in vacuo; and although they are typically found in explanatory contexts, they may also apply in cases of reasoning where we are disinclined to say that we are explaining the data. But by taking a more modest approach that deploys explanatory reasoning 'in the thicket of special questions', we can meet the challenge of relevance, demonstrate how explanatory virtues may provide genuine guidance, and account for the utility of explanatory reasoning even in contexts where there is insufficient information to calculate a posterior probability. In the course of this reconstruction we shall also discover that the explanationists can learn something about theoretical virtues from the Bayesians, and the Bayesians something about the use of probabilities from the explanationists.

2 The Heuristic Conception of Theoretical Virtues

When it is phrased incautiously, the prescription that we should always infer to the best explanation is open to an obvious objection: frequently the evidence is not strong enough to warrant an inference to any of the available explanations. If IBE is understood as a rule that enjoins belief in the best available explanation regardless of the paucity of evidence in that explanation's favor, then common sense will often prevent us from applying it (van Fraassen [1989], pp. 149-50). Departures from common sense here can give rise to the egregious fallacy of inferring the truth of an explanation merely because it is the only one on offer (Salmon [2001a], pp. 83-4).

Mindful of this difficulty, Peter Lipton concedes that IBE only sanctions inference when the 'best' is good enough; sometimes, the correct response to the data is agnosticism (Lipton [2001], p. 104). As a response to the paucity objection this seems wholly reasonable. But if the value of the theoretical virtues is bound up with their capacity to license inferences, then Lipton's concession has unexpected dimensions. It is tantamount to saying that when the data are sparse, the virtues are worthless: they can do no epistemic work until there are enough of them in place to justify a definite conclusion. And this in turn undermines the explanationist programme of showing how the explanatory virtues may be a guide in the process of inquiry rather than merely a means of describing its results.

A parallel objection can be raised against a certain conception of Bayesian inference. It is a commonplace, for example, that a high likelihood P(E/H) alone does not warrant the ascription of a high posterior probability P(H/E). But no one complains that this makes high likelihood useless. Instead, Bayesians make a tacit distinction between probabilistic inference -- usually modeled in terms of conditionalization -- and probabilistic reasoning, which may take place in the absence of crucial parameters necessary for conditionalization.[1] Acknowledging that a high likelihood is insufficient grounds for embracing H, they nevertheless persist in taking high likelihood seriously, and with good reason. For ceteris paribus, likelihood and posterior probability go hand in hand: the higher the one, the higher the other. Even in the absence of precise priors and expectedness, high likelihood is one of the features by which we separate promising hypotheses from those not worth pursuing.

This suggests a way for the explanationist to retrench. Theoretical virtues may help us to sift through the logical space of hypotheses even when they do not pick out a clear favorite. At this level explanatory considerations will have a heuristic role, and the charge that they do not always suffice to warrant an inference will be no more damaging to explanationists than the parallel charge regarding high likelihood is to the Bayesians.[2]

If this model of IBE is to offer any guidance in actual inquiry, loveliness must not simply be identified with a high posterior probability; for that would trivialize the project.[3] But it should also move us beyond the vague suggestion that considerations of simplicity, consilience and the like 'contribute' to priors and likelihoods.[4] So long as the connection between loveliness and likeliness is left inexplicit we shall lack both a rationale for claims about those contributions and a way of assessing the tradeoffs involved when, as not infrequently happens, different virtues point in different directions.

To meet the guiding challenge, explanationists must offer a rational reconstruction of the virtues that meets two conditions: they must show that in some sense the virtues are more accessible, logically or phenomenologically, than priors and likelihoods (Lipton [2001], p. 111), and they must demonstrate that the virtues thus reconstructed nevertheless have positive bearing on the posterior probability of a hypothesis. Our discussion of the distinction between inference and reasoning suggests that this latter end can be accomplished by ceteris paribus theorems, proofs that the features in question are relevant to high posterior probability in a way that parallels the Bayesian rationale for taking likelihood itself seriously. Whether the virtues thus reconstructed can fulfill the promise of greater accessibility can only be determined on a case by case basis.

The task is not made easier by the fact that the wide category of explanatory considerations is home to several quite disparate sorts of virtues. Inquiry, as Peirce seems to have seen clearly (Peirce [1935], sections 5.598-600), has both evidential and economic aspects, where the later may involve considerations such as the cost of experimentation and the expected utility of information. To pretend that such economic considerations do not exist would leave us with a flattened image of the process of inquiry. But to conflate the epistemic and the economic issues would render the distinctively epistemological problem of reconstructing explanatory inference intractable. We must therefore disentangle the evidential from economic aspects of explanatory reasoning, and we must also be prepared to distinguish diverse evidential virtues that have been conflated in our existing terminology.[5] If we can do this successfully, the result will be a conception of the theoretical virtues that exhibits a closer likeness to our pre-theoretic categories than can be effected by a mere correlation with priors and likelihoods and a corresponding set of theorems that display the sense in which those virtues thus reconstructed are accessible and demonstrate their objective epistemic merits.

3 Abduction and the Accessibility of Explanatory Power

In a classic formulation, Peirce suggested that much of our ampliative reasoning is initiated by a mental act of 'abduction' that conforms to the following schema (Peirce [1935], section 5.189):

The surprising fact E is observed;

But if H were true, E would be a matter of course,

Hence, there is reason to suspect that H is true.[6]

Here E is the unexpected bit of evidence that initiates the train of reasoning, and H can very naturally be construed as an explanatory hypothesis that would, if true, remove our surprise at E. In numerous places Peirce calls this an 'inferential step', but the conclusion ‑‑ that there is reason to suspect that H is true ‑‑ is a good deal weaker than the outright conclusion that H. As he says elsewhere, abduction merely suggests that something may be true (Peirce [1935], section 5.171).

The suggestion that we look favorably on hypotheses that remove our surprise seems borne out in many examples of explanatory inference. Sometimes we are in genuine doubt regarding the explanation of some puzzling phenomenon. At other times the explanation may suggest itself to us at virtually the same instant that we become aware of the phenomenon it explains. In either case, it seems to be the gap between our surprise and the 'matter of course' -- the explanatory power, as I shall call it, of that supposition with respect to E -- that commends H to us.

A probabilistic analysis focuses this intuition. In Bayesian terms, we may represent the 'surprisingness of E' in Peirce's schema as the multiplicative inverse of its expectedness; the more surprising E is, the lower P(E).[7] If E is in this sense very surprising, then

P(E) ≈ 0.

The second line of Peirce's schema indicates that on the assumption of H the probability of E is very high, which comes out as a high likelihood:

P(H/E) ≈ 1.

These two conditions do not, on a Bayesian analysis, suffice to guarantee a high posterior probability for H, since a sufficiently low prior P(H) might swamp the ratio of likelihood over expectedness. So Peirce's assessment that abduction merely suggests that H may be true is mirrored in this reconstruction. In fact, Peirce's own formulation is stronger than it needs to be. The essential point is simply that

P(E/H) > P(E),

since this is sufficient to guarantee that taking E into consideration increases the probability of H. Thus in reconstructing Peirce's notion of abduction we arrive at the most widespread current definition of the term 'evidence', for it is under just this condition that E is taken, on a Bayesian account, to be evidence for H.

Explanatory power is a theoretical virtue that cannot be pigeonholed as 'contributing' to high likelihood or a high prior on the Bayesian account. When represented as a ratio,[8]

 

P(E/H)

P(E)

it may be arbitrarily high regardless of how low the likelihood in its numerator, provided that neither is zero, and in some cases of explanatory reasoning both do seem to be low (Collins [1966], pp. 135-6; Rosenkrantz [1977], pp. 166-70). Nevertheless it is the very paradigm of a virtue in the sense we have in view, for it follows immediately from this ratio and Bayes's Theorem that of two hypotheses with equal priors, the one with greater explanatory power will have the greater posterior probability.

All of this may look highly suspicious to the Bayesian eye, not through any fault in the mathematics but because it smacks of outright conceptual theft. Though Bayesians have available the resources to accommodate this sort of reasoning, it is a curious fact that they are often focused so completely on priors and likelihoods that they overlook the epistemic significance of the relevance quotient in its own right. Elliott Sober ([2002]) illustrates this approach particularly clearly in his discussion of the relevance of simplicity or parsimony to the plausibility of a hypothesis -- in particular, the idea that the simpler a hypothesis is, the greater its plausibility. Sober lays out an elementary consequence of Bayes's Theorem,

P(H1/E) > P(H2/E) if and only if P(E/H1) P(H1) > P(E/H2) P(H2),

and then glosses the Bayesian assessment of the theoretical virtue of simplicity exclusively in terms of priors and likelihoods:

 

If 'more plausible' is interpreted to mean higher posterior probability, then there are just two ingredients that Bayesianism gets to use in explaining what makes one hypothesis more plausible than another. This means that if simplicity does influence plausibility, it must do so via the prior probabilities or via the likelihoods. If the relevance of simplicity cannot be accommodated in one of these two ways, then either simplicity is epistemically irrelevant or (strong) Bayesianism is mistaken.[9]

 

And again: 'Bayesianism has just two resources for explaining the epistemic relevance of simplicity priors and likelihoods'. (Sober [2002], p. 10) This approach is thoroughly typical.[10]

Sober's analysis seems plausible because of the particular way that he has displayed the probabilities, but in fact there are multiple mathematically acceptable ways to compare probabilities using Bayes's Theorem. We could as easily say, for example,

P(H1/E) > P(H2/E) if and only if [P(E/H1)/P(E)] / [P(E/H2)/P(E)] > P(H2)/P(H1)

This, too, is a direct consequence of Bayes's Theorem; but now the bracketed terms are the respective relevance quotients -- measures of the explanatory power of the two hypotheses -- rather than likelihoods.

This point would not be significant if we always needed to know the likelihood of an hypothesis with respect to the evidence in question, at least to an order of magnitude approximation, in order to know whether it has any explanatory power with respect to that evidence. And at first blush it seems we do. For if we measure explanatory power of H with respect to E by the relevance quotient, then we are contemplating P(E/H) / P(E); and the most obvious way to determine the value of this ratio is to obtain separate values for P(E/H) and for P(E) and then divide the former by the latter.

The most obvious -- but not the only way. In fact, there are numerous situations in which we have evidence that pertains to the relative values of P(E/H) and P(E) rather than to their absolute values. An example makes this plain. At a carnival poker booth I espy a genial looking fellow willing to play all comers at small stakes. The first hand he deals gives him four aces and a king, the second a royal flush, and indeed he never seems to come up with less than a full house any time the cards are in his hands. Half an hour older and forty dollars wiser, I strongly suspect that I have encountered a card sharp. I have made no attempt to compute the odds against his obtaining those particular hands on chance; I may not even know how to do the relevant calculation. Nor do I have any clear sense of the probability of his getting just those hands given that he is a sharp. For neither P(E/H) nor P(E) am I in a position to estimate a value within, say, three orders of magnitude; the best I can say in non-comparative terms is that each of them is rather low. But I know past reasonable doubt that the explanatory power of my hypothesis is very great.

My evidence in this case is testimonial; I have been told that card sharps are good enough at manipulating a deck to make their own hands come out favorable at a much higher rate than normal. And as is typical in cases where we rely on testimony, I have not done the kind of research that would give me the individual odds directly; someone else may know them, but not I. Thus, although the explanatory power of H with respect to E may be analytically equivalent to the ratio of likelihood and expectedness, it does not follow that either of those two numbers must be epistemically accessible in non-comparative terms in order for the explanatory power itself to be epistemically accessible. It is not hard to generalize this to cases where I may be quite sure more or less exactly what the explanatory power of H is with respect to E though I have no further information bearing on either, as when I am told that a certain baseball player has a batting average that is twice the average in his league.

4 Evidential and Theoretical Consilience

In The Origin of Species, Darwin draws special attention to the fact that his theory explains 'several classes of facts' ranging from homology to the 'atrophied' organs of animals (Darwin [1872], p. 436). The pattern of reasoning is widespread: a hypothesis gains in credibility to the extent that the several pieces of evidence in its favor are unrelated. Following this has led some advocates of IBE to cite consilience -- the capacity to explain diverse independent classes of facts -- as a paradigmatic theoretical virtue.

This intuition can be cast in probabilistic terms. Assume for the sake of argument that H gives equivalent likelihood to the conjunction of evidence E1 through En, on the one hand, and data D1 through Dn, on the other, e.g.

P(E1 & ... & En/H) = P(D1 & ... & Dn/H),

and that the expectedness of the individual pieces of E and D are equivalent, so that for all k,

P(Ek) = P(Dk).

Under these conditions, a set that exhibits independence will confer greater probability on H than one where the members of the set are positively relevant to each other. For if

P(E1 & ... & En) = P(E1) x ... x P(En)

but

P(D1 & ... & Dn) > P(D1) x ... x P(Dn),

then the explanatory power of H is greater with respect to E than to D:

 

P(E1 & ... & En/H) > P(D1 & ... & Dn/H)

P(E1 & ... & En) P(D1 & ... & Dn).

The reasoning is easily generalized to a result regarding the extent of the independence exhibited by the data.

Although this reconstruction of the consilience exhibited in Darwin's theory shows it to be confirmatory ceteris paribus, it has the unexpected consequence that consilience of this sort turns out to be a function of the evidence rather than of the hypothesis. But the analysis of such 'evidential consilience', as we may call it, suggests that there may be a cluster of virtues surrounding the concepts of dependence and independence.[11] If we change our focus from the independence of the evidence apart from the theory to its dependence in light of the theory, we discover a lovely theorem: The degree of confirmation a hypothesis receives from the conjunction of independent pieces of evidence is a monotonic function of the extent to which those pieces of evidence can be seen to be positively relevant to each other in the light of that hypothesis.

The proof is straightforward.[12] Assume for the sake of the argument that P(H1) = P(H2), that for all n, P(En/H1) = P(En/H2), and that the various En are positively relevant to each other conditional on H1 but independent of each other conditional on H2, i.e.,

P(E1 & ... & En/H1) > P(E1/H1) x ... x P(En/H1)

but

P(E1 & ... & En/H2) = P(E1/H2) x ... x P(En/H2).

Then by Bayes's Theorem and a bit of trivial algebra,

P(H1/E1 & ... & En) > P(H2/E1 & ... & En)

Thus H1 emerges as clearly superior to H2, in straightforward confirmational terms, despite the fact that on a case by case basis it has no predictive advantage over H2. The result is easily generalized to yield the theorem in question. This provides a convincing demonstration of the confirmational relevance of what we will call 'theoretical consilience' -- the consilience that obtains when an hypothesis or theory reduces independence among the data.

Note that theoretical consilience is both local and comparative. The superiority of H1 to H2 with respect to E1, ... En does not entail a comparable superiority with respect to some other data D1, ... Dm. So one lesson we learn directly from the theorem is that theoretical consilience is a three-termed relation: H1 is more consilient than H2 with respect to a set of data E to the extent that

P(E1 &...& En/H1) > P(E1 &...& En/H2)

P(E1/H1) x ... x P(En/H1) P(E1/H2) x ... x P(En/H2).

This theorem sheds light on Reichenbach's discussion of a conjunctive fork (Reichenbach [1971]). Reichenbach conceives of a situation where a probabilistic dependence

P(E1 & E2) > P(E1) P(E2)

serves as a sign of a common cause for both E1 and E2, and he offers a formal proof that under certain conditions, among them the Reichenbach Condition

RC: P(E1 & E2/C) = P(E1/C) P(E2/C),

the conjunction raises the probability of C even though, in accordance with RC, the dependence has vanished modulo the assumption of C.[13] Strictly speaking Reichenbach's proof is correct. But consideration of our theorem shows that RC is not only not required for making a hypothesis probable, it is positively detrimental. A simple substitution of C for H2 in the proof above with n=2 demonstrates that a hypothesis that possesses theoretical consilience with respect to the data in question will ceteris paribus emerge from Bayesian conditionalization with a higher posterior probability than a hypothesis that obeys RC.

5 A Test Case: Gravitational Lensing

The example of gravitational lensing offers a good testing ground for this analysis. In 1979 two quasar images only five arcseconds apart, QSO 0957+561, were found to have identical spectral characteristics. Data on the spectra of known quasars indicated that there was only a remote probability of such a coincidence on chance; an explanation seemed called for. By far the most attractive hypothesis proposed was that the phenomenon consisted of a double image produced when radiation streaming from a single quasar was bent by the gravitational field of some massive object located between us and the quasar -- a gravitational lens. Pursuing this hypothesis, astronomers subsequently discovered a cluster of galaxies in the proper place to do the relativistic bending (Salmon [1984], pp. 159, 210; [1997], pp. 290-2; [2001a], pp. 72-3).

Why is the lensing explanation (L) so attractive? In large measure, it is because it eliminates a brute coincidence by providing a context in which the spectrum of the first quasar (S1) is relevant to the spectrum of the second (S2). By itself, L does nothing to raise the likelihood of either spectrum:

P(S1/L) = P(S1)

and mutatis mutandis for S2. But the coordination effected by L makes the dependence between the two almost complete, for if the two images have been formed by lensing from a single quasar then the spectrum of the one image is virtually guaranteed to match the spectrum of the other, i.e.,

P(S1 & S2/L) . P(S1/L) >> P(S1/L) P(S2/L).

Thus the reconstruction of theoretical consilience appears to capture the feature that makes this explanation so lovely.

The lensing case also provides us with a compelling illustration of the manner in which the explanatory virtues do heuristic work. Lovely as it was, L did not rise to the level of an established fact until subsequent observations from Mauna Kea and Palomar confirmed the presence of sufficient mass in the right place. But the loveliness of the explanation almost certainly motivated the search for that mass. From a heuristic perspective this is perfectly rational: with the high likelihood in place thanks to the consilience of L over randomness with respect to S1 & S2, all that remains for an inference is to establish by independent means the plausibility of L itself.[14] Failure to find the mass would have set astronomers to work looking for alternative explanations, much as the failure to find the hypothetical planet Vulcan paved the way for Einstein's explanation of the anomalous precession of the perihelion of Mercury (Salmon [2001a], p. 84). Either way, the heuristic model gives a definite epistemic underpinning to a policy of inquiry that takes theoretical virtues seriously.

In his discussion of this example, Wesley Salmon suggests that the attraction of the lensing hypothesis over its rivals is a simple function of Bayesian considerations and that explanatory beauty does not enter into our evaluation (Salmon [2001a], pp. 72-3). But Salmon's objection here is wide of the mark for two reasons, one phenomenological and one technical. Phenomenologically, the attractiveness of a unifying hypothesis appears to track its theoretical consilience ceteris paribus. One's instinctive assessment of a hypothesis that leaves the conjunction of S1 and S2 a mystery -- no more probable than the product of their individual probabilities -- is that it is not nearly so attractive because it fails to eliminate unexplained coincidence and that absent some other countervailing consideration it ought to be dismissed in favor of one that effects the relevant unification.

The technical point is that Salmon himself seems to have misanalyzed the structure of the reasoning in the lensing example, for he assimilates it to the pattern of inference to a common cause via a conjunctive fork described by Reichenbach.[15] The condition RC is built into that analysis and is explicitly cited by Salmon, so that on Salmon's analysis

P(S1 & S2/L) = P(S1/L) P(S2/L).

But as we have seen, in the case in hand this is plainly false. The moral is that the phenomenology should not be wholly despised. Attention to our pre-theoretical notions of loveliness may at times be a surer guide to a theory's probabilistic merits and the structure of our reasoning than purely algebraic manipulations even in the hands of an acknowledged master.

6 Conclusion

This approach may frustrate members of the explanationist camp who feel that there is nothing distinctively explanatory left by the time we have flattened out the virtues on a probabilistic plane. In their very nature, probabilistic reconstructions make no mention of causal connections, much less 'thick' connections; nor could they do so in any sense that is irreducible to the probability calculus and still serve to meet the relevance problem in Bayesian terms. Moreover, nothing in the foregoing analysis guarantees that an hypothesis with great explanatory power or theoretical consilience must be an explanation of the evidence, a tendentious choice of terminology notwithstanding. The explanationist may feel that his birthright is being sold for a thin mess of probabilistic pottage.

But I think it would be premature to abandon exploration of the reconstructive project on these grounds alone. The conceptual project of analyzing the notion of explanation is a long way from being completed (Salmon [1989]). As for the worry that the virtues might crop up outside of explanatory contexts, I think the explanationist should abandon the imperialist ambition to bring every aspect of non-deductive inference under the heading of IBE. It is sufficient that the virtues are frequently, perhaps overwhelmingly frequently, exhibited in inferences of an overtly explanatory sort and that our attention is drawn to them precisely because of the role they play in explanatory reasoning. And in view of the present impasse in developing non-probabilistic accounts of explanatory inference, explanationists are well advised not to turn back before they have explored the probabilistic path at least somewhat further, lest they should miss a hitherto undreamt of outlook on the whole.

 

Department of Philosophy

Western Michigan University

mcgrew@wmich.edu

 

Acknowledgments

I am grateful to two anonymous referees who provided constructive feedback that strengthened the paper, to Peter Lipton for providing pre-publication copies of his exchange with Salmon, to Samir Okasha, Jim Franklin, and Peter Lipton for helpful criticism and correspondence regarding earlier versions, to Elliott Sober for help tracking down references to his work, and to Lydia McGrew for spotting errors and muddles in the penultimate draft.


References

 

Barnes, Eric [1995]: Inference to the Loveliest Explanation', Synthese 103, pp. 251-77.

 

Boltzmann, Ludwig [1974]: Theoretical Physics and Philosophical Problems, Boston: D. Reidel.

 

Collins, Arthur [1966]: 'The Use of Statistics in Explanation', British Journal for the Philosophy of Science 17, pp. 127-40.

 

Day, Timothy and Kincaid, Harold [1994]: 'Putting Inference to the Best Explanation in its Place', Synthese 98, pp. 271-95.

 

Darwin, Charles [1872]: The Origin of Species, sixth edition, New York: E. P. Dutton & Co. Inc.

 

Foster, John [1985]: A. J. Ayer, New York: Routledge & Kegan Paul.

 

Harman, Gilbert [1973]: Thought, Princeton: Princeton University Press.

 

Hon, Giora and Rakover, Sam S. (eds.) [2001]: Explanation: Theoretical Approaches and Applications, Dordrecht: Kluwer.

 

Horwich, Paul [1982]: Probability and Evidence, Cambridge: Cambridge University Press.

 

Jeffrey, Richard [1992]: Probability and the Art of Judgment, Cambridge, Cambridge University Press.

 

Kitcher, Philip and Salmon, Wesley (eds.) [1989]: Scientific Explanation, Minnesota Studies in the Philosophy of Science, vol. 13, Minneapolis: University of Minnesota Press.

 

Knowles, Dudley (ed.) [1990]: Explanation and its Limits, Cambridge: Cambridge University Press.

 

Lipton, Peter [1991]: Inference to the Best Explanation, New York: Routledge.

 

__________ [2001]: 'Is Explanation a Guide to Inference? A Reply to Wesley Salmon', In Hon and Rakover (eds.) [2001].

 

Niiniluoto, Ilka [1999]: 'Defending Abduction', Philosophy of Science 66 (Proceedings), pp. S436-S451.

 

Peirce, Charles S. [1935]: Collected Papers Vol. 5, ed. by C. Hartshorne and P. Weiss, Cambridge, MA: Harvard University Press.

 

Reichenbach, Hans [1971]: The Direction of Time, Berkeley: University of California Press.

 

Rosenkrantz, Roger [1977]: Inference, Method and Decision: Towards a Bayesian Philosophy of Science, Dordrecht: D. Reidel.

 

Salmon, Wesley [1970]: 'Bayes's Theorem and the History of Science', In Stuewer (ed.) [1970].

 

__________ [1984]: Scientific Explanation and the Causal Structure of the World, Princeton: Princeton University Press.

 

__________ [1989]: 'Four Decades of Scientific Explanation', In Kitcher and Salmon (eds.) [1989].

 

__________ [1998]: Causality and Explanation, Oxford: Oxford University Press.

 

__________ [2001a]: 'Explanation and Confirmation: A Bayesian Critique of Inference to the Best Explanation', In Hon and Rakover (eds.) [2001].

 

__________ [2001b]: 'Reflections of a Bashful Bayesian: A Reply to Peter Lipton', In Hon and Rakover (eds.) [2001].

 

Sober, Elliott [1988]: Reconstructing the Past: Parsimony, Evolution and Inference, Cambridge, MA: MIT Press.

 

__________ [1990]: 'Let's Razor Ockham's Razor', In Knowles (ed.) [1990].

 

__________ [2002]: 'Bayesianism Its Scope and Limits', In Swinburne (ed.) [2002]. This paper is available in pdf form from Sober's website, philosophy.wisc.edu/sober/.

 

Stuewer, Roger (ed.) [1970]: Historical and Philosophical Perspectives of Science, Minnesota Studies in the Philosophy of Science, vol. 5. Minneapolis: University of Minnesota Press.

 

Swinburne, Richard (ed.) 2002: Bayes's Theorem, Cambridge: Cambridge University Press.

 

Thagard, Paul [1978]: 'The Best Explanation: Criteria for Theory Choice', Journal of Philosophy 75, pp. 76-92.

 

van Fraassen, Bas [1989]: Laws and Symmetry, Oxford: Oxford University Press.


Endnotes



[1] I propose to set aside verbal wrangling over whether a redistribution of probabilities by conditionalization counts as inference where the latter term is strictly construed. For examples of Bayesian reasoning in the absence of precise values for all of the traditional parameters, see Salmon ([1970]) and Shimony ([1993]), pp. 274-300.

[2] This approach disarms the piecemeal criticism of particular theoretical virtues by counterexample (Barnes [1995], p. 261).

[3] This is actually sometimes suggested (Niiniluoto [1999], p. S448). Perhaps the best construction to be put on Niiniluoto's brief remark is that the explanatory success of H contributes to its posterior probability in a manner yet to be analyzed (cf. Lipton [2001], p. 110-1); this is plausible enough but is just a promissory note. Lipton explicitly rejects the bald identification of loveliness with high posterior probability as trivializing (Lipton [2001], p. 105).

[4] Existing attempts to do this (Salmon [1970], p. 80; Day and Kincaid [1994], pp. 285-6; Niiniluoto [1999], p. S448; Lipton [2001], pp. 110-1) are more suggestive than detailed.

[5] The virtue of simplicity is notorious in this respect (Sober [1988]).

[6] I have silently substituted the suggestive E (evidence) and H (hypothesis) for Peirce's original C and A, respectively.

[7] As a full-scale Bayesian analysis of surprisingness, this would be inadequate since in general surprise is also a function of the availability of an alternative explanation with higher likelihood (Horwich [1982], pp. 100-4; see also pp. 14-5). But in the context the question is how to represent surprisingness as Peirce is using it; and as he has separated the idea of the availability of hypotheses on which E is 'a matter of course' and presented it in a separate premise of his schema, the low probability of E does seem to be a reasonable way to represent what he has in mind. Here and in the next section I have suppressed background to the right of the solidus for visual clarity, but it should be taken as implicit: there is here no tacit endorsement of radical personalism.

[8]Sometimes called the 'relevance quotient' in the Bayesian literature (e.g. Jeffrey [1992], p. 109).

[9]I have silently adjusted Sober's notation to match that used above and below. Sober does not himself endorse strong Bayesianism; his stated point is that Bayesianism cannot be the whole story about scientific inference. (Sober [2002], p. 3) But it is his conception of the resources available to Bayesians that is of central interest here.

[10]Sober ([1990]) contains the same sort of remarks regarding likelihoods and priors. Day and Kincaid ([1994]) write that 'IBE can be embedded in determining the priors and likelihoods that make up Bayesian calculation', and they suggest some ways that explanatory factors are relevant to these components separately. Both Lipton ([2001]) and Salmon ([2001a], p. 79; [2001b], p. 121) discuss the determination of priors and the determination of likelihood as two possible points at which explanatory considerations might be applied to Bayesian analyses. Salmon ([2001b], p. 125) offers an equation, substantially similar to Sober's formula, which in our notation comes to P(H1/E)/P(H2/E) = P(H1)P(E/H1) / P(H2)P(E/H2). He applauds the elimination of catchall factors from this equation but stresses that the evaluation of the relative merits of competing hypotheses 'still requires both the prior probability and the likelihood of each of the competing hypotheses'. None of these authors mentions the relevance quotient.

[11] Part of the reason for this is that high theoretical consilience, like high likelihood, can be manufactured by 'packing' the hypothesis so that it entails all of the evidence in question. It seems plausible that the gains acquired through packing functions will be offset by a loss of simplicity reflected in a low prior probability, a phenomenon that is arguably at work in examples ranging from Cartesian deceiver scenarios to Ptolemaic astronomy. But in view of the notorious difficulties surrounding the notion of simplicity, this requires separate analysis.

[12] Obvious restrictions apply: for jk, P(Ej & Ek) < P(Ej), similarly for larger conjunctions, and the various terms all take values intermediate between 0 and 1.

[13] Reichenbach ([1971], pp. 158-61). I have silently adjusted Reichenbach's notation.

[14] This is not exactly the same thing as evaluating a Bayesian prior P(L/B); the objective rather is to determine by observation whether the right mass can be found in the right place (M), since adding M to the background B would yield an agreeably high value of P(L/M & B) without reducing the consilience of L with respect to S1 & S2.

[15] Salmon employs Reichenbach's analysis here for inferential purposes; he does not endorse Reichenbach's analysis as an explication of the notion of causation (Salmon [1998], pp. 214ff).