**Confirmation, Heuristics, and Explanatory Reasoning**

Thus natural science appears completely to lose from
sight the large and general questions; but all the more splendid is the success
when, groping in the thicket of special questions, we suddenly find a small
opening that allows a hitherto undreamt of outlook on the whole.

-- Ludwig Boltzmann, *Theoretical Physics and
Philosophical Problems*

ABSTRACT

Recent work on inference to the best explanation has
come to an impasse regarding the proper way to coordinate the theoretical
virtues in explanatory inference with probabilistic confirmation theory, and in
particular with aspects of Bayes's Theorem. I argue that the theoretical
virtues are best conceived heuristically and that such a conception gives us
the resources to explicate the virtues in terms of *ceteris paribus*
theorems. Contrary to some Bayesians, this is not equivalent to identifying the
virtues with likelihoods or priors *per se*; the virtues may be more
accessible epistemically than likelihoods or priors. I then prove a *ceteris
paribus* theorem regarding theoretical consilience, use it to correct a
recent application of Reichenbach's common cause principle, and apply it to a
test case of scientific reasoning.

1 Explanation
and confirmation

2 The
heuristic conception of theoretical virtues

3 Abduction
and the accessibility of explanatory power

4 Evidential
and theoretical consilience

5 A test
case: gravitational lensing

6 Conclusion

1 Explanation and Confirmation

Explanatory reasoning, construed with sufficient
latitude, seems to be a pervasive feature of both scientific practice and
mundane life. Doctors infer that their patients have pneumonia from an examination
of their symptoms and mothers infer that their children have been stamping in
mud puddles from an examination of their shoes: what both inferences seem to
have in common is that the conclusion explains the observed facts and does so
better than any other available explanation. Between the ubiquity of
explanatory arguments and the difficulties besetting other accounts of
uncertain reasoning, some enthusiastic philosophers have been moved to claim
that the only primitive rational form of non‑deductive inference is
inference to the best explanation (Harman [1973]; Foster [1985], p. 227).

But when
explanationists try to spell out inference to the best explanation (IBE) in a
more precise form and show how it provides a distinctive form of non-deductive
inference, they find themselves hard pressed. The standard model cites various
'theoretical virtues' such as simplicity, consilience and precision that make
some hypotheses more 'lovely' than others; ostensibly, the presence of such
virtues enables us to select among rival hypotheses (Thagard [1978]; Lipton
[1991]). But attempts to bring such theoretical virtues into clear focus
without explicitly invoking the probability calculus have not advanced the
discussion beyond a clash of intuitions regarding their epistemic relevance,
leaving critics with the impression that 'inference to the best explanation'
remains a slogan rather than an accurate characterization of any form of
non-demonstrative inference (Salmon [2001a], p. 60).

Recent
work on explanatory inference has increasingly focused on coordinating the
theoretical virtues with the prior probabilities and likelihoods required for
Bayesian inference. The leading idea here is that explanatory considerations
may in some fashion be a guide to high posterior probability -- that
'loveliness' can be a guide to 'likeliness', and that the Bayesian and the
explanationist should be friends (Lipton [1991], pp. 61-4, [2001], p. 94). This
strategy is a promising way to address the relevance problem. But in many cases
it is difficult to determine in a principled way whether a given virtue makes
its contribution to the prior probability of a hypothesis or to its likelihood;
there is no straightforward sense in which those categories match the virtues.
And the strategy risks trivializing IBE by reducing it to unvarnished
probabilistic confirmation theory. As Salmon notes in a piece of friendly
criticism, 'it seems that the Bayesian has more to offer to the explanationist
than vice-versa'. (Salmon [2001b], p. 130).

I shall
argue here that we can make progress by reconstructing explanatory reasoning in
heuristic rather than inferential terms and by a more nuanced probabilistic
representation of theoretical virtues than previous writers have offered. The
resulting picture of explanatory reasoning is less sweeping than some
enthusiastic partisans of IBE have in mind. Many of the virtues are local, in
the sense that they apply to the relation between hypotheses and particular
pieces of evidence *in situ* rather than to those hypotheses considered *in
vacuo*; and although they are typically found in explanatory contexts, they
may also apply in cases of reasoning where we are disinclined to say that we
are *explaining* the data. But by taking a more modest approach that
deploys explanatory reasoning 'in the thicket of special questions', we can
meet the challenge of relevance, demonstrate how explanatory virtues may
provide genuine guidance, and account for the utility of explanatory reasoning
even in contexts where there is insufficient information to calculate a
posterior probability. In the course of
this reconstruction we shall also discover that the explanationists can learn
something about theoretical virtues from the Bayesians, and the Bayesians
something about the use of probabilities from the explanationists.

2 The Heuristic Conception of Theoretical Virtues

When it is phrased incautiously, the prescription that
we should always infer to the best explanation is open to an obvious objection:
frequently the evidence is not strong enough to warrant an inference to any of
the available explanations. If IBE is understood as a rule that enjoins belief
in the best available explanation regardless of the paucity of evidence in that
explanation's favor, then common sense will often prevent us from applying it
(van Fraassen [1989], pp. 149-50). Departures from common sense here can give
rise to the egregious fallacy of inferring the truth of an explanation merely
because it is the only one on offer (Salmon [2001a], pp. 83-4).

Mindful of
this difficulty, Peter Lipton concedes that IBE only sanctions inference when
the 'best' is good enough; sometimes, the correct response to the data is
agnosticism (Lipton [2001], p. 104). As a response to the paucity objection
this seems wholly reasonable. But if the value of the theoretical virtues is
bound up with their capacity to license inferences, then Lipton's concession
has unexpected dimensions. It is tantamount to saying that when the data are
sparse, the virtues are worthless: they can do no epistemic work until there
are enough of them in place to justify a definite conclusion. And this in turn
undermines the explanationist programme of showing how the explanatory virtues
may be a guide in the process of inquiry rather than merely a means of describing
its results.

A parallel
objection can be raised against a certain conception of Bayesian inference. It
is a commonplace, for example, that a high likelihood P(E/H) alone does not
warrant the ascription of a high posterior probability P(H/E). But no one
complains that this makes high likelihood useless. Instead, Bayesians make a
tacit distinction between probabilistic *inference* -- usually modeled in
terms of conditionalization -- and probabilistic *reasoning*, which may
take place in the absence of crucial parameters necessary for
conditionalization.^{[1]}
Acknowledging that a high likelihood is insufficient grounds for embracing H,
they nevertheless persist in taking high likelihood seriously, and with good
reason. For *ceteris paribus*, likelihood and posterior probability go
hand in hand: the higher the one, the higher the other. Even in the absence of
precise priors and expectedness, high likelihood is one of the features by
which we separate promising hypotheses from those not worth pursuing.

This
suggests a way for the explanationist to retrench. Theoretical virtues may help
us to sift through the logical space of hypotheses even when they do not pick
out a clear favorite. At this level explanatory considerations will have a
heuristic role, and the charge that they do not always suffice to warrant an
inference will be no more damaging to explanationists than the parallel charge
regarding high likelihood is to the Bayesians.^{[2]}

If this
model of IBE is to offer any guidance in actual inquiry, loveliness must not
simply be identified with a high posterior probability; for that would
trivialize the project.^{[3]}
But it should also move us beyond the vague suggestion that considerations of
simplicity, consilience and the like 'contribute' to priors and likelihoods.^{[4]}
So long as the connection between loveliness and likeliness is left inexplicit
we shall lack both a rationale for claims about those contributions and a way
of assessing the tradeoffs involved when, as not infrequently happens,
different virtues point in different directions.

To meet
the guiding challenge, explanationists must offer a rational reconstruction of
the virtues that meets two conditions: they must show that in some sense the
virtues are more accessible, logically or phenomenologically, than priors and
likelihoods (Lipton [2001], p. 111), and they must demonstrate that the virtues
thus reconstructed nevertheless have positive bearing on the posterior
probability of a hypothesis. Our discussion of the distinction between
inference and reasoning suggests that this latter end can be accomplished by *ceteris
paribus* theorems, proofs that the features in question are relevant to high
posterior probability in a way that parallels the Bayesian rationale for taking
likelihood itself seriously. Whether the virtues thus reconstructed can fulfill
the promise of greater accessibility can only be determined on a case by case
basis.

The task
is not made easier by the fact that the wide category of explanatory
considerations is home to several quite disparate sorts of virtues. Inquiry, as
Peirce seems to have seen clearly
(Peirce [1935], sections 5.598-600), has both evidential and economic
aspects, where the later may involve considerations such as the cost of
experimentation and the expected utility of information. To pretend that such
economic considerations do not exist would leave us with a flattened image of
the process of inquiry. But to conflate the epistemic and the economic issues
would render the distinctively epistemological problem of reconstructing
explanatory inference intractable. We must therefore disentangle the evidential
from economic aspects of explanatory reasoning, and we must also be prepared to
distinguish diverse evidential virtues that have been conflated in our existing
terminology.^{[5]}
If we can do this successfully, the result will be a conception of the
theoretical virtues that exhibits a closer likeness to our pre-theoretic
categories than can be effected by a mere correlation with priors and
likelihoods and a corresponding set of theorems that display the sense in which
those virtues thus reconstructed are accessible and demonstrate their objective
epistemic merits.

3 Abduction and the Accessibility of Explanatory Power

In a classic formulation, Peirce suggested that much
of our ampliative reasoning is initiated by a mental act of 'abduction' that
conforms to the following schema (Peirce [1935], section 5.189):

The
surprising fact E is observed;

But if H
were true, E would be a matter of course,

Hence,
there is reason to suspect that H is true.^{[6]}

Here E is the unexpected bit of evidence that
initiates the train of reasoning, and H can very naturally be construed as an
explanatory hypothesis that would, if true, remove our surprise at E. In
numerous places Peirce calls this an 'inferential step', but the conclusion ‑‑
that there is reason to suspect that H is true ‑‑ is a good deal
weaker than the outright conclusion that H. As he says elsewhere, abduction
merely suggests that something *may be* true (Peirce [1935], section
5.171).

The
suggestion that we look favorably on hypotheses that remove our surprise seems
borne out in many examples of explanatory inference. Sometimes we are in
genuine doubt regarding the explanation of some puzzling phenomenon. At other
times the explanation may suggest itself to us at virtually the same instant
that we become aware of the phenomenon it explains. In either case, it seems to
be the gap between our surprise and the 'matter of course' -- the explanatory
power, as I shall call it, of that supposition with respect to E -- that
commends H to us.

A
probabilistic analysis focuses this intuition. In Bayesian terms, we may
represent the 'surprisingness of E' in Peirce's schema as the multiplicative
inverse of its expectedness; the more surprising E is, the lower P(E).^{[7]}
If E is in this sense very surprising, then

P(E) ≈
0.

The second
line of Peirce's schema indicates that on the assumption of H the probability
of E is very high, which comes out as a high likelihood:

P(H/E) ≈
1.

These two
conditions do not, on a Bayesian analysis, suffice to guarantee a high
posterior probability for H, since a sufficiently low prior P(H) might swamp
the ratio of likelihood over expectedness. So Peirce's assessment that
abduction merely suggests that H may be true is mirrored in this
reconstruction. In fact, Peirce's own formulation is stronger than it needs to
be. The essential point is simply that

P(E/H)
> P(E),

since this is sufficient to guarantee that taking E
into consideration increases the probability of H. Thus in reconstructing
Peirce's notion of abduction we arrive at the most widespread current
definition of the term 'evidence', for it is under just this condition that E
is taken, on a Bayesian account, to be evidence for H.

Explanatory
power is a theoretical virtue that cannot be pigeonholed as 'contributing' to
high likelihood or a high prior on the Bayesian account. When represented as a
ratio,^{[8]}

__P(E/H)__

P(E)

it may be arbitrarily high regardless of how low the
likelihood in its numerator, provided that neither is zero, and in some cases
of explanatory reasoning both do seem to be low (Collins [1966], pp. 135-6;
Rosenkrantz [1977], pp. 166-70). Nevertheless it is the very paradigm of a
virtue in the sense we have in view, for it follows immediately from this ratio
and Bayes's Theorem that of two hypotheses with equal priors, the one with
greater explanatory power will have the greater posterior probability.

All of
this may look highly suspicious to the Bayesian eye, not through any fault in
the mathematics but because it smacks of outright conceptual theft. Though
Bayesians have available the resources to accommodate this sort of reasoning,
it is a curious fact that they are often focused so completely on priors and
likelihoods that they overlook the epistemic significance of the relevance
quotient in its own right. Elliott Sober ([2002]) illustrates this approach
particularly clearly in his discussion of the relevance of simplicity or
parsimony to the plausibility of a hypothesis -- in particular, the idea that
the simpler a hypothesis is, the greater its plausibility. Sober lays out an
elementary consequence of Bayes's Theorem,

P(H_{1}/E) > P(H_{2}/E) if and only
if P(E/H_{1}) P(H_{1}) > P(E/H_{2}) P(H_{2}),

and then glosses the Bayesian assessment of the
theoretical virtue of simplicity exclusively in terms of priors and
likelihoods:

If 'more plausible' is interpreted to mean *higher
posterior probability*, then there are just two ingredients that Bayesianism
gets to use in explaining what makes one hypothesis more plausible than
another. This means that if simplicity *does* influence plausibility, it
must do so via the prior probabilities or via the likelihoods. If the relevance
of simplicity cannot be accommodated in one of these two ways, then either
simplicity is epistemically irrelevant or (strong) Bayesianism is mistaken.^{[9]}

And again: 'Bayesianism has just two resources for
explaining the epistemic relevance of simplicity – priors and likelihoods'.
(Sober [2002], p. 10) This approach is thoroughly typical.^{[10]}

Sober's
analysis seems plausible because of the particular way that he has displayed
the probabilities, but in fact there are multiple mathematically acceptable
ways to compare probabilities using Bayes's Theorem. We could as easily say,
for example,

P(H_{1}/E) > P(H_{2}/E) if and only
if [P(E/H_{1})/P(E)] / [P(E/H_{2})/P(E)] > P(H_{2})/P(H_{1})

This, too, is a direct consequence of Bayes's Theorem;
but now the bracketed terms are the respective relevance quotients -- measures
of the explanatory power of the two hypotheses -- rather than likelihoods.

This point
would not be significant if we always needed to know the likelihood of an
hypothesis with respect to the evidence in question, at least to an order of
magnitude approximation, in order to know whether it has any explanatory power
with respect to that evidence. And at first blush it seems we do. For if we
measure explanatory power of H with respect to E by the relevance quotient,
then we are contemplating P(E/H) / P(E); and the most obvious way to determine
the value of this ratio is to obtain separate values for P(E/H) and for P(E)
and then divide the former by the latter.

The most
obvious -- but not the only way. In fact, there are numerous situations in
which we have evidence that pertains to the *relative* values of P(E/H)
and P(E) rather than to their absolute values. An example makes this plain. At
a carnival poker booth I espy a genial looking fellow willing to play all comers
at small stakes. The first hand he deals gives him four aces and a king, the
second a royal flush, and indeed he never seems to come up with less than a
full house any time the cards are in his hands. Half an hour older and forty
dollars wiser, I strongly suspect that I have encountered a card sharp. I have
made no attempt to compute the odds against his obtaining those particular
hands on chance; I may not even know how to do the relevant calculation. Nor do
I have any clear sense of the probability of his getting just *those*
hands given that he is a sharp. For neither P(E/H) nor P(E) am I in a position
to estimate a value within, say, three orders of magnitude; the best I can say
in non-comparative terms is that each of them is rather low. But I know past
reasonable doubt that the explanatory power of my hypothesis is very great.

My
evidence in this case is testimonial; I have been told that card sharps are
good enough at manipulating a deck to make their own hands come out favorable
at a much higher rate than normal. And as is typical in cases where we rely on
testimony, I have not done the kind of research that would give me the
individual odds directly; someone *else* may know them, but not I. Thus,
although the explanatory power of H with respect to E may be analytically
equivalent to the ratio of likelihood and expectedness, it does not follow that
either of those two numbers must be epistemically accessible in non-comparative
terms in order for the explanatory power itself to be epistemically accessible.
It is not hard to generalize this to cases where I may be quite sure more or
less *exactly* what the explanatory power of H is with respect to E though
I have no further information bearing on either, as when I am told that a
certain baseball player has a batting average that is twice the average in his
league.

4 Evidential and Theoretical Consilience

In *The Origin of Species*, Darwin draws special
attention to the fact that his theory explains 'several classes of facts'
ranging from homology to the 'atrophied' organs of animals (Darwin [1872], p.
436). The pattern of reasoning is widespread: a hypothesis gains in credibility
to the extent that the several pieces of evidence in its favor are unrelated.
Following this has led some advocates of IBE to cite consilience -- the
capacity to explain diverse independent classes of facts -- as a paradigmatic
theoretical virtue.

This
intuition can be cast in probabilistic terms. Assume for the sake of argument
that H gives equivalent likelihood to the conjunction of evidence E_{1}
through E_{n}, on the one hand, and data D_{1} through D_{n},
on the other, e.g.

P(E_{1}
& ... & E_{n}/H) = P(D_{1} & ... & D_{n}/H),

and that the expectedness of the individual pieces of
E and D are equivalent, so that for all k,

P(E_{k})
= P(D_{k}).

Under these conditions, a set that exhibits
independence will confer greater probability on H than one where the members of
the set are positively relevant to each other. For if

P(E_{1}
& ... & E_{n}) = P(E_{1}) x ... x P(E_{n})

but

P(D_{1}
& ... & D_{n}) > P(D_{1}) x ... x P(D_{n}),

then the explanatory power of H is greater with
respect to E than to D:

__P(E _{1} & ... & E_{n}/H)__ >

P(E_{1} & ... & E_{n}) P(D_{1} & ... & D_{n}).

The reasoning is easily generalized to a result
regarding the extent of the independence exhibited by the data.

Although
this reconstruction of the consilience exhibited in Darwin's theory shows it to
be confirmatory *ceteris paribus*, it has the unexpected consequence that
consilience of this sort turns out to be a function of the evidence rather than
of the hypothesis. But the analysis of such 'evidential consilience', as we may
call it, suggests that there may be a cluster of virtues surrounding the
concepts of dependence and independence.^{[11]} If we change
our focus from the independence of the evidence apart from the theory to its
dependence in light of the theory, we discover a lovely theorem: The degree of
confirmation a hypothesis receives from the conjunction of independent pieces
of evidence is a monotonic function of the extent to which those pieces of
evidence can be seen to be positively relevant to each other in the light of
that hypothesis.

The proof
is straightforward.^{[12]}
Assume for the sake of the argument that P(H_{1}) = P(H_{2}),
that for all n, P(E_{n}/H_{1}) = P(E_{n}/H_{2}),
and that the various E_{n} are positively relevant to each other
conditional on H_{1} but independent of each other conditional on H_{2},
i.e.,

P(E_{1}
& ... & E_{n}/H_{1}) > P(E_{1}/H_{1})
x ... x P(E_{n}/H_{1})

but

P(E_{1}
& ... & E_{n}/H_{2}) = P(E_{1}/H_{2}) x
... x P(E_{n}/H_{2}).

Then by Bayes's Theorem and a bit of trivial algebra,

P(H_{1}/E_{1}
& ... & E_{n})
> P(H_{2}/E_{1}
& ... & E_{n})

Thus H_{1} emerges as clearly superior to H_{2},
in straightforward confirmational terms, despite the fact that on a case by
case basis it has no predictive advantage over H_{2}. The result is
easily generalized to yield the theorem in question. This provides a convincing
demonstration of the confirmational relevance of what we will call 'theoretical
consilience' -- the consilience that obtains when an hypothesis or theory
reduces independence among the data.

Note that
theoretical consilience is both local and comparative. The superiority of H_{1}
to H_{2} with respect to E_{1}, ... E_{n} does not
entail a comparable superiority with respect to some other data D_{1},
... D_{m}. So one lesson we learn directly from the theorem is that
theoretical consilience is a three-termed relation: H_{1} is more
consilient than H_{2} with respect to a set of data E to the extent
that

__ P(E _{1}
&...& E_{n}/H_{1})
__ >

P(E_{1}/H_{1})
x ... x P(E_{n}/H_{1}) P(E_{1}/H_{2})
x ... x P(E_{n}/H_{2}).

This
theorem sheds light on Reichenbach's discussion of a conjunctive fork
(Reichenbach [1971]). Reichenbach conceives of a situation where a
probabilistic dependence

P(E_{1}
& E_{2}) > P(E_{1}) P(E_{2})

serves as a sign of a common cause for both E_{1}
and E_{2}, and he offers a formal proof that under certain conditions,
among them the Reichenbach Condition

RC: P(E_{1} & E_{2}/C) = P(E_{1}/C)
P(E_{2}/C),

the conjunction raises the probability of C *even
though, in accordance with RC, the dependence has vanished modulo the assumption
of C*.^{[13]}
Strictly speaking Reichenbach's proof is correct. But consideration of our
theorem shows that RC is not only not required for making a hypothesis
probable, it is positively detrimental. A simple substitution of C for H_{2}
in the proof above with n=2 demonstrates that a hypothesis that possesses
theoretical consilience with respect to the data in question will *ceteris
paribus* emerge from Bayesian conditionalization with a higher posterior
probability than a hypothesis that obeys RC.

5 A Test Case: Gravitational Lensing

The example of gravitational lensing offers a good
testing ground for this analysis. In 1979 two quasar images only five
arcseconds apart, QSO 0957+561, were found to have identical spectral
characteristics. Data on the spectra of known quasars indicated that there was
only a remote probability of such a coincidence on chance; an explanation
seemed called for. By far the most attractive hypothesis proposed was that the
phenomenon consisted of a double image produced when radiation streaming from a
single quasar was bent by the gravitational field of some massive object
located between us and the quasar -- a gravitational lens. Pursuing this
hypothesis, astronomers subsequently discovered a cluster of galaxies in the
proper place to do the relativistic bending (Salmon [1984], pp. 159, 210;
[1997], pp. 290-2; [2001a], pp. 72-3).

Why is the
lensing explanation (L) so attractive? In large measure, it is because it
eliminates a brute coincidence by providing a context in which the spectrum of
the first quasar (S_{1}) is relevant to the spectrum of the second (S_{2}).
By itself, L does nothing to raise the likelihood of either spectrum:

P(S_{1}/L)
= P(S_{1})

and mutatis mutandis for S_{2}. But the
coordination effected by L makes the dependence between the two almost
complete, for if the two images have been formed by lensing from a single
quasar then the spectrum of the one image is virtually guaranteed to match the
spectrum of the other, i.e.,

P(S_{1}
& S_{2}/L) . P(S_{1}/L)
>> P(S_{1}/L) P(S_{2}/L).

Thus the reconstruction of theoretical consilience
appears to capture the feature that makes this explanation so lovely.

The
lensing case also provides us with a compelling illustration of the manner in
which the explanatory virtues do heuristic work. Lovely as it was, L did not
rise to the level of an established fact until subsequent observations from
Mauna Kea and Palomar confirmed the presence of sufficient mass in the right
place. But the loveliness of the explanation almost certainly motivated the
search for that mass. From a heuristic perspective this is perfectly rational:
with the high likelihood in place thanks to the consilience of L over
randomness with respect to S_{1} & S_{2}, all that remains
for an inference is to establish by independent means the plausibility of L
itself.^{[14]}
Failure to find the mass would have set astronomers to work looking for
alternative explanations, much as the failure to find the hypothetical planet
Vulcan paved the way for Einstein's explanation of the anomalous precession of
the perihelion of Mercury (Salmon [2001a], p. 84). Either way, the heuristic
model gives a definite epistemic underpinning to a policy of inquiry that takes
theoretical virtues seriously.

In his
discussion of this example, Wesley Salmon suggests that the attraction of the
lensing hypothesis over its rivals is a simple function of Bayesian
considerations and that explanatory beauty does not enter into our evaluation
(Salmon [2001a], pp. 72-3). But Salmon's objection here is wide of the mark for
two reasons, one phenomenological and one technical. Phenomenologically, the
attractiveness of a unifying hypothesis appears to track its theoretical
consilience *ceteris paribus*. One's instinctive assessment of a
hypothesis that leaves the conjunction of S_{1} and S_{2} a
mystery -- no more probable than the product of their individual probabilities
-- is that it is not nearly so attractive because it fails to eliminate
unexplained coincidence and that absent some other countervailing consideration
it ought to be dismissed in favor of one that effects the relevant unification.

The
technical point is that Salmon himself seems to have misanalyzed the structure
of the reasoning in the lensing example, for he assimilates it to the pattern
of inference to a common cause via a conjunctive fork described by Reichenbach.^{[15]}
The condition RC is built into that analysis and is explicitly cited by Salmon,
so that on Salmon's analysis

P(S_{1}
& S_{2}/L) = P(S_{1}/L)
P(S_{2}/L).

But as we have seen, in the case in hand this is
plainly false. The moral is that the phenomenology should not be wholly
despised. Attention to our pre-theoretical notions of loveliness may at times
be a surer guide to a theory's probabilistic merits and the structure of our
reasoning than purely algebraic manipulations even in the hands of an
acknowledged master.

6 Conclusion

This approach may frustrate members of the
explanationist camp who feel that there is nothing distinctively explanatory
left by the time we have flattened out the virtues on a probabilistic plane. In
their very nature, probabilistic reconstructions make no mention of causal connections,
much less 'thick' connections; nor could they do so in any sense that is
irreducible to the probability calculus and still serve to meet the relevance
problem in Bayesian terms. Moreover, nothing in the foregoing analysis
guarantees that an hypothesis with great explanatory power or theoretical
consilience must be an *explanation* of the evidence, a tendentious choice
of terminology notwithstanding. The explanationist may feel that his birthright
is being sold for a thin mess of probabilistic pottage.

But I
think it would be premature to abandon exploration of the reconstructive
project on these grounds alone. The conceptual project of analyzing the notion
of explanation is a long way from being completed (Salmon [1989]). As for the
worry that the virtues might crop up outside of explanatory contexts, I think
the explanationist should abandon the imperialist ambition to bring every
aspect of non-deductive inference under the heading of IBE. It is sufficient
that the virtues are frequently, perhaps overwhelmingly frequently, exhibited
in inferences of an overtly explanatory sort and that our attention is drawn to
them precisely because of the role they play in explanatory reasoning. And in
view of the present impasse in developing non-probabilistic accounts of
explanatory inference, explanationists are well advised not to turn back before
they have explored the probabilistic path at least somewhat further, lest they
should miss a hitherto undreamt of outlook on the whole.

Department of Philosophy

Western Michigan University

mcgrew@wmich.edu

Acknowledgments

I am grateful to two anonymous referees who provided
constructive feedback that strengthened the paper, to Peter Lipton for
providing pre-publication copies of his exchange with Salmon, to Samir Okasha, Jim
Franklin, and Peter Lipton for helpful criticism and correspondence regarding
earlier versions, to Elliott Sober for help tracking down references to his
work, and to Lydia McGrew for spotting errors and muddles in the penultimate
draft.

**References**

Barnes, Eric [1995]: Inference to the Loveliest
Explanation', *Synthese* **103**, pp. 251-77.

Boltzmann, Ludwig [1974]: *Theoretical Physics and
Philosophical Problems*, Boston: D. Reidel.

Collins, Arthur [1966]: 'The Use of Statistics in
Explanation', *British Journal for the Philosophy of Science* **17**,
pp. 127-40.

Day, Timothy and Kincaid, Harold [1994]: 'Putting
Inference to the Best Explanation in its Place', *Synthese* **98**, pp.
271-95.

Darwin, Charles [1872]: *The Origin of Species*,
sixth edition, New York: E. P. Dutton & Co. Inc.

Foster, John [1985]: *A. J. Ayer*, New York:
Routledge & Kegan Paul.

Harman, Gilbert [1973]: *Thought*, Princeton:
Princeton University Press.

Hon, Giora and Rakover, Sam S. (*eds*.) [2001]: *Explanation:
Theoretical Approaches and Applications*, Dordrecht: Kluwer.

Horwich, Paul [1982]: *Probability and Evidence*,
Cambridge: Cambridge University Press.

Jeffrey, Richard [1992]: *Probability and the Art of
Judgment*, Cambridge, Cambridge University Press.

Kitcher, Philip and Salmon, Wesley (*eds*.)
[1989]: *Scientific Explanation*, Minnesota Studies in the Philosophy of
Science, vol. 13, Minneapolis: University of Minnesota Press.

Knowles, Dudley (ed.) [1990]: *Explanation and its
Limits*, Cambridge: Cambridge University Press.

Lipton, Peter [1991]: *Inference to the Best
Explanation*, New York: Routledge.

__________ [2001]: 'Is Explanation a Guide to
Inference? A Reply to Wesley Salmon', In Hon and Rakover (*eds*.) [2001].

Niiniluoto, Ilka [1999]: 'Defending Abduction', *Philosophy
of Science* **66** (Proceedings), pp. S436-S451.

Peirce, Charles S. [1935]: *Collected Papers*
Vol. 5, ed. by C. Hartshorne and P. Weiss, Cambridge, MA: Harvard University
Press.

Reichenbach, Hans [1971]: *The Direction of Time*,
Berkeley: University of California Press.

Rosenkrantz, Roger [1977]: *Inference, Method and
Decision: Towards a Bayesian Philosophy of Science*, Dordrecht: D. Reidel.

Salmon, Wesley [1970]: 'Bayes's Theorem and the
History of Science', In Stuewer (*ed*.) [1970].

__________ [1984]: *Scientific Explanation and the
Causal Structure of the World*, Princeton: Princeton University Press.

__________ [1989]: 'Four Decades of Scientific
Explanation', In Kitcher and Salmon (*eds*.) [1989].

__________ [1998]: *Causality and Explanation*,
Oxford: Oxford University Press.

__________ [2001a]: 'Explanation and Confirmation: A
Bayesian Critique of Inference to the Best Explanation', In Hon and Rakover (*eds*.)
[2001].

__________ [2001b]: 'Reflections of a Bashful
Bayesian: A Reply to Peter Lipton', In Hon and Rakover (*eds*.) [2001].

Sober, Elliott [1988]: *Reconstructing the Past:
Parsimony, Evolution and Inference*, Cambridge, MA: MIT Press.

__________ [1990]: 'Let's Razor Ockham's Razor', In
Knowles (*ed*.) [1990].

__________ [2002]: 'Bayesianism – Its Scope and
Limits', In Swinburne (*ed*.) [2002]. This paper is available in pdf form
from Sober's website, philosophy.wisc.edu/sober/.

Stuewer, Roger (*ed*.) [1970]: *Historical and
Philosophical Perspectives of Science*, Minnesota Studies in the Philosophy
of Science, vol. 5. Minneapolis: University of Minnesota Press.

Swinburne, Richard (*ed*.) 2002: *Bayes's
Theorem*, Cambridge: Cambridge University Press.

Thagard, Paul [1978]: 'The Best Explanation: Criteria
for Theory Choice', *Journal of Philosophy* **75**, pp. 76-92.

van Fraassen, Bas [1989]: *Laws and Symmetry*,
Oxford: Oxford University Press.

**Endnotes**

^{[1]} I propose to set aside verbal wrangling over
whether a redistribution of probabilities by conditionalization counts as
inference where the latter term is strictly construed. For examples of Bayesian
reasoning in the absence of precise values for all of the traditional
parameters, see Salmon ([1970]) and Shimony ([1993]), pp. 274-300.

^{[2]} This approach disarms the piecemeal
criticism of particular theoretical virtues by counterexample (Barnes [1995],
p. 261).

^{[3]} This is actually sometimes suggested
(Niiniluoto [1999], p. S448). Perhaps the best construction to be put on
Niiniluoto's brief remark is that the explanatory success of H contributes to
its posterior probability in a manner yet to be analyzed (cf. Lipton [2001], p.
110-1); this is plausible enough but is just a promissory note. Lipton
explicitly rejects the bald identification of loveliness with high posterior
probability as trivializing (Lipton [2001], p. 105).

^{[4]} Existing attempts to do this (Salmon [1970],
p. 80; Day and Kincaid [1994], pp. 285-6; Niiniluoto [1999], p. S448; Lipton
[2001], pp. 110-1) are more suggestive than detailed.

^{[5]} The virtue of simplicity is notorious in
this respect (Sober [1988]).

^{[6]} I have silently substituted the suggestive E
(evidence) and H (hypothesis) for Peirce's original C and A, respectively.

^{[7]} As a full-scale Bayesian analysis of
surprisingness, this would be inadequate since in general surprise is also a
function of the availability of an alternative explanation with higher
likelihood (Horwich [1982], pp. 100-4; see also pp. 14-5). But in the context
the question is how to represent surprisingness *as Peirce is using it*;
and as he has separated the idea of the availability of hypotheses on which E
is 'a matter of course' and presented it in a separate premise of his schema,
the low probability of E does seem to be a reasonable way to represent what he
has in mind. Here and in the next section I have suppressed background to the right
of the solidus for visual clarity, but it should be taken as implicit: there is
here no tacit endorsement of radical personalism.

^{[8]}Sometimes called the 'relevance quotient' in
the Bayesian literature (e.g. Jeffrey [1992], p. 109).

^{[9]}I have silently adjusted Sober's notation to
match that used above and below. Sober does not himself endorse strong
Bayesianism; his stated point is that Bayesianism cannot be the whole story
about scientific inference. (Sober [2002], p. 3) But it is his conception of
the resources available to Bayesians that is of central interest here.

^{[10]}Sober ([1990]) contains the same sort of
remarks regarding likelihoods and priors. Day and Kincaid ([1994]) write that
'IBE can be embedded in determining the priors and likelihoods that make up
Bayesian calculation', and they suggest some ways that explanatory factors are
relevant to these components separately. Both Lipton ([2001]) and Salmon
([2001a], p. 79; [2001b], p. 121) discuss the determination of priors and the determination
of likelihood as two possible points at which explanatory considerations might
be applied to Bayesian analyses. Salmon ([2001b], p. 125) offers an equation,
substantially similar to Sober's formula, which in our notation comes to P(H_{1}/E)/P(H_{2}/E)
= P(H_{1})P(E/H_{1}) / P(H_{2})P(E/H_{2}). He
applauds the elimination of catchall factors from this equation but stresses
that the evaluation of the relative merits of competing hypotheses 'still
requires both the prior probability and the likelihood of each of the competing
hypotheses'. None of these authors mentions the relevance quotient.

^{[11]} Part of the reason for this is that high
theoretical consilience, like high likelihood, can be manufactured by 'packing'
the hypothesis so that it entails all of the evidence in question. It seems
plausible that the gains acquired through packing functions will be offset by a
loss of simplicity reflected in a low prior probability, a phenomenon that is
arguably at work in examples ranging from Cartesian deceiver scenarios to
Ptolemaic astronomy. But in view of the notorious difficulties surrounding the
notion of simplicity, this requires separate analysis.

^{[12]} Obvious restrictions apply: for j…k, P(E_{j} & E_{k}) <
P(E_{j}), similarly for larger conjunctions, and the various terms all
take values intermediate between 0 and 1.

^{[13]} Reichenbach ([1971], pp. 158-61). I have
silently adjusted Reichenbach's notation.

^{[14]} This is not exactly the same thing as
evaluating a Bayesian prior P(L/B); the objective rather is to determine by
observation whether the right mass can be found in the right place (M), since
adding M to the background B would yield an agreeably high value of P(L/M &
B) without reducing the consilience of L with respect to S_{1} & S_{2}.

^{[15]} Salmon employs Reichenbach's analysis here
for inferential purposes; he does not endorse Reichenbach's analysis as an
explication of the notion of causation (Salmon [1998], pp. 214ff).