The empirico-inductive concept
On induction and Bayesian inference, John Maynard Keynes wrote:
"To take an example, Pure Induction can be usefully employed to strengthen an argument if, after a certain number of instances have been examined, we have, from some other source, a finite probability in favour of the generalisation, and, assuming the generalisation is false, a finite uncertainty as to its conclusion being satisfied by the next hitherto unexamined instance which satisfies its premise."
He goes on to say that pure induction "can be used to support the generalisation that the sun will rise every morning for the next million years, provided that with the experience we have actually had there are finite probabilities, however small, derived from some other source, first, in favour of the generalisation, and, second, in favour of the sun's not rising to-morrow assuming the generalisation to be false," adding: "Given these finite probabilities, obtained otherwise, however small, then the probability can be strengthened and can tend to increase towards certainty by the mere multiplication of instances provided that these instances are so far distinct that they are not inferable one from another" (35).
Keynes's book is highly critical of a theory of scientific induction presented by a fellow economist, William Stanley Jevons, that uses Laplace's rule of succession as its basis. Keynes argues here that it is legitimate to update probabilities as new information arrives, but that -- paraphrasing him in our terms -- the propensity information and its attendant initial probabilities cannot be zero. In the last sentence in the quotation above, Keynes is giving the condition of independence, a condition that is often taken for granted, but that rests on assumptions about randomness that we will take a further look at as we proceed. Related to that is the assumption that two events have enough in common to be considered identical. However, this assumption must either be accepted as a primitive, or must be based on concepts used by physicists. We will question this viewpoint at a later point in our discussion. Other Bayesian probabilists, such as Harold Jeffreys, agree with Keynes on this.
One might say that the empirico-inductive approach, at its most unvarnished, assumes a zero or near-zero information value for the system's propensity. But such an approach yields experimental information that can then be used in probability calculations. A Bayesian algorithm for updating probabilities based on new information -- such as Laplace's rule of succession -- might or might not be gingerly accepted for a specific case, such as one of those posed by J. Richard Gott III.
Article on Gott
On induction and Bayesian inference, John Maynard Keynes wrote:
"To take an example, Pure Induction can be usefully employed to strengthen an argument if, after a certain number of instances have been examined, we have, from some other source, a finite probability in favour of the generalisation, and, assuming the generalisation is false, a finite uncertainty as to its conclusion being satisfied by the next hitherto unexamined instance which satisfies its premise."
He goes on to say that pure induction "can be used to support the generalisation that the sun will rise every morning for the next million years, provided that with the experience we have actually had there are finite probabilities, however small, derived from some other source, first, in favour of the generalisation, and, second, in favour of the sun's not rising to-morrow assuming the generalisation to be false," adding: "Given these finite probabilities, obtained otherwise, however small, then the probability can be strengthened and can tend to increase towards certainty by the mere multiplication of instances provided that these instances are so far distinct that they are not inferable one from another" (35).
Keynes's book is highly critical of a theory of scientific induction presented by a fellow economist, William Stanley Jevons, that uses Laplace's rule of succession as its basis. Keynes argues here that it is legitimate to update probabilities as new information arrives, but that -- paraphrasing him in our terms -- the propensity information and its attendant initial probabilities cannot be zero. In the last sentence in the quotation above, Keynes is giving the condition of independence, a condition that is often taken for granted, but that rests on assumptions about randomness that we will take a further look at as we proceed. Related to that is the assumption that two events have enough in common to be considered identical. However, this assumption must either be accepted as a primitive, or must be based on concepts used by physicists. We will question this viewpoint at a later point in our discussion. Other Bayesian probabilists, such as Harold Jeffreys, agree with Keynes on this.
One might say that the empirico-inductive approach, at its most unvarnished, assumes a zero or near-zero information value for the system's propensity. But such an approach yields experimental information that can then be used in probability calculations. A Bayesian algorithm for updating probabilities based on new information -- such as Laplace's rule of succession -- might or might not be gingerly accepted for a specific case, such as one of those posed by J. Richard Gott III.
Article on Gott
http://en.wikipedia.org/wiki/J._Richard_Gott
It depends on whether one accepts near-zero information for the propensity. How does one infer a degree of dependence without any knowledge of the propensity? If it is done with the rule of succession, we should be skeptical. In the case of sun risings, for all we know, the solar system -- remember, we are assuming essentially no propensity information -- is chaotic and only happens to be going through an interval of periodicity or quasi-periodicity (when trajectories cycle toward limit points -- as in attractors -- ever more closely). Maybe tomorrow a total eclipse will occur as the moon shifts relative position, or some large body arrives between the earth and sun. This seems preposterous, but only because in fact we are aware that the propensity information is non-zero; that is, we know something about gravity and have observed quite a lot about solar system dynamics.
But Gott argues that we are entitled to employ the Copernican principle in some low-information scenarios. The "information" in this principle says that there is no preferred orientation in space and time for human beings. With that in mind, the "doomsday argument" follows, whereby the human race is "most likely" to be about halfway through its course of existence, taking into account the current world population. We note that the doomsday argument has commonalities with Pascal's wager on the existence of God. That is, Pascal assigned a probability of 1/2 for existence and against existence based on the presumption of utter ignorance. Yet, how is this probability arrived at? There are no known frequencies available. Even if we use a uniform continuous distribution from 0 to 1, the prior isn't necessarily found there, system information being ruled out and expert opinion unwelcome. That is, how do we know that an initial probability exists at all?
The doomsday argument
http://en.wikipedia.org/wiki/Doomsday_argument
With respect to the doomsday scenario, there are frequencies, in terms of average lifespan of an arbitrary species (about a million years), but these are not taken into account. The fact is that the doomsday scenario's system "information" that we occupy no privileged spacetime position is, being a principle, an assumption taken as an article of faith. If we accept that article, then we may say that we are less likely to be living near the beginning or ending of the timeline of our species, just as when one arrives at a bus stop without a schedule, one expects that one probably won't have to wait the maximum possible time interval, or something near the minimum, between buses. Similarly, we hazard this guess based on the idea that there is a low-level of interaction between one's brain and the bus's appearance, or based on the idea that some greater power isn't "behind the scenes" controlling the appearance of buses. If one says such ideas are patently absurd, read on, especially the sections on noumena (Part VI, see sidebar).
Another point: an inference in the next-bus scenario is that we could actually conduct an experiment in which Bus 8102 is scheduled to arrive at Main and 7th streets at 10 minutes after the hour and a set of random integers in [0,60] is printed out; the experimenter then puts his "arrival time" as that randomly selected number of minutes past the hour; these numbers are matched against when the 8102 actually arrives, the results compiled and the arithmetic mean taken. Over a sufficient number of tests, the law of large numbers suggests that the average waiting time is a half hour. Because we know already that we can anticipate that result, we don't find it necessary to actually run the trials. So, is that Bayesianism or idealistic frequentism, using imaginary trials?
In the case of the plausibility that we are about halfway through the lifespan of our species, it is hard to imagine even a fictional frequency scenario. Suppose we have somehow managed to obtain the entire population of Homo sapiens sapiens who have ever lived or who ever will live. From that finite set, a person is chosen randomly, or as close to randomly as we can get. What is the probability our choice will come from an early period in species history? There is no difference between that probability and the probability our choice came from about the halfway mark. Of course, we hasten to concede that such reasoning doesn't yield much actionable knowledge. What can anyone do with a probability-based assessment that the human species will be extinct in X number of years, if X exceeds one's anticipated lifetime?
Concerning imaginary trials, E.T. Jaynes (36), a physicist and Bayesian crusader, chided standard statistics practitioners for extolling objectivity while using fictional frequencies and trials. A case in point, I suggest, is the probability in a coin toss experiment that the first head will show up on an odd-numbered flip. The probability is obtained by summing all possibilities to infinity, giving an infinite series limit of 2/3. That is Σ ai = Σ 1/2i = 2/3 as n goes infinite. That probability isn't reached after an infinitude of tosses, however. It applies immediately. And one would expect that a series of experiments would tend toward the 2/3 limit. However, such a set of experiments is rarely done. The sum is found by use of the plus sign to imply the logical relation "exclusive or." The idea is that experiments with coins have been done, and so independence has been well enough established to permit us to make such a calculation without actually doing experiments to see whether the law of large numbers will validate the 2/3 result. That is, we say that the 2/3 result logically follows if the basic concept of independence has been established for coin tossing in general.
Concerning the doomsday argument, we note that in the last few hundred years the human population has been increasing exponentially. Prior to that, however, its numbers went up and down in accord with Malthusian population dynamics as indicated by the logistic differential equation, which sets a threshold below which a population almost certainly goes extinct. That harsh Malthusian reality becomes ever more likely as the global population pushes the limits of sustainability. So this tends to rule out some normally distributed population over time -- whereby the current population is near the middle and low population tails are found in past and future -- simply because the population at some point may well return from its current exponential distribution to a jaggedly charted curve reminiscent of stock market charts.
The bus stop scenario can serve to illustrate our expectation that random events tend to cancel each other out on either side of a mean. That is, we expect that the randomly chosen numbers below the mean of 30 to show low correlation with the randomly chosen numbers above the mean, but that nevertheless we think that their average will be 30. This is tantamount to defining the interval [-30,30] where 0 represents the median 30 in the interval [0,60]; we suspect that if we add all the randomly chosen numbers the sum will be close to zero. Why do "randoms" tend to cancel? Elsewhere, we look into "hidden variables" views.
Of course, our various conceptions of randomness and our belief in use of such randomness as a basis for predictions would be greatly undermined by even one strong counterexample, such as an individual with a strong "gift" of telekinesis who is able to influence the computer's number selection. At this point, I am definitely not proposing that such a person exists. However, if one did come to the attention of those whose mooring posts are the tenets of probability theory and inferential statistics, that person (or the researcher reporting on the matter) would come under withering fire because so many people have a strong vested emotional interest in the assumptions of probability theory. They would worry that the researcher is a dupe of the Creationists and Intelligent Design people. However, let us not tarry too long here.
More on induction
Jevons saw the scientific method as based on induction and used Laplace's rule of succession as a "proof" of scientific induction. Yet neither he, nor Pearson, who also cited it favorably, included a mathematical explanation of Laplace's rule, unlike Keynes, who analyzed Laplace's rule and also offered an amended form of it. Jevons, Pearson and Keynes all favored forms of Bayesian reasoning, often called "the inverse method."
Jevons and probability
http://plato.stanford.edu/entries/william-jevons/
On causality and probability, Jevons wrote: "If an event can be produced by any one of certain number of different causes, the probabilities of the existence of these causes as inferred from the event, are proportional to the event as derived from the causes," adding: "In other words, the most probable cause of an event which has happened supposing the cause to exist; but all other possible causes are also taken into account with probabilities proportional to the probability that the event would have happened if the cause existed" (37).
Jevons then uses standard ball and urn conditional probability examples.
A point of dispute here is the word "cause." In fact, in the urn and ball case, we might consider it loose usage to say that it is most probable that urn A is the cause of the outcome.
Still, it is fair to say that urn A's internal composition is the most probable relevant predecessor to the outcome. "Causes" are the hidden sub-vectors of the net force vector, which reaches the quantum level, where causation is a problematic idea.
The problem of causation deeply occupied both Pearson and Fisher, who were more favorably disposed to the concept of correlation as opposed to causation. We can see here that their areas of expertise would tend to promote such a view; that is to say that they, not being physicists, would tend to favor positing low propensity information. Philosophically, they were closer to the urn of nature concept of probability than they might have cared to admit.
And that brings us back to the point that a probability method is a tool for guidance in decision-making or possibly in apprehending truth, though this second item is where much ambiguity arises.
One must fill in the blanks for a particular situation. One must use logical reasoning, and perhaps statistical methods, to go from mere correlation to causation, with the understanding that the problem of cause and effect is a notorious philosophical conundrum.
Cause-effect is in many respects a perceptual affair. If one steps "outside" the spacetime block (see section on spacetime), where is cause-effect?
Also, consider the driver who operates his vehicle while under the influence of alcohol and becomes involved in an auto accident. He is held to be negligent as if by some act of will, whereby his decision to drink is said to have "caused" the accident. First, if a person's free-will is illusory, as seems to be at least partly true if not altogether true, then how do we say his decision caused anything? Second, some might term the decision to drink and drive the "proximate" cause of the accident. However, there are many other influences (causes?) that sum to the larger "cause." The interlinked rows of dominos started falling sometime long ago -- if one thinks in a purely mechanistic computer-like model. How does one separate out causes? We address this issue from various perspectives as we go along.
Brain studies tend to confirm the point that much that passes for free will is illusory. And yet, this very fact seems to argue in favor of a need for a core "animating spirit," or amaterial entity: something that is deeper than the world of phenomena that includes somatic functions; such a liberated pilot spirit would, it seems to me, require a higher order spirit to bring about such a liberation. I use the word "spirit" in the sense of amaterial unknown entity and do not propose a mystical or religious definition; however, the fact that the concept has held for centuries suggests that many have come intuitively to the conclusion that "something is in there."
I realize that here I have indulged in "non-scientific speculation" but I argue that "computer logic" leads us to this means of answering paradoxes. That is, we have a Goedelian argument that points to a "higher frame." But in computer logic, the frames "go all the way up" to infinity. We need something outside, or that is greater than and fundamentally different from, the spacetime block with which to have a bond.
Pearson in The Grammar of Science (38) makes the point that a randomized sequence means that we cannot infer anything from the pattern. But, if we detect a pattern, we can then write an algorithm for its continuation. So we can think of the program as the cause, and it may or may not give a probability 1 as to some number's existence at future step n.
I extend Pearson's idea here; he says the analogy should not be pressed too far but I think it makes a very strong point; and we can see that once we have an algorithm, we have basic system information. The longer the recognizable sequence of numbers, the higher the probability we assign it for non-randomness; see my discussion on that:
A note on periodicity and probability
http://kryptograff5.blogspot.com/2013/08/draft-1-please-let-me-know-of-errors.html
Now when we have what we suspect is a pattern, but have no certain algorithm, then we may find ways to assign probabilities to various conjectures as to the underlying algorithm.
A scientific theory serves as a provisional algorithm: plug in the input values and obtain predictable results (within tolerances).
If we see someone write the series
1,2,4,8,16,32
we infer that he is doubling every previous integer. There is no reason to say that this is definitely the case (for the moment disregarding what we know of human learning and psychology), but with a "high degree of probability," we expect the next number to be 64.
How does one calculate this probability?
The fact the series climbs monotonically would seem to provide a floor probability at any rate, so that a nonparametric test would give us a useful value. Even so, what we have is a continuum. Correlation corresponds to moderate probability that A will follow B, causation to high probability of same. In modern times, we generally expect something like 99.99% probability to permit us to use the term "cause." But even here, we must be ready to scrap our assumption of direct causation if a better theory strongly suggests that "A causes B" is too much of a simplification.
For example, a prosecutor may have an apparently air-tight case against a husband in a wife's murder, but one can't completely rule out a scenario whereby CIA assassins arrived by black helicopter and did the woman in for some obscure reason. One may say that the most probable explanation is that the husband did it, but full certainty is rare, if it exists at all, in the world of material phenomena.
And of course the issue of causation is complicated by issues in general relativity -- though some argue that these can be adequately addressed -- and quantum mechanics, where the problem of causation becomes enigmatic.
Popper argued that in "physics the use of the expression 'causal explanation' is restricted as a rule to the special case in which universal laws have the form of laws of 'action by contact'; or more precisely, of 'action at a vanishing distance' expressed by differential equations" (39) [Popper's emphasis].
The "principle of causality," he says, is the assertion that any event that is amenable to explanation can be deductively explained. He says that such a principle, in the "synthetic" sense, is not falsifiable. So he takes a neutral attitude with respect to this point. This relates to our assertion that theoretic systems have mathematical relations that can be viewed as cause and effect relations.
Popper sees causality in terms of universal individual concepts, an echo of what I mean by sets of primitives. Taking up Popper's discussion of "dogginess," I would say that one approach is to consider the ideal dog as an abstraction of many dogs that have been identified, whereby that ideal can be represented by a matrix with n unique entries. Whether a particular object or property qualifies as being associated with a dogginess matrix depends on whether that object's or property's matrix is sufficiently close to the agreed universal dogginess matrix. In fact, I posit that perception of "reality" works in part according to such a system, which has something in common with the neural networks of computing fame.
(These days, of course, the ideal dog matrix can be made to correspond to the DNA sequences common to all canines.)
But, in the case of direct perception, how does the mind/brain know what the template, or matrix ideal, of a dog is? Clearly, the dogginess matrix is compiled from experience, with new instances of dogs checked against the previous matrix, which may well then be updated.
A person's "ideal dog matrix" is built up over time, of course, as the brain integrates various percepts. Once such a matrix has become "hardened," a person may find it virtually impossible to ignore that matrix and discover a new pattern. We see this tendency especially with respect to cultural stereotypes.
Still, in the learning process, a new encounter with a dog or representation of a dog may yield only a provisional change in the dogginess matrix. Even if we take into account all the subtle clues of doggy behavior, we nevertheless may be relating to something vital, in the sense of nonphysical or immaterial, that conveys something about dogginess that cannot be measured. On the other hand, if one looks at a still or video photograph of a dog, nearly everyone other than perhaps an autistic person or primitive tribesman unaccustomed to photos, agrees that he has seen a dog. And photos are these days nothing but digital representations of binary strings that the brain interprets in a digital manner, just as though it is using a neural matrix template.
Nevertheless, that last point does not rule out the possibility that, when a live dog is present, we relate to a "something" within, or behind, consciousness that is nonphysical (in the usual sense). That is, the argument that consciousness is an epiphenomenon of the phenomenal world cannot rule out something "deeper" associated with a noumenal world.
The concept of intuition must also be considered when talking of the empirico-inductive method. (See discussion on types of intuition in the "Noumenal world" section of Part VI; link in sidebar.)
More on causality
David Hume's argument that one cannot prove an airtight relation between cause and effect in the natural world is to me self-evident. In his words:
"Matters of fact, which are the second objects of human reason, are not ascertained in the same manner [as are mathematical proofs]; nor is our evidence of their truth, however great, of a like nature with the foregoing. The contrary of every matter of fact is still possible, because it can never imply a contradiction, and is conceived by the mind with the same facility and distinctness, as if ever so conformable to reality. That the sun will not rise tomorrow is no less intelligible a proposition, and implies no more contradiction, than the affirmation, that it will rise. We should in vain, therefore, attempt to demonstrate its falsehood. Were it demonstratively false, it would imply a contradiction, and could never be distinctly conceived by the mind..."
To summarize Hume:
I see the sun rise and form the habit of expecting the sun to rise every morning. I refine this expectation into the judgment that "the sun rises every morning."
This judgment cannot be a truth of logic because it is conceivable that the sun might not rise. This judgment cannot be conclusively established empirically because one cannot observe future risings or not-risings of the sun.
Hence, I have no rational grounds for my belief, but custom tells me that its truthfulness is probable. Custom is the great guide of life.
We see immediately that the scientific use of the inductive method itself rests on the use of frequency ratios, which themselves rest on unprovable assumptions. Hence a cloud is cast over the whole notion of causality.
This point is made by Volodya Vovk: "... any attempt to base probability theory on frequency immediately encounters the usual vicious cycle. For example, the frequentist interpretation of an assertion such as Pr(E) = 0.6 is something like: we can be practically (say, 99.9%) certain that in 100,000 trials the relative frequency of success will be within 0.02 of 0.6. But how do we interpret 99.9%? If again using frequency interpretation, we have infinite regress: probability is interpreted in terms of frequency and probability, the latter probability is interpreted in terms of frequency and probability, etc" (40).
Infinite regress and truth
We have arrived at a statement of a probability assignment, as in: Statement S: "The probability of Proposition X being true is y."
We then have:
Statement T: "The probability that Statement Q is true is z."
Now what is the probability of Q being true? And we can keep doing this ad infinitum?
Is this in fact conditional probability? Not in the standard sense, though I suppose we could argue for that also.
Statement S is arrived at in a less reliable manner than statement T, presumably, so that such a secondary can be justified, perhaps.
This shows that, at some point, we simply have to take some ancillary statement on faith.
Pascal's wager and truth
"From nothing, nothing" is what one critic has to say about assigning equal probabilities in the face of complete ignorance.
Let's take the case of an observer who claims that she has no idea of the truthfulness or falseness of the statement "God exists."
As far as she is concerned, the statement and its negation are equally likely. Yes, it may be academic to assign a probability of 1/2 to each statement. And many will object that there are no relevant frequencies; there is no way to check numerous universes to see how many have a supreme deity.
And yet, we do have a population (or sample space), that being the set of two statements {p, ~p}. Absent any other knowledge, it may seem pointless to talk of a probability. Yet, if one is convinced that one is utterly ignorant, one can still take actions:
1. Flip a coin and, depending on the result, act as though God exists, or act as though God does not exist.
2. Decide that a consequence of being wrong about existence of a Supreme Being is so great that there is nothing to lose and a lot to gain to act as though God exists (Pascal's solution).
3. Seek more evidence, so as to bring one closer to certainty as to whether p or ~p holds.
In fact, the whole point of truth estimates is to empower individuals
to make profitable decisions. So when we have a set of equiprobable
outcomes, this measures our maximum ignorance. It is not always relevant whether a
large number of trials has established this set and its associated
ratios.
That is, one can agree that the use of mathematical quantifications in a situation such as Pascal's wager is pointless, and yields no real knowledge. But that fact doesn't mean one cannot use a form of "probabilistic reasoning" to help with a decision. Whether such reasoning is fundamentally wise is another question altogether, as will become apparent in the sections on noumena (Part VI, see sidebar).
There have been attempts to cope with Hume's "problem of induction" and other challenges to doctrines of science. For example, Laplace addressed Hume's sun-rising conundrum with the "rule of succession," which is based on Bayes's theorem. Laplace's attempt, along with such scenarios as "the doomsday argument," may have merit as thought experiments, but cannot answer Hume's basic point: We gain notions of reality or realities by repetition of "similar" experiences; if we wish, we could use frequency ratios in this respect. But there is no formal way to test the truthfulness of a statement or representation of reality.
Is this knowledge?
"Science cannot demonstrate that a cataclysm will not engulf the universe tomorrow, but it can prove that past experience, so far from providing a shred of evidence in favour of any such occurrence, does, even in the light our ignorance of any necessity in the sequence of our perceptions, give an overwhelming probability against such a cataclysm." -- Karl Pearson (41)
I would argue that no real knowledge is gained from such an assessment. Everyone, unless believing some occasional prophet, will perforce act as though such an assessment is correct. What else is there to do? Decision-making is not enhanced. However, we suggest that, even in the 21st century, physics is not so deeply understood as to preclude such an event. Who knows, for example, how dark matter and dark energy really behave at fundamental levels? So there isn't enough known to have effective predictive algorithms on that scale. The scope is too great for the analytical tool. The truth is that the propensities of the system at that level are unknown.
This is essentially the same thought experiment posed by Hume, and "answered" by Laplace. Laplace's rule is derived via the application of the continuous form of Bayes's theorem, based on the assumption of a uniform probability distribution. That is, all events are construed to have equal probability, based on the idea that there is virtually no system information (propensity), so that all we have to go on is equal ignorance. In effect, one is finding the probability of a probability with the idea that the possible events are contained in nature's urn. With the urn picture in mind, one then is trying to obtain the probability of a specific proportion. (More on this later.)
S.L. Zabell on the rule of succession
That is, one can agree that the use of mathematical quantifications in a situation such as Pascal's wager is pointless, and yields no real knowledge. But that fact doesn't mean one cannot use a form of "probabilistic reasoning" to help with a decision. Whether such reasoning is fundamentally wise is another question altogether, as will become apparent in the sections on noumena (Part VI, see sidebar).
There have been attempts to cope with Hume's "problem of induction" and other challenges to doctrines of science. For example, Laplace addressed Hume's sun-rising conundrum with the "rule of succession," which is based on Bayes's theorem. Laplace's attempt, along with such scenarios as "the doomsday argument," may have merit as thought experiments, but cannot answer Hume's basic point: We gain notions of reality or realities by repetition of "similar" experiences; if we wish, we could use frequency ratios in this respect. But there is no formal way to test the truthfulness of a statement or representation of reality.
Is this knowledge?
"Science cannot demonstrate that a cataclysm will not engulf the universe tomorrow, but it can prove that past experience, so far from providing a shred of evidence in favour of any such occurrence, does, even in the light our ignorance of any necessity in the sequence of our perceptions, give an overwhelming probability against such a cataclysm." -- Karl Pearson (41)
I would argue that no real knowledge is gained from such an assessment. Everyone, unless believing some occasional prophet, will perforce act as though such an assessment is correct. What else is there to do? Decision-making is not enhanced. However, we suggest that, even in the 21st century, physics is not so deeply understood as to preclude such an event. Who knows, for example, how dark matter and dark energy really behave at fundamental levels? So there isn't enough known to have effective predictive algorithms on that scale. The scope is too great for the analytical tool. The truth is that the propensities of the system at that level are unknown.
This is essentially the same thought experiment posed by Hume, and "answered" by Laplace. Laplace's rule is derived via the application of the continuous form of Bayes's theorem, based on the assumption of a uniform probability distribution. That is, all events are construed to have equal probability, based on the idea that there is virtually no system information (propensity), so that all we have to go on is equal ignorance. In effect, one is finding the probability of a probability with the idea that the possible events are contained in nature's urn. With the urn picture in mind, one then is trying to obtain the probability of a specific proportion. (More on this later.)
S.L. Zabell on the rule of succession
http://www.ece.uvic.ca/~bctill/papers/mocap/Zabell_1989.pdf
To recapitulate, in Pearson's day, as in ours, there is insufficient information to plug in initial values to warrant assigning a probability -- though we can grant that Laplace's rule of succession may have some merit as an inductive process, in which that process is meant to arrive at the "most realistic" initial system information (translated as an a priori propensity).
Randomness versus causality
Consider, says Jevons, a deck of cards dealt out in numerical order (with suits ranked numerically). We immediately suspect nonrandomness. I note that we should suspect nonrandomness for any specific order posited in advance. However, in the case of progressive arithmetic order, we realize that this is a common choice of order among humans. If the deck proceeded to produce four 13-card straight flushes in a row, one would surely suspect design.
But why? This follows from the fact that the number of "combinations" is far higher than the number of "interesting" orderings.
Here again we have an example of the psychological basis of probability theory. If we took any order of cards and asked the question, what is the probability it will be dealt that way, we get the same result: (52!)-1 =~ (8*1067)-1.
Now suppose we didn't "call" that outcome in advance, but just happened upon a deck that had been recently dealt out after good shuffling. What is our basis for suspecting that the result implies a nonrandom activity, such as a card sharp's maneuvering, is at work? In this case a nonparametric test, such as a runs test or sign test is strongly indicative.
Jevons had no nonparametric test at hand (unless one considers Laplace's rule to be such) but even so argued that if one sees a deck dealt out in arithmetical order, then one is entitled to reason that chance did not produce it. This is a simple example of the inverse method.
Jevons points out that, whether math is used or not, scientists tend to reason by inverse probability. He cites the example of the then recently noted flint flakes, many found with evidence of more than one stroke of one stone against another. Without resort to statistical analysis, one can see why scientists would conclude that the flakes were the product of human endeavor.
In fact, we might note that the usual corpus of standard statistical models does indeed aim to sift out an inverse probability of sorts in a great many cases, notwithstanding the dispute with the Bayesian front.
The following examples of the inverse method given by Jevons (42) are of the sort that Keynes disdained:
1. All larger planets travel in the same direction around the sun; what is the probability that, if a new planet exterior to Neptune's orbit is discovered, it will follow suit? In fact, Pluto, discovered after Jevons's book was published (and since demoted from planetary status), also travels in the same direction around the sun as the major planets.
2. All known noble gases, excepting chlorine, are colorless; what is the probability that, if some new noble gas is discovered, it will be colorless? And here we see the relevance of a system's initial information. Jevons wrote well before the electronic theory of chemistry was worked out. (And obviously, we have run the gamut of stable elements, so the question is irrelevant from a practical standpoint.)
3. Bode's law of distance from the sun to each planet, except Neptune, showed in Jevons's day close agreement with distances calculated using a specific mathematical expression, if the asteroid belt was also included. So, Jevons reasoned that the probability that the next planet beyond Neptune would conform to this law is -- using Laplace's rule -- 10/11. As it happens, Pluto was found to have fairly good agreement with Bode's law. Some experts believe that gravitational pumping from the larger planets has swept out certain oribital zones, leaving harmonic distances favorable to long-lasting orbiting of matter.
The fact that Laplace's rule "worked" in terms of giving a "high" ranking
to the plausibility of a Bode distance for the next massive body to be
found may be happenstance. That is, it is plausible that a physical resonance effect
is at work, which for some reason was violated in Neptune's case. It was
then reasonable to conjecture that these resonances are not usually
violated and that one may, in this case, assign an expert opinion
probability of maybe 80 percent that the next "planet" would have
a Bode distance. It then turns out that Laplace's rule also gives
a high value: 90.9 percent. But in the first case, the expert is using
her knowledge of physics for a "rough" ranking. In the second case,
no knowledge of physics is assumed, but a definitive number is given,
as if it is somehow implying something more than what the expert has
suggested, when in fact it is an even rougher means of ranking than
is the expert's.
Now one may argue that probability questions should not be posed in such cases as Jevons mentions. Yet if we remember to regard answers as tools for guidance, they could possibly bear some fruit. However, an astronomer might well scorn such questions because he has a fund of possibly relevant knowledge. And yet, though it is hard to imagine, suppose life or death depends on a correct prediction. Then, if one takes no guidance from a supernatural higher power, one might wish to use a "rationally" objective probability quantity.
Induction, the inverse method and the "urn of nature" may sound old-fashioned, but though the names may change, much of scientific activity proceeds from these concepts.
How safe is the urn of nature model?
Suppose there is an urn holding an unknown but large number of white and black balls in unknown proportion. If one were to draw 20 white balls in a row, "common sense" would tell us that the ratio of black balls to white ones is low (and Laplace's rule would give us a 21/22 probability that the next ball is white). I note that "common sense" here serves as an empirical method. But we have two issues: the assumption is that the balls are well mixed, which is to say the urn composition is presumed homogeneous; if the number of balls is high, homogeneity needn't preclude a cluster of balls that yields 20 white draws.
We need here a quantitative way to measure homogeneity; and this is where modern statistical methods might help, given enough input information. In our scenario, however, the input information is insufficient to justify the assumption of a 0.5 ratio. Still, a runs test is suggestive of non-randomness in the sense of a non-0.5 ratio.
Another issue with respect to induction is an inability to speak with certainty about the future (as in, will the ball drop if I let go of it?). This in fact is known as is "the problem of induction," first raised by Hume.
To summarize some points previously made, induction is a process of generalizing from the observed regularities. Now these regularities may be gauged by applying simple frequencies or by rule of succession reasoning, in which we have only inference or a propensity of the system, or by deductive reasoning, whereby we set up an algorithm that, when applied, accounts for a range of phenomena. Here the propensity is given a nonzero information value. Still, as said, such deductive mechanisms are dependent on some other information -- "primitive" or perhaps axiomatic propensities -- as in the regularities of gravity. Newton and Einstein provide nonzero framework information (propensities) leading to deductions about gravity in specific cases. However, the deductive system of Newton's gravitational equation depends on the axiomatic probability 1 that gravity varies by inverse square of the distance from a ponderable object's center of mass, as has, for non-relativistic magnitudes, been verified extensively with very occasional anomalies ignored as measurement outliers.
Popper in Schism takes note of the typical scientist's "metaphysical faith of the existence of regularities in the world (a faith which I share and without which practical action is hardly conceivable)" (43).
Measures of future uncertainty, such as the Gaussian distribution, "satisfy our ingrained desire to 'simplify' by squeezing into one single number matters that are too rich to be described by it. In addition, they cater to psychological biases and our tendency to understate uncertainty in order to provide an illusion of understanding the world," observed Benoit Mandelbrot and Nassim Taleb.
A focus on the exceptions that prove the rule
Now one may argue that probability questions should not be posed in such cases as Jevons mentions. Yet if we remember to regard answers as tools for guidance, they could possibly bear some fruit. However, an astronomer might well scorn such questions because he has a fund of possibly relevant knowledge. And yet, though it is hard to imagine, suppose life or death depends on a correct prediction. Then, if one takes no guidance from a supernatural higher power, one might wish to use a "rationally" objective probability quantity.
Induction, the inverse method and the "urn of nature" may sound old-fashioned, but though the names may change, much of scientific activity proceeds from these concepts.
How safe is the urn of nature model?
Suppose there is an urn holding an unknown but large number of white and black balls in unknown proportion. If one were to draw 20 white balls in a row, "common sense" would tell us that the ratio of black balls to white ones is low (and Laplace's rule would give us a 21/22 probability that the next ball is white). I note that "common sense" here serves as an empirical method. But we have two issues: the assumption is that the balls are well mixed, which is to say the urn composition is presumed homogeneous; if the number of balls is high, homogeneity needn't preclude a cluster of balls that yields 20 white draws.
We need here a quantitative way to measure homogeneity; and this is where modern statistical methods might help, given enough input information. In our scenario, however, the input information is insufficient to justify the assumption of a 0.5 ratio. Still, a runs test is suggestive of non-randomness in the sense of a non-0.5 ratio.
Another issue with respect to induction is an inability to speak with certainty about the future (as in, will the ball drop if I let go of it?). This in fact is known as is "the problem of induction," first raised by Hume.
To summarize some points previously made, induction is a process of generalizing from the observed regularities. Now these regularities may be gauged by applying simple frequencies or by rule of succession reasoning, in which we have only inference or a propensity of the system, or by deductive reasoning, whereby we set up an algorithm that, when applied, accounts for a range of phenomena. Here the propensity is given a nonzero information value. Still, as said, such deductive mechanisms are dependent on some other information -- "primitive" or perhaps axiomatic propensities -- as in the regularities of gravity. Newton and Einstein provide nonzero framework information (propensities) leading to deductions about gravity in specific cases. However, the deductive system of Newton's gravitational equation depends on the axiomatic probability 1 that gravity varies by inverse square of the distance from a ponderable object's center of mass, as has, for non-relativistic magnitudes, been verified extensively with very occasional anomalies ignored as measurement outliers.
Popper in Schism takes note of the typical scientist's "metaphysical faith of the existence of regularities in the world (a faith which I share and without which practical action is hardly conceivable)" (43).
Measures of future uncertainty, such as the Gaussian distribution, "satisfy our ingrained desire to 'simplify' by squeezing into one single number matters that are too rich to be described by it. In addition, they cater to psychological biases and our tendency to understate uncertainty in order to provide an illusion of understanding the world," observed Benoit Mandelbrot and Nassim Taleb.
http://www.ft.com/intl/cms/s/2/5372968a-ba82-11da-980d-0000779e2340.html
Some outliers are just too big to handle with a normal curve. For example, if a 300-pound man's weight is added to that of the weights of 100 other persons, he isn't likely to have substantial effect on the mean. But if Bill Gates's net income is added to the incomes of 100 other persons, the mean will be mean-ingless. Similarly, the Fukushima event, using Gaussian methods, was extraordinarily improbable. But the world isn't necessarily as Gaussian as we would like to believe. As said previously, one way to approach the idea of regularity is via pattern recognition matrices. If a sufficient number of entries in two matrices are identical, the two "events" so represented are construed as identical or similar to varying degrees between 1 and 0. But of course, we are immediately brought to the concept of perception, and so we may say that Popper has a metaphysical faith in the mind's reality sensing and construction process of most people, not including some severe schizophrenics. (See Toward.)
Imagine a situation in which regularities are disrupted by sudden jumps in the mind's reality constructor. Life under such conditions might be unbearable and require neurological attention. On the other hand, sometimes "miracles" or "works of wonder," are attested to, implying that some perceive occasional violation of the humdrum of regularities, whether this is a result of a psychosomatic process, wishful/magical thinking, or some sort of intersection with a noumenal world (Part VI, see sidebar).
The theme of "regularities" coincides with what Gott calls the Copernican principle, which I interpret as meaning a metaphysical faith that the rules of nature are everywhere the same (except perhaps in parallel universes).
Gott on the 'Copernican principle'
http://www-psych.stanford.edu/~jbt/224/Gott_93.pdf
It is important to face Hume's point that scientific ideologies of various sorts rest upon unprovable assumptions. For example, the Copernican principle, which Gott interprets as meaning that a human observer occupies no special time or place in the cosmos, is a generalization of the Copernican-Galilean model of the solar system. Interestingly, by the way, the Copernican principle contrasts with the anthropic cosmological principle (discussed later).
Einstein's belief in what must be considered a form of Laplacian realism is put in sharp relief with this assertion:
“The only justification for our concepts and system of concepts is that they serve to represent the complex of our experiences; beyond this they have no legitimacy. I am convinced that the philosophers have had a harmful effect upon the progress of scientific thinking in removing certain fundamental concepts from the domain of empiricism, where they are under our control, to the intangible heights of the a priori. For even if it should appear that the universe of ideas cannot be deduced from experience by logical means, but is, in a sense, a creation of the human mind, without which no science is possible, nevertheless this universe of ideas is just as little independent of the nature of our experiences as clothes are of the form of the human body” (44).
The problem of induction is an obstacle for Einstein. Scientific inquiry requires that it be ignored. However, one might say that this irrational rationalism led to a quagmire that he was unable to see his way past, despite being a friend and colleague of the physicist-turned-logician Kurt Goedel, who had strong reservations about what is sometimes termed Einstein's naive realism.
Another take on this subject is to make a formally valid statement: A implies B, which is to say, "If A holds, then so does B." So if one encounters A as being true or as "the case," then he can be sure that B is also the case. But, at some point in his chain of reasoning, there is no predecessor to A. So then A must be established by induction or accepted as axiomatic (often both) and not by deduction. A is not subject to proof within the system. Of course this is an elementary observation, but those who "believe in Science" need to be reminded that scientific method is subject to certain weaknesses inherent in our plane of existence.
So we tend to say that though theories cannot be proved true, there is a level of confidence that comes with how many sorts of phenomena and how many special cases are successfully predicted by a theory (essentially, in the "hard" sciences, via an algorithm or set of algorithms).
But the fact that some theories are quite successful over a range of phenomena does not mean that they have probabilistically ruled out a noumenal world. It does not follow that successful theories of phenomena (and they are not fully successful) demonstrate that a noumenal world is highly improbable. In fact, David Bohm's struggles with quantum theory led him to argue that the world of phenomena must be supplemented by a noumenal world (he did not use that term) that permits bilocality via an "implicate," or unseen, order.
The noumenal world is reflected by our attempts to contend with randomness. The concept of pure randomness is, I would say, an ideal derived from our formalizations of probability reasoning. Consider a notionally constructible binary string, in which each unit is selected by a quantum detector. For example, if the algorithm clock is set for 1 second intervals. A detection of a cosmic ray in that period is recorded as a 1, whereas no detection during the interval receives a 0.
If this algorithm runs to infinity, we have a probability 1 of every possible finite substring appearing an infinity of times (ignoring the presumed change in cosmic ray distribution tens of billions of years hence). This follows from (1-p)n = 1 as n goes infinite. So, for example, Fred Hoyle noticed that if the cosmos is infinite in size, we would expect an infinity of Fred Hoyles spread across the cosmos.
But, unless you are quite unusual, such a possibility doesn't accord with your concept of reality, does it? You have an inner sense here that we are "playing with numbers." And yet, in the Many Worlds interpretation of quantum physics, there is either an infinite or some monstrously large set of cosmoses in which versions of Fred Hoyle are found many times over -- and remarkably, this multiplicity scenario was developed in order to affirm causality and be rid of the notion of intrinsic randomness.
The Many Worlds "multiverse" is one explanation of what I term the noumenal world. But this interpretation has its problems, as I discuss in Toward.
Yet it is hard not to make common cause with the Many Worlds defenders, arguing that randomness run amok does not seem an appropriate representation of the universe.
LINK TO PART IV IN SIDEBAR
35. A Treatise on Probability by J.M. Keynes (Macmillan, 1921).
36. E.T. Jaynes: Papers on probability, statistics and statistical physics, R.D. Rosenkrantz editor (D. Reidel 1983).
37. The Principles of Science (Vol I) by William Stanley Jevons (Routledge/Thoemmes Press, 1996 reprint of 1874 ms).
38. The Grammar of Science by Karl Pearson (Meridian 1957 reprint of 1911 revised edition).
39. The Logic of Scientific Discovery by Karl Popper. Published as Logik der Forschung in 1935; English version published by Hutchinson in 1959.
40. Vovk's paper, "Kolmogorov complexity conception of probability," appears in Probability Theory, Philosophy, Recent History and Relations to Science edited by Vincent F. Hendricks, Stig Andur Pedersen, Klaus Frovin Jorgensen
41. Grammar, Pearson.
42. Principles of Science, Jevons.
43. Quantum Theory and the Schism in Physics (Postscript Volume III) by Karl Popper (Routledge, 1989. Hutchinson, 1982).
44. The Meaning of Relativity (fifth edition) by Albert Einstein (Princeton, 1956).
No comments:
Post a Comment