Why should scientists prefer theories that are simple and not complex? There is a very simple answer to this complex problem: Leibniz points out in section VI of his Discourse on Metaphysics that a theory ought to be simpler than the data it sets out to explain, otherwise it does not explain anything. A theory becomes vacuous if an arbitrarily complex mathematical statement is permitted to count as a theory, for one can always construct a theory to fit the data, even if the data is random.
Today, complexity and simplicity are far more rigorous than in Leibniz’s day. We talk about information. This concept is used all the time when discussing computers. So how does this idea relate to scientific theories? The insight is a interpretation of scientific theories that treats them like software: the theory, coupled with some background assumptions and initial conditions, predicts observations in much the same way as a program is executed on a computer to provide an output.
A guiding rule of behavior that is not necessarily true, but useful in solving a problem is referred to as a ‘heuristic.’
For instance, when attempting to catch a baseball, the outfielder runs towards the ball while simultaneously keeping his eye trained on it. Since his neck is bending in relation to the position of the ball, he is far more interested in the degree at which his neck is bent detected in the inner ear rather than Newton’s laws of motion.
When his neck begins to bend back, the outfielder’s inner ear tells him that he is running away from the ball, and he reverses direction; when running toward the ball, if his inner ear tells him he is approaching level, the outfielder and the ball must also be approaching very quickly.
Take the two following heuristics:
- Occam’s razor : Given two theories that each explain the data equally well at time t, one ought to prefer the simpler theory until an outcome of a test, if true, says that one of the two theories is false (a ‘crucial experiment’);
- Liebniz’s lather : If a theory is the same size in bits or larger as the data it sets out to explain, then it is worthless, for any random string of data has a theory of that size.
These two heuristics provide the following methodological rule: Seek out theories that can be expressed in a (comparatively) small number of bits. Since a strictly universal statement (“all x are y”) can be expressed in a small number of bits when compared to a finite list of existential statements (“this x1 is a y, that x2 is a y … that xn is a y”), and since scientists are interested in theories that predict a great deal, when explaining a set of data, the scientists ought to prefer strictly universal statements that predict a great deal over a finite list of existential statements that predict very little. 
I have engaged with this bit of pedantry in order to make the case clear that scientists are interested in strictly universal statements. Now that this is tentatively settled, I will get to the heart of the problem.
Since a strictly universal statement predicts a great deal (i.e., it is interesting), it prohibits a large number of state of affairs, meaning that it has a probability that approaches zero. In brief, if a theory T1 predicts “It will rain on Monday”, then T2 “It will rain on Monday and it will rain on Wednesday” must necessarily be either as probable or less probable as T1.
Continue adding on predictions to arrive at Tn: An infinite list of conjunctions of predictions, meaning that Tn would have a probability that approaches zero. In other words, the probability of a statement has an inverse relationship with its content. Furthermore, the more interesting you make a theory (the more it predicts), the easier it is to test the theory, for there are more possible situations to test the theory.
The problem: There are as many different theories that satisfy Tn as there are types of Bayesianism!
Question: Which theory that satisfies Tn should the scientist prefer?
Answer: A theory expressible in a form that satisfies Liebniz’s lather!
In other words, out of all the possible theories that satisfy Tn, we want to adopt a strictly universal statement: A theory that is simple, interesting and improbable. Is there anything objectionable (counter-productive, imprudent) about adopting this answer? I ask this question, for its consequences are currently controversial in the philosophy of science.
Although the acceptance of this methodological rule is permitted, and often recommended, it is never demanded. Appealing to this rule is only to minimize the number of uninteresting and highly probable theories that are difficult to test. The reason is as follows: We have a very limited amount of time at hand, as as Keynes remarked, In the long run we are all dead. Thus, we want interesting and highly improbable theories that are comparatively easy to test.
Now, the great reveal: What I have just done is to reverse-engineer what Popper’s Critical Rationalism (CR) advocates (but does not demand), for  no number of corroborating evidence can indicate if a theory is true. ‘Testability’ is then equivalent to ‘falsifiability.’ The scientist is interested in the origination of bold conjectures that are easily falsified, if the theory is false!
We’ve moved from something uncontroversial to something radical. The argument goes as follows:
- Scientists prefer theories that are simple and interesting. (I., II.)
- If it is simple and interesting, then it is a strictly universal statement. (II.)
- Strictly universal statements are highly improbable. (III.)
- The more improbable a theory, the more testable a theory. (III.)
- Testability is equivalent to falsifiability. (V.)
- Scientists prefer theories that are highly falsifiable. (V.)
C: Critical Rationalism best describes scientific practice.
 An aside. I must give a caveat: I do not mean to imply that a scientist is prohibited from formulating a theory that presently violates Occam’s razor as traditionally understood, by positing presently unobserved regularities. I mean only that theories expressible in a small number of bits are to be preferred over others.
 I made the name up.
 This is a guiding principle, not a rule set in stone, for scientists that understand a problem in great detail often come about new theories through a flash of insight. I do not mean to say that there is a specific method to theory-formation, only that it is recommended that whatever theory a scientist posits should conform to these restrictions. Once the theory has been formed, the theory ought to receive — ahem! — a clean shave.