Friday, June 7, 2013

SIGNIFICANCE


At tea parties, if they still have them, but most especially at cocktail parties, which I know are the present craze, you will hear the word, “significance.”

A guy dressed in faded jeans and a cotton shirt poised in deep thought with his chin carefully, but gently cradled between the forefinger and the thumb, as the forefinger carefully caresses the side of the face. He mutters to the throng of well-dressed beautiful people each carrying their share of the bubbly or the sanguine color of thought in their glasses, as they hang on his every syllable. “Significant… I would say.” And the crowd utters a sigh of relief. You can feel them shiver disquietingly as if the sage might have gone the other way. What then of their future fortunes? What would have happened to their parties? What about their stockholdings, their mansions on the Greek isle or the French Riviera? Why that would have never worked. We would have to find a better statistician, then wouldn’t we? The pretty lady in the red gown wonders.



And so in the dark art of probability, there resides the goose that periodically lays the golden egg of significance. But no one has cracked open the egg yet; to see what is inside and that is where we lay our scene…

What does significance mean? This is significant compared to that? Or this has more significance than that? The veritable definition of statistical significance is “the extent to which a result is unlikely to be due to chance alone.” Pay careful attention to the word “extent,” here. After all, the pride and joy of every scientist-author is to determine that the value of his or her hypothesis generated experimental model has relevance. That, using the information from the experiment, will yield benefits to the consumer, be it in engineering, mathematics, economics, social sciences and especially in medicine. We will restrict ourselves to medicine, because that is where everyday lives are being helped or hurt in such reliance.



The concept of significance is numerically advanced by the p-value. And p-value, is the fire that heats the water and makes the steam that turns the engine, is defined as the probability that the observed data would occur by chance in a given single null hypothesis, or the probability of incorrectly rejecting a given null hypothesis in favor of a second alternate hypothesis.” In an experiment, the lower the p-value below 0.5 the better the chance that the experiment is not a fluke. And the levels of significance are 10% (0.1), 5% (0.05), 1% (0.01), 0.5% (0.005), and 0.1% (0.001). What? What did I just say? “FLUKE?” Well actually, to be quite truthful the “p” in the “p-value” stands for none other than “probability!” Yes you read that right, “PROBABILITY.” So a p-value of 0.05 means that there is a 5% chance that the expressed difference is a “FLUKE.”  The “p” is determined in this formula {p = 2(1- cumulative distribution function x sample size)}



In a coin toss experiment, where the coin toss would yield “heads” or “tails” The probability of two successive “heads” in two flips = (1/2)^2 = 0.25 and that is certainly possible and happens all the time that you get two “heads” or two “tails” with two flips. Now doing ten flips in a row the probability becomes much smaller 1(1/2)^9 = 0.002 which means a 0.2% chance. That is how probabilities are ascertained through straightforward arithmetic.



Fluke in this argument, based on new drug evaluation against a standard drug, is rendered as such: If two drugs for instance are being compared and the experimental drug shows benefit against the standard drug by a certain margin that the probability of that benefit is outside the realm of chance, then the experimental drug is considered significant in its benefits compared to the standard drug. And that is why after the clinical trials are over and a lot of money has been spent over the “new great miracle drug” the reality in the doctor's clinic is, shall we say, different!



But, and here is the devil in this minor detail. In most medical studies, the experts use the cautionary brackets of Confidence Interval (CI). And Confidence Interval is none other than the limits of the significance based on the number of the sample (n) that expresses fairly confidently, within two standard deviations, that the observed benefits of the experimental drug were within the confines of 95% probability. In other words, the experimental drug induced benefits are true given the probable accuracy of the experiment 95% of the times, and by extension, a failure to uphold that possibility in the other 5% of the times. CI is determined: Confidence = Signal/Noise x square root (Sample size). And this is all predicated on the assumption that the model follows the Normal Distribution or as it is called the “Bell Curve,” or Gaussian Curve.”



You might ask if you are inquisitive, what happens in the 5%? Well those are the outliers. And these outliers are considered so far out of the dome of certainty that they need not be counted against. What?

Yes! Its true!

The outliers of 2.5 % on either side of the normal curve (that is if the curve is not skewed to either side), the grand old dame called “Bell Curve” rarely considers those little tails on either side. For you see, if we were to start considering those outliers in the studies, we would have to increase the number of people who would have to take the experimental drug and compare them to an equal number of those taking the standard drugs greatly. So the little (n) would become the big (N) if we were to test the drug in all of the 7 billion people on the planet.



A word on skew ness: The two 2.5% tails can be minimal on one side of the bell or the other or fat on both sides if the outliers are more than 5%. These so-called fat tails can out-maneuver and out-distance any fast running p-value.

Ultimately rationalizing ourselves into this form of accuracy is the best defense against any argument. Incidentally, most of the times a smaller “n” shows a flattened “curve” or platykurtosis, while a bigger “N” gives us a more tenting curve called leptokurtosis. The difference being that the standard deviations become smaller with a leptokurtotic curve and the tails get thinner and smaller with a bigger “n.”.



And that dear reader is the problem. Since the significant p-value is based on a probability of 95% accuracy, most of the times, which says that there is that possibility that the entire experiment could be a potential failure? With the outliers, left alone, wearing the dunce hat in the corner, the 5%, or 4% or 3% or 2% or even 1%, no matter how significant the value it is, it will never be 100% accurate.

In the end significance is a probability expression with the caveat that there may be a likelihood that the whole experiment is a failure, the drug is a failure and the author’s scientific treatise is as useless as the 1s and 0s it has accumulated. I am harsh, am I? Not when some experts sit on their elbows peering over there coke bottle lens eyeglasses and with their hands clasped, their spine hunched in ominous fashion like the bird of prey, pronouncing in exactitude the certainty of their findings. And in the future when proven wrong, they search to identify other targets within the failure to prove their point. For failure is not an oprion! The parties must go on and the Rodin stance must not be trifled with, for therein in that perfect thoughtful, intellectually nuanced moment lies the mystery of what hides behind the real scientist’s eyeglasses.

(from art5play)


The ideas that prevail in science are unique and some experiments do end up with the future love and fanfare by delivering what they promise, but in these heady days where publish or perish has taken on a whole new meaning of selling nonsense as sensible science and plotting a forest of data and curves made of famous names, tables that defy logistics and graphs that would make Excel blush, life is being lived in the fast and furious lane.



So next time, you say this is “significant” or you exercise the intellect of p-value because there are a couple of zeros after the decimal, please understand, it is tailored to the eye of the beholder.

No comments:

Post a Comment