Wednesday, October 16, 2013

Data Pet Peeve & Words To Live By

One of FLG's biggest frustrations with the world is when people with high-level decision making authority, think corporate executives or government policy makers, believe that any data, even bad data, is better than no data.  He is doubly frustrated, and to be completely honest becomes damn near apoplectic, when they get blinded by statistical techniques.  Look, if you run a regression against a data set that either 1) doesn't actually measure what you care about or 2) is just a generally shitty data set all around, then the results of that regression are at least equally suspect.    Let's try not to lock onto those parameter estimates as being precise and accurate when making decisions.  Garbage in, garbage out, as they say.

FLG also has a long-standing rule of thumb that the more meaningful the data the more costly it is to gather.   Consequently, people often end up using less meaningful, but easier to gather data for the sake of using data because any data is better than no data.   To be fair, even bad or imperfect data can provide something, but there is something about people, especially decision makers, who are otherwise intelligent completely losing sight of the inherent weakness of the underlying data and incorrectly assuming a false precision of the resulting analysis, as if the particular number was somehow magic.

Anyway, thinking of this rule of thumb led FLG to begin listing out some broad guidelines, often cribbed from famous quotes, he tries to live his life by. FLG has even added a couple from The Black Swan, which express rules he's had for a while, but more elegantly:
Better to be broadly right than precisely wrong.
Never run for trains.
Always wear a pocket square.
Always be polite to waiters / waitresses.
Always say 'Please' and 'Thank You.'
Hold the door for people.
If you can't spot the sucker in the first half hour at the table, then you ARE the sucker.
Better to keep your mouth shut and be thought a fool than to open it and remove all doubt.
Try never to hurt anyone's feelings unintentionally.
If you don't make yourself laugh, then who will?

He's sure there are more, but that's it for now...


Andrew Stevens said...

FLG also has a long-standing rule of thumb that the more meaningful the data the more costly it is to gather.

This seems false to me. A lot of valuable data is easy to collect and it's not hard to think of useless data which would be very costly to gather. I think you're thinking of specific situations where you know that you could get better data by spending more money, but have to decide whether to do so or go with the less good, but cheaper data. Certainly this situation comes up (often), but there's also plenty of times when the best data you can get is very cheap. There are also plenty of times when we could, if we wished, spend a lot of money to gather useless data since that decision would be obviously irrational, but those situations exist as well.

FLG said...

I dismissed the costly useless data as irrational.

I think that if there is a systematic explanation, though I'd have to think about how to define it more precisely, it's that there is a declining marginal utility/meaning of additional values of the same metric.

Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 3.0 United States License.