« Our illiberal culture | Main | Europe's morality crisis »

September 16, 2011


Adam Bell

'Consultation' covers so many sins that equating it with groups of people trying to work out who produced a particular painting is somewhat meaningless. To take an example dear to my heart, consultations on planning applications involve asking people for information they have near-direct access to; i.e. the expected impact of the given development on their residential amenity. The information that goes into informing that expectation can be 'led' in the manner that you describe, but given that most consultation responses are via the post it's not clear that that's really a factor.


Of course, "consultation" means many things. The point of this experiment is to show that there are circumstances when it doesn't work - such as eliciting opinion rather than genuine knowledge.


Experimental design fail, as the youth are fond of saying these days.

Firstly, the paintings are clearly of a different style. I could recognise each artist's style without recognising the specific works. And in fact, this is a "demonstrably" correct judgement - the very property the authors were trying to get away from.

That, along with the implication that one work is going to be by Klee and one by Kandinsky, should pull answers from the "one right" category to the "both wrong" and "both right" categories, even for groups with zero exposure to either artist.

It also blows all their p values out of the water, since they are no longer dealing with statistically independent outcomes.

A moment's intelligent thought would have at least produced a design with an entirely different choice for a second work.


Second major fail is that, as they discuss, the answers of people in the "team" treatments were not statistically independent of each other. This does not matter for assessing the p values of team treatment influencing accuracy (ignoring my first point). However, it remains to be shown (and I can't be bothered to calculate) whether the tendency to wrong answers was statistically significant when analysed per team units. (There were only 1/6 the "n" for teams as the individuals in teams.)

In other words, although the team treated individuals had a statistically significant tendency to underperforance relative to the individually treated, all it shows is that they had a significant tendency to align with team performance.

Whether team performance itself was sub-par with statistical significance has not been shown.

Made up analogy for this point: England supporters tended to bet the wrong way in the crucial match with Argentina compared to indifferent. Therefore team supporters perform worse in match prediction (p<0.00000001). But no. Although the individual performance of this group was of course not statically independent (look at my p value!), the performance of the group may have random, or even good over the long run. Too few trials (only one of the group) to tell.

They should address this before speculating about over confident ignoramuses (who after all, should perform randomly, not worse than randomly).

(Reminds me of the tale told me by a psychology professor, who had a brilliant medical student who hated the psychology course, and protested by getting negative 100% in the true/false multiple choice exam.)

To conclude, sorry about the snarky tone, but I am vaguely irritated a poorly thought-out experiment. It's not even my job to know about this. It's their job though, isn't it? They actually get paid for it. They've wasted money, and at least 342 people's time.


Chris you have obviously nerver sat on a jury; I have several times. The idea you can lead 12 people from off the street in a given direction they do not wish ( in a criminal trial ) to follow is mere ignorance. When the Liberty of the subject is in issue my experience is that Jurors take their duty very seriously indeed. John Lilburne was essentially correct. Better a Jury than a Judge. I would not try to be overconfident or bossey in the Jury room if you ever serve they will mince you.


Of course, one of the reasons to consult isn't to make a better decision in some Platonic sense. It's so that the opinions of other people are represented.

It may well be the case that one less ratgasm overall is worth some subgroup not getting screwed.


I've have let this stew for a week now and it remains a crap experiment, not a "neat" one.

You can't have two trials that are statistically dependent on one another and then go around analysing results as if they were two coin tosses.

Is this one Coke or Pepsi? How about the other one, is that Coke or Pepsi? Weird! I would expect half of you to only get one right!

I am alleging fundamental professional incompetence here. I would welcome someone to show me how I'm wrong.

travesti bursa

teavesti Is this one Coke or Pepsi? How about the other one, is that Coke or Pepsi? Weird! I would expect half of you to only get one right!

The comments to this entry are closed.

blogs I like

Why S&M?

Blog powered by Typepad