Andrew Bernardin at 7:33 am under science

Groan. I must be intellectually fatigued this morning. So I’ll turn the critical thinking over to you. Anything about this announcement of a new science finding strike you as . . . semi-lame or worse?

Title -

First Concrete Evidence That Women Are Better Multitaskers Than Men

Lead paragraph -

Professor Keith Laws at the University’s School of Psychology looked at multitasking in 50 male and 50 female undergraduates and found that although the sexes performed equally when they multitasked on simple maths and map reading tasks, women far excelled men when it came to planning how to search for a lost key, with 70 per cent of women performing better than their average male counterparts.

There you go. Have at it.

[Update/analysis below the fold.]

Okay.  I’m feeling more energized now.  Seeing I’ve got at least one commenter interested in my analysis, I will share it.

There are two reasons why I find the claim “First Concrete Evidence That Women Are Better Multitaskers Than Men” to be semi-lame (meaning it lacks solid legs to stand on):

1. Shot-gun variables.

This was no rifled bulls-eye.  As far as I can tell there were at least 4 variables measured: simple math problems, a map reading task, answering telephone calls about general knowledge, and strategizing to search for a lost key.   As the lead paragraph mentioned, somewhat in passing, women surpassed men in one of the tasks: “planning how to search for a lost key.”  The claim, however, is a general one.  Women are better than men at multi-tasking.  Yet when multi-tasking they did not out-perform the men in the math and map-reading tasks.  The news-release doesn’t mention whether or not they outperformed the men on the telephone task.  My guess is they didn’t, seeing it wasn’t included.

That strikes me as one hit out of a possible four.  The one hit may be something, but it falls short of justifying a global claim.  In my opinion.

2. Statistical cross-comparison.

The lead paragraph states that 70% of women performed “better than their average male counterparts.”  While that appears impressive on first blush, it actually isn’t.  First, because the sample size was 50 females, “70% of females” means something like 35 out of 50.  Wouldn’t 25 be chance (roughly)?  So that’s 10 subjects better than chance.  Hmm.

Another reason why the 70% figure doesn’t impress me is because it is possible that 70% of the males also outperformed the average female!  How is it possible?  Say 30% of both male and females scored really low.  Then it would be possible for 70% of females to score higher than the average male, while 70% of males scored higher than the average female.  Distribution curves come in many shapes.  And statistics can confound as much as they elucidate.

Finally, I’d like to know how the average female score for all tasks compared to the average male score.  That would be more telling.  And, if telling, would better support the headline claim.

 

> Related Posts

Tags: ,

18 Comments to “Spot the Flaw: When a Bull’s-Eye Isn’t”

  1. Dear Andrew
    rather than groan and complain that you are too tired – think of something useful and/or critical to say about the article. If its ‘semi-lame or worse’, then you shouldnt find it too difficult …should you?

  2. Andrew Bernardin
    July 23rd, 2010 at 8:38 am

    Keith -
    I’m guessing this is your first visit to my blog. Are you the Keith of the study?
    In my “Spot the Flaw” series of posts I intentionally invite readers to comment. How inviting would it be if did the thinking for them?
    If none rise to the occasion, and highlight weaknesses to some bit of research and/or the announcement of it, I certainly will.

  3. Dear Andrew
    Yes. It is fine to invite people …and I have accepted the invitation. I therefore eagerly await your scientifically rigourous justification for the strong negative tone of this blog….(to which I will, of course, try to reply)
    regards
    Keith

  4. Andrew Bernardin
    July 23rd, 2010 at 12:07 pm

    Keith-
    I’ve updated the post to include my analysis. If I’ve gotten anything wrong, please inform me.
    Andrew

  5. 1. This was no rifled bulls-eye.

    Obviously several tasks are required, but only a sufficient number to make any sense. I think the point here is a spurious argument!
    For example a) if we had had 2 task and revealed a difference on one task, then you would have said 50% hit (not great etc)
    b) if we had had 10 tasks and revealed differences on one you would say – ah only one in ten! It sounds as though – for you – being ‘convinced’ equates to always finding an advantage for women i.e hitting every target. This is an absurd position.
    Moreover, it is not how science works – science is exploratory and evolves i.e. hypothesis testing. Because no work had been previously conducted in this area (as stated in the article), there was no existing starting point for the research (i.e. no useful empricial knowledge base). Nonetheless, given the intuitions of the general public concerning male and female abilities, it seems logical to start with maths, map reading, key finding and answering a telephone – the first two or three might even be expected to bias in favour of men (especially given spatial skills argument etc) – so, even one in four (or 3 – see below) turns out, it is interesting.

    Re telephone task – it is really one in three hits anyway (because the telephone was an optional task – used to examine whether men and women differed in willingness to take an additional task) i.e. not everyone took the phone call (there were no differences in numbers who did or how well they performed on the telephone questions answered)

    2. Statistical cross-comparison.

    a) main point – Given the area in which you write, I am surprised to see that your understanding of probability is somewhat flawed here.
    first, there are three possible outcomes (male advantage, female advantage and no difference). The standard statistical probability level of .05 was employed, this suggests that 5 in 100 male advantage and 5 in 100 female advantage for female – 90 not different. So, getting 70% of women (significantly) outperforming the men is impressive i.e. 35 women! By chance, it would be estimated at 2.5 women

    b) minor points
    Another reason why the 70% figure doesn’t impress me is because it is possible that 70% of the males also outperformed the average female! …
    I simply don’t follow your reasoning here – you would need to clarify what you are saying – I would say, how is it possible for 70% of males to outperform the average female, when the reverse has been shown to be true – the numbers just do not make sense in your argument

    Finally, I’d like to know how the average female score for all tasks compared to the average male score. That would be more telling. And, if telling, would better support the headline claim.
    If you were to make an aggregate score, then the females would do better – but thsi aggregate is meaningless – it would simply reflect the difference on the one task anyway (i.e. added to the non differences on the others) – how would that be more impressive or meaningful?

  6. p.s. I would also like it if you could clarify what you mean by ‘semi-lame’ and especially what you are implying with the phrase “…or worse” – given that your website hosts references to ‘libel laws’ in science, I would really appreciate it if you to clarify what you mean by these terms (in print please)

  7. Andrew Bernardin
    July 23rd, 2010 at 9:41 pm

    Keith,
    Consider this hypothetical: You give 10 males and 10 females an exam. The resulting scores are -
    FEMALES
    3 score a 65
    4 score a 90
    3 score a perfect 100
    Average female score = 85.5
    MALES
    3 score a 70
    4 score a 90
    3 score a perfect 100
    Average male score = 87.0
    In the above case, 7 of 10 females (70%) scored better than the “average male.” And guess what, 7 out of 10 of the males (70%) scored better than the average female. And even though 70% of the females scored higher than the average male, the average male scored higher than the average female. (Double-check the numbers if you desire. And perhaps take a moment to wipe the egg off your face.) So yes, I would have liked to see the combined (total for all 4 measures), average scores compared.
    Yes, I understand science and probability. That is why 10 extra, above-average females out of 50 doesn’t impress me much. It’s the 50 that hurts the degree of confidence.
    Also, your mathematical breakdown strikes me as simplistic. It seems to me your calculation should have additionally included the size of the difference beside the number of individuals showing the difference.
    Over a decade ago there was a study that found an association between breast cancer and Paxil use. However, the number of subjects in the study was low: only 900 women, with a mere 10 taking the medication. A follow-up study of nearly 10,000 women, hundreds of them taking Paxil, showed no such link.
    When number of subjects (data points) is low, confidence, too, should be low(er).
    As for wondering what I meant by “semi-lame” it is this: a claim not strongly supported by the evidence/data and/or one that relies upon poor quality evidence/data. “Or worse” means “really lame” and/or perhaps outright bogus.
    My major qualm with the study is that the claim was very broad: Women are better multi-taskers than men. Period. Not “Women may be better multi-taskers than men.” Not “Women are better at strategizing when in a multi-tasking situation than men.” Both of those claims would have escaped my criticism. Good science relies upon precision. In the conducting of it; in the communication of it.
    Again, women performed better than men in only one of the tasks, with a limited number of subjects (to be expected in this type of study, but still to be considered in terms of confidence in results), thus a more cautious and/or specific claim should have been made. In my opinion.

  8. at least one flaw in the study i see is that (according to the article) there is no comparison with how the participants do at these tasks alone. if, for example, women are better at finding lost keys than men, i would expect them to also be better at multitasking when one of the tasks is something with which they’re better at doing than men.

    there could be several reasons why women are better at finding their lost keys than men. For example, they lose there keys more often than men. i’m just conjecturing, but this may be due to the fact that men (conjecturally) put their keys in their pocket,s and women (conjecturally) put their keys in their purses. from my experience, it’s easier to find things in a small pocket than a medium to large purse. this line of reasoning can continue. however, since there is no comparison, it’s just a thought experiment based on nothing but biases. plus, i can’t find the journal article, and so, i can’t see whether this is a valid argument based on the article or complete crap. it would be nice to see how multitasking competence corresponds to the competence of the individual skills.

  9. Andrew –
    1) 70% of women perform better than the average man. The example you give is incorrect because the overall means you present would not differ appropriately i.e. 85.5 vs 87. Perhaps you would understand it better in the following terms – the average of one group (males) would be at the 50 percentile and the average of the other group (females) would be at the 70 percentile
    2) “Also, your mathematical breakdown strikes me as simplistic…” Not sure I understand what you are saying here
    3) the lame or worse comment is disingenuous, without basis and insulting – can you please say if you are proposing one or both of the statements i.e. be quite clear and say if the work
    “relies upon poor quality evidence/data. “Or worse” means “really lame” and/or perhaps outright bogus”
    - please also expand for me on what precisely you mean by bogus and if it applies in this case – just so we can see exactly what your claim is here
    4) the statement ‘Women are better multi-taskers than men’ is correct – there is not one single study that shows men are better multitaskers than women – all of the evidence shows the reverse to be true – period! (as you say)
    5)it is not 10 above average females (as you say) – I thought I had clarified that – probability suggests 2.5 women will outperform the men (i.e. significantly by chance) – the study reports 35/50 women did so.
    6) the number of 50 male and 50 female participants – again, suggests a misunderstanding of statistics and more importantly, statistical power – the number is irrelevant here. It does not matter if one tests 50 or 1000. If a finding has substantial ‘statistical power’ (as here) to separate the groups, then it is large and robust – even in relatively small samples – this effect would have been visible and significant in even less than 50 and 50. In statistical power terms, the sample size of 50 and 50 simply testifies to the power of the effect reported. Conversely, if the effect was only visible in extremely large samples e.g. 1000s,then it would be less impressive.

    Adam -
    your point is the first relevant one I have seen.
    i.e. there is no comparison with how the participants do at these tasks alone.
    Nonetheless, all of the tests used are standaridsed tests with (large) normative databses and reveal no sex differences (and furthermore, no study has ever reported sex differences on these tests) – they were chosen for that very reason i.e. they reveal no sex difference under normal circumstances.
    Again, it is reasonble to speculate that if women were better at the key finding task under normal circumstances, then they would be better under multitasking – but this is not the case!
    It is also necessary to be careful because all other outcomes are possible e.g. healthy women tend to be better at language and semantic tasks than men; however women with Alzheimers disease perform worse than men
    The reasons why people lose their keys is not really relevant – the task was hypothetical and it could have been losing anything in the field.

  10. p.s. on reflection, the follwoing may be a better way to think about the 70% statement – an ‘average’ man would have to move from the 50th percentile to the 70th percentile just to reach performance of the ‘average’ female
    Bear in mind that these are means

  11. Andrew Bernardin
    July 24th, 2010 at 11:37 am

    Adam -
    Good questions. And skepticism is all about asking questions.
    One of the questions I have about the general topic is whether there are occupations that heavily depend upon multi-tasking abilities. Short-order cook and flight traffic controller come to mind. I wonder if there is any measurable difference in how men and women perform in these fields. Of course, we would likely find bias in who gets hired/promoted, and there may even be a gender difference in interest in these occupations in the first place. But something to think about.

  12. Andrew Bernardin
    July 24th, 2010 at 12:35 pm

    Keith:

    Two general points before I get more specific:

    1) You are aware what a blog is, right? Blogs and blog posts are generally much more informal than other media. Thus the use informal language – “lame” (meaning weak), “bogus” (meaning plain bad), etc.

    2) Is this your first published study? You may want to work on developing some thicker skin. Critics and skeptics play a crucial role in scientific discourse and intellectual progress. Sure, we can be a pain in the backside of those making claims, but we serve an important function. As the saying goes, you can learn more from your enemies than you can your friends. Not to say skeptics are enemies. Consider them adversaries.

    3) Have you published a paper on this research? I’d love to read the actual paper. It would help me to better understand and evaluate the actual findings.

    As for specifics:

    1) In my initial evaluation I wrote, “That strikes me as one hit out of a possible four. The one hit may be something, but it falls short of justifying a global claim. In my opinion.”
    Consider this analogy. “Study finds that acupuncture works.” But you learn below that individuals were tested for many health complaints: bad back, allergies, headache, and insomnia. Only on the “bad back” measure did acupuncture show a benefit. Would you say that the general claim was justified? I wouldn’t. Again, it is more scientific to be specific and precise in methods and language.

    2) You wrote: “the statement ‘Women are better multi-taskers than men’ is correct – there is not one single study that shows men are better multitaskers than women – all of the evidence shows the reverse to be true – period! (as you say)”
    Did you even read what I wrote? I did not say men are better multi-taskers. I have no idea whether males or females are better. Or whether there is no significant difference.
    You added, “there is not one single study that shows men are better multi-taskers than women – all of the evidence shows the reverse to be true – period!.”
    Um, have you been drinking? I didn’t say men were better, and you yourself claimed that yours was a first-of-a-kind study. In fact, in a previous comment you wrote, “Because no work had been previously conducted in this area (as stated in the article), there was no existing starting point for the research (i.e. no useful empricial knowledge base).” These statements seem contradictory. Are they not?

    3) You wrote, “It is not 10 above average females (as you say) – I thought I had clarified that – probability suggests 2.5 women will outperform the men (i.e. significantly by chance) – the study reports 35/50 women did so.”
    What I referred to was the ten women who scored above what chance could explain. 35 minus 25 equals 10.

    4) Saying that “probability suggests 2.5 women will outperform the men” is a simplistic and more error-prone approach to evaluating statistical significance. Including degree of difference/outperformance would improve the evaluation.
    A number of years ago I conducted research into religious involvement and crime rates, using national and international demographic information. Besides computing the R-squared, I had to take into account the number of data points (countries or states) to evaluate the statistical significance of my regression analysis. Quantity of data counts. Which leads me to . . .

    5) You also wrote, “It does not matter if one tests 50 or 1000. If a finding has substantial ‘statistical power’ (as here) to separate the groups, then it is large and robust.”
    Are you flipping kidding me? It absolutely matters if one tests 50 or 1000. Why would meta-analyses be so important/widespread if greater numbers didn’t lead to stronger findings, findings we could be more confident in?
    Tests of statistical significance are a guide to whether or not the results are valid, but these tests aren’t foolproof.
    Consider these elements from the Wikipedia entry (which is not a final authority but represents general knowledge) on statistical significance:
    * “One of the more common problems in significance testing is the tendency for multiple comparisons to yield spurious significant differences even where the null hypothesis is true. For instance, in a study of twenty comparisons, using an ?-level of 5%, one comparison will likely yield a significant result despite the null hypothesis being true. In these cases p-values are adjusted in order to control either the familywise error rate or the false discovery rate.”
    No you didn’t use twenty comparisons, but the use of four weakens your results relative to a single measure.
    * “Statistical significance can be considered to be the confidence one has in a given result. In a comparison study, it is dependent on the relative difference between the groups compared, the amount of measurement and the noise associated with the measurement.”
    Yes, relative difference, amount of measurement (number of subjects) and noise (scattered results) all play a role in confidence.
    Is all this foreign to you?

  13. Specifics
    1) 1 in 4 – of course we had to have multiple tasks for it to be multitasking.
    Your example of acupuncture “for bad back, allergies, headache, and insomnia. Only on the “bad back” measure did acupuncture show a benefit” is not really comparable – it would be more comparable if a study of acupuncture for bad back revealed an effect for 1 in 4 outcome measures of back pain.
    But even that would not be comparable because examining mutlitasking (by the nature of the name) we are obliged to have mutliple tasks, while they are not – we are not loading the dice in our favour as the back pain research would be i.e. with 4 measures of back pain. It is now up to others to see if they can replicate or expand (should they so wish). If they fail to replicate then it deserves to fo the way of all unsupported hypotheses

    2. No contradiction – ours is the only study – so all of the evidence (i.e. ours) sugggests that women are better –

    3. “What I referred to was the ten women who scored above what chance could explain. 35 minus 25 equals 10″
    It is not 10 women – 2.5 by chance – 35 score above chance so it must be 35-2.5 = 32.5/50. I dont know where you are getting 10 from here?

    4. it is standard practice – what do you mean by “degree of difference/outperformance”? If you specifciy what you wnat to know, then I will tell you (if I can)

    5. You are simply wrong about numbers and power and any decent statistician (or psych undergrad) will tell you so –
    Of ocurse, studies may be underpowered because of small smaple size (but then they return null findings lets say with 50 – and obtain significant effects only with lets say 1000 subjects)
    Meta analyses are important for compounding numbers across studies laregly because effect sizes are small in the areas examined. So, an effect which is not present in any single study may produce a significant effect size across all of the studies added together – if you want examples I can point you to my own published meta analyses in many areas where this is the case

    The final remarks about significance are true – but matter not a jot here. We could adjust the alpha level by applying a (extremely conservative) bonferonni correction, the effect would and does remain significant – so it is not type 1 error

    finally
    Re thick skin etc, I do not like the unpleasant implications you made at the start of the blog – it is irritating, unsupported, and rude; and whatever you say about blogging – it is not where real science occurs. It is a place for opinion and that is fine.
    I am happy to discuss this with you and anyone else. In fact, I have now spent considerable time trying to clarify things for you and anyone else reading.
    If you still think the same, then I suspect there is nothing I can do to change your mind.

  14. Keith,

    i agree that why someone has lost their keys is irrelevant. the point i was trying to make was is that if one has more practice at a particular task, one may become more proficient at that task than someone who has no practice at that task. so, for example, if women lose their keys more than men, they would have more experience looking for their keys than men. theoretically, i could that women do lose their keys more often, but it would be based on nothing but my personal experiences and biases.

    i also find it difficult to believe that there is a standardized test for looking for lost keys administered by schools or governments or companies. the standardized tests that i am familiar with are those that test reading, writing, and arithmetic. of course, these tests can change depending on the administering country. so, this may change from country to country, business to business, and school to school. could you direct to a link, article, or journal that specifies otherwise?

  15. Adam
    I understand the point, but even if we assume that women lose their keys more often than men (for which we have no evidence of course), it doesnt necessarily follow that women would be better at finding them – just mor practised at losing them.
    Anwyay…there are many hundreds of standardised (neurocognitive)tests beyond the reading, writing tests – as a neuropsychologist, I use dozens testing everything from IQ to memory, executive function, object recogntion etc. Most tests do not actually change much from country to country – typically, they are devised and normed in the USA or europe (mostly UK) – so devised in English language and then translated wherever needed and normed in other cultures if thougt to be requried e.g. if education is longer in USA than for example Spain.

  16. Andrew Bernardin
    July 25th, 2010 at 11:39 am

    Keith –

    Could you please answer the two following questions? Your answers could certainly convince me that your research claim wasn’t “semi-lame.”

    1. Did you designate the key task as “the measure” of multi-tasking ability ahead of time? If so, bravo, I stand corrected.

    Why is this important? As I have tried to express, the more variables measured, the less impressive the finding when a “hit” results. Consider this coin-flip illustration of the concept: the chance of coming up with “heads” (symbolizing a significant outcome – though, yes, the probabilities are WAY different) is 50%. With the one coin. But flip two coins and the chances of a heads outcome grow to 75%. With three coins – there is a 90% chance that, with a single toss of all three, there will be at least one heads in the group!

    Okay, the numbers are different with research, but the concept is the same. And valid. As this quote from statsoft.com relates (it’s a statistical software site arrived at after a quick Google search to refresh my understanding):

    “Needless to say, the more analyses you perform on a data set, the more results will meet “by chance” the conventional significance level. For example, if you calculate correlations between ten variables (i.e., 45 different correlation coefficients), then you should expect to find by chance that about two (i.e., one in every 20) correlation coefficients are significant at the p .05 level, even if the values of the variables were totally random and those variables do not correlate in the population. Some statistical methods that involve many comparisons and, thus, a good chance for such errors include some “correction” or adjustment for the total number of comparisons.”

    Was the number of variables/measures included in your computation of statistical significance?

    On a similar note, I wonder: If had you included a measure of . . . say, “computer solitaire score” in the group, and this variable “hit,” whether your conclusion would have been the same. What if men scored better at the solitaire task, and women better at another task? Then what? My concern is that your results may be task-dependent. That women are better multi-taskers, but only when one of the tasks involves X.

    Do you see my point?

    2. In the spirit of comparing apples to apples, I would like to know what the average female score for the key task was, as well as the average male score. I think this is important information. It helps put the finding into a clearer numerical context. As I pointed out before, the “70% of females outperformed the male average” is a bit questionable.

    It seems to me that you believe, “If my numbers reached x threshold, then it is definitely was a valid finding.” My skeptical attitude is, “If your numbers reached x threshold, it could be a valid finding, but a better determination would include a number of other factors.”

    Another relevant quote from statsoft.com:

    “There is no way to avoid arbitrariness in the final decision as to what level of significance will be treated as really “significant.” That is, the selection of some level of significance, up to which the results will be rejected as invalid, is arbitrary. In practice, the final decision usually depends on whether the outcome was predicted a priori or only found post hoc in the course of many analyses and comparisons performed on the data set.”

    As for my “rudeness” (I tend to see myself as blunt and sometimes barbed): That’s your opinion and you are free to have it. As I am free to have the opinion that the general claim of your research strikes me as semi-lame. And it should be remembered that the validity of an argument relies not on the tone of its delivery.

  17. Your statements about the work being ‘semi-lame or worse’ have far less concern for me now than before you started asking your questions – and have no interest in persuading you.
    I can see that your view reflects at best, misunderstanding or at worst, a lack of basic statistical/methodological knowledge (not that unusual for an ex-psychology lecturer). Please don’t take this as a direct response to your impoliteness…it is not, but I do think you would do better to shy away from blogging on anything to do with stats and research methods (unless you sharpen up your act – not via Wikipedia etc). I think I have spent enough time on this blog now – thanks for your questions – my final answers are below

    Anyway…regarding your points:
    1) as I have already indicated in my earlier response to Adam, we had no reason to predict a difference on any task (because sex differences do not occur on these tasks under normal conditions) – they were chosen for that reason; however, we suspected that the likely place for a multitasking effect would be on a task that requires the frontal cortex i.e. in this case Brodmanns area 10 – for planning and task switching – hence, the key search task was included because: a) it is known to be sensitive to frontal function, b) it shows no sex difference normally and c) it has relevance to everyday issues i.e. losing keys, searching for lost items (when stressed etc)

    2) your coin example is simply not comparable – you have not grasped the difference between a) males scoring higher ; females scoring higher (these are trivial) and b) males scoring ‘significantly’ higher, females scoring ‘significantly’ higher and finally, the most important – no difference at all (the latter has a 95% chance as outcome) – either of the previous two i.e. significant male or significant female has a 5% outcome

    If we used your probabilities, then we would have to say that ‘all differences’ are important (i.e. because they achieve a 50:50 outcome – most tests would reveal an advantage for males or females) – obviously there are differences on all 4 of our tasks, but only one is significant! The others are random occurrences
    - of course, it just does not make any sense i.e. in your example, ‘any’ advantage for a sex would become important – when they obviously are not – as I have said several times, differences need to be considered to have occurred beyond the accepted standard level of chance i.e. the alpha level of .05

    3) The number of tasks equals the number of comparisons between two groups i.e. 4 – one was significant – as I have already said, if you apply a bonferonni correction, then the probability becomes .0125 (which is extremely stringent and I guarantee would not be applied by anyone – however, if we apply it, the significant difference is unaltered (as the actual observed p value was less than .01). But to reiterate, it is again, a non-point – it would be valid in fMRI where comparisons are being between over 100,000 voxels (we are using 4! Not 100,000) – and even then they do not use this (stringent) correction in fMRI studies – so, I don’t know what more I can say – it is a misconception in this case and makes no difference even if we apply it in this case!

    4) the point about computer solitaire (or whatever) is valid but, again trivial – other researchers could come along and try different tests and find something completely different – as a scientist I have no emotional or intellectual investment in the specific outcome being one way or the other. It is up to others (or me) to now examine the factors that ‘moderate’ the effect – that is how science works – as I said hypothesis testing and rejection a la Popper (within a Kuhnian/Lakatos context) – I don’t care if someone finds something different or the same/similar – it will help us understand and elaborate the context for the effect – science proceeds in that fashion.

    In a related vein, your point about the finding being ‘task dependent’ – again it is a non-point – this is exactly what we have shown and what we have claimed!! What else could we say if no difference emerged on the other tasks?
    As the original article in the UK Telegraph states quite clearly in the subheading ” Psychologists have proven that men really are worse at multitasking than women, although it does depend on the task” – a little search would have revealed this fact! http://www.telegraph.co.uk/sci.....n-men.html

    5) simply knowing the means of anything will reveal very little – understanding of basic statistics would tell you that. Possibly the effect size might be useful

    6) regarding the quote
    “There is no way to avoid arbitrariness in the final decision as to what level of significance will be treated as really “significant.” That is, the selection of some level of significance, up to which the results will be rejected as invalid, is arbitrary. In practice, the final decision usually depends on whether the outcome was predicted a priori or only found post hoc in the course of many analyses and comparisons performed on the data set.”

    This is year 1 undergraduate knowledge- the level of significance is – of course – arbitrary (how could it be otherwise?)
    As it states correctly, ‘In practice the final decision usually depends on whether the outcome was predicted a priori or only found post hoc in the course of many analyses and comparisons performed on the data set’ – the former simply refers to 1 and 2 tailed hypotheses; the latter to multiple comparisons – what is the point you are alluding to exactly? Everyone employs an arbitrary p<.05 level (if you have a two tailed hypothesis then the probability is twice as hard to achieve and correspondingly adjusted; if you make multiple comparisons (note it does not say how many – and most people would certainly not adjust for 3 or 4), then also adjust – This 'arbitrariness' is as true of physics as is it of psychology.

  18. Keith –

    Oops, my bad. I was unfamiliar with the term “bonferonni correction,” so was unaware that you had taken into consideration the multiple measures drawback when it comes to statistical significance. As you mentioned in an earlier comment. That’s why I asked you about it. Also, due to this correspondence you have persuaded me to better educate myself about statistical terms. And I thank you for that.

    As for one of your errors, you wrote . . . “Your statements about the work being ‘semi-lame or worse’” . . . is incorrect. That’s neither what I wrote nor what I was criticizing. Rather, I was criticizing the claim based upon your work. Those two can be distinctly different things.

    In fact, in my original post I wrote, “Anything about this announcement of a new science finding strike you as . . . semi-lame or worse?”

    Yes, “this announcement.”

    Further down in the post I wrote, “There are two reasons why I find the claim ‘First Concrete Evidence That Women Are Better Multitaskers Than Men’ to be semi-lame (meaning it lacks solid legs to stand on)…”

    Again, “the claim,” not “the science” or “the work.”

    This blog is about encouraging skepticism and critical thinking. It’s not about proving anything right or wrong, but about evaluating claims. I still find the claim that there is now concrete evidence that females are better multi-taskers than men to be semi-lame. In your most recent comment you included this: “As the original article in the UK Telegraph states quite clearly in the subheading ” Psychologists have proven that men really are worse at multitasking than women, although it does depend on the task” – a little search would have revealed this fact! http://www.telegraph.co.uk/sci.....n.html.”

    Um, I wasn’t critiquing that “announcement,” but the one found at ScienceDaily. And had the one at ScienceDaily contained the sub-head — “although it does depend on the task” — that would have been a horse of a different color, so to speak.

    Also, you should understand that my blog comments were not written exclusively for you, but for other readers as well. Which is why I included simplistic mathematical illustrations in making some points. So others could grasp the general concept involved.

    I could go on and on refuting points you made in your comments, but they would likely fall upon ears that seem semi-deaf. Yours. So I will put my time to better use.

    Although I certainly reserve the right to take up the topic again in another blog post.

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

*