Page:EPIC Oxford report.pdf/54

 to justify knowledge claims in general. By 'far greater lengths', we should add, we are talking especially about issues such as extent of evidence provided in support of a knowledge claim, clarity about methodological issues and evidence of peer review. None of these things can reasonably be expected of articles in online encyclopaedias in sufficient measure. It is important here to focus on the qualities that can reasonably be expected of such sources of knowledge, in order to see whether – on the basis of this quite small sample at least – we were able to collect evidence which, if collected on a far larger scale, would provide definitive judgments about the quality of Wikipedia in its own terms, which is to say, as a leading online encyclopaedia.

Within this small sample, Wikipedia scored well in many key respects, as we have indicated above, and these positive scores were reflected when considering the findings in relation to the specific perspectives of articles in different languages, and in different disciplines. Indeed, as the quantitative results show clearly, it was only with respect to articles in the Arabic encyclopaedias that Wikipedia did not earn markedly higher scores. In the case of those two encyclopaedias, Mawsoah and Arab Encyclopaedia, Wikipedia came out lower on style, and more or less the same on the other key criteria of accuracy, references, overall judgment and overall quality score. In all other comparisons, Wikipedia fared somewhat better on references and, with the exception of articles in the Humanities and MPLS (Mathematics, Physics and Life Sciences) where Wikipedia scored no better on accuracy, style/ readability, overall judgment and overall quality score. This was more or less the case with articles in the Social Sciences, with the difference that Wikipedia scored relatively poorly there on style/ readability. In Medical Sciences, though, Wikipedia scored well on accuracy, references and overall judgment.

6.2.2 Qualitative Findings

In terms of qualitative analysis, the picture is less easy to summarise. It is, in theory, possible to total the number of positive and negative comments and the overall number of preferences expressed regarding the full spread of articles in the sample. However, reviewers were generally quite measured in their comments and sometimes expressed no distinct preference, or highlighted strengths and weaknesses across both articles whilst marginally preferring one. For some articles overall preference is too close to call and in others where a preference is expressed, it is not a strong preference. Additionally, some subjects had four reviewers, whereas others only had two, so any overall count of preferences will be necessarily skewed by this. If a particularly well received article from one publication happened to have more reviewers than a less well received article from the same publication, that publication would make an unrepresentatively strong showing in any rough total of reviewer preferences.

It would, at any rate, be pernicious to attempt to quantify qualitative judgments too precisely. Above all, the analysis of qualitative data aims to capture things that are hard to quantify precisely: feelings, attitudes and opinions of reviewers that are important and illuminating but are often also imprecise and hard to compare.

In comparing 'the accuracy, quality, style, references and judgment of Wikipedia entries as rated by experts to analogous entries from popular online alternative encyclopaedias' through the medium of the qualitative data, we were able to identify a number of issues 54