Page:Citation Detective WikiWorkshop2020.pdf/4

Wiki Workshop’20, April 2020, Taipei, Taiwan the proportion of sentences needing citations that already have a citation in the text.

$$Q = \frac{1}{\rho}\ \sum_{i=\Rho}Ci$$

where 𝑝 is the number of sentences needing citations for a given article, i.e. having 𝑦 = 1; 𝑐𝑖 reflects the presence of a citation in the original text of the sentence 𝑖: 𝑐 = 0 if the sentence doesn’t have an inline citation in the original text or 𝑐 = 1 if the sentence has an inline citation in the original text; 𝑃 is the set of 𝑝 sentences needing citations in the article according to the Citation Need model. When 𝑄 = 0 the quality is very low, as none of the sentences classified by the model as needing citations actually have a citation in the original text.

4.2 Results
We report here a set of summary results of articles’ citation quality analysis, broken down by articles’ characteristics.

4.2.1 Citation Quality Score VS Manual Reference Quality Annotations.
To validate the accuracy of the citation quality score, we look at the average citation quality for articles that have been marked by editors as "Missing Sources" (our groundtruth), and compare it with the average 𝑄 for all other articles. We find that the average citation quality score across all articles is 0.66: namely, in average, 66% of the sentences in an article that are marked as missing citations already have an inline citation in the original text. This percentage drops for articles marked as "Missing Sources": the average 𝑄 for those articles is 0.49, thus showing that the Citation Quality score can correctly expose those articles which require more attention because of low quality references.

4.2.2 Citation Quality Score VS Article Quality and Popularity.
To further investigate the accuracy of the citation quality score, we correlate, for each article, the citation quality score 𝑄 with the article quality score previously computed through ORES. We observe a strong Pearson correlation (statistically significant with 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 0.05) between these 2 quantities (𝜌 = 0.34). We also compute the correlation between citation quality and article popularity, finding a significant correlation of 𝜌 = 0.09. Although weaker than the correlation between citation quality and article quality, this positive correlation is probably due to the fact that very popular articles tend also to be of high quality (there is a significant correlation of 𝜌 = 0.14 between article quality and popularity).

4.2.3 Breakdown of Citation Quality by Topic.
Finally, we break down citation quality by article topic. We compute the average citation quality for all articles belonging to a given topic, and report the results in Figure 4.2.3. We find that the most well sourced articles (𝑄 > 0.85) belong to the Medicine and Biology topics. "Language and Literature", the topic category hosting most biographies, also ranks among the top well-sourced topics. We find that articles in Mathematics and Physics tend to be marked as poorly sourced. This is probably due to the fact that these articles don’t report many inline citations, as the proof of the scientific claims is in the formulas/equations that follow, and these articles tend to have a few references cited in general.

'''Figure 1: Average article citation quality score by article topic. X axes corresponds the average 𝑄 for all articles in a given topic, and Y axes corresponds to the number of articles for a given topic in the sample drawn for the analysis.'''

5 CONCLUSIONS
We presented a framework to analyze, monitor and improve citation quality at scale. We designed Citation Detective, a system that applies Citation Need models to a large number of articles in English Wikipedia, and periodically released data dumps exposing unsourced sentences in Wikipedia. To give an example of the potential applications of the Citation Detective data, we provided a large-scale analysis of citation quality in Wikipedia, showing that citation quality is positively correlated with article quality, and that articles in Medicine and Biology are the most well sourced in English Wikipedia.

This analysis provides an initial overview of the potential applications of Citation Detective, and is a limited view on the overall picture, both within the English Wikipedia, and across the other (nearly 300) language Wikipedia projects. Future work on this project could broaden this dataset to include a higher percentage (or even all) of the English Wikipedia’s content. We may also consider selecting articles non-randomly, such as ensuring the dataset contains all highly-viewed or high quality articles. Additionally, the Citation Need model is capable of analysing other language projects, for which additional datasets could be made available.

The data is presently only available within the Toolforge environment due to technical limitations. In future work we aim to make the database more accessible, such as through the Quarry database querying service.