Page:Wikidata making of.pdf/8



Indeed, data uniformity and coherency has emerged as one of the big challenges that Wikidata has yet to address. By selecting a flexible, statement-centric data model—inspired by SMW, and in turn by RDF—Wikidata does not enforce a fixed schema upon groups of concepts. This is a maximal departure from the historic Wikidata plan (Section 3), and even from the more flexible (but still template-based) Freebase. There are advantages to such flexibility (e.g., Freebase struggled with evolving schemas or unexpected needs), but it also leads to reduced coherence and uniformity across groups of similar concepts, which is an obstacle to re-use.

On-going and future developments may help to address this, while unlocking additional new uses of the data. Two notable Wikimedia projects under current development are Wikifunctions and Abstract Wikipedia [69], both led by Vrandečić. These closely-related projects have a number of goals. Most prominently, Abstract Wikipedia is working towards extending knowledge representation beyond Wikidata such that one can abstractly capture the contents and structure of Wikipedia articles. Building on top of the data and lexicographic knowledge in Wikidata, these abstract representations will then be used to generate encyclopedic content in many more languages, providing a baseline of knowledge in the hundreds of languages of Wikipedia. This will also require significantly more lexicographic knowledge of Wikidata than currently available (about 1,000,000 Lexemes in 1,000 languages as of February 2023).

Wikifunctions in turn is envisioned as a wiki-based repository of executable functions, described in community-curated source code. These functions will in particular be used to access and transform data in Wikidata, in order to generate views on the data. These views—tables, graphs, text—can then be integrated into Wikipedia. This is a return to the goals of the original Phase 3, which would increase both the incentives to make the data more coherent, and the visibility and reach of the data as such. This may then lead to improved correctness and completeness of the data, since only data that is used is data that is good (a corollary to Linus’s law of “given enough eyeballs, all bugs are shallow” [54]).

Returning to Wikidata itself, there are also many important tasks and developments still ahead. Editors need continued support to maintain data quality and increase coherence, an everlasting challenge in an open and dynamic system. Together with easier access methods, this should enable more applications, services, and research on top of the data and increase Wikidata’s impact further. In addition, Wikidata still has a long way to go to fully realize its potential as a support system for the Wikimedia projects.

Another aspect of Wikidata that we think needs further development is how to more effectively share semantics—within Wikidata itself, with other Wikimedia projects, and with the world in general. Wikidata is not based on a standard semantics such as OWL [22], although community modeling is strongly inspired by some of the expressive features developed for ontologies. The intended modeling of data is communicated through documentation on wikidata.org, shared SPARQL query patterns, and Entity Schemas in ShEx [52]. Nevertheless, the intention of modeling patterns and individual statements often remains informal, vague, and ambiguous. As Krötzsch argued in his ISWC 2022 keynote [32], a single, fixed semantic model could not be enough for all uses and perspectives required for Wikidata (or the Web as a whole), yet some sufficiently formal, unambiguous, and declarative way of sharing intended interpretations is still needed. A variety of powerful knowledge representation languages could be used for this purpose, but we still lack both infrastructure and best practices to use them effectively in such complex applications.

The above are mainly the wishes and predictions of the authors. The beauty of Wikidata is, however, that many people have used the system and data in ways we never have imagined, and we hope and expect that the future will continue to surprise us.

ACKNOWLEDGMENTS

Many people have played important parts in this short history of Wikidata. Thanks are due to all who have contributed their skills, ideas, and significant own time, often as volunteers. We thank all developers of Semantic MediaWiki, especially the early supporters S Page, Yaron Koren, MW James, Siebrand Mazeland of translatewiki.net, and the long-term contributors and current maintainers Jeroen De Dauw and Karsten Hofmeyer.

We further thank all who have contributed to the initial technical development of Wikidata and the underlying software Wikibase, notably John Blad, Jeroen De Dauw, Katie Filbert, Tobias Gritschacher, Daniel Kinzler, Silke Meyer, Jens Ohlig, Henning Snater, Abraham Taherivand, and Daniel Werner, as well as anyone who followed in their footsteps.

Our special thanks are due to Rudi Studer, who has shaped much of the stimulating academic environment in which our own ideas could initially grow. Further thanks are due to Yolanda Gil, John Giannandrea, Ian Horrocks, Erik Möller and Pavel Richter, and their institutions, who supported part of the work.

Making Wikidata a reality also relied on the financial support from a variety of organizations. The research leading to SMW and Wikidata has received funding from the European Union’s Sixth Framework Programme (FP6/2002-2006) under grant agreement no. 506826 (SEKT), and Seventh Framework Programme (FP7/20072013) under grant agreements no. 257790 (RENDER) and no. 215040 (ACTIVE), and from Vulcan Inc. under Project Halo. Wikidata development has been supported with donations by the Allen Institute for Artificial Intelligence, Google, the Gordon and Betty Moore Foundation, Yandex, and IBM. Wikimedia relies on small donations from millions of people to keep their services (including Wikidata) up and running, and we specifically want to thank all individuals who have directly contributed in this way.