Page:Wikidata making of.pdf/3

 Wikidata’s data is also used by Open Source and Open Culture projects (e.g., Wikipedia, MusicBrainz [39], OpenStreetMap [45], OpenArtBrowser, KDE [25], Wikitrivia , Scribe [17]) as well as civil society projects (e.g., OCCRP [65], Peppercat [11], OpenSanctions [48], GovDirectory [2], DataStory [36], EveryPolitician [12]).

Wikidata’s data is used for a variety of tasks, including accessing basic information about a concept, machine learning, data cleaning and reconciliation, data exploration and visualization, tagging and entity recognition as well as internationalization of content. In addition, Wikidata is a hub in the Linked Data Web and beyond by connecting to over 7,500 other websites, catalogs, and other databases.

3 SEMANTIC WIKIPEDIA

Wikidata launched as a public web site in October 2012, but its true beginnings are much earlier, in May 2005. During the seven years between inception and launch, the design of Wikidata went through significant conceptual changes. This evolution was, however, driven not so much by deliberate strategic planning, but rather by close interactions with many people and communities as part of continuous efforts of making Wikidata a reality.

The first idea of what was to become Wikidata was born in early May 2005. Google was already strong, Skype had revolutionized (still voice-only) Internet telephony, Facebook was not fully public, and Twitter did not exist yet. Wikipedia, launched in 2001, was still not widely known, but its phase of explosive growth had started. These were also formative years for the Wikimedia Movement, and the first ever Wikimania conference was to be held in August 2005 in Frankfurt am Main, Germany.

Just one train-hour away, in Karlsruhe, a group of young PhD students were taking note. Markus Krötzsch, Max Völkel, and Denny Vrandečić had each recently joined the research group of Rudi Studer at University of Karlsruhe (now KIT), a leading location of Semantic Web research. Fascinated by the Wikipedia concept, and being early contributors, it was natural to ask how the Semantic Web ideas of explicit specification and machine-readable processing could make a contribution. Vrandečić proposed to annotate links on Wikipedia pages, inspired by the notion of typed links, a well-known concept in hypertext that was also endorsed by Berners-Lee [6].

The result was the early concept of Semantic Wikipedia, a proposal to use annotations in wikitext markup for embedding structured data into Wikipedia articles [28]. Such integration of text and data was a popular concept in the early Semantic Web, and Wikipedia was perceived here as a miniature Web within which to realize these ideas. However, tying data to texts also enshrines mono-linguality, restricts machine-writability (since all data must also appear in text), and hinders verifiability (since it is hard to link data and references). None of these issues were perceived as very problematic at the time, whereas the seamless and gradual introduction of structured data management into existing workfows was considered essential.



Krötzsch and Vrandečić presented the idea at Wikimania on August 5th, 2005 (see Fig. 2). Certain of the convincing benefits of their vision, they called for volunteers to implement it – a task that Vrandečić, when asked, estimated to take about two weeks’ effort. The German company DocCheck stepped up to donate this effort, leading to the first implementation of the software Semantic MediaWiki (SMW) [26, 29].

Looking back, the most striking aspect of this early history is how quickly the idea of a “Semantic Wikipedia” caught on and gathered support. In the 48 hours after their talk, Krötzsch and Vrandečić created a related community portal with details on project goals, implementation plan, and envisioned applications (including “question answering based on Wikipedia (e.g. integrated in major web searching engines)”. ) Within a month, the idea had gathered vocal supporters in the Web community, such as Tim Finin, Danny Ayers, and Mike Linksvayer. . The SMW software saw its first release 0.1 on September 29th, 2005, with new mailing lists connecting to a growing user community. A first WWW paper was presented at WWW2006 in Edinburgh, Scotland [27].

This sudden success also reflects that Semantic Wikipedia resonated strongly with popular ideas of the time. Indeed, several semantic wiki systems (not related to Wikipedia) had been proposed around that time [13], and there was even a concurrent, completely independent (but not completely dissimilar) proposal for a “Semantic MediaWiki” by Hideaki Takeda and his research group, first published in October 2005 [41, 42]. The vision of a machine-readable Wikipedia also inspired researchers, which would later lead to DBpedia [4] and Yago [64] (both 2007).

Conversely, structured data had been gaining popularity within the Wikimedia Movement, e.g., in the German Wikipedia’s “Personendaten” initiative. In a peculiar historical coincidence, Erik Möller had recently proposed the idea of a Wikimedia project called Wikidata, conceived as a wiki-like database for several concrete application areas. In the following years, Möller, Gerard Meijssen, and others pursued the OmegaWiki project (first named WiktionaryZ), which had an alternative approach to the data model [40,