Page:Wikidata making of.pdf/4

 43, 67], but was much more focused on multi-linguality from the beginning. It would still take years for these concepts to converge.

4 MOVING SIDEWAYS (2005–2010)

In the following years, the initial success of Semantic Wikipedia (the grand vision) gradually turned into a success of Semantic MediaWiki (the software). Fueled by the initiators’ intensive development activities and community management, a growing user base was running their own Semantic MediaWiki sites in many application domains. The built-in query answering functionality turned out to be especially useful for many community projects outside of Wikipedia. Further developers joined, and the first SMW User Meeting in 2008 in Boston became the starting point of the regular SMWCon conference series, which is still ongoing today.



Practical experiences and user feedback from SMW also revealed aspects that the original concept of Semantic Wikipedia had overlooked or misjudged. Facilitated by new software extensions in the prospering SMW ecosystem, form-based input methods soon dominated over in-text annotations – a major deviation from the unity of data and text that was central to the Semantic Wikipedia concept. Moreover, it soon emerged that the provision of RDF-encoded data (fully conforming to Linked Data recommendations [5]) did not lead to relevant applications. Instead, the ability to embed query results into wiki pages was what motivated users to add structured data. Most development efforts between 2005 and 2010 were directed towards improving these inline queries in terms of power, performance, and presentation. Semantic wikis thus prospered, but, without inspiring data re-use beyond individual sites as expected for Wikipedia. SMW gradually diverged from its original goal.

Indeed, the praise that SMW won from researchers and practitioners had little impact on Wikipedia. SMW was presented at Wikimania each year (e.g. [30], often gathering significant audiences and positive feedback, yet the route into Wikipedia remained<!— column break —> unclear. For several years, editors and operators were occupied with running Wikipedia in a time of unprecedented growth. Potentially disruptive software changes were out of the question, and technical work focused on core functionality (UI, account management, discussion pages, backend performance). Even those more meager innovations were not always welcomed by the growing community, resulting in conflicts between the Wikimedia Foundation and contributors. Adding data management to Wikipedias’ core tasks seemed a huge risk, especially as some communities became more conservative and less open to such changes. Even after many years, the only bits of structured data in Wikipedia came from a few uses of Microformats [24] – sparse records that would never form a knowledge graph.

SMW meanwhile was gathering interest elsewhere [26, 29, 31]. The large wiki host Wikia had made it a standard offer for its customers. Smaller IT companies offered services and extensions to turn it into a tool of corporate knowledge management. One of them was Ontoprise, a Karlsruhe-based SME subcontracted by Paul Allen’s Vulcan Inc. under the leadership of Mark Greaves to adapt SMW for knowledge acquisition in the ambitious Halo project [18]. To support development and community outreach, Ontoprise hired a local computer science student, Lydia Pintscher.

5. EVOLUTION OF AN IDEA

While SMW was moving along its own trajectory, the greater goal was, however, not abandoned. Between 2005 and 2012, through interactions with many people, the original Semantic Wikipedia evolved to the first accurate concept of Wikidata.

Erik Möller, by then Deputy Director of the Wikimedia Foundation, was the driving force behind a major change: Vrandečić was still arguing to turn the individual Wikipedias semantic in 2009 (in particular to compare the graphs from the different language editions [58, 59, 71]), whereas Möller favored a single Wikidata for all languages. Already in Möller’s original Wikidata proposal in 2004, he had envisioned a solution “to centrally store and manage data from all Wikimedia projects.” The resulting design combined this idea with the more fluid, graph-based data model of Semantic Wikipedia. Möller had also secured the domain for Wikidata, which was a major factor in eventually selecting this name.

Another important realization was that verifiability would have to play a central role. Vrandečić, Elena Simperl (then KIT), and Mathias Schindler (Wikimedia Deutschland) initiated research on the topic of knowledge diversity, which led up to the EU research project RENDER (2010–2013). The project developed ideas for handling contradicting and incomplete knowledge, and analyzed Wikipedia to understand the necessity for such approaches [63].

Also in 2010, Krötzsch and Vrandečić had finished their Ph.D.s, with Krötzsch joining Ian Horrocks’s group at the University of Oxford and Vrandečić following an invitation of Yolanda Gil to spend a 6-month sabbatical at ISI, University of Southern California. The first prototype for a verifiability-enabled semantic wiki platform, named Shortipedia, emerged from the collaboration of Vrandečić, Gil, Varun Ratnakar (ISI), and Krötzsch [70]. The prototype also