Page:The digital public domain.pdf/156

Rh a sphere of concern—distributing the analysis and processing across many sets of users, in small slices.

Cross-maintenance across different data sets—rebuilding aggregated updates—becomes more important. Having cleanly defined edges, something like a “knowledge API”, or many APIs, is envisaged. Each domain has a set of small, concrete common information models. To distribute a data package is to distribute a reusable information model with it—to offer as much automated assistance in reusing and recombining information as possible.

Licensing clarity is important because without it one is not allowed to recombine data sources (though there is still a large gap between being allowed and being able). Code has come a long way with the legal issues, and differently flavoured Free Software Definitions have gained a good consensus. The state of open data is more uncertain, especially looking at the different ways of asserting the right to access and to reuse data in different legislative regions. Open data practice should demonstrate value and utility, and thus it becomes a natural choice, and not an imposition. The Open Knowledge Definition is an effort to describe the properties of truly open data.


 * 4. Knowledge and data APIs

Open knowledge research projects are carried out in an atmosphere of “fierce collaborative competition”. The Human Genome Analysis project was a shining example: slices of source data were partitioned out to a network of institutions. Near-to-realtime information about the analysis results led to the redirection of resources and support to centres that were performing better. In the context of open media, people are also “competing to aggregate”, to compile not mere volume but more cross-connectedness into indexes and repositories of common knowledge.

Progress on the parts is easier to perceive than on the whole. In the parts, the provenance is clear—who updated data when and why, and how it was improved. The touchstones are to improve reusability, accuracy and currency of data. Working with subsets of datasets, in the absence of significant hardware or bandwidth barriers, anyone can start to carry out and contribute analysis from home. Knowledge is given back into a publically available research space, becoming easier to build on the work