Page:The digital public domain.pdf/161

134 an individual, on paper, rather than making the information readable by machines.

Science Commons has identified four key problems. First, there is the issue of cognitive overload, especially as information is translated to a digital form or created that way. We are beginning to know too much for our brain to process and take care of, and in this way face a data deluge. Secondly, most of what we know is poorly fitted for use and reuse—a design problem—making the information impossible to say, text mine. Even the simple act of publishing a document as a PDF adds a barrier to fully utilising the information in the form provided. Documents are poorly linked or annotated, making it increasingly difficult to connect information. Thirdly, there is a licensing problem, where knowledge is licensed in such a way that it is not legally available (this is an issue routinely faced in data integration or text mining). Lastly, the physical materials, the non-digital objects on which this is based (for example, lab mice, DNA, gene snippets and plasmids) is not always freely available in reality.

The first three points—cognitive overload, the design, and licensing problems—all describe problems of the regular Internet, but in order to have “open science” or a “research Web”, one must include in this discussion an additional dimension: access to the physical materials.

Current ways of conducting this research are imperfect. Take, for example, the following research question, which could be asked of a “research Web”: based on what has been published in journals and databases, what signal transduction genes may be active in pyramidal neurons? This question would serve as a lead to find drug targets in Alzheimer’s disease, since signal transduction genes tend to make for good drug targets and pyramidal neurons are implicated. A simple Google search renders approximately 189,000 results. Conducting this search in other information warehouses such as the US National Institutes of Health’s PubMed or PubMed Central provides an enormous number of articles, references, and citations. Sorting through all of this knowledge would take far beyond the grant period for any normal researcher—it is an example of the aforementioned data deluge/ cognitive overload problem. What you should be able to access using the power of the Internet is a list of genes that meet the conditions specified in the original research question.

It is currently very difficult to use the network to build on and validate research. There is no technical barrier to doing this, no creative breakthrough nor “eureka moment” needed. It is a matter of reformatting