Page:Wikidata as a knowledge graph for the life sciences.pdf/3

 Feature Article

Science Forum Wikidata as a knowledge graph for the life sciences

found in taxon (1,247)

(63

6)

symptom 1,089

positive therapeutic predictor (685)

sequence variant 1,502

ap

e

ne

ga

eu

tic

p

tiv

CIViC variant ID: 1,398 HGVS nomenclature: 820

subject has role (2,307)

18)

er th

ject

negative therapeutic predictor (565)

e th

0)

(1,4

r

e itiv

s po

d re

or ict

5 (7

role

d

,3 32 ) rt of (3 0 pa

5) 1, 05

anatomical structure 120,184

sub

InChIKey: 156,336 InChI: 153,826 PubChem CID: 150,018 ChemSpider ID: 124,461 ChEBI ID: 84,459 CAS Registry Number: 71,467 UNII: 58,419 ...

e ap

9

has

ed dru ica g u l c se on d f dit or ion tre tre atm at en ed t (6 / ,8 23 ) m

chemical compound 163,252

ic ut

e pr

o ict

6 r(

Freebase ID: 1,462 TA98 Latin term: 1,363 Terminologia Anatomica 98 ID: 1,353 UBERON ID: 1,187 Encyclopædia Britannica Online ID: 743 MeSH descriptor ID: 693 UMLS CUI: 616 ...

ly

in

te r

ac

ts

w

ith

(2

,5

03

)

cell component (15,310)

protein 961,210 RefSeq Protein ID: 750,780 UniProt protein ID: 646,506 Ensembl Protein ID: 251,125 PDB structure ID: 44,732 cell component (907)

mechanism of action 182 MeSH Code: 288 MeSH ID: 168

al

5)

c

7)

ic

0,92

a

ific sign

,3

ys

of (1

g

ru nt d

ra inte

(2 tion

significant drug interaction (3,130) 20)

ph

/ part

rt

6)

63)

pa has

rt / pa

7 of (6

part

has part / part of (1,093)

ha

3,7

subclass of (221)

5)

f (1

significant drug interaction (247)

,66

binding site 77

has

f (3

5)

27

t(

ar

InterPro ID: 76

rt o

rt o

InterPro ID: 132

medication 3,869

/ pa

pa

sp

part

rt /

rt

ha

has

a sp

pa

active site 132

CAS Registry Number: 2,775 UNII: 2,664 PubChem CID: 2,579 InChIKey: 2,535 ChemSpider ID: 2,503 InChI: 2,469 ChEMBL ID: 2,468 ...

a loc nat at om ion ic (9 al 59 )

/ of

(278,089)

t has role (4 ,315) cant dr ug inte ractio n (367 )

signifi

protein family 27,431

Reactome ID: 2,250

part of / has part

(306)

subclass of (6,276)

biological pathway 2,994

InterPro ID: 22,025

)

ss of

has part / part of (2,278)

biological variant of (1,534)

t / 26 en ,6 m (9 at d re ate rt fo tre ed ion ) us dit 97 ug on ,9 dr a l c (1 ry ic go te ca cy an

gn

subcla

4)

encoded by / encodes (1,845,119)

ms

ed

subject has role (7,945)

pto

m

subject has role (5,052)

sym

/ treated ) ndition ent (1,112 al co m medic for treat ed us drug

subjec

,773)

(220)

MonDO ID: 11,914 UMLS CUI: 11,441 Disease Ontology ID: 9,509 ICD-10-CM: 6,805 Orphanet ID: 6,745 MeSH descriptor ID: 6,019 OMIM ID: 5,975 ...

65)

e pr

significant drug interaction (246)

1)

n (795

3,61

on (1

on (2,979)

le (3

pharmacologic action 1,332

77)

n (3

se / ) cau 8 has ect (57 eff ha s

disease 17,080

of (2

ctio

ciati

asso

to f(

ss

tera

gen

pa r

cla

g in

,02

MeSH descriptor ID: 658 Freebase ID: 490 ChEBI ID: 359 CAS Registry Number: 230 UNII: 214 ChemSpider ID: 212 PubChem CID: 212 ...

h acti as ac ve ti ing ve in re d g re ien die t in nt / (2,1 64 ) therap eutic area (1,505 )

etic

sub dru

s ro

symptoms (685)

gene 1,176,028 Entrez Gene ID: 737,302 RefSeq RNA ID: 561,824 NCBI Locus tag: 502,347 Ensembl Transcript ID: 401,691 Ensembl Gene ID: 122,639 MGI Gene Symbol: 71,959 Mouse Genome Informatics ID: 65,989 ...

t/

t ha

instance of (2,335)

ha sp ar

ant

jec

in taxo

subclass of (41,199)

has active ingredient / active ingredient in (3,030)

nific

sub

ortholog (3,711,264)

found

found in tax

sig

RxNorm CUI: 2,046 European Medicines Agency product number: 1,068

/ nt ) ie 52 ed (2 gr in in t e en it v edi ac gr s in ha ive t ac

dr med ug ic us al c ed on for dit tre ion atm tre en ate t( d/ 1,0 22 )

found in taxon (581,407)

part of / has part

MeSH ID: 608 ChEBI ID: 478 MeSH Code: 443 Freebase ID: 422 KEGG ID: 395 ATC code: 390 ChemSpider ID: 316 ...

pharmaceutical product 2,731

pa rt /

therapeutic use 803

taxon 2,600,217 Global Biodiversity Information Facility ID: 2,058,609 Encyclopedia of Life ID: 1,354,013 IRMNG ID: 1,214,539 iNaturalist taxon ID: 569,998 ITIS TSN: 533,003 IPNI plant ID: 488,933 NCBI Taxonomy ID: 471,220 ...

ha s

has active ingredient / active ingredient in (236)

part of / has part (726)

physically interacts with (675)

stereoisomer of (642) physically interacts with (3,924) significant drug interaction (1,725)

Figure 1. A simplified class-level diagram of the Wikidata knowledge graph for biomedical entities. Each box represents one type of biomedical entity. The header displays the name of that entity type (e.g., pharmaceutical product) and the number of Wikidata items for that entity type. The lower portion of each box displays a partial listing of attributes about each entity type and the number of Wikidata items for each attribute. Edges between boxes represent the number of Wikidata statements corresponding to each combination of subject type, predicate, and object type. For example, there are 1505 statements with ’pharmaceutical product’ as the subject type, ’therapeutic area’ as the predicate, and ’disease’ as the object type. For clarity, edges for reciprocal relationships (e.g., ’has part’ and ’part of’) are combined into a single edge, and scientific articles (which are widely cited in statement references) have been omitted. All counts of Wikidata items are current as of September 2019. The most common data sources cited as references are available in Figure 1—source data 1. Data are generated using the code in https://github.com/SuLab/genewikiworld (archived at Mayers et al., 2020). A more complete version of this graph diagram can be found at https://commons.wikimedia.org/wiki/File:Biomedical_ Knowledge_Graph_in_Wikidata.svg. The online version of this article includes the following source data and figure supplement(s) for figure 1: Source data 1. Most frequent data sources cited as references for the biomedical subset of the Wikidata knowledge graph shown in Figure 1. Figure supplement 1. Trends in Wikidata edits.

focused on those with a clear clinical or therapeutic relevance. Chemical compounds including drugs: Wikidata has items for over 150 thousand chemical compounds, including over 3500 items which are specifically designated as medications. Compound attributes are drawn from a diverse set of databases, including PubChem

Waagmeester et al. eLife 2020;9:e52614. DOI: https://doi.org/10.7554/eLife.52614

(Wang et al., 2009), RxNorm (Nelson et al., 2011), the IUPHAR Guide to Pharmacology (Harding et al., 2018; Pawson et al., 2014; Southan et al., 2016), NDF-RT (National Drug File – Reference Terminology), and LIPID MAPS (Sud et al., 2007). These items typically contain statements describing chemical structure and key physicochemical properties, and links to

3 of 15