############################################################################################################################################################################ # # Title: DisGeNET-RDF data dump # # DisGeNET Version: 5.0 # # RDF Version: 5.0.0 # # Date: 19/11/2017 # ############################################################################################################################################################################ DisGeNET-RDF v5.0.0 (2017) - Release notes and bugs -------------------------------------------------------------------------------- DisGeNET-RDF dataset is the formal semantics representation of the DisGeNET database, which is a database integrating gene-disease associations from several public sources and the literature. DisGeNET-RDF v5.0.0 dataset is the first release of the RDF distribution of DisGeNET Version v5.0, and it is available for download as of November 19, 2017. The RDF distribution of DisGeNET includes all DisGeNET v5.0 new content, besides new annotation and new linksets: All linksets updated, i.e. all ontologies updated. Disease-phenotype annotation data have been integrated from 3 different sources: Annotations from the Human Phenotype Ontology Text-mined annotations from The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. Groza et al., 2015 Text-mined annotations from Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases. Hoehndorf et al., 2015. RDF enhancement and data model changes The full description of the DisGeNET-RDF v5.0.0 is available in RDF for download as 'void.ttl' file, which contains release statistics. Additional information regarding DisGeNET is available on the DisGeNET homepage at http://www.disgenet.org/. -------------------------------------------------------------------------------- DisGeNET-RDF v5.0.0 (2017) - data dump -------------------------------------------------------------------------------- The DisGeNET-RDF dataset v5.0.0 and its metadata description is distributed within several files: - void.ttl.tar.gz: Metadata description triples of the DisGeNET-RDF dataset, which is W3C HCLS compliant. - vda_score.ttl.tar.gz: all VDA score triples. - vda.ttl.tar.gz: all variant-disease associations triples and related annotated objects. - variant.ttl.tar.gz: all variants triples and related annotated objects. - umlsSTY.ttl.tar.gz: all UMLS semantic types triples. - pubmed.ttl.tar.gz: all PubMed publications triples. - protein.ttl.tar.gz: all proteins triples. - phenotype.ttl.tar.gz: all phenotypes triples. - pda.ttl.tar.gz: all phenotype-disease annotations triples and related annotated objects. - pantherClass.ttl.tar.gz: all protein classes according Panther classification - meshClass.ttl.tar.gz: all MeSH classes - hpoClass.ttl.tar.gz: all HPO classes triples - geneSymbol.ttl.tar.gz: all gene symbols triples - gene.ttl.tar.gz: all genes triples - gda.ttl.tar.gz: all gene-disease associations - gda_score.ttl.tar.gz: all GDA score triples. - doClass.ttl.tar.gz: all DO classes triples. - disease.ttl.tar.gz: all diseases triples and related annotated objects. Otherwise, there is the option to download the entire dump at once: - disgenetv5.0-rdf-v5.0.0-dump.tgz The dump dataset is serialized in RDF/Turtle format. This dataset is the RDF representation of the DisGeNET version 5.0. The linksets are embedded in these files and also extracted in independent files named 'ls-id1-id2-rdflink.ttl'. They are located in the 'linksets' folder. Note that there is the option to download all gene-disease associations distributed in smaller files located in the 'gda-batch' folder. There are other files for download related to the v5.0 of DisGeNET-RDF such as the OWL DisGeNET ontology, the data model graphic or the DisGeNET-RDF-example.txt with a sample of the RDF description of each concept in the dataset. For more information about the RDF dataset, please visit the Web site at: http://rdf.disgenet.org/ -------------------------------------------------------------------------------- DisGeNET Nanopublications v5.0.0.0 (2017) - data dump -------------------------------------------------------------------------------- DisGeNET Nanopublications linked dataset is the nanopublication representation of the DisGeNET database. The DisGeNET Nanopublications dataset v5.0.0.0 is a distribution of the DisGeNET v4.0. It is dowloadable as a unique file: - nanopublications_v5.0.0.0.trig.gz: all nanopublications. The dump dataset is serialized in RDF/TriG format. This is a trusty nanopublication dataset (see http://trustyuri.net/). For more information about the nanopublication dataset, please visit the Web site at: http://rdf.disgenet.org/ -------------------------------------------------------------------------------- Attribution -------------------------------------------------------------------------------- If you use DisGeNET, you are requested to cite the source articles: Núria Queralt-Rosinach, Janet Piñero, Àlex Bravo, Ferran Sanz, Laura I Furlong. DisGeNET-RDF: Harnessing the Innovative Power of the Semantic Web to Explore the Genetic Basis of Diseases. Bioinformatics (2016) doi: 10.1093/bioinformatics/btw214 N. Queralt-Rosinach, T. Kuhn, C. Chichester, M. Dumontier, F. Sanz, and L.I. Furlong, "Publishing DisGeNET as Nanopublications", Semantic Web Journal, vol. Preprint, no. Preprint, pp. 1-10, 2015. DOI: 10.3233/SW-150189 Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Jordi Deu-Pons, Anna Bauer-Mehren, Martin Baron, Ferran Sanz, Laura I Furlong. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (2015) Vol. 2015: article ID bav028; doi:10.1093/database/bav028 Anna Bauer-Mehren, Markus Bundschus, Michael Rautschka, Miguel A. Mayer, Ferran Sanz, Laura I. Furlong. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS ONE 2011 6(6): e20284. doi:10.1371/journal.pone.0020284. Bauer-Mehren A, Rautschka M, Sanz F, Furlong LI. DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks. Bioinformatics. 2010 Nov 15;26(22):2924-6. Epub 2010 Sep 21. To cite specific data: Gene-disease association data were retrieved from the DisGeNET Database, GRIB/IMIM/UPF Integrative Biomedical Informatics Group, Barcelona. (http://www.disgenet.org/). [Month, year of data retrieval]. -------------------------------------------------------------------------------- License information -------------------------------------------------------------------------------- The DisGeNET database is made available under the Attribution-NonCommercial-ShareAlike 4.0 International License https://creativecommons.org/licenses/by-nc-sa/4.0/. Any rights in individual contents of the database are licensed under the Attribution-NonCommercial-ShareAlike 4.0 International License https://creativecommons.org/licenses/by-nc-sa/4.0/. If DisGeNET is incorporated into other works, we ask that the DisGeNET IDs are preserved, and that the release number of DisGeNET is clearly displayed. Please, see more information on legal notices at http://www.disgenet.org/ds/DisGeNET/html/legal.html -------------------------------------------------------------------------------- Contact us -------------------------------------------------------------------------------- Integrative Biomedical Informatics Group Research Unit on Biomedical Informatics - GRIB Barcelona Biomedical Research Park - PRBB Dr. Aiguader 88 08003 Barcelona email: lfurlong(at)imim(dot)es phone: +34 93 316 0521 fax: +34 93 316 0550 web: http://ibi.imim.es/ -------------------------------------------------------------------------------- Help and troubleshooting -------------------------------------------------------------------------------- If you have any suggestion, question or comment about DisGeNET-RDF datasets, please do not hesitate to contact us: Support team Email: support(at)disgenet(dot)org © 2010-2017, Integrative Biomedical Informatics Group