############################################################################################################################################################################ # # Title: DisGeNET-RDF data dump # # DisGeNET Version: 4.0 # # RDF Version: 4.0.0 # # Date: 05/11/2016 # ############################################################################################################################################################################ DisGeNET-RDF v4.0.0 (2016) - Release notes and bugs -------------------------------------------------------------------------------- DisGeNET-RDF dataset is the formal semantics representation of the DisGeNET database, which is a database integrating gene-disease associations from several public sources and the literature. DisGeNET-RDF v4.0.0 dataset is the first release of the RDF distribution of DisGeNET Version v4.0, and it is available for download as of May 11, 2016. DisGeNET-RDF v4.0.0 has new annotation and new linksets: - More GDAs comprising 17,381 genes and 15,093 diseases as linked data in the Semantic Web. - New text mined gene-disease associations from MEDLINE (BeFree source v4.0). - New sources: ORPHANET and NHGRI-EBI GWAS CATALOG (curated). - All linksets updated, i.e. all ontologies updated. - New disease annotation: all diseases annotated to the original term(s) of provenance. - Phenotype annotation: disease-phenotype associations from the Human Phenotype Ontology project (HPO). - New disease linksets: * EFO - RDF enhanced: * Updated RDF Schema to encompass new annotations. * New 303 URIs for DisGeNET Disease Specificity and Disease Pleiotropy Index entities. * Changed formal description of the property linking diseases to their phenotypic profile: sio:'is manifested as' (sio:SIO_000341) replaced by sio:'has phenotype' (sio:SIO_001279). - RDF bugs fixed: * Fixed formal description of Literal resources adding the language datatype. - Interlinking: More mappings to the Linked Open Data cloud. - Metadata description of the dataset which is compliant to the W3C HCLS and Open PHACTS specifications. The full description of the DisGeNET-RDF v4.0.0 is available in RDF for download as 'void.ttl' file, which contains release statistics. Additional information regarding DisGeNET is available on the DisGeNET homepage at http://www.disgenet.org/. -------------------------------------------------------------------------------- DisGeNET-RDF v4.0.0 (2016) - data dump -------------------------------------------------------------------------------- The DisGeNET-RDF dataset v4.0.0 and its metadata description is distributed within sixteen files: - geneDiseaseAssociation.ttl.tgz: all gene-disease associations triples and related annotated objects. - disease.ttl.gz: all diseases triples and related annotated objects. - gene.ttl.gz: all genes triples and related annotated objects. - meshClass.ttl.gz: all MeSH disease classes triples. - doClass.ttl.gz: all DO classes triples. - hpoClass.ttl.gz: all HPO classes triples. - umlsSTY.ttl.gz: all UMLS Semantic Types categories triples. - hpoAnnotation.ttl.gz: HPO disease-phenotype annotation triples. - geneSymbol.ttl.gz: all HGNC symbols triples. - protein.ttl.gz: all protein triples. - pantherClass.ttl.gz: all PANTHER classes triples. - pathway.ttl.gz: all pathways triples. - phenotype.ttl.gz: all phenotypes triples. - pubmed.ttl.gz: all PubMed articles triples. - snp.ttl.gz: all SNPs triples. - void.ttl.gz: Metadata description triples of the DisGeNET-RDF dataset, which is W3C HCLS compliant. Otherwise, there is the option to download the entire dump at once: - disgenetv4.0-rdf-v4.0.0-dump.tgz The dump dataset is serialized in RDF/Turtle format. The linksets are embedded in these files and also extracted in independent files named 'ls-id1-id2-rdflink.ttl.gz'. They are located in the 'linksets' folder. Note that there is the option to download all gene-disease associations distributed in smaller files located in the 'gda-batch' folder. This dataset is the RDF representation of the DisGeNET version 4.0. There are other files for download related to the v4.0 of DisGeNET-RDF such as the OWL DisGeNET ontology, the data model graphic or the DisGeNET-RDF-example.txt with a sample of the RDF description of each concept in the dataset. For more information about the RDF dataset, please visit the Web site at: http://rdf.disgenet.org/ -------------------------------------------------------------------------------- DisGeNET Nanopublications v4.0.0.0 (2016) - data dump -------------------------------------------------------------------------------- DisGeNET Nanopublications linked dataset is the nanopublication representation of the DisGeNET database. The DisGeNET Nanopublications dataset v4.0.0.0 is a distribution of the DisGeNET v4.0. It is dowloadable as a unique file: - nanopublications_v4.0.0.0.trig.gz: all nanopublications. The dump dataset is serialized in RDF/TriG format. This is a trusty nanopublication dataset (see http://trustyuri.net/). For more information about the nanopublication dataset, please visit the Web site at: http://rdf.disgenet.org/ -------------------------------------------------------------------------------- Attribution -------------------------------------------------------------------------------- If you use DisGeNET, you are requested to cite the source articles: Núria Queralt-Rosinach, Janet Piñero, Àlex Bravo, Ferran Sanz, Laura I Furlong. DisGeNET-RDF: Harnessing the Innovative Power of the Semantic Web to Explore the Genetic Basis of Diseases. Bioinformatics (2016) doi: 10.1093/bioinformatics/btw214 N. Queralt-Rosinach, T. Kuhn, C. Chichester, M. Dumontier, F. Sanz, and L.I. Furlong, "Publishing DisGeNET as Nanopublications", Semantic Web Journal, vol. Preprint, no. Preprint, pp. 1-10, 2015. DOI: 10.3233/SW-150189 Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Jordi Deu-Pons, Anna Bauer-Mehren, Martin Baron, Ferran Sanz, Laura I Furlong. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database (2015) Vol. 2015: article ID bav028; doi:10.1093/database/bav028 Anna Bauer-Mehren, Markus Bundschus, Michael Rautschka, Miguel A. Mayer, Ferran Sanz, Laura I. Furlong. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS ONE 2011 6(6): e20284. doi:10.1371/journal.pone.0020284. Bauer-Mehren A, Rautschka M, Sanz F, Furlong LI. DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene-disease networks. Bioinformatics. 2010 Nov 15;26(22):2924-6. Epub 2010 Sep 21. To cite specific data: Gene-disease association data were retrieved from the DisGeNET Database, GRIB/IMIM/UPF Integrative Biomedical Informatics Group, Barcelona. (http://www.disgenet.org/). [Month, year of data retrieval]. -------------------------------------------------------------------------------- License information -------------------------------------------------------------------------------- The DisGeNET database is made available under the Open Database License whose full text can be found at http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License whose text can be found at http://opendatacommons.org/licenses/odbl/1.0/. If DisGeNET is incorporated into other works, we ask that the DisGeNET IDs are preserved, and that the release number of DisGeNET is clearly displayed. Please, see more information on legal notices at http://www.disgenet.org/ds/DisGeNET/html/legal.html -------------------------------------------------------------------------------- Contact us -------------------------------------------------------------------------------- Integrative Biomedical Informatics Group Research Unit on Biomedical Informatics - GRIB Barcelona Biomedical Research Park - PRBB Dr. Aiguader 88 08003 Barcelona email: lfurlong(at)imim(dot)es phone: +34 93 316 0521 fax: +34 93 316 0550 web: http://ibi.imim.es/ -------------------------------------------------------------------------------- Help and troubleshooting -------------------------------------------------------------------------------- If you have any suggestion, question or comment about DisGeNET-RDF datasets, please do not hesitate to contact us: Support team Email: support(at)disgenet(dot)org © 2010-2016, Integrative Biomedical Informatics Group