Welcome to the homepage of the Integrative Data Science Lab (IDSL) in the School of Informatics, Computing and Engineering (SICE) at Indiana University in Bloomington, Indiana. Integrative Data Science is about bringing together complex data sources, novel analysis methods that cross these networked data silos, and diverse skill sets to solve real world problems. IDSL's research areas include digital health, molecular therapeutics, and crisis & emergency response.
Dr David Wild, Director
Samuel Bentum, Ph.D. Student
Natalie Franklin, Ph.D. Student
Stefan Furrer, Ph.D. Student
Chris Gessner, Ph.D. Student
Logan Paul, Ph.D. Student
Jeremy Yang, Ph.D. Student
Dr Xiao Dong (PhD), UIC
Dr Huijun Wang,(PhD), Pfizer Inc
Dr Pulan Yu (PhD), Dow Agrochemical
Dr Abhik Seal (PhD)
Dr Jae Hong Shin, (PhD)
Dr Anurag Passi, (PhD Fulbright Fellow)
Breaking down data silos. Including landmark Chem2Bio2RDF project, which uses semantic technologies to integrate networks of chemistry, biology and biomedical data
Association finding across data silos using higly novel path-based prediction tools. Includes SLAP algorithm and recent random walk methods
Integrative data mining in healthcare data including biomedical, adverse event, and electronic medical records.
Big data mining for Automated Chemical Synthesis. Cheminformatics big data approaches to the next generation of chemical synthesis
Integrative data science for disaster risk, resilience and expenditure helping local communities and federal agencies better plan for climate change, and to better use disaster recovery funds to increase resilience
Integrative data mining for emergency response. Mining critical information for emergency responders
NEWS & RECENT PRESENTATIONS
David Wild presents on Applying semantic and network methods in AOP knowledge discovery at the NIH AOP Workshop in Bethesda, MD, September 2014
David Wild presents on Opportunities in Toxicology for Large Scale Semantic Linked Data and Prediction, at the Society of Toxicology Meeting in Phoenix, AZ, March 2014
Congratulations to recent successfully defended PhD students Dr Jae Hong Shin and Dr Hari Machina!
David Wild co-organizes Semantic Technologies in Translational Medicine and Drug Discovery session at Indianapolis ACS, Sep 2013
Social media in cheminformatics education. American Chemical Society National Meeting, Indianapolis, Sep 2013
David Wild co-organizes the Exploiting Big Data Semantics for Translational Medicine workshop at IU, March 2013
Large scale cross dataset mining of chemical and biological datasets for drug discovery. American Chemical Society CINF Webinar, May 2013 (Link to CINF site with recording)
New opportunities for biomedical science and drug discovery using semantic technologies. Exploiting Big Data Semantics for Translational Medicine, Indiana University, Bloomington, March 2013.
Semantic integration, search and prediction on drug discovery data at Indiana University. Leiden University Medical Center, July 2012.
Assessing drug target association using semantic linked data. American Chemical Society National Meeting, San Diego, March 2012.
EDUCATIONAL PROJECTS & RESOURCES
IU Data Science Program
MS and Certificate Programs in Data Science at Indiana University SOIC
Data Science in Drug Discovery, Health and Translational Medicine.
An online IU course with freely accessible online resources
An NSF project to develop a radical new hybrid online/local approach for teaching cheminformatics to undergraduate chemistry students
Free and low cost resources for learning cheminformatics, including an eBook and course materials
Informatics in Disasters and Emergency Response
An online IU course with freely accessible online resources
Gao, Z. Fu, G., Ouyang, C., Tsutsui, S., Liu, X., Yang, J., Gessner, C., Foote, B., Wild, D.J., Yu, Q., and Ding, Y. Edge2vec: Representation learning using edge semantics for biomedical knowledge discovery. BMC Bioinformatics. 2019, 20:306.
Meng, G., Huang, Y., Yu, Q., Ding, Y., Wild, D., Zhao, Y., Liu, X, Min, S. Adopting Literature-based Discovery on Rehabilitation Therapy Repositioning for Stroke. Frontiers in Neuroscience. March, 2019.
Seal, A., Wild. D.J. Netpredictor: R and Shiny package to perform drug-target network analysis and prediction of missing links. BMC Bioinformatics, 2018, 19(1), 265.
Passi, A., Rajput, N.K., Wild, D.J., Bhardwaj, A.RepTB: a gene ontology based drug repurposing approach for tuberculosis. Journal of Cheminformatics, 2018, 10:24
Correia, R.B., de Araújo, L.P., Mattos, M.M.,Wild, D., Rocha, L.M. City-wide Analysis of Electronic Health Records Reveals Gender and Age Biases in the Administration of Known Drug-Drug Interactions. arXiv:1803.035712018/3.
Djokic-Petrovic, M., Cvjetkovic, V., Yang, J., Marko Zivanovic, M., Wild, D.J.PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets. Journal of Biomedical Semantics, 2017, 8(1) 42.
Kulkarni, V., Wild, D.J. An activity canyon characterization of the pharmacological topography. Journal of Cheminformatics 2016, 8 (1), 41.
Fox, G., Maini, S., Rosenbaum, H., Wild, D. Data Science and Online Education. 2015 IEEE 7th International Conference on Cloud Computing Technology and Science.
Seal, A., Ahn, Y.Y., Wild, D.J. Optimizing drug-target interaction prediction based on random walk on heterogeneous networks. Journal of Cheminformatics. 2015, 7 (1), 1.
Lee, J.A., et. al. Novel phenotypic outcomes identified for a public collection of approved drugs from a publicly accessible panel of assays. 2015, PLoS One 10(7) e0130796
Chen, B., Wang, H., Ding, Y., Wild, D. Semantic Breakthrough in Drug Discovery. Synthesis Lectures on the Semantic Web: Theory and Technology, Morgan & Claypool, 2014, 4(2) p1-142.
Henschel, R., et. al. Applications of the YarcData Urika in Drug Discovery and Healthcare.
Joshi, H., Parihar, A., Jiao, D., Murali, S., Wild, D.J. A possible gut microbiota basis for weight gain side effects of antipsychotic drugs. Eprint arXiv:1401.2389, 2014/01.
Chen, B., Wild, D.J. Practice and challenges of building a semantic framework for chemogenomics research. Molecular Informatics, 2013, 32:11/12 pp1000-1008
Machina, H., Wild, D.J., Dey, P., Merchant, M. Effective integration of informatics tools to enhance the drug discovery process. Industrial & Engineering Chemistry Research, 2013, 52(47), pp16547-16554
Wild, D.J. Cheminformatics for the masses: a chance to increase educational opportunities for the next generation of cheminformaticians. Journal of Cheminformatics, 2013, 5:32
Willighagen E., Waagmeester, A., Spjuth, O., Ansell, P., Williams, A.J., Tkachenko, V., Hastings, J., Chen, B. and Wild, D.J. The ChEMBL database as linked open data. Journal of Cheminformatics, 2013, 5:23
Machina, H.K. and Wild, D.J. Electronic laboratory notebooks: progress and challenges in implementation. Journal of Laboratory Automation, 2013, in press.
Seal, A., Yogeeswari, P., Sriram, D., Wild, D.J. Enhanced ranking of PknB inhibitors using data fusion methods. Journal of Cheminformatics, 2013, 5:3.
Machina, H.K. and Wild, D.J. Laboratory informatics tools integration strategies for drug discovery, Journal of Laboratory Automation, 2013, 18(2), 126-136.
R package for prediction of missing links in any given unipartite or bipartite network using Random Walk with Restart and Network inference algorithm.
NetPredictor is described in Seal, A. et al, BMC Bioinformatics, 2018, 19, A265
A missing-link prediction tool derived from social networking that uses the Chem2Bio2RDF network to predict association between drugs and gene targets.
SLAP is described in Chen, B. et al., PLoS Computational Biology, 2012, 8(7), e1002574
A searchable semantic network of public drug discovery linking chemical compounds with genes, diseases, targets, pathways and adverse effects that allows cross-dataset querying using SPARQL.
Chem2Bio2RDF is described in Chen, B. et al., BMC Bioinformatics 2010, 11, 255. The associated Chem2BioOWL ontology is described in Chen, B., et al., Journal of Cheminformatics 2012, 4:6.
Drug Repurposing Explorer - A prototype tool for ranking known drugs to queries using fused similarity of chemical structure fingerprints, side effects, biological targets, shape, and disease association
Bioterm Literature Association Score Calculator (BLASC). Predicts association between genes, drugs and diseases from data mining of recent PubMed scholarly journal articles using a BioLDA Topic Model. You can specify a start node (e.g. a gene), and intermediate and end node types, and the tool will produce a list of the most strongly associated end node types (e.g. drugs). The BioLDA algorithm is described in our recent paper Wang, H. et al., PLoS One, 2011, 6(3), e17243.
Drugbank Semantic Faceted Browser - A prototype semantic browsing tool that demonstrates how semantic annotation can be used for faceted browsing of drug data. Currently works on a small subset of Drugbank as a proof-of-concept only.
WENDI - a tool for finding non-obvious relationships between chemical compounds and biology that aggregates information from databases, extracted from the literature, and computational predictions. Funded by Eli Lilly. Extended to use an RDF inference engine to make predictions of compound-disease relationships using a rule-base. For more information, see our recent papers Zhu et al., Journal of Cheminformatics, 2010, 2:6 and Zhu et al., BMC Bioinformatics, 2011, 12, 256.
PARTNERS & RECENT SUPPORT
IDSL is affiliated with the IU Network Science Institute (IUNI)
IDSL works in close collaboration with IU's Web Science Lab, led by Dr Ying Ding
We are an affiliated partner of the EU OpenPHACTS project
Recent support given by