Welcome to the homepage of the Integrative Data Science Lab (IDSL) in the School of Informatics, Computing and Engineering (SICE) at Indiana University in Bloomington, Indiana. Integrative Data Science is about bringing together complex data sources, novel analysis methods that cross these networked data silos, and diverse skill sets to solve real world problems. IDSL's research areas include digital health, molecular therapeutics, and crisis & emergency response.
PEOPLE
PROJECTS
David Wild, PhD, Professor and Director
Abhik Seal, PhD, Adjunct Professor
Jeremy Yang, PhD, Adjunct Professor
Logan Paul, PhD Student
Former members (and current affiliations):
Samuel Bentum, PhD
Chris Gessner, PhD
Rajarshi Guha, PhD (Vertex)
Qian Zhu, PhD, (Mayo Clinic)
Varsha Kulkarni, PhD (Harvard)
Xiao Dong, PhD (UIC)
Huijun Wang, PhD (Pfizer)
Pulan Yu, PhD, Dow Agrochemical
Bin Chen, PhD (Michigan State)
Hari Machina, PhD (Amgen)
Jae Hong Shin, PhD (NetTargets)
Anurag Passi, PhD (UCSD)
Dazhi Jiao (Amazon)
Alex Christou
Stefan Furrer (Givaudan)
Natalie Franklin (Lilly)
-
Breaking down data silos. Including landmark Chem2Bio2RDF project, which uses semantic technologies to integrate networks of chemistry, biology and biomedical data
-
Association finding across data silos using higly novel path-based prediction tools. Includes SLAP algorithm and recent random walk methods
-
Integrative data mining in healthcare data including biomedical, adverse event, and electronic medical records.
-
Big data mining for Automated Chemical Synthesis. Cheminformatics big data approaches to the next generation of chemical synthesis
-
Integrative data science for disaster risk, resilience and expenditure helping local communities and federal agencies better plan for climate change, and to better use disaster recovery funds to increase resilience
-
Integrative data mining for emergency response. Mining critical information for emergency responders
NEWS & EVENTS
-
Christopher Gessner successfully defends his PhD dissertation: "An Integrative in silico Approach to Preclinical Drug Discovery", on May 26, 2023. Congratulations Dr. Gessner!
-
Samuel Bentum successfully defends his PhD dissertation: "Digital Transformation Strategies for Applied Science Domains", on April 11, 2023. Congratulations Dr. Bentum!
-
Jeremy Yang, PhD, appointed Adjunct Professor in the Department of Informatics and Computing, September 2022.
-
Abhik Seal, PhD, appointed Adjunct Professor in the Department of Informatics and Computing, September 2022.
-
Jeremy Yang successfully defends his PhD dissertation: "Evidence evaluation in biomedical knowledge graphs for pharmaceutical discovery", on March 13, 2022. Congratulations Dr. Yang!
EDUCATIONAL PROJECTS & RESOURCES
LATEST & SELECTED PUBLICATIONS
-
Jeremy J. Yang, Christopher R. Gessner, Joel L. Duerksen, Daniel Biber, Jessica L. Binder, Murat Ozturk, Brian Foote, Robin McEntire, Kyle Stirling, Ying Ding & David J. Wild. Knowledge graph analytics platform with LINCS and IDG for Parkinson's disease target illumination. BMC Bioinformatics, 2022.
-
Gao, Z. Fu, G., Ouyang, C., Tsutsui, S., Liu, X., Yang, J., Gessner, C., Foote, B., Wild, D.J., Yu, Q., and Ding, Y. Edge2vec: Representation learning using edge semantics for biomedical knowledge discovery. BMC Bioinformatics. 2019, 20:306.
-
Meng, G., Huang, Y., Yu, Q., Ding, Y., Wild, D., Zhao, Y., Liu, X, Min, S. Adopting Literature-based Discovery on Rehabilitation Therapy Repositioning for Stroke. Frontiers in Neuroscience. March, 2019. https://doi.org/10.3389/fninf.2019.00017
-
Seal, A., Wild. D.J. Netpredictor: R and Shiny package to perform drug-target network analysis and prediction of missing links. BMC Bioinformatics, 2018, 19(1), 265.
-
Passi, A., Rajput, N.K., Wild, D.J., Bhardwaj, A.RepTB: a gene ontology based drug repurposing approach for tuberculosis. Journal of Cheminformatics, 2018, 10:24
-
Correia, R.B., de Araújo, L.P., Mattos, M.M.,Wild, D., Rocha, L.M. City-wide Analysis of Electronic Health Records Reveals Gender and Age Biases in the Administration of Known Drug-Drug Interactions. arXiv:1803.035712018/3.
-
Djokic-Petrovic, M., Cvjetkovic, V., Yang, J., Marko Zivanovic, M., Wild, D.J.PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets. Journal of Biomedical Semantics, 2017, 8(1) 42.
-
Kulkarni, V., Wild, D.J. An activity canyon characterization of the pharmacological topography. Journal of Cheminformatics 2016, 8 (1), 41.
-
Fox, G., Maini, S., Rosenbaum, H., Wild, D. Data Science and Online Education. 2015 IEEE 7th International Conference on Cloud Computing Technology and Science.
-
Seal, A., Ahn, Y.Y., Wild, D.J. Optimizing drug-target interaction prediction based on random walk on heterogeneous networks. Journal of Cheminformatics. 2015, 7 (1), 1.
-
Lee, J.A., et. al. Novel phenotypic outcomes identified for a public collection of approved drugs from a publicly accessible panel of assays. 2015, PLoS One 10(7) e0130796
IU Data Science Program
MS and Certificate Programs in Data Science at Indiana University SOIC
Data Science in Drug Discovery, Health and Translational Medicine.
An online IU course with freely accessible online resources
Cheminformatics OLCC
An NSF project to develop a radical new hybrid online/local approach for teaching cheminformatics to undergraduate chemistry students
LearnCheminformatics.com
Free and low cost resources for learning cheminformatics, including an eBook and course materials
IDSL TOOLS
NetPredictor
R package for prediction of missing links in any given unipartite or bipartite network using Random Walk with Restart and Network inference algorithm.
NetPredictor is described in Seal, A. et al, BMC Bioinformatics, 2018, 19, A265
SLAP
A missing-link prediction tool derived from social networking that uses the Chem2Bio2RDF network to predict association between drugs and gene targets.
SLAP is described in Chen, B. et al., PLoS Computational Biology, 2012, 8(7), e1002574
A searchable semantic network of public drug discovery linking chemical compounds with genes, diseases, targets, pathways and adverse effects that allows cross-dataset querying using SPARQL.
Chem2Bio2RDF is described in Chen, B. et al., BMC Bioinformatics 2010, 11, 255. The associated Chem2BioOWL ontology is described in Chen, B., et al., Journal of Cheminformatics 2012, 4:6.
Drug Repurposing Explorer - A prototype tool for ranking known drugs to queries using fused similarity of chemical structure fingerprints, side effects, biological targets, shape, and disease association
Bioterm Literature Association Score Calculator (BLASC). Predicts association between genes, drugs and diseases from data mining of recent PubMed scholarly journal articles using a BioLDA Topic Model. You can specify a start node (e.g. a gene), and intermediate and end node types, and the tool will produce a list of the most strongly associated end node types (e.g. drugs). The BioLDA algorithm is described in our recent paper Wang, H. et al., PLoS One, 2011, 6(3), e17243.
Drugbank Semantic Faceted Browser - A prototype semantic browsing tool that demonstrates how semantic annotation can be used for faceted browsing of drug data. Currently works on a small subset of Drugbank as a proof-of-concept only.
WENDI - a tool for finding non-obvious relationships between chemical compounds and biology that aggregates information from databases, extracted from the literature, and computational predictions. Funded by Eli Lilly. Extended to use an RDF inference engine to make predictions of compound-disease relationships using a rule-base. For more information, see our recent papers Zhu et al., Journal of Cheminformatics, 2010, 2:6 and Zhu et al., BMC Bioinformatics, 2011, 12, 256.
PARTNERS & RECENT SUPPORT
IDSL is affiliated with the IU Network Science Institute (IUNI)
Informatics in Disasters and Emergency Response
An online IU course with freely accessible online resources
IDSL works in close collaboration with IU's Web Science Lab, led by Dr Ying Ding
We are an affiliated partner of the EU OpenPHACTS project
Recent support given by