Data Science and Informatics Core for Cancer Research

资源


Data

DSICCR and SBMI established a Data Service Coordination Office that will host, manage and provide access to various data ranging from electronic health records to -omics data for cancer research. The SBMI Data Service Coordination Office is directed by DSICCR co-investigator Dr. Hua Xu. For more information or to access data, please visit the office’s website at:https://sbmi.uth.edu/sbmi-data-service/



Hardware Infrastructure

DSICCR建立了用于癌症研究的强大计算基础设施。beplay苹果手机能用吗基础架构包括用于高级数据科学研究的以下最先进的计算硬件。beplay苹果手机能用吗

  1. An EXXACT TS4 deep learning certified systems with 24 CPU cores, 9 Nvidia RTX 2080 Ti GPUs (43,520 cores), 376 GB memory, and 12 TB internal storage
  2. 最新的NVIDIA DGX-A100 GPU服务器,每个服务器都有5个PETAFLOPS AI/10 PETAOPS INT8性能,8个最新的NVIDIA A100 GPU,12 NVLINKS/GPUS/GPUS(600GB/S GPU(600GB/S GPU)到GPU到GPU,320 GB Total GPU Memory,Nvidia Cuda Cuda Cuda Cuda Cuda Cuda:65536,NVIDIA张量核心:4096,1 TB DDR4内存和30 TB SSD存储
  3. One SuperMicro 8-way systems with 448 CPU core, 1 Xilinx Alveo U250 FPGA, 6TB system memory, 60TB local SSD drives.
  4. NFS服务器和多个直接连接设备提供的1 pb存储空间
  5. 一个36计算节点dell emc hadoop群集,总计864个计算内核,12.8 TB组合内存和1.5 pb(RAW)存储

These advanced systems are connected by 10GBS and 25GBS high speed networks to allow fast transfer of large data files in terabytes, providing supports to advanced computational needs with intensive memory and extreme parallelization requirements.

此外,德克萨斯州高级计算中心(TACC),可通过高SBMI研究员beplay苹果手机能用吗h-speed network (Internet II) connections. TACC is equipped with many robust high-performance computing systems, including Frontera—the fifth most powerful supercomputer in the world (2019 ranking). TACC's ultimate science environment includes high performance computing, visualization, data analysis, storage systems, software, and portal interfaces that enable researchers to answer questions more efficiently and effectively using advanced computing resources. TACC provides systems and software to researchers and have worked on over 3000 projects by more than 1000 researchers at over 350 institutions nationally and worldwide that address scientific concepts to improve the quality of life.

AI和大数据的高级硬件基础架构



DSICCR教师的软件/算法

DSICCR教师正在进行尖端数据科学和信息学研究,并开发了许多用于生物医学数据分析的软件和算法。beplay苹果手机能用吗这些软件工具在下面列出,可用于癌症研究人员。beplay苹果手机能用吗

Name 描述
MSEA: Mutation Set Enrichment Analysis MSEA通过检测体细胞突变(又称热点)来鉴定癌症驱动基因。
  1. Faculty/PI Name: Peilin Jia
  2. 出版物:Jia P,Wang Q,Chen Q,Hutchinson K,Pao W,Zhao Z(2014)MSEA:通过突变集富集分析对突变热点进行检测和定量。基因组生物学15(10):489
  3. Website Link:https://github.com/bsml320/MSEA

VarWalker VarWalker performs mutation network analysis of putative cancer genes from next-generation sequencing data.
  1. Faculty/PI Name: Peilin Jia
  2. Publications: Jia P, Zhao Z (2014) VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-generation Sequencing Data. PLoS Computational Biology 10(2): e1003460
  3. Website Link:https://bioinfo.uth.edu/VarWalker.html

收集 It is a Gene Annotation Tool to Help Explain Relationships.
  1. Faculty Name:Jeffrey Chang
  2. Publications:
    Chang, J. and Nevins, J. (2006). GATHER: a systems approach to interpreting genomic signatures. Bioinformatics, 22(23), pp.2926-2933.
  3. Website Link:http://changlab.uth.tmc.edu/gather/

SIGNATURE It is a web-based resource that simplifies gene expression signature analysis by providing software, data, and protocols to perform the analysis successfully.
  1. Faculty/PI Name:Jeffrey Chang
  2. Publications:
    Chang JT, Gatza ML, Lucas JE, Barry W, Vaughn P, and Nevins JR. "SIGNATURE: A Workbench for Gene Expression Signature Analysis." BMC Bioinformatics 12(443), 2011
  3. Website Link:https://uth.tmc.edu
TREC精密医学 An information retrieval (IR) tool for finding relevant precision medicine scientific literature and clinical trials for specific cancer patients.
  1. Faculty/PI Name:柯克·罗伯茨(Kirk Roberts)
  2. Publications:
    • Roberts K, Demner-Fushman D, Voorhees E, Hersh WR, Bedrick S, Lazar A, Pant S. (2017). Overview of the TREC 2017 Precision Medicine Track. Proceedings of the Text Retrieval Conference.
    • Roberts K. Automatic Identification of Cancer Precision Medicine Literature Articles.In Submission
癌症弗拉梅内特 一种自然语言处理(NLP)信息提取工具,用于基于临床文本编写的基于框架的癌症信息。
  1. Faculty/PI Name:柯克·罗伯茨(Kirk Roberts)
  2. Publications:
    • Roberts K, Si Y, Gandhi A, Bernstam EV. (2018). A FrameNet for Cancer Information in Clinical Narratives: Schema and Annotation. Proceedings of the Language Resources and Evaluation Conference.
    • Si Y, Roberts K. A Frame-Based NLP System for Cancer-Related Information Extraction. In Submission.
Epiphanet EpiphaNet is an interactive knowledge discovery system, which enables researchers to explore visually sets of relations extracted from MEDLINE using a combination of language processing techniques.
  1. Faculty/PI Name: Trevor Cohen
  2. Publications:
    • Cohen, T., Whitfield, G. K., Schvaneveldt, R. W., Mukund, K., & Rindflesch, T. (2010). EpiphaNet: An Interactive Tool to Support Biomedical Discoveries. Journal of Biomedical Discovery and Collaboration, 5, 21–49.
夹具:临床语言注释,建模和处理工具包 CLAMP is a comprehensive clinical Natural Language Processing (NLP) software that enables recognition and automatic encoding of clinical information in narrative patient reports.
  1. Faculty/PI Name:Hua Xu
  2. Publications:
    • Ergin Soysal, Jingqi Wang, Min Jiang, Yonghui Wu, Serguei Pakhomov, Hongfang Liu, Hua Xu. CLAMP – a toolkit for efficiently building customized clinical natural language processing pipelines. JAMIA, Doi: 10.1093/jamia/
    • Jun Xu*, Hee-Jin Lee*, Zongcheng Ji*, Jingqi Wang, Qiang Wei, and Hua Xu. UTH_CCB System for Adverse Drug Reaction Extraction from Drug Labels at TAC-ADR 2017. Proceedings of TAC, 2017. (* denotes equal contribution)
  3. Website Link:https://clamp.uth.edu/
SBMI Data Service SBMI数据服务为合格客户提供有关学校可用的健康数据集的技术援助和咨询服务。
R, SAS We developed computer codes using R and SAS for various projects in cancer research and other fields.
  1. Faculty/PI Name:Dejian Lai
  2. Publications:

A. Recent Projects in Cancer Research:

  1. Early Life Exposures to Air Toxics and Risk of Early Childhood Leukemia (2018)
  2. SPATIAL ANALYSIS OF AMBIENT BENZENE AND CANCER INCIDENCE RATES IN TEXAS (student thesis, graduated in 2017).
  3. Longitudinal Study of Melatonin, Cortisol and risk of Colorectal Cancer (proposal submitted in 2017)
  4. Hazardous Air Pollutants and Lymphohematopoietic Cancer Incidence in Houston (NIH funded project).
  5. 使用带有SAR和CAR结构的空间线性模型检查得克萨斯州的肺癌的发病率(Student Consitis,毕业于2017年)

B.过去五年中一些与癌症相关的出版物:

  1. Symanski E, Lewis PGT, Chen TY, Chan W, Lai D, Ma XM: Air Toxics and Early Childhood Acute Lymphocytic Leukemia in Texas, A Population Based Case Control Study. 2016, Environmental Health: A Global Access Science Source. 2016, Vol. 15, No. 70
  2. Tong L, Ahn C, Symanski E, Lai D, Du XL. Relative Impact of Earlier Diagnosis and Improved Treatment on Survival for Colorectal Cancer: A US database Study among Elderly Patients. Cancer Epidemiology. 2014, Vol. 38, 733-740.
  3. Tong LY, Ahn C, Symanski E, Lai D, Du XL: Temporal Trends in the Leading Causes of Death among a Large National Cohort of Patients with Colorectal Cancer from 1975 to 2009 in the United States. Annals of Epidemiology. 2014, Vol. 24, 411-417
  4. Tong LY, Ahn C, Symanski E, Lai D, Du XL: Effects of Newly Developed Chemotherapy Regimens, Comorbidities, Chemotherapy-related Toxicity on the Changing Patterns of the Leading Causes of Death in Elderly Patients with Colorectal Cancer. Annals of Oncology. 2014, Vol. 25, 1234-1242.
  5. Wang GD, Lai D, Burau K, Du XL: Potential Gains in Life Expectancy from Reducing Heart Disease, Cancer, Alzheimer's Disease, Kidney Disease or HIV/AIDS as Major Causes of Death in the USA. Public Health. 2013, Vol. 127, 348-356.
Machine Learning tools for Longitudinal Brain Connectivity 识别大脑中神经可塑性模式的方法对于理解和潜在治疗疾病至关重要。扩散张量成像(DTI)允许对大脑内部的结构连接组进行体内估计,并可以在出现临床症状之前量化退行过程。我们已经开发了基于机器学习的新型策略,以计算纵向结构连接组,以发现和定量纵向模式。
  1. Faculty/PI Name:Luca Giancardo
  2. Publications:
    • Giancardo,L。*,Ellmore,T。M.,Suescun,J.,Ocasio,L.,Kamali,A.,Riascos-Castaneda,R。&Schiess,M。C.基于纵向连接组的预测模型,用于REM睡眠行为障碍,从结构性脑连接性进行REM睡眠行为障碍。进行Spie Med。成像(2018).

Luca Machine Learning tools
Machine learning-based image (and video) segmentation We develop and adapt pipelines for image and video segmentation. These pipelines can be adapted to the multiple use cases by leveraging machine learning techniques that learn by examples. These tools allow for high throughput analysis quantitative analysis of large dataset. We have experience with optical images, MRI and videos.
  1. Faculty/PI Name:Luca Giancardo
  2. Publications:

Here are some examples of new image segmentation pipelines developed


Luca Machine learning-based image
Image/signal based computational biomarker development. 使用现代的机器学习方法,我们可以从非结构化图像或时间信号数据中发现数据模式,以开发新型的计算生物标志物来生成假设或预测结果。
  1. Faculty/PI Name:Luca Giancardo
  2. Publications:
    • L Giancardo*,K Roberts和Z Zhao,“视网膜脉管系统嵌入的代表性学习”。胎儿,婴儿和眼科医学图像分析。FIFI 2017,OMIA,2017年。计算机科学的讲义,第10554卷。Springer,Cham,2017年。
    • T Arroyo-Gallego, M Ledesma-Carbayo, A Sanchez-Ferro, I Butterworth, C Mendoza, M Matarazzo, P Montero, R Lopez-Blanco, V Puertas-Martin, R Trincado and L Giancardo*Detection of Motor Impairment in Parkinson’s Disease via Mobile Touchscreen Typing. IEEE Transaction on Biomedical Engineering,,,,64, 1994–2002, 2017.
    • L Giancardo*, A Sanchez-Ferro, T. Arroyo-Gallego, I. Butterworth, C.S. Mendoza, P. Montero, M. Matarazzo, A. Obeso, M. L. Gray and San José Estepar, “Computer keyboard interaction as an indicator of early Parkinson's disease”, Scientific Reports, 6(34468), 2016.
    • L Giancardo*, A Sanchez-Ferro, I Butterworth, C Sanchez-Mendoza and J M Hooker, “Psychomotor Impairment Detection via Finger Interactions with a Computer Keyboard”, Scientific Reports, 5(9678), 2015.
卢卡现代机器学习
Genome3D项目 Genome3D是一个模型视图框架,用于在人类基因组的三维物理模型中显示基因组和表观基因组数据。
  1. Faculty/PI Name:Jim Zheng
  2. Publications:
    • Asbury,T.,Mitman,M.,Tang,J。,&Zheng,W。(2010)。Genome3D:一种用于在三维基因组中整合和可视化多尺度表观基因组信息的观众模型框架。BMC生物信息学,11(1),444。http://dx.doi.org/10.1186/1471-2105-11-444
  3. Website Link:http://www.genome3d.org/
本体学指纹 基因或疾病的本体指纹是与基因或疾病相关的PubMed摘要中的一组基因本体术语,以及相应的富集p值。
  1. Faculty/PI Name:Jim Zheng
  2. Publications:
    • Tsoi,L.,Boehnke,M.,Klein,R。,&Zheng,W。(2009)。通过本体学指纹的开发评估全基因组关联研究结果。生物信息学,25(10),1314-1320。http://dx.doi.org/10.1093/bioinformatics/btp158
    • Qin, T., Matmati, N., Tsoi, L., Mohanty, B., Gao, N., & Tang, J. et al. (2014). Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network. Nucleic Acids Research, 42(18), e138-e138. http://dx.doi.org/10.1093/nar/gku678
  3. Website Link:http://www.ontologyfingerprint.org/
迅速的 迅速的is an ultra-fast tool for the identification of identity-by-descent segments among genotyped individuals.
  1. Faculty/PI Name:Degui Zhi
  2. Publications:
    • Naseri, A., Liu, X., Zhang, S., & Zhi, D. (2017). Ultra-fast Identity by Descent Detection in Biobank-Scale Cohorts using Positional Burrows-Wheeler Transform.http://dx.doi.org/10.1101/103325
  3. Website Link:https://github.com/ZhiGroup/RaPID
hapseq2 hapseq2is a program for genotyping calling and haplotype phasing from next generation sequencing data using haplotype information from jumping reads.
  1. Faculty/PI Name:Degui Zhi
  2. Publications:
  3. Website Link:https://github.com/zhigroup/hapseq2
Baidu