2023-06-05 ワシントン州立大学(WSU)
◆特に小麦の研究では、複数のゲノム配列や遺伝子の位置のバリエーションを扱うことが課題でしたが、BRIDGEcerealはその問題を解決しました。このウェブアプリは、他の穀物作物にも適用可能であり、作物改良に革新をもたらす可能性があります。
<関連情報>
- https://news.wsu.edu/news/2023/06/05/self-teaching-web-app-improves-speed-accuracy-of-classifying-cereal-variety-dna-variations/
- https://www.cell.com/molecular-plant/fulltext/S1674-2052(23)00139-9
教師なし機械学習を効率化し、パンゲノムからインデルベースのハプロタイプを調査・グラフ化する
Streamline unsupervised machine learning to survey and graph indel-based haplotypes from pan-genomes
Bosen Zhang,Haiyan Huang,Laura E. Tibbs-Cortes,Adam Vanous,Zhiwu Zhang,Karen Sanguinet,Kimberly A. Garland-Campbell,Jianming Yu,Xianran Li
Molecular Plant Published:May 17, 2023
DOI:https://doi.org/10.1016/j.molp.2023.05.005
Dear Editor,
Pan-genomes with high quality de novo assemblies are shifting the paradigm of biology research in genome evolution, speciation, and function annotation (Shi et al., 2023). An all-vs.-all comparison across assemblies potentially overcomes the limitation of mapping short reads to a single assembly in cataloging polymorphisms, especially large insertions and deletions (indels) contributing to phenotypic variations through altering gene structure or expression (Chen et al., 2021). However, for specific genes, surveying and graphing large indels across assemblies are challenging and painstaking tasks (Mahmoud et al., 2019). Here, we constructed an interactive webapp, BRIDGEcereal (https://bridgecereal.scinet.usda.gov/), to expedite this process through streamlining unsupervised learning.
A large indel is flanked by two high-scoring segment pairs (HSPs). We devised two unsupervised machine learning algorithms to identify large indels (Figure 1A). The first algorithm, clustering HSPs for ortholog identification via coordinates and equivalence (CHOICE; Figure 1B), identifies and extracts the segment harboring the ortholog from each assembly. The segments are then subjected to an all-vs.-all comparison to survey potential large indels. The second algorithm, clustering via large-indel permuted slopes (CLIPS; Figure 1C and Supplemental Figure 1), groups segments sharing the same set of indels to graph a concise haplotype depiction for visualizing potential large indels, their impacts on the gene, and relationships among haplotypes. For indels outside of genes, because of unknown sizes and locations, multiple iterations may be needed to obtain the optimal haplotype graph by probing different up- and down-stream search boundaries and the order of haplotypes (Supplemental Figure 2). Through the interactive graph user interface of BRIDGEcereal, these parameters can be instantly adjusted based on the visual inspection.