Chloroplast genome sequencing and features of cyclamen species
Nine complete chloroplast genome sequences represent five species were deposited in GenBank with the accession numbers: ON480518, ON480519, ON480520, ON480521, ON480522, ON480523, OP957067, OP957068 and OP957069. The assembly results were uniform and the validating of the assembly results by mapping reads to the assembled sequence were showed in supplementary materials (Supplementary Fig. S1). The total chloroplast genome size ranged from 151,626 bp (C. coum) to 153,058 bp (C. hederifolium with white green septal striped leaves) (Fig. 1). The Cyclamen chloroplast genome has a typical quadripartite structure and includes a pair of IR regions (25,321–25,480 bp), LSC regions (82,653–83,976 bp), and SSC regions (18,182–18,465 bp) (Fig. 2).
Long repeat and SSR analysis
Repeat sequences in Cyclamen cp genomes were detected with REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer), with a copy size of 30 bp or longer. 38 long repeats consisting of 16 forward repeats and 22 palindromic repeats were detected in C. coum while 27 long repeats consisting of 11 forward repeats, 15 palindromic repeats, and 1 reverse repeat were detected in C. hederifolium (Fig. 3). Five C. hederifolium samples shared the same SSR features, but there is some variation between the five Cyclamen species in SSR number and type (Table 1).
The chloroplast genome of Cyclamen has 128 genes, including 84 protein-coding genes, 36 transfer RNA genes, and eight ribosomal RNA genes. Six protein-coding genes (rps7, rps12, rpl2, rpl23, ndhB, and ycf2), seven tRNA genes (trnA-UGC, trnI-GAU, trnI-CAU, trnL-CAA, trnN-GUU, trnR-ACG, and trnV-GAC) and all four rRNA genes are duplicated in the IR regions. Fourteen genes (trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC, rps12, rps16, rpl2, rpl16, rpoC1, petB, petD, atpF, ndhA, and ndhB) contain a single intron and two genes (clpP and ycf3) have two introns (Table 2).
Phylogenetic analysis reveals inter- and intraspecific variation
A total of 28 Myrsinoideae and 9 related cp genomes were included in the phylogenetic analysis. The molecular phylogenetic tree showed the Cyclamen is monophyletic, located in Myrsinoideae and is closely related to the genera Lysimachia and Glaux (Fig. 4). Ten genera in Myrsinoideae formed four clades, one on them includes Aegiceras and Myrsine. The second consists of Embellia, Ardisia, Tapeinosperma, and Elingamita. Cyclamen represents one clade and the fourth clade includes Lysimachia and Glaux. In particular, Lysimachia and Glaux are not reciprocal monophyly: Glaux is embedded in Lysimachia which indicates that accurate classification of these two genera requires further study.
We analyzed the nucleotide diversity (Pi) values to measure the divergence levels of protein-coding genes and intergenic regions of the five Cyclamen species. The level of sequence divergence among protein-coding genes was more conserved than in intergenic regions. The Pi value was from 0 to 0.02222 in protein-coding genes while it ranged from 0 to 0.10925 in intergenic regions (Fig. 5). Three genes and four intergenic regions were selected for interspecific relationship analysis due to their relatively high Pi value and potential success as PCR primers (Table 3).
The phylogenetic analysis of the 19 samples represent 14 species was constructed based on three genes (Ycf1 was divided into three fragments as it is too long to amplify and sequence) and four intergenic regions. Different samples of same species like C. hederifolium, C. cyprium, C. coum and C. rohlfsianum formed separated branches indicating the effectiveness of these genes in phylogenetic construction. 14 species in Cyclamen genus formed five clades, one of which includes C. hederifolium, C. colchium and C. purpurascens. The second clade is comprised of C. cyprium, C. pseudibericum, C. coum, C. intaminatum, C. alpinum and C. mirabile. The third clade consisted of C. creticum and C. balearicum. The fourth clade includes C. rohlfsianum and C. persicum. The species C. graecum formed the fifth clade (Fig. 6).
The gene rpl22 has high interspecific variation and low intraspecific variation in Cyclamen. The amplified length of this gene is 448 bp, with 38 basepair variations within the genus. This gene can identify species with a single DNA sequence, which is simple, efficient, and low cost (Supplementary Fig. S2).