In total, the small RNA-seq yielded 552.5 million raw reads for the 228 sequenced samples, ranging from 1.1 to 6.6 million per sample. Preprocessing decreased the amount of reads to 451.9 million and the per-sample range to 0.84–4.2 million. A total of 391.0 million reads aligned to the human genome, with the per-sample range from 0.73 to 3.9 million. Finally, a total of 13.3 million reads counted to all 5331 sncRNAs, or 6.6 million to those 4799 short RNAs that did not overlap with an exon of a non-short RNA, of which 4.72, 1.67, and 0.068 million represented the main sncRNA classes (gene_types), snoRNA, miscRNA, and snRNA, respectively. Vst-normalized expression for those 1949 genes that had any reads in any sample and key clinical parameters for all 228 samples are available as Supplementary Table S2. Since the sequencing was performed as 40nt reads, the reads represent fragments of sncRNAs instead of the whole-length RNAs. The RNA-seq measurements for select top DE small RNAs, detailed in subsequent sections, were validated using comparison to independent RNA-seq data and/or by RT-qPCR. As shown in the violin plot in Supplementary Fig. S1A for 6 small RNAs, both the means and variances of expression in invasive BC tumors corresponded very well between the KBCP (n = 186) and the independent ILRS (n = 40) materials. Furthermore, the expression measured by RT-qPCR of four of the above genes, SNORA80E (ENSG00000207475.1), SNORD103B (NCBI: 692235), SNORD59A (ENSG00000207031.1), and SNORD104 (ENSG00000199753.1) correlated statistically significantly with the respective small RNA‐seq measurements (Supplementary Fig. S1B).
The preliminary view using the UMAP clustering of all 228 samples suggests that differentially expressed sncRNAs might be present for most clinical variables, but not all (Supplementary Fig. S2). For example, samples with the same clinical characteristic appear to group together for the main and more fine-grained sample subtype, tumor grade, and ER and PR status. However, such grouping appears to be absent in the case of nodal status.
sncRNAs distinguish ilBC from benign breast tissue and from metastasized BC
Altogether 63 sncRNAs were significantly (Padj < 0.01) upregulated and 33 sncRNAs downregulated in ilBC compared to benign breast tissue (Fig. 1, Supplementary Table S3).
Ninety-six sncRNAs were found to significantly (Padj < 0.01) distinguish invasive local BC from benign breast tissue. The hierarchical clustering of 186 invasive local BC and 22 benign breast tissue samples (columns) and differentially expressed sncRNAs (rows) using Pearson metrics. Log2 fold change is marked by the color scale (from blue to red).
A significant differential expression (Padj < 0.01) was also observed between metastasized BC and ilBC for five sncRNAs; two sncRNAs were upregulated and three sncRNAs downregulated in metastasized BC (M1 at diagnosis) when compared to ilBC (Supplementary Table S4).
The hormone receptor status of the tumors associates with sncRNA expression
Altogether, 47 sncRNAs were significantly DE (Padj < 0.01) in the comparisons with the hormone receptor status of the tumors.
More specifically, 26 sncRNAs were upregulated in ER negative tumors (ilBC), whereas 18 sncRNAs were upregulated in ER positive tumors (ilBC) (Supplementary Table S5, Supplementary Fig. S3A).
Additionally, 11 sncRNAs were upregulated in PR negative tumors (ilBC) and three in PR positive tumors (Supplementary Table S6, Supplementary Fig. S3A). Except for two snoRNAs (SNORA2C and AL732366.1), all the sncRNAs that associated with the PR status of the tumors associated also with the ER status of the tumors (Supplementary Table S5, Supplementary Fig. S3A).
Only one sncRNA (SNORD124) associated with HER2 positive tumors in the HER2 negative versus HER2 positive tumors comparison (P = 2.77e−06, Padj = 2.95e−03, Log2FC = − 1.020).
Thirteen sncRNAs show prominent association with the tumor grade independently of the tumors’ ER status
Thirty-nine sncRNAs in total associated (Padj < 0.01) with the tumor grade in ilBC cases (with or without adjusting for tumor ER status); 24 sncRNAs were upregulated and 15 sncRNAs were downregulated in grade III tumors compared to grade I tumors and/or grade II tumors (Supplementary Table S7, Fig. 2A,B). Of these 39 associations, 17 were shared in both comparisons (gr. III vs. gr. II, and gr. III vs. gr. I) when not adjusted for ER status (Fig. 2C). Thirteen sncRNAs associated with the tumor grade independently of the tumors’ ER status (Supplementary Table S7).
Thirty-nine sncRNAs in total associated (Padj < 0.01) with the tumor grade in ilBC cases; (A) and (B) twenty-four sncRNAs were upregulated, and 15 sncRNAs were downregulated in grade III tumors compared to grade I tumors and/or grade II tumors regardless of tumor ER status, (C) seventeen were shared in both gr. III versus gr. II, and gr. III versus gr. I when not adjusted for ER status, and (D) thirteen sncRNAs were significantly associated (Padj < 0.01) with the tumor grade independently of the ER status. (E) The hierarchical clustering of 186 invasive local BC samples (grade I n = 31, grade II n = 89, grade III n = 66) (columns) and differentially expressed sncRNAs (rows) using Euclidean metrics. Log2 fold change is marked by the color scale (from blue to red).
To summarize, the tumor grade-associated sncRNAs included five sncRNAs that were downregulated and eight sncRNAs that were upregulated in the advanced grade III tumors independently of the tumors’ ER status (Fig. 2D,E).
SNORD99 and VTRNA1-1 associate with ductal tumor histology
SNORD99 and VTRNA1-1 showed association (Padj < 0.01) with ductal carcinoma in the comparison of lobular carcinoma versus ductal carcinoma (ilBC) (P = 1.72e − 04, Padj = 9.72e − 03, Log2FC = − 0.561 and P = 1.50e − 05, Padj = 1.69e − 03, Log2FC = − 0.745, respectively). The associations were observed also in the analysis adjusted by ER status, even though they reached the statistical significance only at the level of Padj < 0.05 (P = 3.51e − 04, Padj = 2.73e − 02, Log2FC = − 0.520 and P = 7.12e − 04, Padj = 2.73e − 02, Log2FC = − 0.552, respectively).
A profile of sncRNAs define the tumors of luminal and TNBC subtype
Altogether 23 sncRNAs were significantly (Padj < 0.01) upregulated in TNBC compared to luminal BC (ilBC), whereas 26 sncRNAs were significantly upregulated in luminal BC compared to TNBC (ilBC) (Fig. 3, Supplementary Table S8, Supplementary Fig. S3B).
Forty-nine sncRNAs were found as characteristic (Padj < 0.01) to TNBC or luminal BC. The hierarchical clustering of 33 TNBC and 133 luminal BC samples (invasive local disease) (columns) and differentially expressed sncRNAs (rows) using Pearson metrics. Log2 fold change is marked by the color scale (from blue to red).
Nine of the 23 TNBC-associated sncRNAs [SNORD111, SNORD92, SNORD72, RNU2-36P, RNU5D-1, RNU5F-1, RNY4, VTRNA1-1, and ENSG00000201548.1 (Y_RNA)] were also upregulated in ER negative and in PR negative tumors compared to ER positive and PR positive tumors, respectively (Table 1, Supplementary Figs. S3B, S4–S12). All but RNY4 significantly associated also with higher grade tumors, of which RNU2-36P and RNU5D-1 independently of the ER status (Supplementary Figs. S4–S9, S11, S12). Notably, the expression levels of SNORD111, RNU5D-1, RNU5F-1, and ENSG00000201548.1 were low (Supplementary Table S2).
Additionally, the luminal BC-associated SNORA11 and SNORD104 were also upregulated in ER positive and in PR positive tumors when compared to ER negative and PR negative tumors, respectively (Table 1, Supplementary Figs. S3B, S13 and S14). Both of them significantly associated also with lower grade but only SNORA11 independently of the ER status (Supplementary Figs. S13, S14).
sncRNAs identified as candidate prognostic markers for BC; a group of them specifically for ER positive BC
Altogether 42 sncRNAs were identified as possible prognostic markers for ilBC. This group includes sncRNAs that associated with patient outcome (Overall P < 0.05 and Q4 P < 0.05) only in the analyses that were not restricted to any specific therapy groups, i.e. in analyses including all invasive cases, ER positive cases or ER negative cases separately, and/or in the cases who had received only surgery (all invasive cases who received only surgery, and ER positive cases with only surgery). Also, sncRNAs that associated with patient outcome only in the forementioned groups and in the cases who had received RT were included in these 42 sncRNAs (Supplementary Tables S9–S16, Supplementary Fig. S15A).
Among the 42 sncRNAs, 23 associated with BC prognosis independently of the ER status of the tumors; the higher level of eight, four, and seven sncRNAs associated with poorer RFS, BCSS, and OS, respectively, whereas better RFS, BCSS, and OS were associated with the higher level of seven, six, and three sncRNAs, respectively. In many of the multivariate survival analyses also other (established) prognostic factors, including, typically, the nodal status, age at diagnosis, and tumor size, were significant. However, in some of the analyses the sncRNA was more significant than e.g. the forementioned. For example, longer RFS was observed with the patients with increased SNORD6 (ENSG00000202314.1) expression in all cases with ilBC in the multivariate analysis, while nodal status was a more significant factor than SNORD6 in the analysis (Fig. 4a, univariate analysis in Fig. 4b). In all ER positive ilBC cases SNORD6 was more significant than nodal status and age at diagnosis (RFS) in the multivariate analysis (Fig. 4c, univariate analysis in Fig. 4d), and in ER positive ilBC cases with only surgery SNORD6 alone significantly associated with RFS (Supplementary Fig. S15B).
SNORD6 was identified as a prognostic marker in invasive local BC. (a) The highest quartile (Q4) of SNORD6 significantly associated with better RFS (Overall Padj = 0.1010, Overall P = 0.0486, for Q4 P = 0.0189, HR [CI 95%] = 0.43 [0.22–0.87]), when compared to the lowest quartile (Q1) in all cases with invasive local BC (n = 174) in the Cox multivariate survival analysis including the covariates tumor grade, tumor histology, tumor size, nodal status, ER status, PR status, HER2 status, age at diagnosis, and the treatment parameters radiotherapy (RT) (yes/no), adjuvant chemotherapy (CT) (yes/no), and adjuvant endocrine therapy (ET) (yes/no). Of the covariates, nodal status significantly associated with RFS in the multivariate analysis (Overall P = 5.405e − 05, for node positivity P = 0.0009, HR [CI 95%] = 2.17 [1.37–3.43). (b) Kaplan–Meier plot showing the association of SNORD6 with RFS in the univariate analysis (Overall Log Rank P = 0.0105) in all cases with invasive local BC (n = 186). (c) The highest quartile (Q4) of SNORD6 significantly associated with better RFS (Overall Padj = 0.0182, Overall P = 0.0009, for Q4 P = 0.0125, HR [CI 95%] = 0.24 [0.08–0.74]) also in ER positive cases (invasive local, n = 123) in the Cox multivariate analysis including covariates tumor grade, histology and size, nodal status, PR status, HER2 status, age at diagnosis, and the treatment parameters RT (yes/no), CT (yes/no), and adjuvant ET (yes/no). Of the covariates nodal status (Overall P = 0.0108, for node positivity P = 0.0027, HR [CI 95%] = 2.47 [1.37–4.46]), and age at diagnosis (Overall P = 0.0254, for age class ≤ 39 P = 0.0015, HR [CI 95%] = 4.51 [1.78–11.42]) significantly associated with RFS in the multivariate analysis. (d) Kaplan–Meier plot showing the association of SNORD6 with RFS in the univariate analysis (Overall Log Rank P = 0.0007) in the cases with invasive local, ER positive BC (n = 133). In (a) and (c), the fitted Ns were extrapolated from the multivariate-fitted survival probabilities.
Of the 42 sncRNAs, 18 sncRNAs were found as candidate prognostic markers for ER positive ilBC as they associated with patient outcome only in the forementioned groups restricted to cases with ER positive ilBC (all ER positive cases, ER positive cases with only surgery, and/or RT-treated ER positive cases) (Supplementary Tables S10, S13, S15, Supplementary Fig. S15C). The higher level of seven sncRNAs associated with poorer RFS, and five with poorer BCSS. For example, SCARNA5 associated with poorer RFS only in all ER positive cases (i.e. when the analyses were not restricted to a specific treatment group), being a more significant factor in the Cox analysis (Overall P) than nodal status or age at diagnosis (Fig. 5a, univariate analysis in Fig. 5b). Similar association remained also when the SCARNA5 expression level was divided into two groups by median (Fig. 5c,d). The higher level of another seven, two, and one sncRNAs in turn associated with better RFS, BCSS, and OS.
SCARNA5 showed prognostic potential in ER positive invasive local BC. (a) The highest quartile (Q4) of SCARNA5 significantly associated with poorer RFS (Overall Padj = 0.0161, Overall P = 0.0006, for Q4 P = 0.0462, HR [CI 95%] = 3.04 [1.02–9.08]), when compared to the lowest quartile (Q1) in ER positive cases (invasive local, n = 123) in the Cox multivariate analysis including the covariates tumor grade, histology and size, nodal status, PR status, HER2 status, age at diagnosis, and the treatment parameters RT (yes/no), CT (yes/no), and adjuvant ET (yes/no). Of the covariates nodal status (Overall P = 0.0096, for node positivity P = 0.0063, HR [CI 95%] = 2.35 [1.27–4.33]), and age at diagnosis (Overall P = 0.015, for age class ≤ 39 P = 0.0013, HR [CI 95%] = 4.68 [1.85–12.02]) significantly associated with RFS in the multivariate analysis. (b) Kaplan–Meier plot showing the association of SCARNA5 with RFS in the univariate analysis (Overall Log Rank P = 0.0071) in the invasive local, ER positive cases (n = 133). (c) The higher level of SCARNA5 associated with poorer RFS (Overall Padj = 0.0051, Overall P = 3.86e − 05, for higher half P = 0.0027, HR [CI 95%] = 2.79 [1.43–5.44]) in the cases with invasive local, ER positive BC also when the expression level was divided into two groups according to median in the Cox multivariate analysis including the covariates tumor grade, histology and size, nodal status, PR status, HER2 status, age at diagnosis, and the treatment parameters RT (yes/no), CT (yes/no), and adjuvant ET (yes/no). Also nodal status (Overall P = 0.0086, for node positivity P = 0.0061, HR [CI 95%] = 2.33 [1.27–4.28]), and age at diagnosis (Overall P = 0.014, for age class ≤ 39 P = 0.0010, HR [CI 95%] = 4.64 [1.86–11.61]) remained significant. (d) Kaplan–Meier plot showing the association of SCARNA5 with RFS according to median in the univariate analysis (Overall Log Rank P = 0.0010) in the cases with ER positive, invasive local BC (n = 133). In (a) and (c), the fitted Ns were extrapolated from the multivariate-fitted survival probabilities.
Only one sncRNA (Y_RNA, ENSG00000199801.1) associated with patient outcome (poorer BCSS) only in the ER negative cases (Supplementary Table S11), the highest expression quartal having a stronger predictive effect in the Cox model (Overall P = 0.0130, for Q4 P = 0.0004, HR [CI 95%] = 8.12 [2.56–25.78]) than nodal status (Overall P = 0.0050, for node positivity P = 0.0053, HR [CI 95%] = 4.35 [1.55–12.26]). Tumor histology also remained statistically significant in the Cox analysis (Overall P = 0.0190, for lobular histology P = 0.0010, HR [CI 95%] = 51.04 [4.89–532.38], for ductal histology P = 0.0059, HR [CI 95%] = 12.70 [2.08–77.53]).
Several sncRNAs predict response to RT
Altogether 20 sncRNAs were identified to be predictive for RT response as they associated with patient outcome (Overall P < 0.05 and Q4 P < 0.05) only in the analyses including cases who had received RT (all, ER positive, or ER negative) or in addition in the analyses that were not restricted to any specific treatment group (all, ER positive, ER negative) (Supplementary Tables S14–S16, S9–S11, Supplementary Fig. S16). The higher level of six sncRNAs associated with poorer RFS, six with poorer BCSS, and seven with poorer OS. The higher level of four sncRNAs in turn associated with better RFS, six with better BCSS and five with better OS. For example, poorer OS was observed with the RT-treated cases with increased SNORD60 expression (Fig. 6a for multivariate and Fig. 6b for univariate analysis), whereas the increased SNORD67 expression associated with better BCSS in the ER positive RT-treated cases (Fig. 6c for multivariate and Fig. 6d for univariate analysis).
SNORD60 and SNORD67 were identified as candidates for predictive markers for RT in invasive local BC. (a) The higher quartiles (Q3 and Q4) of SNORD60 significantly associated with poorer OS in the Cox multivariate analysis including the covariates tumor grade, histology and size, nodal status, ER status, PR status, HER2 status, age at diagnosis, and the treatment parameters CT (yes/no), and adjuvant ET (yes/no) (Overall Padj = 0.1042, Overall P = 0.0289, for Q3 P = 0.0008, HR [CI 95%] = 3.90 [1.76–8.63], and for Q4 P = 0.0206, HR [CI 95%] = 2.48 [1.15–5.37]) in all cases with invasive local BC who had received RT (n = 91). (b) Kaplan–Meier plot showing the association of SNORD60 with OS in the univariate analysis (Overall Log Rank P = 0.0313) in all cases with invasive local BC who had received RT (n = 97). (c) The highest quartile (Q4) of SNORD67 significantly associated with better BCSS in the cases with ER positive BC (invasive local, n = 58) who had received RT (Overall Padj = 0.0941, Overall P = 0.0485, for Q4 P = 0.0132, HR [CI 95%] = 0.25 [0.08–0.75]; Cox multivariate analysis including the covariates tumor grade, histology and size, nodal status, PR status, HER2 status, age at diagnosis, and the treatment parameters CT (yes/no), and adjuvant ET (yes/no). (d) Kaplan–Meier plot showing the association of SNORD67 with BCSS in the univariate analysis (Overall Log Rank P = 0.0397) in cases with invasive local, ER positive BC who had received RT (n = 64). In (a) and (c), the fitted Ns were extrapolated from the multivariate-fitted survival probabilities.
Additionally, the higher level of SNORD109B associated with better BCSS in the ER positive cases who had received RT and with better OS in the tamoxifen-treated ER positive cases indicating that SNORD109B might influence both RT and tamoxifen response in ER positive ilBC cases (Supplementary Tables S15, S17).
A few sncRNAs show potential as predictive markers for tamoxifen response
Altogether eight sncRNAs associated with patient outcome (Overall P < 0.05 and Q4 P < 0.05) only in the ER positive cases who had received tamoxifen therapy suggesting they may influence tamoxifen response (Supplementary Table S17). The higher level of one sncRNA associated with poorer RFS, one with poorer OS, two with better RFS, and four with better OS. For example, the higher level of SNORA11 associated with better RFS (Fig. 7a for multivariate and Fig. 7b for univariate analysis), whereas the higher level of SCARNA11 associated with better OS (Fig. 7c for multivariate and Fig. 7d for univariate analysis). It should be taken into consideration that some of the tamoxifen response-associated sncRNAs were expressed at relatively low levels (Supplementary Table S2).
Examples of candidate sncRNAs predictive for tamoxifen response in invasive local BC. (a) The higher SNORA11 quartiles (Q3 and Q4) significantly associated with better RFS in the Cox multivariate analysis including the covariates tumor grade, histology and size, nodal status, PR status, HER2 status, age at diagnosis, and RT (yes/no) (Overall Padj = 0.0393, Overall P = 0.0104, for Q3 P = 0.0002, HR [CI 95%] = 0.05 [0.01–0.24], and for Q4 P = 1.8e − 08, HR [CI 95%] = 0.02 [0.01–0.08]), when compared to the lowest quartile (Q1) in the cases with ER positive BC who had received tamoxifen (invasive local, n = 31). (b) Kaplan–Meier plot showing the association of SNORA11 with RFS in the univariate analysis (Overall Log Rank P = 0.0755) in the tamoxifen-treated cases with invasive local, ER positive BC (n = 35). Note the low number of events in the Q1 group. (c) The higher SCARNA11 quartiles (Q3 and Q4) significantly associated with better OS in the Cox multivariate analysis including the covariates tumor grade, histology and size, nodal status, PR status, HER2 status, age at diagnosis, and RT (yes/no) (Overall Padj = 0.0471, Overall P = 0.0184, for Q3 P = 0.0021, HR [CI 95%] = 0.13 [0.04–0.48], and for Q4 P = 0.0023, HR [CI 95%] = 0.12 [0.03–0.47]), when compared to the lowest quartile (Q1) in the cases with ER positive BC who had received tamoxifen (invasive local, n = 31). (d) Kaplan–Meier plot showing the association of SCARNA11 with OS in the univariate analysis (Overall Log Rank P = 0.0163) in the tamoxifen-treated cases with local invasive, ER positive BC (n = 35). In (a) and (c), the fitted Ns were extrapolated from the multivariate-fitted survival probabilities. Note that the last time point in (a) is at 19.74 and in (c) 19.43 years.