In total, the small RNA-seq yielded 552.5 million raw reads for the 228 sequenced samples, ranging from 1.1 to 6.6 million per sample. Preprocessing decreased the amount of reads to 451.9 million and the per-sample range to 0.84–4.2 million. A total of 391.0 million reads aligned to the human genome, with the per-sample range from 0.73 to 3.9 million. Finally, a total of 13.3 million reads counted to all 5331 sncRNAs, or 6.6 million to those 4799 short RNAs that did not overlap with an exon of a non-short RNA, of which 4.72, 1.67, and 0.068 million represented the main sncRNA classes (gene_types), snoRNA, miscRNA, and snRNA, respectively. Vst-normalized expression for those 1949 genes that had any reads in any sample and key clinical parameters for all 228 samples are available as Supplementary Table S2. Since the sequencing was performed as 40nt reads, the reads represent fragments of sncRNAs instead of the whole-length RNAs. The RNA-seq measurements for select top DE small RNAs, detailed in subsequent sections, were validated using comparison to independent RNA-seq data and/or by RT-qPCR. As shown in the violin plot in Supplementary Fig. S1A for 6 small RNAs, both the means and variances of expression in invasive BC tumors corresponded very well between the KBCP (n = 186) and the independent ILRS (n = 40) materials. Furthermore, the expression measured by RT-qPCR of four of the above genes, SNORA80E (ENSG00000207475.1), SNORD103B (NCBI: 692235), SNORD59A (ENSG00000207031.1), and SNORD104 (ENSG00000199753.1) correlated statistically significantly with the respective small RNA‐seq measurements (Supplementary Fig. S1B).
The preliminary view using the UMAP clustering of all 228 samples suggests that differentially expressed sncRNAs might be present for most clinical variables, but not all (Supplementary Fig. S2). For example, samples with the same clinical characteristic appear to group together for the main and more fine-grained sample subtype, tumor grade, and ER and PR status. However, such grouping appears to be absent in the case of nodal status.
sncRNAs distinguish ilBC from benign breast tissue and from metastasized BC
A significant differential expression (Padj < 0.01) was also observed between metastasized BC and ilBC for five sncRNAs; two sncRNAs were upregulated and three sncRNAs downregulated in metastasized BC (M1 at diagnosis) when compared to ilBC (Supplementary Table S4).
The hormone receptor status of the tumors associates with sncRNA expression
Altogether, 47 sncRNAs were significantly DE (Padj < 0.01) in the comparisons with the hormone receptor status of the tumors.
Additionally, 11 sncRNAs were upregulated in PR negative tumors (ilBC) and three in PR positive tumors (Supplementary Table S6, Supplementary Fig. S3A). Except for two snoRNAs (SNORA2C and AL732366.1), all the sncRNAs that associated with the PR status of the tumors associated also with the ER status of the tumors (Supplementary Table S5, Supplementary Fig. S3A).
Only one sncRNA (SNORD124) associated with HER2 positive tumors in the HER2 negative versus HER2 positive tumors comparison (P = 2.77e−06, Padj = 2.95e−03, Log2FC = − 1.020).
Thirteen sncRNAs show prominent association with the tumor grade independently of the tumors’ ER status
Thirty-nine sncRNAs in total associated (Padj < 0.01) with the tumor grade in ilBC cases (with or without adjusting for tumor ER status); 24 sncRNAs were upregulated and 15 sncRNAs were downregulated in grade III tumors compared to grade I tumors and/or grade II tumors (Supplementary Table S7, Fig. 2A,B). Of these 39 associations, 17 were shared in both comparisons (gr. III vs. gr. II, and gr. III vs. gr. I) when not adjusted for ER status (Fig. 2C). Thirteen sncRNAs associated with the tumor grade independently of the tumors’ ER status (Supplementary Table S7).
To summarize, the tumor grade-associated sncRNAs included five sncRNAs that were downregulated and eight sncRNAs that were upregulated in the advanced grade III tumors independently of the tumors’ ER status (Fig. 2D,E).
SNORD99 and VTRNA1-1 associate with ductal tumor histology
SNORD99 and VTRNA1-1 showed association (Padj < 0.01) with ductal carcinoma in the comparison of lobular carcinoma versus ductal carcinoma (ilBC) (P = 1.72e − 04, Padj = 9.72e − 03, Log2FC = − 0.561 and P = 1.50e − 05, Padj = 1.69e − 03, Log2FC = − 0.745, respectively). The associations were observed also in the analysis adjusted by ER status, even though they reached the statistical significance only at the level of Padj < 0.05 (P = 3.51e − 04, Padj = 2.73e − 02, Log2FC = − 0.520 and P = 7.12e − 04, Padj = 2.73e − 02, Log2FC = − 0.552, respectively).
A profile of sncRNAs define the tumors of luminal and TNBC subtype
Altogether 23 sncRNAs were significantly (Padj < 0.01) upregulated in TNBC compared to luminal BC (ilBC), whereas 26 sncRNAs were significantly upregulated in luminal BC compared to TNBC (ilBC) (Fig. 3, Supplementary Table S8, Supplementary Fig. S3B).
Nine of the 23 TNBC-associated sncRNAs [SNORD111, SNORD92, SNORD72, RNU2-36P, RNU5D-1, RNU5F-1, RNY4, VTRNA1-1, and ENSG00000201548.1 (Y_RNA)] were also upregulated in ER negative and in PR negative tumors compared to ER positive and PR positive tumors, respectively (Table 1, Supplementary Figs. S3B, S4–S12). All but RNY4 significantly associated also with higher grade tumors, of which RNU2-36P and RNU5D-1 independently of the ER status (Supplementary Figs. S4–S9, S11, S12). Notably, the expression levels of SNORD111, RNU5D-1, RNU5F-1, and ENSG00000201548.1 were low (Supplementary Table S2).
Additionally, the luminal BC-associated SNORA11 and SNORD104 were also upregulated in ER positive and in PR positive tumors when compared to ER negative and PR negative tumors, respectively (Table 1, Supplementary Figs. S3B, S13 and S14). Both of them significantly associated also with lower grade but only SNORA11 independently of the ER status (Supplementary Figs. S13, S14).
sncRNAs identified as candidate prognostic markers for BC; a group of them specifically for ER positive BC
Altogether 42 sncRNAs were identified as possible prognostic markers for ilBC. This group includes sncRNAs that associated with patient outcome (Overall P < 0.05 and Q4 P < 0.05) only in the analyses that were not restricted to any specific therapy groups, i.e. in analyses including all invasive cases, ER positive cases or ER negative cases separately, and/or in the cases who had received only surgery (all invasive cases who received only surgery, and ER positive cases with only surgery). Also, sncRNAs that associated with patient outcome only in the forementioned groups and in the cases who had received RT were included in these 42 sncRNAs (Supplementary Tables S9–S16, Supplementary Fig. S15A).
Among the 42 sncRNAs, 23 associated with BC prognosis independently of the ER status of the tumors; the higher level of eight, four, and seven sncRNAs associated with poorer RFS, BCSS, and OS, respectively, whereas better RFS, BCSS, and OS were associated with the higher level of seven, six, and three sncRNAs, respectively. In many of the multivariate survival analyses also other (established) prognostic factors, including, typically, the nodal status, age at diagnosis, and tumor size, were significant. However, in some of the analyses the sncRNA was more significant than e.g. the forementioned. For example, longer RFS was observed with the patients with increased SNORD6 (ENSG00000202314.1) expression in all cases with ilBC in the multivariate analysis, while nodal status was a more significant factor than SNORD6 in the analysis (Fig. 4a, univariate analysis in Fig. 4b). In all ER positive ilBC cases SNORD6 was more significant than nodal status and age at diagnosis (RFS) in the multivariate analysis (Fig. 4c, univariate analysis in Fig. 4d), and in ER positive ilBC cases with only surgery SNORD6 alone significantly associated with RFS (Supplementary Fig. S15B).
Of the 42 sncRNAs, 18 sncRNAs were found as candidate prognostic markers for ER positive ilBC as they associated with patient outcome only in the forementioned groups restricted to cases with ER positive ilBC (all ER positive cases, ER positive cases with only surgery, and/or RT-treated ER positive cases) (Supplementary Tables S10, S13, S15, Supplementary Fig. S15C). The higher level of seven sncRNAs associated with poorer RFS, and five with poorer BCSS. For example, SCARNA5 associated with poorer RFS only in all ER positive cases (i.e. when the analyses were not restricted to a specific treatment group), being a more significant factor in the Cox analysis (Overall P) than nodal status or age at diagnosis (Fig. 5a, univariate analysis in Fig. 5b). Similar association remained also when the SCARNA5 expression level was divided into two groups by median (Fig. 5c,d). The higher level of another seven, two, and one sncRNAs in turn associated with better RFS, BCSS, and OS.
Only one sncRNA (Y_RNA, ENSG00000199801.1) associated with patient outcome (poorer BCSS) only in the ER negative cases (Supplementary Table S11), the highest expression quartal having a stronger predictive effect in the Cox model (Overall P = 0.0130, for Q4 P = 0.0004, HR [CI 95%] = 8.12 [2.56–25.78]) than nodal status (Overall P = 0.0050, for node positivity P = 0.0053, HR [CI 95%] = 4.35 [1.55–12.26]). Tumor histology also remained statistically significant in the Cox analysis (Overall P = 0.0190, for lobular histology P = 0.0010, HR [CI 95%] = 51.04 [4.89–532.38], for ductal histology P = 0.0059, HR [CI 95%] = 12.70 [2.08–77.53]).
Several sncRNAs predict response to RT
Altogether 20 sncRNAs were identified to be predictive for RT response as they associated with patient outcome (Overall P < 0.05 and Q4 P < 0.05) only in the analyses including cases who had received RT (all, ER positive, or ER negative) or in addition in the analyses that were not restricted to any specific treatment group (all, ER positive, ER negative) (Supplementary Tables S14–S16, S9–S11, Supplementary Fig. S16). The higher level of six sncRNAs associated with poorer RFS, six with poorer BCSS, and seven with poorer OS. The higher level of four sncRNAs in turn associated with better RFS, six with better BCSS and five with better OS. For example, poorer OS was observed with the RT-treated cases with increased SNORD60 expression (Fig. 6a for multivariate and Fig. 6b for univariate analysis), whereas the increased SNORD67 expression associated with better BCSS in the ER positive RT-treated cases (Fig. 6c for multivariate and Fig. 6d for univariate analysis).
Additionally, the higher level of SNORD109B associated with better BCSS in the ER positive cases who had received RT and with better OS in the tamoxifen-treated ER positive cases indicating that SNORD109B might influence both RT and tamoxifen response in ER positive ilBC cases (Supplementary Tables S15, S17).
A few sncRNAs show potential as predictive markers for tamoxifen response
Altogether eight sncRNAs associated with patient outcome (Overall P < 0.05 and Q4 P < 0.05) only in the ER positive cases who had received tamoxifen therapy suggesting they may influence tamoxifen response (Supplementary Table S17). The higher level of one sncRNA associated with poorer RFS, one with poorer OS, two with better RFS, and four with better OS. For example, the higher level of SNORA11 associated with better RFS (Fig. 7a for multivariate and Fig. 7b for univariate analysis), whereas the higher level of SCARNA11 associated with better OS (Fig. 7c for multivariate and Fig. 7d for univariate analysis). It should be taken into consideration that some of the tamoxifen response-associated sncRNAs were expressed at relatively low levels (Supplementary Table S2).