In the projects I audited over the past couple of years, the main reason for pages not being indexed in the vast majority of cases was weak content quality. By weak quality, I don’t mean only non-unique or templated content, but also content that was fairly well written yet still unoriginal and therefore considered low-value by Google.
As a result, such sites lost indexation for many of their pages, and some even received algorithmic filters (including the Helpful Content Update filter). The second most common reason was technical issues on websites. These ranged from simple accidental blocks, such as restrictions in robots.txt, to page duplication, both literal duplication (for example in Wordpress), and intent-based duplication, where a website had more than one relevant page targeting the same search query. In such cases, Google might consider only one page worthy of indexation, while marking the other as crawled but currently not indexed.
There were also rare, unusual situations where previously indexed pages suddenly dropped out of the index. After a deeper review, we found that those pages had simply been stolen by other websites.. the content was copied and distributed across other platforms, which over time caused Google to view the original pages as lower quality, less fresh, and no longer unique. As a result, they were eventually removed from the index.