Publication Library / Publications
A simulation study of sample size demonstrated the importance of the number of events per variable to develop prediction models in clustered data
Objective
This study aims to investigate the influence of the amount of clustering [intraclass correlation (ICC) = 0%, 5%, or 20%], the number of events per variable (EPV) or candidate predictor (EPV = 5, 10, 20, or 50), and backward variable selection on the performance of prediction models.
Study design and setting
Researchers frequently combine data from several centers to develop clinical prediction models. In our simulation study, we developed models from clustered training data using multilevel logistic regression and validated them in external data.
Conclusion
We recommend at least 10 EPV to fit prediction models in clustered data using logistic regression. Up to 50 EPV may be needed when variable selection is performed.
Authors
L Wynants, W Bouwmeester, K G Moons, M Moerbeek, D Timmerman, S Van Huffel, B Van Calster, Y Vergouwe
Journal
Journal of Clinical Epidemiology
Therapeutic Area
Other
Center of Excellence
Real-world Evidence & Data Analytics
Year
2015
Read full article