- Objective To directly compare the performance and externally validate the three most studied prediction tools for osteoporotic fractures—QFracture, FRAX, and Garvan—using data from electronic health records. Design Retrospective cohort study. Setting Payer provider healthcare organisation in Israel. Participants 1 054 815 members aged 50 to 90 years for comparison between tools and cohorts of different age ranges, corresponding to those in each tools’ development study, for tool specific external validation. Main outcome measure First diagnosis of a major osteoporotic fracture (for QFracture and FRAX tools) and hip fractures (for all three tools) recorded in electronic health records from 2010 to 2014. Observed fracture rates were compared to probabilities predicted retrospectively as of 2010. Results The observed five year hip fracture rate was 2.7% and the rate for major osteoporotic fractures was 7.7%. The areas under the receiver operating curve (AUC) for hip fracture prediction were 82.7% for QFracture, 81.5% for FRAX, and 77.8% for Garvan. For major osteoporotic fractures, AUCs were 71.2% for QFracture and 71.4% for FRAX. All the tools underestimated the fracture risk, but the average observed to predicted ratios and the calibration slopes of FRAX were closest to 1. Tool specific validation analyses yielded hip fracture prediction AUCs of 88.0% for QFracture (among those aged 30-100 years), 81.5% for FRAX (50-90 years), and 71.2% for Garvan (60-95 years). Conclusions Both QFracture and FRAX had high discriminatory power for hip fracture prediction, with QFracture performing slightly better. This performance gap was more pronounced in previous studies, likely because of broader age inclusion criteria for QFracture validations. The simpler FRAX performed almost as well as QFracture for hip fracture prediction, and may have advantages if some of the input data required for QFracture are not available. However, both tools require calibration before implementation.