Abstract: The vast majority of Philadelphia-negative myeloproliferative neoplasms share a narrow set of phenotypic driver mutations affecting the erythropoietin/thrombopoietin signalling pathways. Despite this, there is significant heterogeneity in disease phenotype at diagnosis, as well as in patient outcome with respect to thrombosis, disease progression and survival. Current risk stratification models are useful for predicting outcome and guiding treatment in those patients with MF. However, there remains significant heterogeneity within risk subgroups, and no models are available for identifying poor risk patients with chronic phase disease. We sequenced the full coding regions of 68 genes and genome-wide single nucleotide polymorphisms in 2041 patients (1326 essential thrombocytosis (ET), 355 polycythemia vera (PV) and 311 primary/post-ET/post-PV myelofibrosis (MF), 49 with other MPN diagnoses) to characterize the associations between somatic mutations, copy number variant profiles, germline predisposition, order of mutation acquisition, clinical phenotype and patient outcome. Mutations in established myeloid driver genes other than JAK2, CALR or MPL were identified in 827 patients (41%). The presence and number of additional mutations correlated with both MPN phenotype and age at diagnosis. Non-canonical JAK2 and MPL mutations were found in 51 patients, of whom 17 had “triple-negative” disease. Novel protein truncating mutations in PPM1D and MLL3 were identified in 54 (2.6%) and 25 patients (1.2%) respectively. Chromosomal events, predominantly uniparental disomy of chromosome 9p (9p UPD), were seen in only 8% of those with ET, compared to 45% and 55% of those with MF and PV respectively. The JAK2 46/1 haplotype correlated with the presence of JAK2V617F, 9pUPD, increased JAK2 clone size and a PV phenotype. In addition, a range of other genetic and non-genetic factors were found to significantly correlate with phenotype at presentation. Mutation timing was assessed to characterize the patterns of tumor evolution. Many genes were specifically acquired either early or late in disease. The sequence of mutation acquisition was also linked to phenotype. In JAK2 -mutated patients, JAK2 was the earliest detected event (and/or was present in the dominant clone) in 80% of cases of PV and MF, but was preceded by other mutations in the majority of patients with ET. DNMT3A and SF3B1 mutations preceding JAK2 mutations were almost exclusively seen in ET, while EZH2 and ASXL1 mutations post- JAK2 were commonly a feature of MF. There were 422 different combinations of mutational/chromosomal events observed in this study, of which only 37 were recurrent in at least 5 cases. Bayesian network analysis and clustering using Bayesian Dirichlet processes were used to identify distinct patterns and genetic groups within MPNs. Two groups in particular were enriched in MF (as well as MDS) patients and were associated with adverse outcomes. Mutations in TP53 in association with chromosome 17p aberrations and/or 5q- were a distinct group associated with an increased risk of AML transformation in both chronic phase and MF patients. We then developed a unifying predictive model for all MPN patients. In order to take into account the striking degree of heterogeneity in genetic events, clinical characteristics and potential clinical outcomes, we developed a multi-state random effects Cox proportional hazards model. This allowed integration of a total of 63 clinical and genomic variables in order to generate individualised patient predictions for survival and disease transformation for all MPN patients. The model generated accurate predictions on the training cohort, and performed well on internal cross-validation and on application to an external validation cohort. In patients with MF, the model was more accurate for predictions of event-free survival than DIPSS or IPSS (concordance 81% v 69% v 77% respectively). We have devised an online calculator that can generate personalised outcome predictions for individual patients (and impute missing information where unavailable). This could be used to guide the management of chronic phase and MF patients and improve stratification within clinical trials. Together our results demonstrate the utility of combining genomic data with clinical parameters to refine disease classification and improve prognostication.