Genomics of the origin and evolution of Citrus
The genus Citrus, comprising some of the most widely cultivated fruit crops worldwide, includes an uncertain number of species. Here we describe ten natural citrus species, using genomic, phylogenetic and biogeographic analyses of 60 accessions representing diverse citrus germ plasms, and propose that citrus diversified during the late Miocene epoch through a rapid southeast Asian radiation that correlates with a marked weakening of the monsoons. A second radiation enabled by migration across the Wallace line gave rise to the Australian limes in the early Pliocene epoch. Further identification and analyses of hybrids and admixed genomes provides insights into the genealogy of major commercial cultivars of citrus. Among mandarins and sweet orange, we find an extensive network of relatedness that illuminates the domestication of these groups. Widespread pummelo admixture among these mandarins and its correlation with fruit size and acidity suggests a plausible role of pummelo introgression in the selection of palatable mandarins. This work provides a new evolutionary framework for the genus Citrus.
The genus Citrus and related genera (Fortunella, Poncirus, Eremocitrus and Microcitrus) belong to the angiosperm subfamily Aurantioideae of the Rutaceae family, which is widely distributed across the monsoon region from west Pakistan to north-central China and south through the East Indian Archipelago to New Guinea and the Bismarck Archipelago, northeastern Australia, New Caledonia, Melanesia and the western Polynesian islands1. Native habitats of citrus and related genera roughly extend throughout this broad area (Extended Data Fig. 1a and Supplementary Table 1), although the geographical origin, timing and dispersal of citrus species across southeast Asia remain unclear. A major obstacle to resolving these uncertainties is our poor understanding of the genealogy of complex admixture in cultivated citrus, as has recently been shown2. Some citrus are clonally propagated apomictically3 through nucellar embryony, that is, the development of non-sexual embryos originating in the maternal nucellar tissue of the ovule, and this natural process may have been co-opted during domestication; grafting is a relatively recent phenomenon4. Both modes of clonal propagation have led to the domestication of fixed (desirable) genotypes, including interspecific hybrids, such as oranges, limes, lemons, grapefruits and other types.
Under this scenario, it is not surprising that the current chaotic citrus taxonomy?based on long-standing, conflicting proposals5,6?requires a solid reformulation consistent with a full understanding of the hybrid and/or admixture nature of cultivated citrus species. Here we analyse genome sequences of diverse citrus to characterize the diversity and evolution of citrus at the species level and identify citrus admixtures and interspecific hybrids. We further examine the network of relatedness among mandarins and sweet orange, as well as the pattern of the introgression of pummelos among mandarins for clues to the early stages of citrus domestication.
Diversity and evolution of the genus Citrus
To investigate the genetic diversity and evolutionary history of citrus, we analysed the genomes of 58 citrus accessions and two outgroup genera (Poncirus and Severinia) that were sequenced to high coverage, including recently published sequences2,3,7 as well as 30 new genome sequences described here. For our purpose, we do not include accessions related by somatic mutations. These sequences represent a diverse sampling of citrus species, their admixtures and hybrids (Supplementary Tables 2, 3 and Supplementary Notes 1, 2). Our collection includes accessions from eight previously unsequenced and/or unexamined citrus species, such as pure mandarins (Citrus reticulata), citron (Citrus medica), Citrus micrantha (a wild species from within the subgenus Papeda), Nagami kumquat (Fortunella margarita, also known as Citrus japonica var. margarita), and Citrus ichangensis (also known as Citrus cavaleriei; this species is also considered a Papeda), as well as three Australian citrus species (Supplementary Notes 3, 4). For each species, we have sequenced one or more pure accessions without interspecific admixture.
Local segmental ancestry of each accession can be delineated for both admixed and hybrid genotypes, based on genome-wide ancestry-informative single-nucleotide polymorphisms (Supplementary Note 5). Comparative genome analysis further identified shared haplotypes among the accessions (Supplementary Notes 6, 7). In particular, we demonstrate the F1 interspecific hybrid nature of Rangpur lime and red rough lemon (two different mandarin?citron hybrids), Mexican lime (a micrantha?citron hybrid) and calamondin (a kumquat?mandarin hybrid), and confirm, using whole-genome sequence data, the origins of grapefruit (a pummelo?sweet orange hybrid), lemon (a sour orange?citron hybrid) and eremorange (a sweet orange and Eremocitrus glauca (also known as Citrus glauca) hybrid). We also verified the parentage of Cocktail grapefruit, with low-acid pummelo as the seed parent and King and Dancy mandarins as the two grandparents on the paternal side. The origin of the Ambersweet orange is similarly confirmed to be a mandarin?sweet orange hybrid with Clementine as a grandparent. We have previously shown that sour orange (cv. Seville) (Citrus aurantium) is a pummelo?mandarin hybrid, and have analysed the more complex origin of sweet orange (Citrus sinensis)2. Re-analysing sequences from ten cultivars of sweet orange3 shows that they are all derived from the same genome by somatic mutations, and were thus not included in our study.
We identified ten progenitor citrus species (Supplementary Note 4.1) by combining diversity analysis (Extended Data Table 1), multidimensional scaling and chloroplast genome phylogeny (Extended Data Fig. 1b). The first two principal coordinates in the multidimensional scaling (Fig. 1a) separate three ancestral (sometimes called ‘fundamental’) Citrus species associated with commercially important types8,9?citrons (C. medica), mandarins (C. reticulata) and pummelos (Citrus maxima)?and display lemons, limes, oranges and grapefruits as hybrids involving these three species. The nucleotide diversity distributions (Fig. 1b) show distinct scales for interspecific divergence and intraspecific variation, and reflect the genetic origin of each accession. Hybrid accessions (sour orange, calamondin, lemon and non-Australian limes) with ancestry from two or more citrus species are readily identified on the basis of their higher segmental heterozygosity (1.5?2.4%) relative to intraspecific diversity (0.1?0.6%). Other citrus accessions show bimodal distributions in heterozygosity (sweet orange, grapefruits and some highly heterozygous mandarins) due to interspecific admixture, a process that generally involves complex backcrosses. Among the pure genotypes without interspecific admixture, citrons show significantly lower intraspecific diversity (around 0.1%) than the other species (0.3?0.6%). The reduced heterozygosity of citrons, a mono-embryonic species, is probably due to the cleistogamy of its flowers10, a mechanism that promotes pollination and self-fertilization in unopened flower buds, which in turn reduces heterozygosity.
Figure 1: Genetic structure, heterozygosity and phylogeny of Citrus species.
a, Principal coordinate analysis of 58 citrus accessions based on pairwise nuclear genome distances and metric multidimensional scaling. The first two axes separate the three main citrus groups (citrons, pummelos and mandarins) with interspecific hybrids (oranges, grapefruit, lemon and limes) situated at intermediate positions relative to their parental genotypes. b, Violin plots of the heterozygosity distribution in 58 citrus accessions, representing 10 taxonomic groups as well as 2 related genera, Poncirus (Poncirus trifoliata, also known as Citrus trifoliata) and Chinese box orange (Severinia). White dot, median; bar limits, upper and lower quartiles; whiskers, 1.5× interquartile range. The bimodal separation of intraspecies (light blue) and interspecies (light pink) genetic diversity is manifested among the admixed mandarins and across different genotypes including interspecific hybrids. Three-letter codes are listed in parenthesis with additional descriptions in Supplementary Table 2. c, Chronogram of citrus speciation. Two distinct and temporally well-separated phases of species radiation are apparent, with the southeast Asian citrus radiation followed by the Australian citrus diversification. Age calibration is based on the citrus fossil C. linczangensis16 from the Late Miocene (denoted by a filled red circle). The 95% confidence intervals are derived from 200 bootstraps. Bayesian posterior probability is 1.0 for all nodes. d, Proposed origin of citrus and ancient dispersal routes. Arrows suggest plausible migration directions of the ancestral citrus species from the centre of origin?the triangle formed by northeastern India, northern Myanmar and northwestern Yunnan. The proposal is compatible with citrus biogeography, phylogenetic relationships, the inferred timing of diversification and the paleogeography of the region, especially the geological history of Wallacea and Japan. The red star marks the fossil location of C. linczangensis. Citrus fruit images in c and d are not drawn to scale.
The identification of a set of pure citrus species provides new insights into the phylogeny of citrus, their origins, evolution and dispersal. Citrus phylogeny is controversial1,5,6,11,12, in part owing to the difficulty of identifying pure or wild progenitor species, because of substantial interspecific hybridization that has resulted in several clonally propagated and cultivated accessions. Some authors assign separate binomial species designations to clonally propagated genotypes1,6. Our nuclear genome-based phylogeny, which is derived from 362,748 single-nucleotide polymorphisms in non-genic and non-pericentromeric genomic regions, reveals that citrus species are a monophyletic group and establishes well-defined relationships among its lineages (Fig. 1c and Supplementary Note 8). Notably, the nuclear genome-derived phylogeny differs in detail from the chloroplast-derived phylogeny (Extended Data Fig. 1). This is not unexpected, as chloroplast DNA is a single, non-recombining unit and is unlikely to show perfect lineage sorting during rapid radiation (Supplementary Note 8.3).
The origin of citrus has generally been considered to be in southeast Asia1, a biodiversity hotspot13 with a climate that has been influenced by both east and south Asian monsoons14 (Supplementary Note 9). Specific regions include the Yunnan province of southwest China15, Myanmar and northeastern India in the Himalayan foothills1. A fossil specimen from the late Miocene epoch of Lincang in Yunnan, Citrus linczangensis16, has traits that are characteristic of current major citrus groups, and provides definite evidence for the existence of a common Citrus ancestor within the Yunnan province approximately 8 million years ago (Ma).
Our analysis establishes a relatively rapid Asian radiation of citrus species in the late Miocene (6?8?Ma; Fig. 1c, d), a period coincident with an extensive weakening of monsoons and a pronounced climate transition from wet to drier conditions17. In southeast Asia, this marked climate alteration caused major changes in biota, including the migration of mammals18 and rapid radiation of various plant lineages19,20. Australian citrus species form a distinct clade that was proposed to be nested with citrons12, although distinct generic names (Eremocitrus and Microcitrus) were assigned in botanical classifications by Swingle1,5. Both molecular dating analysis21 and our whole-genome phylogenetic analysis do not support an Australian origin for citrus22. Rather, citrus species spread from southeast Asia to Australasia, probably via transoceanic dispersals. Our genomic analysis indicates that the Australian radiation occurred during the early Pliocene epoch, around 4?Ma. This is contemporaneous with other west-to-east angiosperm migrations from southeast Asia23,24, presumably taking advantage of the elevation of Malesia and Wallacea in the late Miocene and Pliocene25,26 (Supplementary Note 9).
The nuclear and chloroplast genome phylogenies indicate that there are three Australian species in our collection. One of the two Australian finger limes shows clear signs of admixture with round limes (Supplementary Note 5.4). The closest relative to Australian citrus is Fortunella, a species that has been reported to grow in the wild in southern China27. Australian citrus species are diverse, and found natively in both dry and rainforest environments in northeast Australia, depending on the species28. Our phylogeny shows that the progenitor citrus probably migrated across the Wallace line, a natural barrier for species dispersal from southeast Asia to Australasia, and later adapted to these diverse climates.
The results also show that the Tachibana mandarin, naturally found in Taiwan, the Ryukyu archipelago and Japan29, split from mainland Asian mandarins (Fig. 1c, d) during the early Pleistocene (around 2?Ma), a geological epoch with strong glacial maxima30. Tachibana, as did other flora and fauna in the region, very probably arrived in these islands from the adjacent mainland31 during the drop in the sea level of the South China Sea and the emergence of land bridges32,33, a process promoted by the expansion of ice sheets that repetitively occurred during glacial maxima (Supplementary Note 9).
Although Tachibana5,6 has been assigned its own species (Citrus tachibana), sequence analysis reveals that it has a close affinity to C. reticulata34,35 and does not support its taxonomic position as a separate species (Supplementary Note 4.1). However, both chloroplast genome phylogeny (Extended Data Fig. 1b) and nuclear genome clustering (Fig. 1a) clearly distinguish Tachibana from the mainland Asian mandarins. This suggests that Tachibana should be designated a subspecies of C. reticulata. By contrast, the wild Mangshan ‘mandarin’ (Citrus mangshanensis)7 represents a distinct species, with comparable distances to C. reticulata, pummelo and citron2 (Extended Data Table 1).
Pattern of pummelo admixture in the mandarins
Using 588,583 ancestry-informative single-nucleotide polymorphisms derived from three species, C. medica, C. maxima and C. reticulata, we delineate the segmental ancestry of 46 citrus accessions (Extended Data Fig. 2 and Supplementary Note 5). Pummelo admixture is found in all but 5 of the 28 sequenced mandarins, and the amount and pattern of pummelo admixture, as identified by phased pummelo haplotypes (Fig. 2a and Supplementary Note 6), suggests the classification of the mandarins into three types.
Figure 2: Admixture proportion and citrus genealogy.
a, Allelic proportion of five progenitor citrus species in 50 accessions. CI, C. medica; FO, Fortunella; MA, C. reticulata; MC, C. micrantha; PU, C. maxima; UNK, unknown. The pummelos and citrons represent pure citrus species, whereas in the heterogeneous set of mandarins, the degree of pummelo introgression subdivides the group into pure (type-1) and admixed (type-2 and -3) mandarins. Three-letter code as in Fig. 1, see Supplementary Table 2 for details. b, Genealogy of major citrus genotypes. The five progenitor species are shown at the top. Blue lines represent simple crosses between two parental genotypes, whereas red lines represent more complex processes involving multiple individuals, generations and/or backcrosses. Whereas type-1 mandarins are pure species, type-2 (early-admixture) mandarins contain a small amount of pummelo admixture that can be traced back to a common pummelo ancestor (with P1 or P2 haplotypes). Later, additional pummelo introgressions into type-2 mandarins gave rise to both type-3 (late-admixture) mandarins and sweet orange. Further breeding between sweet orange and mandarins or within late-admixture mandarins produced additional modern mandarins. Fruit images are not to scale and represent the most popular citrus types. See Supplementary Note 1.1 for nomenclature usage.
Type-1 mandarins represent pure C. reticulata with no evidence of interspecific admixture and include Tachibana, three unnamed Chinese mandarins (M01, M02, M04)3 and the ancient Chinese cultivar Sun Chu Sha Kat reported here, a small tart mandarin commonly grown in China and Japan, and also found in Assam. This cultivar is likely described in Han Yen-Chih’s ad 1178 monograph ‘Chü Lu’36, which includes references to citrus cultivated during the reign of Emperor Ta Yu (2205?2197 bc). Sixteen of the twenty-eight mandarins belong to type-2 mandarins, which have a small amount of pummelo admixture (1?10% of the length of the genetic map; Fig. 2a), usually in the form of a few short segments distributed across the genome. Although the lengths and locations of these admixed segments may be distinct in different mandarins, they share one or two common pummelo haplotypes (designated as P1 and P2) (Extended Data Fig. 3). By contrast, the seven remaining mandarins (type-3) contain higher proportions of pummelo alleles (12?38%; Fig. 2a) in longer segments. Although the P1 and P2 pummelo haplotypes are also detectable among type-3 mandarins, other more extensive pummelo haplotypes dominate the pummelo admixture in type-3 mandarins (Fig. 2b and Extended Data Table 2).
These observations suggest that the initial pummelo introgression into the mandarin gene pool may have involved as few as one pummelo tree (carrying both P1 and P2 haplotypes), the contribution of which was diluted by repeated backcrosses with mandarins (Supplementary Note 6.3). The introgressed pummelo haplotypes became widespread and gave rise to type-2 (early-admixture) mandarins (Fig. 2b). We propose that later, additional pummelo introgressions gave rise to type-3 (late-admixture) mandarins and sweet orange, and that some modern type-3 mandarins were derived from hybridizations among existing mandarins and sweet orange. This late-admixture model for type-3 mandarins is consistent with the historical records for Clementine and Kiyomi (both mandarin?sweet orange hybrids), and for W. Murcott, Wilking and Fallglo (hybrids involving other type-3 mandarins), whereas definitive records for the remaining two late-admixture mandarins (King and Satsuma) are not available.