Genes Under Study
The original list of genes being targeted for analysis by our project contained 4455 nuclear genes that were identified as likely to encode plastid-targeted proteins. We have adopted an inclusive approach to predicting plastid targeting to minimize false negatives, recognizing the likelihood of false positives. The 28,952 sequences in the TIGR A. thaliana proteome (release 5) were processed by TargetP, a sensitive and specific prediction algorithm (Emanuelsson et al., 2000; Richly and Leister, 2004). TargetP identified 3,996 protein-coding genes as chloroplast-targeted, with 273 ambiguously predicted as either mitochondrial or chloroplastic. Because such ambiguity is a common failing of prediction algorithms, these sequences are included in our gene list. 186 additional sequences are reliably validated as chloroplast-targeted but are missed by TargetP.
In the spring of 2007 we added 426 new genes to the target list. These genes are from stromal proteomics data from Klaas van Wijk at Cornell University and annotation made available from the SubCellular Proteomic Database (SUBA) at the Plant Energy Biology Centre of Excellence in Australia.
In the second half of 2007 we produced a modified gene list to incorporate changes to the annotation of the Arabidopsis genome due to release of TAIR7. This version was also updated by removing lists of genes annotated as producing embryo lethal genes when mutated.
Starting in 2011 the project expanded in two ways. First, we sought to identify plastid proteins that encode genes essential for seed or seedling development. This project led to the identification of 520 new homozygous viable T-DNA mutants of Arabidopsis (now available from the Arabidopsis Biological Resource Center) and several dozen lethal T-DNA mutants (see Savage et al., 2013). Second, we began to identify genes that are co-regulated with enzymes of aspartate derived amino acid biosynthesis and branched chain amino acid degradation.
Our 2013 gene list included 200 new genes with potential roles in photosynthesis, amino acid metabolism, and other closely related processes (e.g., photorespiration).
Most plastid outer envelope membrane proteins do not carry predictable plastid-targeting sequences. Therefore this project will not attempt a comprehensive characterization of genes encoding proteins that are targeted to the outer envelope membrane. We have included selected proteins already demonstrated in the plastid outer envelope membrane and are directly related to ongoing work in our labs.
Heazlewood JL, Verboom RE, Tonti-Filippini J, Small I, Millar AH. SUBA: the Arabidopsis Subcellular Database. Nucleic Acids Res. (2007) Jan;35(Database issue):D213-8. doi: 10.1093/nar/gkl863
Heazlewood JL, Tonti-Filippini J, Verboom RE, Millar AH. Combining experimental and predicted datasets for determination of the subcellular location of proteins in Arabidopsis. Plant Physiol. (2005) Oct;139(2):598-609. doi: 10.1104/pp.105.065532
Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. (2000) Jul 21;300(4):1005-16. doi: 10.1006/jmbi.2000.3903
Richly E, Leister D. An improved prediction of chloroplast proteins reveals diversities and commonalities in the chloroplast proteomes of Arabidopsis and rice. Gene. (2004) Mar 31;329:11-6. doi: 10.1016/j.gene.2004.01.008