This review uses a data-driven, quantitative method to summarize the published, peer-reviewed literature about the impact of genetically modified (GM) plants on arthropod natural enemies in laboratory experiments. The method is similar to meta-analysis, and, in contrast to a simple author-vote counting method used by several earlier reviews, gives an objective, data-driven summary of existing knowledge about these effects. Significantly more non-neutral responses were observed than expected at random in 75% of the comparisons of natural enemy groups and response classes. These observations indicate that Cry toxins and proteinase inhibitors often have non-neutral effects on natural enemies. This synthesis identifies a continued bias toward studies on a few predator species, especially the green lacewing, Chrysoperla carnea Stephens, which may be more sensitive to GM insecticidal plants (16.8% of the quantified parameter responses were significantly negative) than predators in general (10.9% significantly negative effects without C. carnea). Parasitoids were more susceptible than predators to the effects of both Cry toxins and proteinase inhibitors, with fewer positive effects (18.0%, significant and nonsignificant positive effects combined) than negative ones (66.1%, significant and nonsignificant negative effects combined). GM plants can have a positive effect on natural enemies (4.8% of responses were significantly positive), although significant negative (21.2%) effects were more common. Although there are data on 48 natural enemy species, the database is still far from adequate to predict the effect of a Bt toxin or proteinase inhibitor on natural enemies.
The effect of insecticidal transgenic or genetically modified (GM) crops on natural enemies remains a controversial topic (Andow et al. 2006, Romeis et al. 2006, Marvier et al. 2007). Two major kinds of insecticidal transgenic crops have been in or are nearing possible commercial use. The several Bt crops express one or more transgenes originating from the bacterium Bacillus thuringiensis Berliner. These include the crystalline protein toxins, Cry1Ab, Cry1Ac, Cry1Fa, Cry2A, Cry3A, Cry3Bb, Cry9C, and Cry34A/Cry35 and the vegetative insecticidal protein Vip3A. The second group of insecticidal transgenes are proteinase inhibitors and include bovine trypsin inhibitor (BPTI or aprotinin; Christeller et al. 2002), snowdrop (Galanthus nivalis L.) agglutinin (GNA; Hilder et al. 1995), cowpea trypsin inhibitor (CpTI; Felton and Gatehouse 1996), and rice cystatin I (oryzacystatin I; Benchekroun et al. 1995).
Before release, transgenic plants need to undergo a risk assessment process to avoid harm to the environment (EC 2001, National Research Council 2001, EFSA 2006). During this process, beneficial organisms—such as natural enemies of insect pests—are often considered, and the initial tests typically involve laboratory experiments (Andow et al. 2006). Two recent reviews provide contrasting summaries of the published literature of laboratory studies on natural enemies and GM crops (Lövei and Arpaia 2005, Romeis et al. 2006). The reviews are not directly comparable because Lövei and Arpaia (2005) did not separate Bt crops from proteinase inhibitors, and Romeis et al. (2006) did not consider proteinase inhibitors.
More significantly, the two reviews summarized the literature in very different ways. Romeis et al. (2006) based theirs on the conclusions of the authors and used a vote counting method to compile their summary. Lövei and Arpaia (2005) based theirs on the published data tables and figures and used a weighted vote counting method to compile their summary. The weighting of the results was according to the reported SE of the control means to create multiple response categories: statistically significant differences (higher or lower response in Bt crop), and "possible" differences (see Materials and Methods section). This weighting method is similar to the meta-analysis statistic Hedge's g. As indicated by Marvier et al. (2007), simple vote counting of author conclusions are inadequate, and arguments about the safety of GM crops will remain unsatisfying without a quantitative analysis of the numerous experiments.
In this paper, we show how the method of summarizing the published results can influence the conclusions of a literature review, provide a detailed summary of the effect of insecticidal crops on natural enemies in laboratory studies, compare these results with those of Romeis et al. (2006) and Lövei and Arpaia (2005), and discuss some of the limitations of the literature.
Materials and Methods
To determine which papers would be included in this review, we maintained the criterion used by Lövei and Arpaia (2005) that only studies published in the peer-reviewed scientific literature containing original data from a laboratory or greenhouse study were admitted. In addition, an admissible study had to have a no-toxin or nontransgenic plant control and test either purified toxins in artificial diets, the transgenic plant or its parts, or extracts of the transgenic plant. To identify relevant papers, searches on databases (Web of Science, WebSPIRS) were conducted using suitable keywords, and the reference lists of the published reviews were searched (Lövei and Arpaia 2005, Romeis et al. 2006, Marvier et al. 2007). We summarized the results reported in 80 laboratory studies published through mid-2007, including the 45 reviewed earlier (Lövei and Arpaia 2005). These included 55 studies on Cry toxins and 27 studies on proteinase inhibitors (PIs); 2 covered both. In comparison, Romeis et al. (2006) reviewed 29 laboratory studies on Cry toxins and Marvier et al. (2007) reviewed 45 studies on Cry toxins. In addition to the PI studies, the main differences between our data set and the Marvier data set were that we excluded five studies that were not peer reviewed and one study lacking a proper control, included five studies that they overlooked (Kalushkov and Hodek 2005; Liu et al. 2005b,c; Zhang et al. 2006a; Duan et al. 2006), four publications that tested purified Cry toxin in artificial diet (Hilbeck et al. 1998b,1999; Romeis et al. 2004; Rodrigo-Simon et al. 2006), and six more recent publications up to mid-2007. The Marvier et al. (2007) database recorded 372 responses of natural enemies in laboratory studies. We combed through all of the publications thoroughly and recorded 1,065 responses to Cry toxins and 583 responses to PIs in our data set.
Furthermore, to refine the approach by Lövei and Arpaia (2005), we considered the different transgene product classes separately. Cry toxins were separated into three classes: Cry1Ab/ Cry1Ac/Cry2A, Cry3A/Cry3Bb, and Cry9A/Cry9C. In addition, the stacked Cry1A + CpTI (cowpea trypsin inhibitor) was also separated. All of the PIs were combined and included aprotinin, jackbean lectin (concanavalinA), CpTI, GNA, the barley cystatin (HvCPI), and oryzacystatin I. The majority of the PI responses were to GNA. To account for the possible over-representation of studies on a single natural enemy, natural enemies with >100 responses to a single transgene class were also evaluated separately. This included Chrysoperla carnea Stephens (185 responses to Cry1/Cry2), Propylea japonica (Thunberg) (102 responses to Cry1/Cry2), Coleomegilla maculata De Geer (101 responses to Cry3), Campoletis chloridae Uchida (107 responses to Cry1/Cry2), and Eulophus pennicornis (Nees) (145 responses to PIs).
Responses were classified into the following categories: behavior, development, growth, survival/mortality, reproduction, sex ratio, and enzyme activity. Behavior included choice (e.g., response to plant volatiles), feeding preference, flip time (time until an upturned adult coccinelid righted itself), landing frequency, prey consumption, proportion of prey eaten, reaction time to prey, spider web morphology, and walking speed. Development was typically a period of time related to immature development, but also included time to first oviposition and other developmental parameters associated with reproduction. Occasionally, development was expressed as the proportion of individuals reaching a certain developmental stage by some time period (e.g., the percent of larvae reaching the third instar by day 10). Growth was a mass measurement and could be either dry body mass or fresh body mass. Survival or mortality was typically provided as percent surviving or percent mortality, but it was also expressed as longevity, especially for adult stages. Reproduction typically was a measure of fecundity but also included clutch size, egg load, mating frequency, and measures of reproduction per host or prey, such as parasitism rate, offspring per primary host, and offspring per host plant. Sex ratio was expressed as percent females or female:male ratio.
For all data, the experimental treatment was compared with the control, and the measure of the effect size was the experimental mean - control mean. A positive value indicated that the magnitude of the experimental means was larger than the magnitude of the control mean. For behavioral responses such as flip time, landing, consumption, reaction time, web morphology, and walking speed, faster, more, and larger were considered "better" than slower, less, and smaller, so the effect size measure indicated how much better (for positive effect sizes) or worse (for negative effect sizes) was the experimental response. Other behavioral responses, such as choice and preference could not be readily classified as better or worse, and these were reported as effect sizes. For development, slower development is usually worse than faster development, so the effect size was reversed (control - experimental), so that positive values still indicated that the response to the experimental treatment was better than to the control. Growth and reproduction were considered to be better when larger and producing more offspring, so a positive effect size indicated that the experimental treatment was better than the control. For mortality and survival, higher survival and longevity were better. However, lower mortality was considered better, so here the effect size was reversed (control - experimental). It was not clear whether a female- or male-biased sex ratio is better. This is likely to depend on the species and test environment, so the effect size was reported with no judgment about which was better. Higher enzyme activity was considered better than lower activity.
We kept the effect size measure originally applied in Lövei and Arpaia (2005) to evaluate these data. We sorted the quantified responses into five effect size classes: significant negative, nonsignificant negative, neutral, nonsignificant positive, or significant positive. The level of significance was set at P < 0.05. A difference was classified as nonsignificantly different if it was larger than the pooled SE of the control and treatment. For normally distributed errors and sufficiently large sample sizes, this is approximately equivalent to P < 0.30, which allows us to statistically test whether there are significantly more non-neutral effects than expected (meaning the effects are random or not). This criterion enabled us to score 87.3% of the responses, whereas 209 responses (12.7%) could not be scored. Many of the responses that could not be scored were categorical data and were presented as contingency tables in the original publication (e.g., survival rates). We extracted the raw binomial responses (e.g., insects alive or dead) from the information presented in the publications, and calculated Pearson's χ2 analysis of contingency tables including Yates correction for cells with observed frequencies <5. We classified responses as negative significant or positive significant when the experimental treatment was significantly worse (or better) than the control based on Pearson's χ2 with P < 0.05; negative not significant or positive not significant if 0.05 < P < 0.3; and neutral otherwise. This allowed us to score a further 162 of the 209 unscored responses (77.5%). The remaining 47 responses (2.9%, 20 data values from Cry toxin experiments and 27 involving PIs) did not report treatment and control means (21 responses from the publications by Birch et al. 1999, Bell et al. 2003, Romeis et al. 2003, Pruetz and Dettner 2004, Pruetz et al. 2004, Sanders et al. 2007) or did not report measures of variance of the means (26 responses: Bell et al. 2001b,2003; Bauer and Boethel 2003; Down et al. 2003; Romeis et al. 2003; Schuler et al. 2004; Sharma et al. 2007). It is unlikely that these missing data could strongly affect the results of this analysis.
To arrive at a synthesis, the number of responses in each of the above five effect classes was counted. We counted all quantified responses in each paper. For example, if mortality was estimated instar by instar, we considered each of these responses separately (but did not include total larval mortality). This is different from the approach by Lövei and Arpaia (2005) where, in such cases, only the summary mortality was used, leading to one value versus the current two to five responses (depending on the life history; for example, number of larval stages of the species concerned or the duration of the study). Individual instar responses may be more informative than the summary statistics, especially if the test population is heterogeneous for tolerance to a toxin (Vaupel and Yashin 1985). If there is any heterogeneity for tolerance, the less tolerant individuals will die earlier than the more tolerant ones, creating complex instar-specific mortality schedules and patterns of development times. We also evaluated how well the instar-specific responses matched the total summary response. We collected all of the cases in the literature with instar-specific responses and the summary responses for development time and survival (or mortality). There were 89 such cases in 25 studies on development time and 38 cases in 16 studies on survival. We considered several match criteria: (1) the match between the reported or calculated statistical significance; (2) the match between the sign of the responses (zero matched both signs); and (3) the match between the sign and relative magnitude of the effect, using the five-category classification system described above. A few papers reported cumulative effects over increasingly longer periods of time (Zwahlen et al. 2000, Duan et al. 2002). When evaluating these studies, matches between adjacent time periods were considered. We counted the number of matches between instar-specific responses and the total response within studies and pooled these counts across studies.
By including all responses in a published paper, papers that report more responses may have a greater effect on our interpretation of the data. The mean number was 20.6 responses per published study (range, 1-68). Although this is high variation, the study with 68 responses comprised only 4.1% of all reported responses. Consequently, although these papers have a larger effect on our summary than others, they do not dominate the data overall or in any one response category.
This detailed, data-driven reading ("micro-reading"; Tufte 1997) of the quantitative data (rather than the summary evaluation by the authors) resulted in more data points and provided a more accurate picture of the literature than the summary method used by many others (e.g., O'Callaghan et al. 2005 and Romeis et al. 2006). To take one example, Romeis et al. (2006) reported "no effect" for the response of a natural enemy, a coccinellid beetle, in one Chinese study, involving bitrophic exposure to Bt pollen (Bai et al. 2005). Bai et al. (2005), however, measured and compared 18 predator response parameters each on two different Bt varieties, two of which were significantly negative with respect to the control, 10 nonsignificantly negative, 10 nonsignificantly positive, 2 significantly positive, and the rest (12) neutral (Table 1).
Data from Bai et al. (2005).
The results were evaluated by comparing the distribution of responses to a null hypothesis that the effect sizes were randomly distributed around "no effect." Effect size divided by its SE is normally distributed, so the significant positive and significant negative effects each have an expected frequency of 0.025, and the "nonsignificant positive" and "nonsignificant negative" effects have an expected frequency of 0.1337 (1 SE). If the number of responses was <10, the response was not analyzed, because the numbers were too small. If the number of responses was >80, the test was for all five effect sizes (df = 4). Otherwise, the two positive effects and the two negative effects were combined, and the test was on three effect sizes (df = 2). In addition, skewness in the effect (are there more negative effects than positive ones) was tested by comparing the number of responses in the two positive effect classes with the number in the two negative effect classes (df = 1). All tests were conducted using log-linear contingency table analysis.
Comparison of Instar-specific Responses with Total Summary Responses.
Matching between instar-specific responses and the total summary response was 50-84%, depending on the matching criterion and the response (Table 2, development time and survival). If the instar-specific responses were completely redundant to the total response, there should be perfect matching except a few nonmatches arising at random. If instar-specific responses provided some additional information not contained in the total response, there should be a degree of nonmatching beyond what might be expected from random nonmatches. Allowing the normal type I error rate of 0.05, we calculated the probability that mismatches are random (=that instar-specific responses provide no additional information beyond that in the total response) and found that all match criteria for both responses were sufficiently low to allow us to claim that the instar-specific responses contained information not contained in the total response. As might be expected, the instar-specific responses had less matching (more information) for the criterion including both sign and magnitude of the effect, but even when limited to the sign of the effect or a match of statistical significance, the instar-specific responses provided additional information. These results justify our use of instar-specific response values.
Studies were Ahmad et al. (2006); Ashouri et al. (2001); Bai et al. (2006); Bell et al. (1999, 2001b); Down et al. (2000); Duan et al. (2006),Dutton et al. (2002); Ferry et al. (2003); Gonzalez-Zamora et al. (2007); Hilbeck et al. (1998a, b, 1999); Lundgren and Wiedenmann (2002), Pilcher et al. (1997); Ramirez-Romero et al. (2007); Rodrigo-Simon et al. (2006); Schuler et al. (2004); Setamou et al. (2002b); Sharma (2007),Tomov and Bernal (2003); Vojtech et al. (2005); Zhang et al. (2006b); Zwahlen et al. (2000).
Studies were Ashouri et al. (2001); Bernal et al. (2002); Down et al. (2000, 2003); Duan et al. (2006); Dutton et al. (2002); Hilbeck et al (1998a, b, 1999); Liu et al. (2005b); Lundgren and Wiedenmann (2004); Pilcher et al. (1997); Rodrigo-Simon et al. (2006); Schuler et al. (2004),Tomov and Bernal (2003); Zwahlen et al. (2000).
Scope and Quality of the Published Data.
The number of species studied has increased slightly compared with the Lövei and Arpaia (2005) review. A total of 27 species of predators and 21 species of parasitoids have been studied in at least one laboratory experiment. However, significant imbalances remain. Most of the predator studies have focused on one species of Chrysopidae, C. carnea, and two species of Coccinellidae, C. maculata and P. japonica. Most of the parasitoid studies have been conducted on 12 species of Ichneumonoidea (536 responses, 65.2% of all parasitoid responses), and an additional 17.6% of the responses were on one species of Eulophidae. No predaceous Diptera, Orthoptera s.l., Plecoptera, or Odonata have been studied, and only one study each has been conducted on spiders and predaceous mites (Table 3). There have been no studies on any species in the Hymenoptera superfamilies Bethyloidea, Ceraphronoidea, Evanoidea, Platygastroidea, and Proctotrupoidea and no studies on parasitic Diptera.
The geographic distribution of these studies has also widened since 2005 (Table 3). However, nearly all of the measured responses are from China, the United States, and western Europe. Although Bt crops are commercially used in Argentina, Brazil, Australia, and South Africa, we found no published, peer-reviewed laboratory studies on natural enemies from these countries. The selection of natural enemy groups to be involved in laboratory studies in different geographical locations showed no apparent biological criteria.
A total of 1,648 responses have been published in the peer-reviewed literature (Table 3). Excluding 47 responses lacking reported means or variances, we could analyze 1,601 responses, including 812 on predators and 789 on parasitoids. There is considerable variation in the quality of the studies, whether related to sample size, statistics, or the accuracy of the measurements. For example, the number of individuals tested in a treatment varied from 20 to >200, with the majority of studies with 30-80 individuals per treatment. Needless to say, larger numbers provide more reliable results. Only five studies had ≥150 individuals per treatment (Hilbeck et al. 1998a,b; Birch et al. 1999; Vojtech et al. 2005; Ramirez-Romero et al. 2007). Studies that replicated the entire experiment more than once are stronger, because they eliminate potential correlations among replicate individuals related to the time the experiment was conducted. However, replication over time was not common. Johnson et al. (1997) and Hilbeck et al. (1998b) had the greatest number of replicate experiments (seven and five replicate experiments, respectively). Statistical reporting was highly variable. The vast majority of studies did not report all significance (P) values, several did not report measures of variance for all of the observed sample means, and a few did not report all sample means. Some studies did not report actual samples sizes when sample size varied among treatment groups (Romeis et al. 2003, Hogervorst et al. 2006). Measurement accuracy also varied some among studies. For example, Bai et al. (2005) examined their developing larvae once every 3 h for the duration of development to adult, enabling very accurate measures of development time and time of mortality. Most studies record these data only once a day.
Responses of Natural Enemies.
Effects were not randomly distributed for any of the natural enemy groupings (Table 4, test for random effects). For predators, this was caused by fewer neutral responses and more positive and negative effects than expected. Overall, there were similar numbers of positive and negative responses for predators (Table 4, test for skewness). Most observations were on Cry1A/Cry2A (51.0%), with similar numbers on Cry3 and PIs. For parasitoids, there were both fewer neutral responses than expected and significantly more negative responses than positive ones (Table 4). The parasitoids that have been studied were more sensitive to Cry toxins and PIs than the predators. Most observations involved Cry1A/Cry2A (46.5%) and PIs (39.3%; Table 4).
Responses of species with disproportionally numerous studies are presented separately.
For predators, when exposed to Cry1/Cry2 toxin, 34.8% of the quantified responses fell into the neutral category, and without the data on C. carnea and P. japonica, 36.2% of the responses were neutral (Table 4). All were significantly less than predicted. C. carnea, however, seemed to be more sensitive than other predators to Cry1/Cry2 toxin: 2.7% of the responses showed a significant positive effect, and 16.8% were significantly negative (Table 4). For other predators, negative effects still prevailed (38.2% for P. japonica; 35.4% for all others) but a greater number of positive effects were recorded (31.4% for P. japonica; 28.4% for all others; Table 4). The beetle-specific Cry3A/Bb caused fewer effects in either direction (42.5% neutral), and this was similar for C. maculata (41.6% neutral) and all other predators (44.4% neutral) (Table 4). The PIs had fewer neutral (27.2%) and more statistically significant effects (24.4% negative, 13.0% positive; Table 4).
The bias toward a few predator species is evident: 47.8% of all predator responses were measured on only three species, C. carnea, P. japonica, and C. maculata. Observations on C. carnea and P. japonica comprised 69.3% of the responses to Cry1A/Cry2A and observations on C. maculata comprised 69.2% of the responses to Cry3.
Parasitoids in general were more susceptible to the effects of both Cry toxins and proteinase inhibitors, with fewer positive effects (always <26%, significant and nonsignificant combined) and more negative ones (between 42.1 and 75.0% significant and nonsignificant combined). There was no marked difference in the effect of Cry toxins versus PIs, although PIs were more likely to have positive effects and less likely to have negative ones (Table 4). Two species, C. chloridae and E. pennicornis, accounted for 30.3% of all of the observations, including 45.2% of all observation on proteinase inhibitors. C. chloridae may be more sensitive to Cry1A/Cry2A than the other parasitoids studied, and E. pennicornis may be less sensitive to proteinase inhibitors than the other parasitoids studied (Table 4).
Considering the sensitivity of the response classes studied (Tables 5 and 6), for predators, 22 of 35 comparisons were significantly nonrandom (P < 0.05; Table 5) with fewer neutral responses than expected. The number of observations in the 12 nonsignificant classes averaged only 16.8 (range, 11-34) compared with 51.0 (range, 20-156) for the significantly nonrandom classes, which suggests that some of the nonsignificant classes are type 2 errors (nonsignificant only because of a small sample size). When comparing classes with similar sample sizes, none of the classes appeared to be more sensitive at detecting non-neutral responses. For 31 of the 35 classes, positive responses were similar to negative responses. Only Cry1A/Cry2A survival had more negative responses than positive ones, and this was because of the response of C. carnea, which accounted for most of the negative responses by predators to Cry1A/Cry2a (Table 5). Survival and reproduction responses to PIs also had more negative responses than positive ones (Table 5), suggesting that PIs have more negative effects on predators than Cry toxins.
Responses of species with disproportionally numerous studies are presented separately.
Extracted from studies by Ahmad et al. (2006); Al Deeb et al. (2001); Alvarez-Alfageme et al. (2007); Armer et al. (2000); Ashouri et al. (1998); Bai et al. (2005, 2006); Bell et al. (2003); Bernal et al. (2002); Birch et al. (1999); Bouchard et al. (2003a, b); Burgess et al. (2002); Davidson et al. (2006); Dogan et al. (1996); Down et al. (2000, 2003); Duan et al. (2002, 2006); Dutton et al. (2002); Ferry et al. (2003); Gonzalez-Zamoraet al. (2007); Harwood et al. (2006); Hilbeck et al. (1998a,b, 1999); Hogervorst et al. (2006); Jørgensen and Lovei (1999); Kalushkov and Hodek (2006); Kalushkov and Nedved (2005); Lozzia et al. (1998); Ludy and Lang (2006); Lundgren and Wiedenmann (2002, 2004,2005); Meier and Hilbeck (2001); Meissle et al. (2005); Mullin et al. (2005); Pilcher et al. (1997); Ponsard et al. (2002); Riddick and Barbosa (1998, 2000); Rovenska et al. (2005); Rodrigo-Simon et al. (2006); Romeis et al. (2004); Schuler et al. (2005); Zhang et al. (2006a, b); and Zwahlen et al. (2000).
Significance level was set at P < 0.05.
Responses of species with disproportion ally numerous studies are presented separately.
Extracted from studies by Ashouri et al. (2001); Baur and Boethel (2003); Bell et al. (1999, 2001a, b, 2004); Bernal et al. (2002); Couty and Poppy (2001); Couty et al. (2001a,b,c); Davidson et al. (2006); Geng et al (2006); Johnson et al. (1997); Liu et al. (2005a, b, c); Pruetz and Dettner (2004); Pruetz et al. (2004); Ramirez-Romero et al. (2007); Romeis et al. (2003); Sanders et al. (2007); Schuler et al. (2001, 2003, 2004); Setamou et al. (2002a,b); Sharma et al. (2007); Tomov and Bernal (2003); Tomov et al. (2003); Turlings et al. (2005); Vojtech et al. (2005); and Wang et al. (2007).
Significance level was set at P < 0.05.
For parasitoids (Table 6), 25 of 31 class responses were significantly nonrandom (P < 0.05), with fewer neutral responses than expected. The number of observations in the six nonsignificant classes was 16.7 (range, 12-19) compared with 52.0 (range, 18-136) for the significantly nonrandom classes, which suggests that some of the nonsignificant classes are type 2 errors. For 12 of the 31 response classes, there were more negative responses than positive ones (Table 6). This was particularly true for parasitoids tested with Cry1A/Cry2A (including Cry1A + CpTI), for which 12 of 15 class responses were significantly more negative than positive (Table 6). Growth seemed to be a more sensitive response than development. A greater proportion of the responses for growth were negative than for development within toxins and parasitoid species (Table 6).
The existing data on the effects of transgenic insecticidal proteins on natural enemies are still incomplete, with respect to the toxins studied, the species of natural enemies evaluated, and the geographic distribution of the studies (Lövei and Arpaia 2005), although we now have data from 48 natural enemy species. Cry1Ab and Cry1Ac are generally considered Lepidopteran-specific toxins, but both are toxic to some Dipteran species (Haider et al. 1986, Omolo et al. 1997), and toxicity to a Dipteran natural enemy still has not been evaluated in the laboratory. Representatives of other groups that are important in pest control (e.g., non-Ichneumonoid parasitoids) have been less studied. C. carnea is over-represented in the literature: 22.8% of all predator responses have been measured on C. carnea and Lepidopteran-active Cry1Ab or Cry1Ac, often in maize or cotton.
One can only speculate why this single species features so frequently among studies done on natural enemies. Lövei and Arpaia (2005) pointed out that the process of selecting species seemed ad hoc or governed by availability and familiarity. C. carnea is one of the standard organisms in pesticide side effect studies and several authors involved in such studies have also authored papers in the GM nontarget fields (Hilbeck et al. 1998, Sterk et al. 1999). However, selection of this species to assess risks may result in false conclusions, because taxonomy is not a suitable guide to assess the reaction of predators to tri-trophic impacts (Malcolm 1992).
Sensitivity of C. carnea is slightly greater than for the remaining predators (Table 4), but the greater difference is that significant negative effects of Cry1A/Cry2A on C. carnea were 6.2 (31/5) times more likely to occur than significant positive ones, whereas negative effects on the remaining predators were half (7/14) as likely as positive ones. In addition, survival of C. carnea had many significant negative effects, whereas only development had several significant negative responses for C. carnea or the other predators. These deficiencies in our present understanding underline the importance of using a systematic, transparent, and multi-criteria procedure to select relevant natural enemy species for biosafety tests (Andow et al. 2006), which can lead to quick and efficient assessment of risk hypotheses associated with species that are ecologically important.
In addition, our review highlights another important gap. There are several recent cry transgenes for which published data barely exist, including Cry1Fa, Cry34/35, and Vip3A. We have more familiarity with Cry1Ab/Cry1Ac and Cry2A, which are common components of insecticidal formulations that have been used for decades in agriculture, than with these new Cry toxins. Both the Cry1Ab and Cry1Ac toxins are members of the large family of three-domain Cry toxins, meaning that they share homologous amino acid sequences in three regions, which are implicated in receptor-specific binding and toxin specificity. Four distinct classes of receptors have been identified: cadherin-like proteins, aminopeptidases, alkaline phosphatases and certain glycolipids (Griffiths et al. 2005), and it is clear that the understanding of receptor and toxin specificity is far from complete. Even well-studied Cry toxins have an incompletely determined range of toxicity (van Frankenhuyzen and Nystrom 2002). Although it is clear that Cry1Ab and Cry1Ac are toxic mainly to Lepidopteran species, it is not yet possible to infer toxin specificity from toxin structure, and thus toxin specificity of a Cry toxin is a scientific hypothesis, not a scientific fact. Moreover, truncation and mutagenesis of synthetic toxins might alter their range of toxicity compared with the native toxins. It is premature to suggest that the results from a few studies on the three-domain Cry1Ab, Cry1Ac, Cry2A, Cry3Aa, Cry3Bb, and Cry9C will generalize to all Bt toxins, including the nonspecific Vip and Cyt toxins, the Bin-like Cry35Aa (Crickmore et al. 2007), the three-domain Cry31Aa (Mizuki et al. 2000), or even the 19 kinds of Cry1Ab and 20 kinds of Cry1Ac (Crickmore et al. 2007).
When all effects of transgenic crops and transgene products on natural enemies are examined (Table 4, excluding C. carnea), 10.8% of the predator response were significantly negatively affected, which is significantly more than the expected 2.5%. In addition, 26.5% of the predator responses were negative, but not significant, which exceeds the expected 13.4% Similarly, positive effects on predators exceeded expectation (4.9% were significantly positively affected; Table 4). Laboratory studies on parasitoids also indicate that Cry toxins and proteinase inhibitors have non-neutral effects on parasitoids. Negative effects were significantly more than expected (56.6%), whereas positive effects were similar to expectation (observed 18.0%, expected 14.9%). Together, these observations indicate that Cry toxins and PIs often have effects on natural enemies that are not neutral. The results clearly indicate that there are potential adverse effects that should be assessed, even for the more widespread Bt crops. We do not imply that any of these adverse effects will result in an adverse effect on the environment; but we do suggest that there is a continued need for case-specific environmental risk assessment for Bt and PI crops.
Our review also suggests that new transgenes, such as PIs (neutral = 27.2%) may be more likely to have effects on predators than Cry toxins (neutral = 37.1%; Table 4). PIs and Cry toxins may be equally likely to have effects on parasitoids (PI neutral = 29.0%; Cry neutral = 23.8%). Proteinases inhibitors may inhibit the efficiency of converting ingested food in a broad range of predatory arthropods compared with the Cry toxins.
Mortality has been one of the most frequently used response parameters in laboratory experiments with natural enemies. For both predators and parasitoids, mortality is not a more sensitive a response parameter than any of the other response classes. There is no empirical reason to focus on mortality as main response of natural enemies in laboratory (tier 1) experiments.
Although it is likely that some of the instar-specific responses are positively correlated, we did not attempt to estimate the degree of correlation or to control for this possibility. If we were to remove positively correlated responses, we would be reducing the number of similar responses, which would increase the variation in the responses. Consequently, the instar-specific analysis in this paper can be considered an underestimation of the degree of variation in responses to Cry toxins and proteinase inhibitors. Additional examination of these instar-specific responses would be useful because it is not entirely clear what is the best way of weighting these data.
Vote-counting of summary evaluations is a common method used in reviews of the effects of transgenes on natural enemies (Romeis et al. 2006). It is a misleading review technique for two reasons. First, it ignores the variation in observed responses and accepts a single summary conclusion. Although individual authors undoubtedly use their best scientific judgment to draw conclusions based on their own data, these conclusions rely on judgment, and we believe that considering the original quantitative data when synthesizing the published literature gives more objective results and interpretations. For example, hypothetically if each of several studies found one statistically significant effect and nine nonsignificant effects of a transgene product on a natural enemy, each author would probably conclude that the transgene product had no apparent effect on the natural enemy, and a review based on these summary evaluations would conclude that there were no effects. In reality, however, 10% of the time the transgene did have an effect, which is significantly greater than expected at random. This is the most serious deficiency in using summary evaluations to review the effects of transgenic insecticidal toxins on natural enemies.
Second, vote counting often considers only effects that satisfy the arbitrary P < 0.05 level of significance, ignoring the observed magnitude of the effect. What if the 1,648 observed responses in this review each had a negative effect with a significance level of P = 0.06? Clearly this is unlikely, but a review based on summaries would conclude that there were no cases of a negative effect of a transgenic insecticidal toxin on a natural enemy, whereas a summary following our method would have concluded that all effects were negative. A central conceptual flaw with summary reviews is that they fail to consider the direction and size of the effect observed in each study.
In summary, a detailed review of the existing quantitative data, in which the different classes of transgene products were considered separately supports and elaborates the conclusions reached earlier (Lövei and Arpaia 2005). Based on our review of the literature, it is clear that conclusions that Bt and PI transgene products have "no harm" to natural enemies are currently overgeneralized and premature. Additionally, there is continued overrepresentation of one species (C .carnea), which seems to be more sensitive to Bt transgenic plants than other predators. Our results also suggest that an even more detailed analysis of the published literature may reveal useful generalizations.
These findings suggest that flexibility in environmental risk assessment methodologies is essential to retain. We believe that understanding and retaining risk assessment alternatives will have considerably greater ramifications on the development of plant biotechnology than quantifications of risk for specific transgenic crop products. Many transgenic crops have the potential to reduce the use of harmful pesticides and there is a desperate need to improve environmental risk assessment so that the likelihood of realizing environmental benefits is increased and the likelihood of environmental harm is reduced. We are optimistic that the rapidly accumulating base of empirical knowledge, if the current imbalances are purposefully eliminated by new, targeted studies, will soon make this possible.
We thank our colleagues in the GMO-ERA and BiosafeTrain Projects for discussions and Editor A. Cameron and three anonymous reviewers for thoughtful comments on the manuscript.