Abstract

Background

Occupational colour vision testing is a requirement in a number of transport industries, and there are a number of tests that are considered acceptable by the various industry regulatory bodies.

Aims

To review the occupational colour vision tests currently in use nationally and internationally and determine whether they give consistent results.

Methods

A systematic review of the evidence was carried out according to standard methods. The Ovid Medline database was searched from 1946 to March 2013 using a broad and inclusive strategy.

Results

A total of 8951 citations were identified, from which 20 papers were selected for data analysis. Of these papers, 13 of 20 assessed test sensitivity and specificity, and 11 papers measured the number, type and severity of colour vision deficiency of subjects passing the tests. Three studies also measured test repeatability. The quality of studies included was generally good. Sensitivity and specificity ranged from 64% to 100% and 88% to 100%, respectively. The studies evaluating the newer screen-based tests reported the highest sensitivity and specificity. The marked variability reported between tests and within tests can be attributed to many factors including test protocol, sample selection, test distance and time for dark adaptation.

Conclusions

There was low consistency between the colour vision tests examined. Lantern tests cannot be used to identify type or severity of colour vision deficit and, when used as a screening test for ‘colour safe’ status, give variable results. These results highlight the need for standardization across the transport industries.

Introduction

It has long been recognized that people with colour vision deficiency (CVD) have difficulty in recognizing the colours of signal lights, and that this can present safety risks in a number of industries [ 1 ]. CVD can be broadly classified into monochromacy, dichromacy and anomalous trichromacy. Monochromacy normally results in poor visual acuity (VA) and does not present an occupational problem. Dichromacy and anomalous trichromacy, which do not normally affect VA, can be further categorized into protan, deutan and tritan subsets (see Table 1 ).

Table 1.

Description of CVD types

Classification  Prevalence a Mechanism Characteristics 
Monochromacy 
 Typical monochromacy Rare No normal photopigments. Colour blind. Colours distinguished by brightness differences only. Very insensitive to red light. Nystagmus. Low visual acuity 6/36 to 6/60. 
 Blue cone monochromacy Rare S (blue) cone pigment only. No L (red) or M (green) cone pigment. Colour blind. Colours distinguished by brightness differences only. Insensitive to red light. Nystagmus. Low visual acuity 6/12 to 6/24. 
 Atypical monochromacy Very rare Mechanism unknown. Colour blind. Colours distinguished by brightness differences only. Insensitive to red light. No nystagmus. Normal visual acuity. 
Dichromacy 
 Protanopia 1% of men and 0.01% of women Absence of L (red) cone pigment. Very reduced ability to identify colours. Confuse red, yellow and green, white and green, black and red. Reduced sensitivity to red light. 
 Deuteranopia 1% of men and 0.01% of women Absence of M (green) cone pigment. Very reduced ability to identify colours. Confuse red, yellow and green, and white and green. 
 Tritanopia 1 in 13000, both men and women Absence of S (blue) cone pigment. Very reduced ability to identify colours. Confuse blue, blue–green and green, and white and yellow. 
Anomalous trichromacy 
 Protanomaly 1% of men and 0.03% of women L (red) cone pigment maximum absorption shifted to shorter wavelengths of light. May confuse white with green and confuse reds, yellows and greens but loss of colour discrimination varies greatly between individuals. Reduced sensitivity to red light. Make abnormal colour matches, for example, will add excess red in the colour match R+G=Y. 
 Deuteranomaly 5% of men and 0.03% of women M (green) cone pigment absorption shifted to longer wavelengths of light. May confuse white with green and confuse reds, yellows and greens but loss of colour discrimination varies greatly between individuals. Make abnormal colour matches, for example, will add excess green in the colour match R+G=Y. 
 Tritanomaly Rare Partial loss of S cone pigment. Loss of colour discrimination for blues, blue–greens, and greens. 
Classification  Prevalence a Mechanism Characteristics 
Monochromacy 
 Typical monochromacy Rare No normal photopigments. Colour blind. Colours distinguished by brightness differences only. Very insensitive to red light. Nystagmus. Low visual acuity 6/36 to 6/60. 
 Blue cone monochromacy Rare S (blue) cone pigment only. No L (red) or M (green) cone pigment. Colour blind. Colours distinguished by brightness differences only. Insensitive to red light. Nystagmus. Low visual acuity 6/12 to 6/24. 
 Atypical monochromacy Very rare Mechanism unknown. Colour blind. Colours distinguished by brightness differences only. Insensitive to red light. No nystagmus. Normal visual acuity. 
Dichromacy 
 Protanopia 1% of men and 0.01% of women Absence of L (red) cone pigment. Very reduced ability to identify colours. Confuse red, yellow and green, white and green, black and red. Reduced sensitivity to red light. 
 Deuteranopia 1% of men and 0.01% of women Absence of M (green) cone pigment. Very reduced ability to identify colours. Confuse red, yellow and green, and white and green. 
 Tritanopia 1 in 13000, both men and women Absence of S (blue) cone pigment. Very reduced ability to identify colours. Confuse blue, blue–green and green, and white and yellow. 
Anomalous trichromacy 
 Protanomaly 1% of men and 0.03% of women L (red) cone pigment maximum absorption shifted to shorter wavelengths of light. May confuse white with green and confuse reds, yellows and greens but loss of colour discrimination varies greatly between individuals. Reduced sensitivity to red light. Make abnormal colour matches, for example, will add excess red in the colour match R+G=Y. 
 Deuteranomaly 5% of men and 0.03% of women M (green) cone pigment absorption shifted to longer wavelengths of light. May confuse white with green and confuse reds, yellows and greens but loss of colour discrimination varies greatly between individuals. Make abnormal colour matches, for example, will add excess green in the colour match R+G=Y. 
 Tritanomaly Rare Partial loss of S cone pigment. Loss of colour discrimination for blues, blue–greens, and greens. 

a The prevalences quoted are for Caucasians. Prevalences generally lower for other races.

From: International Recommendations for Colour Vision Requirements for Transport. CIE 143–2001.

Although there is broad international agreement on the pattern of colour vision (CV) deficiencies, there is considerable variation in the way in which the assessment criteria are implemented, with a variety of different tests being accepted by different regulatory bodies [ 2 , 3 ]. Most testing regimes recommend the use of a simple screening test, such as the Ishihara pseudo-isochromatic plate (PIP) test, which detects deficiencies but misclassifies some of those with normal CV, supplemented where required by more specialized secondary tests. These secondary tests are often lantern tests, and the particular types approved for use vary for different colour critical tasks and from country to country. Lantern tests have a long history as occupational CV tests and are accepted by many regulatory bodies. There are eight different lantern tests currently in use: the Holmes–Wright (HW) type A and B lanterns, the Farnsworth Lantern (FALANT), the Optec 900 and the Aviation Lights Test (ALT) (both updated variants of the FALANT), the Fletcher Clinical Aviation and Maritime (FCAM), the Beyne and the Spectrolux lanterns. All usually present pairs of lights each of which may be red, green or white, and require the subject to name the colours. They vary widely in the level of difficulty they present and many are no longer made. These lantern tests do not test for blue–yellow (tritan) defects, cannot be used to identify the type of CVD or quantify severity, and there is limited evidence on their diagnostic accuracy. They are slowly being superseded by computerized colour vision tests such as the Colour Assessment and Diagnosis (CAD) test and the Cone Contrast Test (CCT). While these tests offer greater utility than lantern tests, data on their performance are limited predominantly to studies conducted by the test developers.

This wide range of different secondary tests has resulted in a lack of standardization across industries, whereby different national regulatory bodies for the same type of industry administer different tests with different pass criteria. The tests themselves can have very different physical and colorimetric characteristics which produce inconsistent results [ 4 ]. Some may display actual signal colours, whereas others use a selection of less occupationally relevant isochromatic colours [ 5 ]. An example of how these characteristics may produce different results when testing to the same standard can be seen by comparing the Joint Aviation Authorities (JAA) approved tests. The HW(B) is the most stringent; it has a point brilliance close to the chromatic threshold for reliable recognition of coloured signal lights and therefore fails some subjects with normal colour discrimination but higher than average chromatic thresholds [ 6 ]. The HW(A), Beyne and Spectrolux lanterns will fail protanopes, while passing some anomalous trichromats (protan and deutan). The Beyne and Spectrolux lanterns differ from the HW(A), however, in that they will also fail significant numbers of CVN subjects. It is evident from this that a candidate for a JAA licence may pass or fail depending on the type of test attempted.

Lantern tests may also demonstrate poor test reliability as they employ colour naming as the examination method, giving red–green colour deficient test subjects a 50% chance of correctly guessing the colour of the light presented to them. They also allow subjects to use other cues such as brightness [ 7 ]. This is particularly the case with protans who see reds as darker than other colours and so tend to call dark colours red.

The PC-based tests differ from the lanterns in that they present coloured stimuli on a screen, do not employ colour naming, test for both red–green and yellow–blue CVD and can characterize type and severity. The CAD test threshold can be adjusted to allow pass/fail limits as required for particular occupations (CV task analyses performed for the UK Civil Aviation Authority and Transport for London have resulted in different CAD thresholds being used, reflecting the different CV tasks required of pilots and tube train drivers). The CCT presents a random series of coloured letters visible to a single cone type in decreasing steps of cone contrast to determine the threshold for letter recognition, and can calculate type and severity of CVD. It is observed, however, that sensitivity reduces with more severe CVD as the test is set to be isoluminant for the standard normal observer (as defined by the International Commission on Illumination), and therefore luminance contrast increases for observers with spectral sensitivities that deviate from this average (J. Barbur, personal communication; 2013).

This review presents data on the variability in methods and outcomes of secondary CV tests, specifically lantern tests and computer (PC)-based CV assessment, and critically appraises the published data evaluating these tests. The review is purposefully restricted to the test types that are accepted by a number of regulatory bodies and have subjectively good face validity, insomuch as they appear to mimic the occupational tasks, and to some of the newer computer-based alternatives. Other established CV tests including the Farnsworth D15, City University test, Medmont C100 and spectral anomaloscope have been excluded. The Farnsworth D15 test was originally designed for vocational guidance to select recruits with adequate discrimination for work in the electronics industry with the aim of separating those with moderate and severe CVD from those with minimal CVD and normal CV. This test is used by the Australian rail industry as a secondary test as it is known to pass mild to moderate anomalous trichromats, although comparisons with lantern tests suggest that this categorization is not predictive of ability to identify signal lights. The City University test is based on the D15 panel and is also designed for identifying subjects with significant CVD; this test will pass 20% of protans and 50% of deutans and is therefore not useful as a secondary test to determine ‘colour safe’ status. The Medmont C100 can be used to differentiate protans from deutans, but is less able to separate these groups from normal trichromats. The spectral anomaloscope has also been excluded from this review as, although it accurately classifies type and severity of CVD, it requires a highly skilled examiner and is not appropriate for routine occupational medical assessments.

Methods

Studies on adult subjects undergoing the occupational CV tests that are used to supplement screening with iso chromatic plate tests were included, with analysis restricted to lantern tests and PC-based tests of CV. Outcomes of interest were the identification of congenital CVD, sensitivity, specificity and validity of the test. As the study is secondary research, no ethical approval was sought. Studies were identified by searching the Ovid Medline electronic database (1946–present), searching research websites, scanning reference lists and consultation with experts in occupational CV testing. Studies were limited to those written in English. Search terms are at Appendix 1 (available as Supplementary data at Occupational Medicine Online).

The primary outcome measures were sensitivity and specificity for identifying CVD. It is acknowledged that different studies used varying methods to determine normal colour vision and there was variation between papers in what was accepted as normal, however, for the purposes of the systematic review, the criteria were taken as defined by each study. Given the heterogeneity of the included studies, no meta-analysis was done. Analysis of data was restricted to qualitative methods.

Results

The search of Medline combined with supplementary approaches provided a total of 8973 citations, with no duplicates. Of these, 8839 were discarded as on reading the titles it was clear that they did not meet the inclusion criteria. The abstracts of the remaining 134 studies were screened and 107 were rejected as not meeting the inclusion criteria. A total of 27 studies were selected for detailed review of the full text. Of these, one did not meet the inclusion criteria as described, two were discarded because full text of the studies was not available and four were excluded as they did not identify data for individual CV test performances. A total of 20 studies were identified for inclusion. See flow diagram at Appendix 2 (available as Supplementary data at Occupational Medicine Online).

All 20 papers finally selected for the review were comparison studies published in English between 1983 and 2011. The included studies involved a total of 5424 participants with sample sizes ranging from 24 to 1571 (mean 271, median 138), and an age range where specified between 8 and 71 years. Gender was specified in nine studies, with a total of 2209 male and 160 female participants identified. Subjects were recruited from various sources including CV clinic attendees (six studies) [ 7–12 ], volunteers recruited by advertisement (three studies) [ 13–15 ], applicants for pilot training (two studies) [ 4 , 16 ], rail employees (one study) [ 17 ], the US Navy (one study) [ 18 ] and not specified (seven studies) [ 6 , 19–24 ]. Trials were conducted in the UK (seven studies), Australia (five studies), the USA (five studies), Canada (two studies) and Hong Kong (one study). The CV tests investigated were the HW lantern (types A and B), Beyne, Spectrolux and Optec 900 lantern, FALANT, ALT, FCAM, CAD and CCT.

All participants were pre-screened and diagnosed colour vision normal (CVN) or CVD by one or more of a battery of clinical CV tests. The number and types of clinical tests included varied between studies (see Table S1 , available as Supplementary data at Occupational Medicine Online). Exclusion criteria were predominantly based upon VA levels, ranging from corrected binocular VA no worse than 6/9 to VA 6/7.5 in best eye. Other exclusion criteria included ocular pathology and use of tinted lenses.

Outcomes of interest were sensitivity and specificity, and the number, types and severity of CV deficient subjects that each test passed as ‘colour safe’. A total of 13 of the 20 papers assessed sensitivity and specificity of the tests, and 11 measured the number, type and severity of CVD subjects passing the tests (see Table 2 ). Three studies also measured repeatability of the HW(A). The risk of bias was assessed using a standard approach.

Table 2.

Summary of test methods investigated

Test Sensitivity % Specificity % CVD classification Pass % Summary of studies 
HW(A) 83, 86, 86, 88.5, 89, 92, 96, 97, 100 (90.8) 96, 100, 100, 100, 100, 100 (99.3) Dichromat  0–2 (0.40), 0–20 (4.4) a  Two studies compared HW(A) with established CV test battery:
 one using manufacturer testing and scoring procedure
 one testing in photopic conditions only.
Two studies compared HW(A) with other secondary CV tests, using manufacturer testing and scoring procedure.
One study compared HW(A) with simulated signal light test, using manufacturer testing and scoring procedure.
Two studies compared pass/fail rates using different testing criteria:
 one comparing results with seven different scoring criteria
 one comparing the manufacturer testing criteria with that recommended by CIE.  
   Trichromat  8–76.9 (27.7, 18.5, 28.9, 19.7) a 
HW(B) 98 92 Dichromat One study compared HW(B) with two other lantern tests, using the Australian Department of Transport testing criteria. 
Trichromat 
Beyne 85 50 Dichromat One study compared Beyne with two other lantern types and an anomaloscope using the JAR testing criteria. 
Trichromat 22 
Spectrolux 95 88 Dichromat One study compared Spectrolux with two other lantern types and an anomaloscope using the JAR testing criteria. 
Trichromat 
Optec 900 81, 87 (84.0) 100 Dichromat  Two studies compared Optec 900 with the FALANT:
 one using the standard Farnsworth testing criteria
 one using an amended Farnsworth testing criteria (test administered in a group setting).  
Trichromat 19 
FALANT 66, 69, 76, 81, 83, 85, 86, 88 (79.3) 99, 100, 100 (99.7) Dichromat 0–6 (1.8)  Two studies compared FALANT with established CV test battery, both using the standard Farnsworth testing criteria.
Four studies compared FALANT with other lanterns:
 three using the standard Farnsworth testing criteria
 one using an amended Farnsworth testing criteria (test administered in a group setting).
One study compared FALANT with a simulated signal light test, using the standard Farnsworth testing criteria.
One study compared FALANT with a trade test of colour vision, an amended Farnsworth testing criteria (fail criteria taken as a single error on any run).  
Trichromat 25–45 (35.2) 
ALT 85, 87 (86.0) 100 Dichromat  One study compared ALT with established CV test battery, using an amended Farnsworth testing criteria (fail criteria taken as two or more errors on three runs, test conducted in mesopic conditions).
One study compared ALT with a signal light simulator, using a scoring criteria of 0–1 errors over three runs.  
Trichromat 18 
FCAM 100 100 – – One study compared FCAM with established CV test battery, using unspecified test criteria. 
CAD 64, 70 (based on RG CAD threshold limits for aviation), 93.3 (web-based CAD) (75.8) 100 – –  One study evaluated the web- based CAD, comparing with established CV test battery, using the manufacturer’s testing criteria.
One study compared CAD with established CV test battery, a lantern test, and a signal light simulator, using experimentally derived pass/fail limits.  
CCT 98.8–100 (99.3) 100 – –  One study compared CCT with spectral anomaloscope, using the manufacturer’s testing criteria.
One paper does not specify what CCT is compared with, and provides no information on testing criteria.  
Test Sensitivity % Specificity % CVD classification Pass % Summary of studies 
HW(A) 83, 86, 86, 88.5, 89, 92, 96, 97, 100 (90.8) 96, 100, 100, 100, 100, 100 (99.3) Dichromat  0–2 (0.40), 0–20 (4.4) a  Two studies compared HW(A) with established CV test battery:
 one using manufacturer testing and scoring procedure
 one testing in photopic conditions only.
Two studies compared HW(A) with other secondary CV tests, using manufacturer testing and scoring procedure.
One study compared HW(A) with simulated signal light test, using manufacturer testing and scoring procedure.
Two studies compared pass/fail rates using different testing criteria:
 one comparing results with seven different scoring criteria
 one comparing the manufacturer testing criteria with that recommended by CIE.  
   Trichromat  8–76.9 (27.7, 18.5, 28.9, 19.7) a 
HW(B) 98 92 Dichromat One study compared HW(B) with two other lantern tests, using the Australian Department of Transport testing criteria. 
Trichromat 
Beyne 85 50 Dichromat One study compared Beyne with two other lantern types and an anomaloscope using the JAR testing criteria. 
Trichromat 22 
Spectrolux 95 88 Dichromat One study compared Spectrolux with two other lantern types and an anomaloscope using the JAR testing criteria. 
Trichromat 
Optec 900 81, 87 (84.0) 100 Dichromat  Two studies compared Optec 900 with the FALANT:
 one using the standard Farnsworth testing criteria
 one using an amended Farnsworth testing criteria (test administered in a group setting).  
Trichromat 19 
FALANT 66, 69, 76, 81, 83, 85, 86, 88 (79.3) 99, 100, 100 (99.7) Dichromat 0–6 (1.8)  Two studies compared FALANT with established CV test battery, both using the standard Farnsworth testing criteria.
Four studies compared FALANT with other lanterns:
 three using the standard Farnsworth testing criteria
 one using an amended Farnsworth testing criteria (test administered in a group setting).
One study compared FALANT with a simulated signal light test, using the standard Farnsworth testing criteria.
One study compared FALANT with a trade test of colour vision, an amended Farnsworth testing criteria (fail criteria taken as a single error on any run).  
Trichromat 25–45 (35.2) 
ALT 85, 87 (86.0) 100 Dichromat  One study compared ALT with established CV test battery, using an amended Farnsworth testing criteria (fail criteria taken as two or more errors on three runs, test conducted in mesopic conditions).
One study compared ALT with a signal light simulator, using a scoring criteria of 0–1 errors over three runs.  
Trichromat 18 
FCAM 100 100 – – One study compared FCAM with established CV test battery, using unspecified test criteria. 
CAD 64, 70 (based on RG CAD threshold limits for aviation), 93.3 (web-based CAD) (75.8) 100 – –  One study evaluated the web- based CAD, comparing with established CV test battery, using the manufacturer’s testing criteria.
One study compared CAD with established CV test battery, a lantern test, and a signal light simulator, using experimentally derived pass/fail limits.  
CCT 98.8–100 (99.3) 100 – –  One study compared CCT with spectral anomaloscope, using the manufacturer’s testing criteria.
One paper does not specify what CCT is compared with, and provides no information on testing criteria.  

a Depending on scoring/pass criteria

A total of 19 of the 20 studies had well-defined questions and outcome measures, and adequately identified and justified methodology. One paper provided some detail on test methodology but did not specify test protocol or scoring criteria [ 23 ]. All of the papers were identifiable as observational studies, and all were equivalence trials, comparing the test in question against one or more of a battery of CV tests. Criteria for clear identification of population included age range/mean age, gender mix, population from which subjects were recruited and classification of CVD types in sample (i.e. protanope, deuteranope, protanomalous and deuteranomalous). Nine papers included all of these data, eight papers did not specify gender, six had no information on age, five did not identify the population recruited from and two did not specify CVD types. Determination of whether the sample was reflective of the general population was made on ratio of CVN to (red–green) CVD subjects, ratio of CVD types within the sample and age range/mean age of samples. A total of three of the 20 papers had samples reflective of the general population. Twelve studies were not considered to be reflective (three due to CVN:CVD ratio and nine due to the ratio of CVD types within the sample—although two studies corrected their results to account for this). In five studies, there was insufficient data to make a decision. In one study, the gender mix was appropriate (approximate ratio of 1:1 male to female overall, with approximate ratio of 20:1, M:F CVD), in nine studies males were over-represented, and in 10 studies it was unknown due to lack of data. All 20 studies used valid and reliable data collection techniques and 17 studies presented results in an appropriate and clear manner. In one paper, it was unclear which subjects had passed and which had failed and one paper presented all results on a single graph with no detail on performance of each CVD type on the test. Seventeen papers provided comprehensive discussions and conclusions. One paper included a two-paragraph discussion/conclusion which did not indicate how the test differed from other CV tests, or how it could be used in practice. Another paper provided a suitable conclusion but no discussion, and one paper presented results with no discussion or conclusion. Seventeen studies had results that were generalizable; of the three that were not, one had a sample with dichromats accounting for 46% of the CVD population, which therefore overestimated test sensitivity, and two papers did not have results clearly presented.

In summary, the studies generally described their methodology well. Most had a low risk of bias in defining the question and outcome measures, in identifying and justifying the methodology, and in data collection. No studies explicitly identified the study design, although in all but one paper this was clearly described. Few papers recruited samples reflective of the general population, and only one paper identified an appropriate gender mix [ 21 ]. One paper was particularly prone to bias, having no well-defined question or outcome measures, and no clearly identified study design or sample demographics [ 23 ]. The results were presented in a non-systematic manner, with no clear indication of the pass/fail rates for the test. Thirteen studies measured test performance with CVN and CVD subjects, allowing sensitivity and specificity to be calculated. Eleven studies provided data on pass rates for each CVD classification.

Of the lantern tests reviewed, the HW(B) was the most difficult to pass, failing a proportion of CVN subjects.

Discussion

The results of this study demonstrate that there are marked differences between these CV tests ( Figure S1 , available as Supplementary data at Occupational Medicine Online). It is apparent that all the tests reviewed are used in a variety of ways by different testing authorities. Results are inconsistent and depend upon the test and testing criteria used, and cannot be relied upon to separate colour safe from non-colour safe subjects.

This is the first study to report on a systematic review of studies assessing the different secondary CV test types. The major strengths are that it used broad and inclusive criteria to encompass all published papers on colour vision testing since 1946, and that it was designed against the PRISMA checklist (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to ensure transparent and complete reporting. The limitations are that the abstracts were screened by only one author (K.B.), and also that the literature search may have omitted a number of studies assessing test per formance that are unpublished but have been undertaken for internal organizational purposes.

The tests reviewed are used interchangeably despite having very different characteristics (see Table 3 ). Some lanterns employ lights that comply with maritime and aviation colour limits, replicating real-life signal light values. Some display selected isochromatic colours well above the chromatic threshold and provide no inform ation on the ability to detect distant signals near the chromatic threshold.

Table 3.

Summary of test characteristics

Test Characteristics Passes all CVN (red/green) Fails all protans Passes mild deuteranomals Employs colour naming Tests for tritan CVD 
HW(A) Uses signal light chromaticities No Yes Yes Yes No 
HW(B) Uses signal light chromaticities No Yes Yes Yes No 
Beyne Uses signal light chromaticities No Yes Yes Yes No 
Spectrolux Uses signal light chromaticities No Yes Yes Yes No 
Optec 900 Uses isochromatic colours Yes No Yes Yes No 
FALANT Uses isochromatic colours No No Yes Yes No 
ALT Uses signal light chromaticities Yes Yes Yes Yes No 
FCAM Uses signal light chromaticities Data unavailable Data unavailable Data unavailable Yes No 
CAD Assesses chromatic discrimination thresholds against the ‘standard normal observer’ Yes Yes Yes No Yes 
CCT Uses cone-specific contrast sensitivity Yes Yes No No Yes 
Test Characteristics Passes all CVN (red/green) Fails all protans Passes mild deuteranomals Employs colour naming Tests for tritan CVD 
HW(A) Uses signal light chromaticities No Yes Yes Yes No 
HW(B) Uses signal light chromaticities No Yes Yes Yes No 
Beyne Uses signal light chromaticities No Yes Yes Yes No 
Spectrolux Uses signal light chromaticities No Yes Yes Yes No 
Optec 900 Uses isochromatic colours Yes No Yes Yes No 
FALANT Uses isochromatic colours No No Yes Yes No 
ALT Uses signal light chromaticities Yes Yes Yes Yes No 
FCAM Uses signal light chromaticities Data unavailable Data unavailable Data unavailable Yes No 
CAD Assesses chromatic discrimination thresholds against the ‘standard normal observer’ Yes Yes Yes No Yes 
CCT Uses cone-specific contrast sensitivity Yes Yes No No Yes 

There are also a wide variety of testing and scoring criteria used with lantern tests, and it can be seen that the number of runs per test and number of errors accepted for a pass will clearly alter test results [ 10 , 13 , 19 ]. Other factors such as test distance, time given for response, presence of other colours within field of view, ambient illumination, time for dark adaptation, uncorrected refractive error and night myopia, prior test experience and equipment calibration may also affect these results. It is not known how great an effect differences in these variables may have, and is an area for future research.

Test–retest reliability is also an area where there is limited data, possibly because most subjects are only required to take the test once in their careers. Reliability is, however, an essential attribute for a valid test and, of the three studies included in this review that assessed test reliability, reliability ranged from 60% to 98% [ 6 , 15 , 21 ].

This study has assessed a number of long-established occupational CV tests, as well as some of the newer methods for diagnosing and quantifying CVD. Comparison of these tests demonstrates marked inconsistency and variability of results, both between tests, and within variations of the same test. Modern transport environments continue to demand accurate recognition of signal lights, albeit with the increased redundancy of these connotative signals afforded by radar, satellite navigation and other hazard warning alerts. Denotative colour codes are increasingly used in complex visual display screens, and there is a need to maintain CV standards to ensure that this technology continues to enhance performance and accuracy in demanding operational environments. A major limitation of both lantern tests and Ishihara PIP is that they cannot be used to accurately quantify type or severity of CVD. Many regulatory bodies must therefore resort to a requirement for either normal trichromacy or ‘colour safe’ status, with no evidence base to indicate that their recommended test methods are valid or reliably determine that safety-critical colour-related tasks can be performed without any unjustified exclusion from employment. Some of the new screen-based tests do have the ability to quantify the level and nature of impairment. However, they may not have the same face validity as the lantern tests that mimic coloured signal lights. They do, however, have the potential to be used to replace these, and with a far greater potential for equitable decision taking in relation to task demands.

Where CV standards are deemed necessary, they must be defined in such a way so as not to unfairly discriminate against those with CVD who can in fact safely recognize the colours used. In order to do this, the CV tasks for the job must be identified and quantified, and standards set accordingly. A validated test must then be applied to ensure that these standards are met. In the UK, the organizations that have done this are not only able to pass more CVD applicants as fit for employment, but also have a robust justification for failing those applicants who cannot meet their standards [ 24 , 27 ]. The consensus from a recent expert workshop on maritime CV testing has recognized that this approach is the only valid long-term solution [ 28 ]. It is likely that the UK military, surface rail and other relevant transport regulatory bodies will need to adopt this approach as their current CV testing regimens become increasingly obsolete. In the meantime, organizations using secondary CV tests for safety-critical employment decisions must be cognizant of the differing characteristics of these tests and consider whether the test used is appropriate.

Key points
  • Previous studies have compared lantern tests. This is the first time a systematic review of all test outcomes has been done.

  • Lantern tests are not interchangeable because their characteristics differ and they do not classify the nature and severity of deficient colour perception consistently.

  • Colour vision standards should be task based, and assessed with validated tests, to ensure that selection is both safe and equitable.

Conflicts of interest

None declared.

References

1.
Health and Safety Executive
.
Colour Vision Examination: A Guide for Employers. Information Sheet WEB03 HSE 2005
. www.hse.gov.uk/pubns/WEB03.pdf (20 August 2015, date last accessed).
2.
Birch
J
.
Performance of colour-deficient people on the Holmes-Wright lantern (type A): consistency of occupational colour vision standards in aviation
.
Ophthalmic Physiol Opt
 
2008
;
28
:
253
258
.
3.
International Commission on Illumination
.
Colour Vision Standards for Transport. [143–2001]
  .
Vienna, Austria
:
International Commission on Illumination
,
2000
.
4.
Squire
TJ
Rodriguez-Carmona
M
Evans
AD
Barbur
JL
.
Color vision tests for aviation: comparison of the anomaloscope and three lantern types
.
Aviat Space Environ Med
 
2005
;
76
:
421
429
.
5.
Birch
J
Dain
SJ
.
Performance of red-green color deficient subjects on the Farnsworth Lantern (FALANT)
.
Aviat Space Environ Med
 
1999
;
70
:
62
67
.
6.
Vingrys
AJ
Cole
BL
.
Validation of the Holmes-Wright Lanterns for testing colour vision
.
Ophthalmic Physiol Opt
 
1983
;
3
:
137
152
.
7.
Cole
BL
Maddocks
JD
.
Can clinical colour vision tests be used to predict the results of the Farnsworth lantern test?
Vision Res
 
1998
;
38
:
3483
3485
.
8.
Cole
BL
Lian
KY
Lakkis
C
.
Color vision assessment: fail rates of two versions of the Farnsworth lantern test
.
Aviat Space Environ Med
 
2006
;
77
:
624
630
.
9.
Cole
BL
Maddocks
JD
.
Color vision testing by Farnsworth lantern and ability to identify approach-path signal colors
.
Aviat Space Environ Med
 
2008
;
79
:
585
590
.
10.
Birch
J
Dain
SJ
.
Performance of red-green color deficient subjects on the Farnsworth Lantern (FALANT)
.
Aviat Space Environ Med
 
1999
;
70
:
62
67
.
11.
Siu
AW
Yap
MK
.
The performance of color deficient individuals on airfield color tasks
.
Aviat Space Environ Med
 
2003
;
74
:
546
550
.
12.
Birch
J
.
Performance of colour-deficient people on the Holmes-Wright lantern (type A): consistency of occupational colour vision standards in aviation
.
Ophthalmic Physiol Opt
 
2008
;
28
:
253
258
.
13.
Birch
J
.
Performance of red-green color deficient subjects on the Holmes-Wright lantern (Type A) in photopic viewing
.
Aviat Space Environ Med
 
1999
;
70
:
897
901
.
14.
Hovis
JK
Oliphant
D
.
Validity of the Holmes-Wright lantern as a color vision test for the rail industry
.
Vision Res
 
1998
;
38
:
3487
3491
.
15.
Mertens
HW
Milburn
NJ
Collins
WE
.
Practical color vision tests for air traffic control applicants: en route center and terminal facilities
.
Aviat Space Environ Med
 
2000
;
71
:
1210
1217
.
16.
Rabin
J
Gooch
J
Ivan
D
.
Rapid quantification of color vision: the cone contrast test
.
Invest Ophthalmol Vis Sci
 
2011
;
52
:
816
820
.
17.
Casolin
A
Katalinic
PL
Yuen
GY
Dain
SJ
.
The RailCorp Lantern test
.
Occup Med (Lond)
 
2011
;
61
:
171
177
.
18.
Laxar
KV
Wagner
SL
Cotton
TC
.
Evaluation of the Stereo Optical Co. Farnsworth Lantern (FALANT) Color Perception Test: A Specification and Performance Comparison with the Original FALANT
  .
Groton, CT
:
Naval Submarine Medical Research Laboratory
,
1998
.
19.
Hovis
JK
.
Repeatability of the Holmes-Wright type A lantern color vision test
.
Aviat Space Environ Med
 
2008
;
79
:
1028
1033
.
20.
Seshadri
J
Christensen
J
Lakshminarayanan
V
Bassi
CJ
.
Evaluation of the new web-based ‘Colour Assessment and Diagnosis’ test
.
Optom Vis Sci
 
2005
;
82
:
882
885
.
21.
Rabin
J
Gooch
J
Ivan
D
Harvey
R
Aaron
M
.
Beyond 20/20: new clinical methods to quantify vision perform ance
.
Mil Med
 
2011
;
176
:
324
326
.
22.
Fletcher
R
.
The Fletcher CAM lantern colour vision test
.
Optom Today Jul
 
2005
;
29
:
24
26
.
23.
Birch
J
Roden
M
.
Colour vision deficiencies XI
. In: Drum B, ed.
The Clinical Use of the Holmes-Wright Lantern
  .
Dordrecht, the Netherlands
:
Springer
,
1993
;
97
103
.
24.
Barbur
J
Rodriguez-Carmona
M
Evans
S
Milburn
N.
Minimum Color Vision Requirements for Professional Flight Crew, Part 3: Recommendations for New Color Vision Standards. DTIC Document
  ,
Washington, DC
:
Federal Aviation Administration
,
2009
.
25.
Birch
J.
Diagnosis of Defective Colour Vision
  .
2
nd edn.
Boston
:
Butterworth-Heinemann
,
2001
.
26.
Vingrys
AJ
Cole
BL
.
Origins of colour vision standards within the transport industry
.
Ophthalmic Physiol Opt
 
1986
;
6
:
369
375
.
27.
Internal Report Commissioned by TfL
.
Minimum Colour Vision Requirements for London Underground Train Operators
  .
London
:
Transport for London
,
2008
.
28.
International Maritime Health Association
.
Test Methods for Color Vision in Seafarers with Navigational look-out duties . Expert Workshop, Kobe, 20–21 January 2014
. http://beta.imha.net/images/2014-01_ST-KOBE_Colour_Vision_Workshop_report_final.pdf (6 July 2015, date last accessed).