Abstract

The authors surveyed existing standard codes for units of measures, such as ISO 2955, ANSI ×3.50, and Health Level 7′s ISO+. Because these standards specify only the character representation of units, the authors developed a semantic model for units based on dimensional analysis. Through this model, conversion between units and calculations with dimensioned quantities become as simple as calculating with numbers. All atomic symbols for prefixes and units are defined in one small table. Huge permutated conversion tables are not required. This method is also simple enough to be widely implementable in today's information systems. To promote the application of the method the authors provide an open-source implementation of this method in JAVA. All existing code standards for units, however, are incomplete for practical use and require substantial changes to correct their many ambiguities. The authors therefore developed a code for units that is much more complete and free from ambiguities.

A report of a quantitative measurement is not meaningful without its units. Saying an infant's weight is “five” begs the question “5 pounds” or “5 kilograms.” We violate this precept in many medical contexts, leaving out the units in chart notes about laboratory tests and blood pressure and failing to standardize them in computer systems.

Although some of us were early participants in developing standard units codes for Health Level 7 (HL7)1,(pp7-36ff) and ASTM 1238,2 a review of the units tables at our Wishard Hospital laboratory showed an embarrassing lack of standardization. Indeed, we found “G,” “GM,” and “GS” representing gram; “LITERS” and “L” for liter; “ML” and “CC” for milliliter; and “MOL,” “MOLE,” and “MOLES” for mole. Some of our unit names, such as “/VOL” or “+,” are not interpretable as units at all.

Within a single institution where most measurements tend to be reported consistently in the same units, users correctly infer the units when they are omitted. Nonetheless, such omissions could lead to mistakes when patients move between care facilities. For instance, creatine kinase can be reported in units of catalytic concentration and as mass concentrations with very different magnitudes that may be misinterpreted without units. Correctly labeled units have even greater importance in medication orders. Confusion over body mass, measurable in kilogram or pound, could have disastrous consequences for body-mass-based medication dosing. Confusing microgram with milligram in a 100-μg dose of thyroxin could kill.

We often have to convert units from one form to another, especially in the pharmacy where prescribing units (e.g., 15 mL) and patient instruction units (e.g., 1 tablespoon) may be different. We often have to calculate a derived quantity based on quite complex formulas, such as the cardiac output from oxygen intake, oxygen saturations, and hemoglobin concentration (Fick's principle). Automatic conversion and computer-assisted calculations become easier with the proper use of units since we do not need to memorize conversion factors. In addition, we can better trust the correctness of our calculations when the result comes up with the expected unit.

Why is this everyday need to convert units and to use units in complex calculations so weakly supported by existing computer systems? Computers require an unambiguous coding standard for units, of course. But computers also require a semantic model of units that lets them understand and calculate with the meaning of units. A useful semantic model of units is not commonly deployed, although the theory is available. Such a useful semantic model should represent the most important information within units and yet be simple enough to allow implementation in every computer system.

Unfortunately, most introductions to the matter of measurements, units, and dimensional analysis suggest an algebraic approach and symbolic processing. If deployment of the theory required symbolic algebra processors, as implied by the “ontologic” approach of Gruber and Olsen3,4 the necessary software development work would inhibit widespread implementation.

This article provides a representation of units expressible as vectors of numbers. Numbers are much easier for computers to process than symbols. With the theory provided here and with the sample implementations that we have made freely available, support for units can be easily incorporated into current clinical information systems. Users will be able to easily convert between units and do calculations with dimensioned quantities, and system administrators will no longer have to maintain huge unit conversion tables.

Syntax of the Units of Measure

Four standards exist for the notation of units: ISO 29555; ANSI X3.506; their extension by HL71 and ASTM 1238,2 called “ISO+”; and the European standard ENV 12435.7

ISO 29555 is a standard notation for units designed for the limited character set available on many computers. It focuses on the international system of units (called “SI,” from French Système International)8–11; however, the general construction rules of ISO 2955 could, in principle, be applied to other systems of units as well. It gives both a case-sensitive and a case-insensitive notation that does not require Greek letters or superscripts. It is therefore useful for communication among computers and is still easily readable.

ANSI X3.506 refers to ISO 2955 and adds to it a table of customary units (such as foot and pound) that are not covered by ISO but are used widely in the United States. Since a code for data communication should accommodate the real world, any code used for units should accommodate SI units as well as U.S. customary and other non-ISO units, such as Torricelli's unit of pressure (1 mm Hg). However, ANSI X3.50 has many ambiguities and is incomplete. For instance, it does not define a symbol for Fahrenheit degrees and does not distinguish between the avoirdupois and apothecaries' pound (the former weighing 21 percent more than the latter).

The HL7 ISO+ extensions add most of the units required in health care, including 1 mm Hg, 1 cm H2O, and 1°F missing from ISO. Unfortunately ISO+ inherits many of the ambiguities of ISO 2955 and ANSI X3.50, although we expect these deficiencies to be corrected in one of the next minor releases of HL7.

The European standard ENV 12435,7 which was released last year, declares ISO 2955 obsolete for the display and printing of units of measure. It claims that today's computer systems are capable of presenting all special characters properly. While this may be true for modern information systems with graphical user interfaces, many existing systems do not meet these requirements, especially laboratory automata that still come with simple dot-matrix printers. Furthermore, ENV 12435 applies mainly to printed reports and is silent on which codes to use for units in messages, which is our major concern.

Ambiguous Unit Symbols

In the metric system a simple unit consists of an optional prefix symbol and a terminal unit symbol, which we call the unit atom. Prefix and unit atom are written side by side, e.g., thousand gram is written as “1 kg.” The ENV 12435, ISO 2955, and ANSI X3.50 standards all follow this common practice. Because the prefix is not delimited from the unit atom, the computer must analyze a simple unit lexically, i.e., by finding a match among all possible combinations of prefixes and atoms. This approach is prone to ambiguities.

A given symbol, such as “PEV” in ISO, ANSI, and HL7, is lexically ambiguous because it can be generated with the prefix “PE” (peta) and the unit atom “V” (volt) and with the prefix “P” (pico) and the unit atom “EV” (electron volt).

The most important source of ambiguities is the absence, in existing standards, of distinction between metric units—those that are normally scaled in powers of ten—from nonmetric units. Customary units are nonmetric, but ISO units are not necessarily metric. Examples of nonmetric ISO units are day, hour, minute (of time and angle), and the various degrees. A new code system for units should classify the units as metric or nonmetric and forbid prefixes for nonmetric units.

We examined all three standards in order to find all their ambiguities and to derive a new, unambiguous code system. Because the set of prefixes and unit atoms is small, we can find the conflicts simply by examining all combinations exhaustively. We further classified all conflicts into categories that differ in severity and helped us address the more severe conflicts first.

  • Type I (simple atom clash). The same unit atom has two meanings, such as “a” for the year (from Latin annum) and for the are (=100 m2) in the case-sensitive ISO 2955. Such direct name conflicts are the most severe errors.

  • Type II (metric—metric). Two different valid prefix—unit combinations generate the same symbol, such as the “PEV” of ISO and ANSI described above. ANSI has re-introduced “PA” for pascal (1 Pa) and picoampere (1 pA), which ISO eliminated by renaming the pascal “PAL”. Type II conflicts are severe errors in those code systems.

  • Type III (nonmetric—nonmetric). Two nonmetric units, such as ANSI's nautic mile (“NMI”) and nanomile (“N-MI”), collide with each other because of the combination of a nonmetric unit with a prefix. These conflicts must be resolved by forbidding prefixes with nonmetric units.

  • Type IVa (metric—nonmetric). A metric unit and a nonmetric unit collide because of a prefix at the nonmetric unit. For instance, all existing standards, including ENV 12435, contain “cd” for candela and “c-d” for centi-day. Again, this conflict can be resolved if we forbid prefixes with the nonmetric unit day.

  • Type IVb (nonmetric—metric). A nonmetric unit atom (e.g., “FT” for foot in ANSI) collides with a metric prefix—atom combination (e.g., “F-T” for fempto-Tesla). There is no way to resolve these conflicts without changing the code.

  • Type V (nonmetric, other). A combination of a nonmetric atom with a prefix collides with a metric prefix—atom combination. Forbidding prefixes with nonmetric units can prevent those potential conflicts, too.

Algebraic Combinations of Units

More complex units can be derived from simple units through operators for multiplication (“.”) and division (“/”). In human writing and print, the multiplication operator is often left out. ISO 2955 uses the period (“.”) as the multiplication operator rather than the asterisk (“*”) used in most computer programming languages. So one must be aware that the multiplication operator can collide with the decimal point when unit terms are to contain numerals. ISO 2955 does not allow the multiplication operator to be omitted, for this would result in an even more complex lexical analysis and more ambiguities, as “PAL” for the pascal in case-insensitive ISO 2955 would now be indistinguishable from picoampere liter (1 pAL).

ISO 2955 and ANSI ×3.50 do not allow numeric factors in unit terms. They do allow the raising of units to positive and negative powers, signified by an integer number written directly after the unit symbol. Thus, a square meter (1 m2) is written as “M2” and the Newton (SI unit of force) is written as “N,” “KG/M/S2,” or “KG.M-1. S-2.” Fractional exponents, such as 1 m1/2, are very rare and of doubtful meaning. Roots of units should be eliminated through appropriate exponentiation.

The operators of multiplication and division have equal precedence. In human practice, however, there is a tendency to confuse the division operator with the fraction bar and assign to it a lower precedence.* However, in ISO 2955 the expression  

graphic
must be transcribed as either “a/b/c” or “a.c-1.b-1” but not as “a/b.c.” Parentheses to group and nest terms are not required or defined by ISO 2955.

HL7 and ASTM 1238, however, use parentheses in five different ways. Parentheses can be used to write “a/(b.c)” circumventing the normal operator precedence; to include numeric factors within unit terms, e.g., “ML/(8.H)” for “milliliter per 8 hours”; to write fractional exponents, such as “M(1/2)” for 1 m1/2; to modify the meaning of a unit, e.g., “MM (HG)” for 1 mm Hg; and to prevent nonstandard medical units and pseudo-units (e.g., “(PH)” for the pH value) from conflicting with standard units (e.g., “PH” for picohenry).

Computers could interpret the many different meanings of parentheses and digits differently by context; however, an extremely complex lexical analysis would be required. To avoid those complexities, our proposed code system allows parentheses only to override normal operator precedence. It uses square brackets “[]” for meaningful suffixes (e.g., “cm[H2O]” for 1 cm H2O) and to disambiguate special units (e.g., “[pH]”). Everything within a pair of square brackets is considered a verbatim part of the unit atom; digits or operators are not interpreted inside square brackets.

Given our proposed simplifications, the parsing of the algebraic unit terms is simple. The full Backus-Naur-Form grammar is shown in Figure 1. Calculating a result from a combination of numbers and operators can be done by a freshman in computer science: A simple result variable is updated for every number and operator read from the input string. We need a concise semantic representation of units that fits even the most complex term of units into one uniform variable, so that we can calculate with units as we usually calculate with numbers.

Figure 1

The grammar of unit expressions in the Backus-Naur-Form.

Figure 1

The grammar of unit expressions in the Backus-Naur-Form.

Semantics of the Units of Measure

The syntax of units described above allows us to build arbitrary unit expressions. These expressions are, however, mere strings of characters. A semantic approach to the units of measures must do more than verify that a given string of characters is a legal unit term. It should allow us to find equivalence between apparently different expressions, such as 1 N, 1 kg· m/s2, and 1 Pa·m2. In this section we derive concise implementable semantics of units from a theory of measurement and units. This semantic representation will allow us to find equivalences, convert between units, and calculate with units. We present the theory in steps going from simple to more complex cases.

We do not pretend to provide a brand new theory. It is based on prior work,13–18 although most of this prior work aims at mathematical correctness and generality rather than practical implementation. Only Thun13 suggests storing dimensional information on punched cards, but he does not deal with conversions between units or with the problems of units in biomedical sciences. Our purpose is to recast existing theory into a form that can be deployed easily by current medical information systems.

Measurement, Quantity, Unit

A measurement is a comparison of an unknown quantity with a standard object (e.g., a meter stick) or standard process (e.g., a clock), which is the unit u. The comparison is done according to a precept of measurement (e.g., compare with the meter stick or counting the ticks of the clock). The measurement is further based on a set of postulates about the nature of the observed objects or processes18 (e.g., the stick does not change its length and the clock always ticks at the same rate). We can express the comparison between object and unit through the simple equation 

(1)
graphic
where μ is the rational number that tells us “how many units” the quantity Q represents. Hence, any measurement is the product of a measured number μ and its unit u: Q = μ·u (e.g., distance D = 3.6·m.)

Commensurability, Conversion, and the Arbitrariness of Units

The same quantity Q can be expressed in different units u and u′ (e.g., meter and yard), where 

(2)
graphic
(e.g., D = 3.6 m = 3.937 yd). The units u and u′ that both measure the same kind of quantity are called commensurable. With two commensurable units one unit can be used to measure the magnitude of another unit (e.g., the meter stick can be used to measure the length of the yardstick). Thus, the following equation holds: 
(3)
graphic
where ν is the magnitude of the unit u′ as a quantity measured by the unit u (e.g., yd = 3600/3937·m). It follows that any fixed quantity can be used as a unit to measure all other quantities of the same kind; hence, the selection of one particular quantity for a unit is completely arbitrary.

From equations 2 and 3 we can derive a formula for unit conversion: 

(4)
graphic

Thus, the measurement value μ′ that expresses a quantity in unit u′ (e.g., yard) can be calculated from the value μ expressing the same quantity in unit u (e.g., meter) with the “conversion factor” 1/ν (e.g., 3937/3600).

Derived Measurements

A precept of measurement Q can demand that two or more other quantities Q1, Q2,…, Qn be measured at one object or process in order to combine these measurements to yield the derived quantity Q = f(Q1, Q2,…, Qn). For example, velocity V is measured through measuring the displacement D of a moving body in a certain period of time T and dividing the displacement by the time: V = D/T.

Any two quantities can be multiplied and divided with each other or with a scalar number, and any quantity can be raised to a power. In contrast, addition an d subtraction are defined only for commensurable quantities. Given the quantities Q1, Q2,…, Qn any quantity Q can be derived as 

(5)
graphic
(e.g., V = D1·T-1). Thus, the new derived unit u of Q is 
(6)
graphic
(e.g., v = m1·s-1). For the vast majority of quantities the exponents q1, q2,…, qn turn out to be integers between -4 and +4.13,19§

Systems of Units and Base Units

Because quantities can be derived from other quantities we can organize units into a system B consisting of a limited number of base units b1, b2,…, bn, from which all other units u are derived through 

(7)
graphic

Such a system, where no proportionality factors (of the form βibiui) are used, is said to be coherent.20 In a coherent system only one unit exists for each kind of quantity. Hence, every unit u of the system B can be mapped to a vector uB=(u1,u2,,un). Every component of the vector uB represents one base kind of quantity and gives the exponent of the respective base unit in the term. The simple vectors 

graphic
represent the base units themselves (see Tables 1 and 2 for examples). The set of base units can now be interpreted as the basis of an n-dimensional vector space, where every unit is represented by a linear combination of the base units.

Table 1

Proposed Base System, Tuned Toward Communication of Units and Computation with Units

Kind of Quantity Variable Unit Vectorbi 
Length s 1 m (1, 0, 0, 0, 0, 0, 0) 
Time t 1 s (0, 1, 0, 0, 0, 0, 0) 
Mass m 1 g (0, 0, 1, 0, 0, 0, 0) 
Charge Q 1 C (0, 0, 0, 1, 0, 0, 0) 
Temperature T 1 K (0, 0, 0, 0, 1, 0, 0) 
Luminous intensity Iv 1 cd (0, 0, 0, 0, 0, 1, 0) 
Angle ϕ 1 rad (0, 0, 0, 0, 0, 0, 1) 
Note: This sytem is compatible but not isomorphic with the SI. 
Kind of Quantity Variable Unit Vectorbi 
Length s 1 m (1, 0, 0, 0, 0, 0, 0) 
Time t 1 s (0, 1, 0, 0, 0, 0, 0) 
Mass m 1 g (0, 0, 1, 0, 0, 0, 0) 
Charge Q 1 C (0, 0, 0, 1, 0, 0, 0) 
Temperature T 1 K (0, 0, 0, 0, 1, 0, 0) 
Luminous intensity Iv 1 cd (0, 0, 0, 0, 0, 1, 0) 
Angle ϕ 1 rad (0, 0, 0, 0, 0, 0, 1) 
Note: This sytem is compatible but not isomorphic with the SI. 
Table 2

Some Derived Units and Their Internal Representation Within Our Base System

Kind of Quantity Definition* Unit u Factor ν Vectoru 
The unity (0, 0, 0, 0, 0, 0, 0) 
Area A = s1 · s2  1 m2 (2, 0, 0, 0, 0, 0, 0) 
Volume V = A · s liter 1 L 10-3 (3, 0, 0, 0, 0, 0, 0) 
Velocity v = s/t  1 m/s (1, -1, 0, 0, 0, 0, 0) 
Angular velocity ω = ϕ/t  1 rad/s (0, -1, 0, 0, 0, 0, 1) 
Volume current V.=V/t  1 L/min 6 × 10-2 (3, -1, 0, 0, 0, 0, 0) 
Acceleration a = v/t  1 m/s2 (1, -2, 0, 0, 0, 0, 0) 
Force F = m · a newton 1 N 103 (1, -2, 1, 0, 0, 0, 0) 
Work W = F · s joule 1 J 103 (2, -2, 1, 0, 0, 0, 0) 
Moment of force M = F · s  1 Nm 103 (2, -2, 1, 0, 0, 0, 0) 
Power P = W/t watt 1 W 103 (2, -3, 1, 0, 0, 0, 0) 
Electric current I = Q/t ampere 1 A (0, -1, 0, 1, 0, 0, 0) 
Electric potential U = WQ volt 1 V (2, -2, 1, -1, 0, 0, 0) 
Kind of Quantity Definition* Unit u Factor ν Vectoru 
The unity (0, 0, 0, 0, 0, 0, 0) 
Area A = s1 · s2  1 m2 (2, 0, 0, 0, 0, 0, 0) 
Volume V = A · s liter 1 L 10-3 (3, 0, 0, 0, 0, 0, 0) 
Velocity v = s/t  1 m/s (1, -1, 0, 0, 0, 0, 0) 
Angular velocity ω = ϕ/t  1 rad/s (0, -1, 0, 0, 0, 0, 1) 
Volume current V.=V/t  1 L/min 6 × 10-2 (3, -1, 0, 0, 0, 0, 0) 
Acceleration a = v/t  1 m/s2 (1, -2, 0, 0, 0, 0, 0) 
Force F = m · a newton 1 N 103 (1, -2, 1, 0, 0, 0, 0) 
Work W = F · s joule 1 J 103 (2, -2, 1, 0, 0, 0, 0) 
Moment of force M = F · s  1 Nm 103 (2, -2, 1, 0, 0, 0, 0) 
Power P = W/t watt 1 W 103 (2, -3, 1, 0, 0, 0, 0) 
Electric current I = Q/t ampere 1 A (0, -1, 0, 1, 0, 0, 0) 
Electric potential U = WQ volt 1 V (2, -2, 1, -1, 0, 0, 0) 
*

We give only “naive” definitions, i.e., we pay no attention to the fact that many quantities are vectors and that many are properly defined using differential quotients or integrals.

Our theory thus matches the term “dimension” as used in metrology with the dimensions of a space in a mathematical sense. The space of dimensions can be analyzed with the well-known concepts and methods of linear algebra. For instance, we know that the members of any basis must be linearly independent. We also know how to use matrices to carry out linear transformations between different base systems, and we can find systems to be isomorphic if there is a one-to-one transformation between two base systems of units.

Liberation from Coherence

The SI defines units according to equation 7. Thus, SI is a coherent system of units. Although SI units may be scaled through a prefix (milli-, centi-, kilo-, etc.), derived SI units are always based on units without prefixes. Requiring coherence is a burdensome constraint, however, because it generates units that do not fit the usual size of measured values. For example, the SI-coherent unit katal (1 kat = 1 mol/s) is seven orders of magnitude greater than biologic enzymatic rates.

Consequently, most laboratories still use 1 U = 1 μmol/min instead. For the same reason, the Comiteé International des Poids et Mesures permits the liter besides the cubic meter as the volume for expressing concentrations.9

We shall now extend the theory explained so far, to overcome the limitation to coherent systems. Given the solid notion of units defined in a coherent system (equation 7) and the notion of commensurability (equation 3), we can define a unit u′ as an ordered pair, 

(8)
graphic
where ν is the magnitude of u′ measured in the B-coherent unit u. Thus, we can permit the combination of arbitrarily scaled units to derived units, including all kinds of non-SI units, such as foot, pound, minute, even tablespoon (2 tbs = 1 oz fl) or drops (12 drp = 1 mL). For example, “drops per minute” or “tablespoon per day” can be employed instead of the corresponding SI-coherent unit “cubic meter per second.”

We consider two units u1 and u2 as commensurable if the equation 

(9)
graphic
holds. Multiplication, division, and exponentiation are defined as follows: 
(10)
graphic
 
(11)
graphic
 
(12)
graphic

Addition (and subtraction) of commensurable quantities can be defined as 

(13)
graphic
for u1=u2=u..

A measurement value μ in unit u, finally, is converted into a μ′ in a commensurable unit u′ through equations 4 and 11 reducing to 

(14)
graphic
because u=u must be true for conversions.

Transformation Between Unit Systems

The ordered pair of a scalar and a vector <ν, (u1, u2,…, un)> can be rewritten as the simple vector (log10ν, u1, u2,…, un). By regarding the number 10 as a base unit, our model allows transformations between different mutually incoherent base systems. For instance, we could define the transformation between the MKS system, using meter, kilogram, second, and ampere as its base units, and another system that uses centimeter, gram, second (CGS), and coulomb as: 

(15)
graphic

Because this transformation is one-to-one, we know that both systems are dimensionally isomorphic, i.e., that they can both describe the same physical phenomena with no fundamental difference. Likewise, there is an isomorphism between SI units and a system that uses the customary units foot for length and ounce for mass.#

A Base System for Practical Use

Like the SI system of units, our unit system is based on seven dimensions, with the units shown in Table 1. However, some of our base units differ from the SI base units, because we focus on the everyday need to communicate units and to calculate with units. Conversely, the SI system is concerned with metrology, i.e., with specifying devices to reliably reproduce units with high accuracy.

We chose the gram as the base unit for mass instead of the SI kilogram in order to avoid prefixes in base units. We need a meaningful unit of mass before we can modify it by any prefix.

We use charge rather than electric current as a base/kind of quantity for electromagnetic phenomena simply because electrons, and their elementary charge, are the first cause for electric phenomena including current. As explained above, with this change our system is still isomorphic with the SI.

Although SI adopted the substance amount in 1971 as a base kind of quantity,9 the mole is in fact only an arbitrary large number of particles.21(p39-10) The base unit 1 mol defined in SI can be simply expressed in terms of Avogadro's number, NA = 6.022137 × 1023. Because the mole is dimensionless in our unit system, our system is no longer isomorphic with SI. However, the only information that the mole conveys over Avogadro's number is that some substance was measured, but it does not tell what kind. Thus, our change does not result in important information loss.

The ISO 1000 standard9 defines the units radian and steradian of plain and solid angle as “supplementary units” that can be used both as base or derived units. However, in 1995 the 20th Conference Geéneérale des Poids et Mesures eliminated the class of supplementary units entirely and regards radian and steradian as dimensionless derived units. In the latter sense, the “radian” (1 rad) is defined on a circle as the angle that encloses an arc of length equal to the radius. Because angles measured this way are ratios of two lengths, the units cancel out. Consequently, the SI loses important information. For example, SI cannot distinguish angular velocity from rotational frequency, or the radian from the steradian by means of dimensional analysis. We therefore include the radian in our system as a distinct base unit. Hence, the steradian is now also a proper derived unit defined as 1 sr = 1 rad2.

We retained the luminous intensity only for reasons of compatibility with the SI. The SI now defines the candela as wavelength-dependent radiant intensities, according to the human eye's response to light of the respective wavelength.9,22 This is similar to the audiometric correction according to the A scale. However, it is unclear why photometry has its own place among SI base units, while other psychophysical measures, such as the intensity of perception for sound, heat, pressure, and vibration do not. Adopting any measure of human perception into base systems causes the base unit vectors to become dependent (e.g., luminous on radiant intensities) and thus disqualifies them as a basis of a vector space.

Examples of derived units in this proposed base system are shown in Table 2.

Units of Nonratio Scales

The temperature scales Fahrenheit, Celsius, and Kelvin provide examples of units that are clearly comparable and convertible, but not in the narrow sense of equations 9 and 14. The values for temperature expressed in two of those units are not directly proportional, particularly 0°C ≠ 0 K ≠ 0°F.

Celsius and Fahrenheit are interval scales,23 on which measurement values can be converted using a simple linear equation μ′ = μ·ν/ν′ + o with the intercept o adjusting the zero points of the scales. However, this simple equation is not suitable for logarithmic or exponential scales.

We now extend our theory to deal with nonratio scales very generally: We define each special unit, with a proper unit and a pair of real functions, f and f-1, used to convert measurement values between the proper unit and the special unit. Based on equation 14, we define the conversion from the proper unit u to the special unit u′ as  

(16)
graphic

The reverse conversion from u′ to u uses the inverse function f-1: 

(17)
graphic

Table 3 exemplifies how we define special units. This shows that interval scales are but a special case of this general nonproportional conversion. In medicine, logarithmic scales are especially important, such as the H+ ion concentration measured as “the pH value.” In our theory, pH can safely be regarded as a normal unit and conversions to, say, 1 nmol/L can be carried out easily.**

Table 3

Definition of Special Units on Nonratio Scales

u′ u ν f(x) f-1(x
1°C 1 K x - 273.15 x + 273.15 
1°F 1 K 5/9 x - 459.67 x + 459.67 
pH 1 mol/L -log10x 10-x 
1 Np ln x ex 
1 bel log10x 10x 
1 db(SPL) 1 Pa 2 × 10-5 20 · log10x 10x/20 
1 db(mV) 1 mV 20 · log10x 10x/20 
1 db(W) 1 W 10 · log10x 10x/10 
u′ u ν f(x) f-1(x
1°C 1 K x - 273.15 x + 273.15 
1°F 1 K 5/9 x - 459.67 x + 459.67 
pH 1 mol/L -log10x 10-x 
1 Np ln x ex 
1 bel log10x 10x 
1 db(SPL) 1 Pa 2 × 10-5 20 · log10x 10x/20 
1 db(mV) 1 mV 20 · log10x 10x/20 
1 db(W) 1 W 10 · log10x 10x/10 

Discussion

The semantics of units we present in this article resolves the symbolic expressions of units completely into numbers. It compresses such complex units as 1 dyn·s/cm5 into one real number and a vector of seven integers: <10.0, (-5, -1, 1, 0, 0, 0, 0)>. This representation of units remains simple, regardless of how complex a unit expression looks externally. Operations to calculate with units are well defined on these representations. Computer implementation is therefore easy. By this internal representation, the computer can easily tell that 1 dyn·s/cm5 is just the same as 0.75 mm Hg/(L/s) or 105 Pa·s·m-3.

Previously we have implemented this method in C++ as part of an HL7 communications project.24 To better demonstrate the propositions of this article, we prepared another implementation in JAVA. We describe the design of the JAVA software in an appendix to this article. We made all source code of the software freely available,†† because we hope that our method and its practical implementation will be picked up by medical information system designers, to make the semantics of measurements and units an integral part of their software.

Any method that reduces complexity into simple and uniform representations risks losing important information through this reduction. Our method is rooted in a general theory of measurement, which ensures that the concepts of physics are preserved. To fully account for the physics of units, we have to discuss the role of precepts and postulates that were part of our definition of a measurement. Units are, however, not just abstract concepts of physics but part of human language. Thus, in the remainder of this section, we look not only at what units are but also how they are used.

Precepts of Measurements

Work W and the moment of force M both have the unit 1 Nm. Thus, uW=uM=(2,2,1,0,0,0,0). Both kinds of quantities differ only in the precept of measurement and the set of postulates. Force F is a vector kind of quantity and the moment of force is the vector product M=F×s of a radius vector s with the force acting orthogonal to the radius. Conversely, the work W is defined as the scalar product of the displacement vector s of a moving body and the force Fs acting on the body parallel to its displacement.

These distinctions are not covered in the semantics of units given here. Although the problem is well known,13 it is usually not given much attention in dimensional analysis.‡‡ Extending the method to cover these distinctions would introduce overwhelming complexity, and we would lose the ability to reduce the semantics to simple vectors.

Dimensionless Quantities

Sometimes all units may cancel out in quantities derived through equation 5. For example, in a bodymass—dependent dosage for medications such as 2.5 mg/kg, the mass units cancel out, resulting in 2.5 × 10-6 or 2.5 × 10-4 percent. We frequently see notations such as “1 mg/kg (body weight)” to keep the “history” of the measurement from being canceled.

The most important of such cases are concentrations reported as “percentages.” Traditionally, we measure concentrations of fluids and gases as a fraction of volumes VB/V1 reported as 1 %vol. Concentrations of dissolved substances are traditionally measured as fractions of masses mB/m1 reported as 1 g% or just as 1%. Because water is the predominant dissolvent in biology, and 1 mL of water has the mass 1 g, 1 g% is regarded as equal to 10-2 g/mL or 1 g/dL. On this basis, the unit 1 mg% emerged, which was set equal to 10-2 mg/g ≡ 1 mg/dL.

The ISO, IUPAC, and CEN standards assert that those annotations on the percent sign (e.g., %vol and g%) are meaningless in principle and therefore deprecated. Thus, we will not extend our theory to give meaning to those annotations. Instead, concentrations in chemistry and biology should best be reported in 1 mol/L.7,8,25 Correct interpretation of percentages requires the knowledge of what has been measured. Failure to supply this information, such as on drug labels for lidocaine or adrenaline, leads to severe misinterpretations and causes over- or under-dosage. In a survey by Scrimshire,26 only 45 of 100 doctors knew how much 5 mL of 1% lidocaine is. This suggests that the traditional percentages may not be the most user-friendly measures for concentrations.

The EUCLIDES project27–29 assumes that any unit consists of a numerator and a denominator that are not reduced. Indeed, many clinically relevant units do have the form of such simple ratios. The EUCLIDES method, however, is challenged by units such as 1 mL/kg/h. The problem is how to canonically distribute the parts of such complex terms into just one numerator—denominator pair.

Dependence on the Analyte

One of the most frequent conversions in medicine is between units of substance concentration and mass concentration. These are incommensurable units in our theory, although they are clearly convertible. In medicine we would like those measurements to be convertible with little effort.

In practice these conversions can be done at a system's interface by checking for each observation the reported unit ur and the standard unit of the receiving system us for commensurability according to equation 9. If ur=us holds, the conversion is straightforward (equation 14). Otherwise, a material constant c=μc. uc, such as the molar mass, is looked up for the analyte. With this constant, either ur+uc=usorur+uc=us must be true, and the conversion is done by multiplication and division, respectively, of the reported quantity with the constant.

For example, hemoglobin is usually reported in mass concentrations such as cm = 15 g/dL. With the molar mass of hemoglobin MHb = 64.5 kg/mol the computer can calculate the equivalent substance concentration cn as 

graphic

If the computer had multiplied the mass concentration with the molar mass, the result would have had the wrong unit: 9.6 kg2L-1mol-1. Thus, through our semantics based on dimensional analysis the computer can find the correct conversion rule without any further knowledge of biochemistry.

Conclusion

Our method shares with dimensional analysis the ease with which calculations can be done with units. But it also shares the weaknesses of dimensional analysis in that the commensurability criterion (equation 9) is too wide and too narrow at the same time. The criterion is too wide because it neglects the various differences of precepts of measurements and thus falsely finds work and moment of force to be commensurable. The criterion is too narrow because it finds substance concentration and mass concentrations to be incommensurable, although by and large we can regard them as equivalent. Dimensional analysis is particularly weak regarding the many dimensionless quantities that are clearly incommensurable but dimensionally indistinguishable.

An essential part of a report of a measurement is the measurement name, which says what has been measured before giving a value and its unit.7,25 The measurement name and its connotations should reflect those distinctions, which dimensional analysis can not cover. The problem of naming measurements has been discussed elsewhere. The Logical Observation Names and Codes (LOINC) terminology is a useful code for naming measurements in medicine.30,31

Through the measurement name we can distinguish measurements with different kinds of quantities but the same dimension. LOINC, for instance, explicitly names the kind of quantity for every measurement name and thus distinguishes among mass concentration ratio (MCRTO), mass fraction (MFR), volume fraction, ratio (VFR and VRTO, respectively), and number fraction, all of which would be reported as percentages.

Because the measurement name mentions at least the analyte, we can look up properly dimensioned constants in order to find a conversion between incommensurable but otherwise equivalent kinds of quantities, such as mass concentration and substance concentration. The LOINC data base already contains a field for the molar mass, although this field is not yet fully populated.

In this article, however, units have been our main concern. Our theory of units is powerful yet easily implemented in computer systems. Only one compact table defines all symbols for prefixes and simple units without prefix (unit atoms). Because each unit atom occurs only once at the left side of the definitions, the table is small and requires little maintenance. Customary units and special biomedical units, even those with interval scales or logarithmic scales, can be handled as well as SI units. The existing standards ISO 2955, ANSI ×3.50, and HL7′s ISO+ are not adequate for encoding units because of their many lexical ambiguities and their incompleteness for practical purposes.

One sound standard code for units is needed in order to support automated communication, seamless conversion, and intelligent processing of dimensioned quantities. Because we found all lexical ambiguities through exhaustive search, we could use exhaustive search to remove all existing ambiguities from current unit codes and ensure that our more complete code did not introduce new lexical ambiguities. We will soon suggest our revised code system for units to be used with HL7.§§

Appendix

Implementation

The class diagram in Figure A outlines the data structures and their methods. Each major concept of our theory is embodied in one class. In C++, operator overloading allows us to integrate the algebraic operations naturally into the programming language. Since JAVA lacks operator overloading we have to use regular methods named “add,” “sub,” “mul,” and “div” instead. The following classes are defined:

Figure A

Class diagram for an implementation of units: A Measurement has one Unit and a Unit has one Dimension. A UnitAtom is a special Unit that is defined in a table. The database of UnitAtoms and Prefixes are used by the static method Unit : parse to create a new Unit-object from a String representation of the unit. A Unit may have a FunctionPair that is used to convert the nonratio unit to and from its proper unit.

Figure A

Class diagram for an implementation of units: A Measurement has one Unit and a Unit has one Dimension. A UnitAtom is a special Unit that is defined in a table. The database of UnitAtoms and Prefixes are used by the static method Unit : parse to create a new Unit-object from a String representation of the unit. A Unit may have a FunctionPair that is used to convert the nonratio unit to and from its proper unit.

class Dimension: implements the vector u of exponents and its operators for addition “+,” subtraction “-,” and multiplication with a scalar “*”.

class Unit: embodies the unit <v,u>. It has a parser that generates the internal representation from the character string expression of a unit. It also provides all the operators to calculate with units, multiplication “*,” division “/,” and raising to a power “pow (int)”.

class Measurement: implements a quantity Q = <μ, u>. It provides the full set of operators including addition “+” and subtraction “-” of equally dimensioned quantities. It also has a method “convert to (Unit u prime)” to convert a measurement value μ for another commensurable unit u′.

class UnitAtom: is a specialization of class Unit. Every instance of this class is stored in a static table or database. Definition and retrieval of unit atoms are performed by static member functions. This class is rarely used directly by the programmer but is essential for the unit parser.

class Prefix: implements the table of multiplier prefixes. This class is normally used only by the unit parser to resolve prefix names to their values.

class FunctionPair: is used only for special nonratio units, to convert measurement values to and from the corresponding proper unit.

After the defined prefixes and unit atoms are read from a table at the startup of a program that uses the units API, the parser can then read the strings of characters formatted according to the grammer and translate them into the internal format in which both simple and complex units are represented as pairs u=<v,u>. Thus it need not build an abstract syntax tree but simply scans through the string, linearly reading unit atoms and accumulating them into one result unit variable according to the operators found.

We have not implemented a general builder that constructs a unit string from the internal representation. A simple builder would just print out a product of powers of base units. This may result in terms that are dimensionally correct but hard for humans to understand. For example, “kPa/s” (kilo-pascal per second, to measure the pressure increment in the left ventricle) would come up as “kg.m-1.s-3”. We would rather keep the external representations as strings in all objects of class Unit. In calculations, we update this string accordingly (e.g., “kPa” divided by “s” is “kPa/s”.) Because the string is never reduced, after heavy calculations it may become long and even less readable for humans (e.g., “kPa/s” times “L” times “min” is “kPa/s.L.min,” where it could have been reduced to 60 “J”.)

References

1
Health Level Seven, version 2.3
 .
Ann Arbor, Mich.
:
Health Level Seven
,
1997
.
2
ASTM E31-11, ASTM 1238: Standard Specification for Transfering Clinical Laboratory Data Messages Between Independent Computer Systems
 .
West Conshohocken, Pa.
:
American Society for Testing and Materials
,
Sep
1991
.
3
Gruber
TR
Olsen
GR
.
An ontology for engineering mathematics
. In:
Doyle
J
Sandewall
E
Torasso
P
(eds).
Principles of Knowledge Representation and Reasoning. Proceedings of the Fourth International KR '94 Conference; Bonn, Germany; May 24–27, 1994
 .
San Francisco, Calif.
:
Morgan Kaufmann
,
1994
. Also available at: http://ksl-web.stanford.edu/knowledge-sharing/papers/engmath.html.
4
Gruber
TR
.
A translation approach to portable ontology specifications
.
Knowledge Acquisition
 .
1993
;
5
(
2
):
199
220
. Available at: http://ksl-web.stanford.edu/knowledge-sharing/papers/#ontolingua-intro.
5
ISO 2955: Information Processing: Representation of SI and Other Units in Systems with Limited Character Sets
 .
Geneva, Switzerland
:
International Organization for Standardization
,
1983
.
6
ANSI×3.50: Representation for United States Customary, SI and Other Units to Be Used in Systems with Limited Character Sets
 .
New York
:
American National Standards Institute
,
1986
.
7
ENV 12435: Medical Informatics—Expression of the Results of Measurement in Health Sciences
 .
Brussels, Belgium
:
Comiteé Europeéen de Normalisation
,
May
11
,
1998
.
8
ISO 31: Quantities and Units, part 0: General Principles
 .
Geneva, Switzerland
:
International Organization for Standardization
,
1992
.
9
ISO 1000: SI Units and Recommendations for the Use of Their Multiples and of Certain Other Units
 .
Geneva, Switzerland
:
International Organization for Standardization
,
1981
.
10
Taylor
BN
.
The International System of Units (SI)
 .
Gaithersburg, Md.
:
National Institute of Standards and Technology (NIST)
,
1991
. Also available at: http://physics.nist.gov/Document/sp330.pdf.
11
Taylor
BN
.
Guide for the Use of the International System of Units (SI)
 .
Gaithersburg, Md.
:
National Institute of Standards and Technology
,
1995
. Also available at: http://physics.nist.gov/Document/sp811.pdf.
12
Harken
AH
Moore
EE
.
Abernathy's Surgical Secrets
 .
3rd ed
.
Philadelphia, Pa.
:
Hanley & Belfus
,
1996
.
13
Thun
RE
.
On dimensional analysis
.
IBM J Res. Dev
 
Jul
1960
;
4
:
349
56
.
14
Drobot
S
.
On the foundation of dimensional analysis
.
Studia Mathematica
 
1954
;
14
:
84
99
.
15
Kurth
R
.
A note on dimensional analysis
.
Am Math Monthly
 
1965
;
72
(
9
):
965
9
.
16
Whitney
H
.
The mathematics of physical quantities, part I: mathematical models for measurement
.
Am Math Monthly
 
1968
;
75
:
115
38
.
17
Hart
G
.
The theory of dimensioned matrices
. In:
Proceedings of the 5th Society for Industrial and Applied Mathematics; Jun 15–18, 1994; Snowbird, Utah
 .
Philadelphia, Pa.
:
SIAM
,
1994
:
186
90
.
18
Fasching
G
.
Die empirisch-wissenschaftliche Sicht
 .
Vienna, Austria
:
Springer-Verlag
,
1989
.
19
Krantz
DH
Luce
RD
Supper
P
Tversky
A
.
Foundations of measurement, Vol. 1: Additive and polinomial representations
 .
New York
:
Academic Press
;
1971
.
20
ISO
.
International Vocabulary of Basic and General Terms in Metrology
 .
Geneva, Switzerland
:
International Organization for Standardization
,
1993
.
21
Feynman
RP
Leighton
RB
Sands
M
.
The Feynman Lectures on Physics
 .
Reading, Mass.
:
Addison-Wesley
,
1963
.
22
Ryer
AD
.
Light Measurement Handbook
 .
Newburyport, Mass.
:
International Light
,
1997
. Available at: http://www.intl-light.com/handbook/.
23
Stevens
SS
.
Measurement and Man
.
Science
 .
Feb
21
1958
;
127
(
3295
):
383
9
.
24
Schadow
G
Föhring
U
Tolxdorff
T
.
Implementing HL7: from the standard's specification to production applications
.
Methods Inform Med
 
1998
;
37
(
1
):
119
23
.
25
Rigg
JC
Brown
SS
Dybkaer
R
Olesen
H
.
Compendium of Terminology and Nomenclature of Properties in Clinical Laboratory Science
 .
Oxford, England
:
IUPAC/Blackwell Science
,
1995
.
26
Scrimshire
JA
.
Safe use of lignocaine
.
Br Med J
 
1989
;
298
:
1494
.
27
De Moor
GJ
.
Towards a standard for electronic data interchange in laboratory medicine [doctoral dissertation]
 .
Ghent, Belgium
:
University of Ghent
,
1994
.
28
EUCLIDES/OpenLabs coding scheme, English version 4.0
 .
Zaventern, Belgium
:
EUCLIDES Foundation International
,
Aug
16
,
1994
.
29
EUCLIDES conversion utility (E.C.U.) user manual
 .
Zaventern, Belgium
:
EUCLIDES Foundation International
,
Oct
1991
.
30
Forrey
AW
McDonald
CJ
DeMoor
G
et al
.
Logical observation identifier names and codes (LOINC) database: a public use set of codes and names for electronic reporting of clinical laboratory test results
.
Clin Chem
 
1996
;
42
(
1
):
81
90
.
31
Regenstrief Institute for Health Care. LOINC: Logical Observation Identifier Names and Codes
 .
Indianapolis
:
The Institute
,
1995
. Also available at: http://www.mcis.duke.edu/standards/termcode/loinc.htm.
*
For instance, Abernathy's Surgical Secrets12(p30) gives the formula  
graphic
for the calculation of mixed venous oxygen saturation. Of course, it is assumed here that the cardiac output (CO) and the hemoglobin concentration (Hb) belong in the denominator as well.
Although the “venous” counterpart of “MM (HG)” is inconsistently written as “CM H2O.”
Conventional notation uses [Q] for the unit and {Q} for the measurement value of the quantity Q. Those brackets and curly braces are distracting, however, and do not contribute any essential information.
§
Although, in the unit 1 dyn·s·cm-5 for vascular resistance we encounter an exponent -5.
For linear independence, α1b1+α2b2++αnbn may be zero only if α12,…,αn are all zero.
It must be noted that the sum of two quantities is meaningful by itself only in so called “extensive” measures. The sum of conjoint derived measures, such as two densities (ρ = m/V), is by itself meaningless. However, mathematically it is useful to define the sum in general; otherwise one would lose the distributive property of physical quantities that allows one to write m = V·ρ1 + V·ρ2 = V·(ρ1 + ρ2).
#
In our model nothing favors one particular base system over another set of isomorphic systems. Thus, SI should be used not because it is inherently better in the sense of this theory, but because it is an international standard. However, our method applies as well to any other system of units.
**
At the same time this illustrates the virtue of defining the amount of substance as a dimensionless kind of quantity, representing a number of particles: After conversion of, say, pH 9 to an H+ ion concentration of 1 nmol/L we can directly proceed to the number of H+ ions per volume: 602.204/pL. Indeed, pH, nmol/L, and 1/pL all measure the same kind of quantity, which is obscured by the conventional notion of a dimensionless pH and of the mole as a unit on its own dimension.
‡‡
Available at: http://aurora.rg.iupui.edu/units. There is also a JAVA applet that demonstrates the conversion facility.
‡‡
Krantz et. al19(p474ff) give a solution for the vector problem by splitting length into three base dimensions, one for each axis of the Euclidean space. This solves the vector problem completely, since length is the only vector base kind of quantity. However, because there is no fixed coordinate system of space, mapping conventional units onto such a system is not one-to-one.
§§
The Unified Code for Units of Measures is available at: http://aurora.rg.iupui.edu/UCUM.
This work was supported in part by grant HS08750 from the Agency for Health Care Policy and Research and by contract N01-LM-63546 from the National Library of Medicine.

Comments

0 Comments