VAS achieves high response rates and high levels of completion. VAS methods tend to be less expensive to administer than TTO or SG methods due to their relative simplicity and ease of completeness. There is also a significant amount of empirical evidence to demonstrate the reliability of VAS methods in terms of inter-rater reliability and test–retest reliability. However, the lack of choice and direct nature of the VAS tasks have given rise to concerns over the ability of this technique to reflect preferences on an interval scale.
There is also a concern that VAS methods are susceptible to response spreading, whereby respondents use all areas on the valuation scale when responding, especially where multiple health states are valued on the same scale. Response spreading can lead to health states that are very much alike, being placed at some distance from one another on a valuation scale, and health states that are essentially vastly different being placed very close to one another, as the respondent seeks to place responses across the whole (or a specific portion) of the available scale. If response spreading does occur, then this implies that VAS techniques do not generate an interval scale and the numbers obtained may not be meaningful in cardinal terms.
More generally, VAS is prone to context effects in which the average rating for items is influenced by the level of other items being valued and by endpoint bias whereby health states at the top and bottom of the scale are placed further apart on the scale than would be suggested by a direct comparison of differences.
In summary, VAS techniques appear to measure aspects of health status changes rather than the satisfaction or benefit conveyed by such changes. Qualitative evidence of respondents seeing VAS methods as an expression of numbers in terms of ‘percentages of the best imaginable state,’ or a ‘percentage of functioning scale’ rather than eliciting information about their preferences for health states provides support for this hypothesis. There is a large body of evidence to suggest that unadjusted VAS scores do not provide a valid measure of the strength of preference that can be used in economic evaluation.
Given the evidence that VAS may not produce health state utilities that can be used directly in the calculation of QALYs, there has been interest in mapping VAS values to SG or TTO utility values. This has the advantage of retaining the ease of use of VAS with the theoretical advantages of a choice-based measure of health. However, the extent to which a stable mapping function can be found between VAS and SG or TTO has been disputed (Stevens et al., 2006).
Standard Gamble
Many SG studies, across different respondent groups, have reported completion rates in excess of 80%, with some studies reporting completion rates as high as 95–100%, indicating that the SG appears to be acceptable in terms of its practicality. The SG has also been found to be feasible and acceptable among varied types of patient groups and clinical areas including cancer, transplantation, vascular surgery, and spinal problems.
SG is rooted in expected utility theory (EUT). EUT has been the dominant theory of decision making under uncertainty for over half a century. EUT theory postulates that individuals choose between prospects (such as different ways of managing a medical condition) in such a way as to maximize their ‘expected’ utility. According to this theory, for a given prospect such as having a surgical operation, a utility value is estimated for each possible outcome, good or bad. These values are multiplied by their probability of occurring and the result summed to calculate the expected utility of the prospect. This procedure is undertaken for each prospect being considered. The key assumption made by EUT over and above conventional consumer theory is independence, which means that the value of a given outcome is independent of how it was arrived at or its context. In decision tree analysis this is the equivalent of saying that the value of one branch of the tree is unaffected by the other branches.
Due to its theoretical basis, the SG is often portrayed as the classical method of decision making under uncertainty, and due to the uncertain nature of medical decision making the SG is often classified as the gold standard. As medical decisions usually involve uncertainty the use of the SG method would seem to have great appeal. However, the type of uncertain prospect embodied in the SG may bear little resemblance to the uncertainties in various medical decisions, so this feature may be less relevant than others have suggested.
The status of SG as the gold standard has been criticized given the existence of ample evidence that the axioms of EUT are violated in practice. One response in health economics (as elsewhere) has been that EUT should be seen as a normative rather than a descriptive theory, that is, it suggests how decisions should be made under condition of uncertainty. However, this still does not alter the concern that the values generated by SG do not necessarily represent people’s valuation of a given health state, but incorporate other factors, such as risk attitude, gambling affects, and loss aversion.
Time Trade-Off
The TTO technique is a practical, reliable, and acceptable method of health state valuation as evidenced by the wide variety of empirical studies that have applied this method (Brazier et al., 2007). The TTO has been mainly interviewer-administered although it has also been used in a self-administered and computer-based applications.
The applicability of the TTO in medical decision making may be questioned because the technique asks respondents to make a choice between two certain outcomes, when health care is characterized by conditions of uncertainty. It is potentially possible to adjust TTO values to incorporate individuals’ attitudes to risk and uncertainty, though this is rarely done. Furthermore, adjusting for risk attitude is difficult when there are strong theoretical and empirical grounds for arguing there is not a constant attitude to risk.
An underlying assumption of the TTO method is that individuals are prepared to trade off a constant proportion of their remaining life years to improve their health status, irrespective of the number of years that remain. This is a very strong assumption and it seems reasonable to expect that the valuation of a health state may be influenced by a duration effect relating to the time an individual spends in that state. There may be a ‘maximal endurable time’ for some severe health states beyond which they yield negative utility. Furthermore, for short survival periods, individuals may not be willing to trade survival time (measured in life years) for an improvement in quality of life, implying that individuals’ preferences are lexicographic for short time durations. If individuals do not trade off a constant proportion of their remaining life expectancy in the valuation of health states, then values elicited using specific time durations (e.g., 10 years) cannot be assumed to hold for states lasting for different time periods.
The impact of ‘time preference’ on valuations is another issue that causes theoretical concerns with the TTO. If individuals have a positive rate of time preference they will give greater value to years of life in the near future than to those in the distant future. Alternatively respondents may prefer to experience an episode of ill health immediately to eliminate ‘dread’ and move on. For instance, this hypothesis may explain why some women with a family history of breast cancer opt for mastectomy before any breast cancer is detected. In practice, the majority of individuals exhibit positive time preferences for health, although empirically the validity of the traditional (constant) discounting model in health has been challenged in favor of a model that allows for decreasing time aversion (implying that the longer the period of delay for the onset of ill health, the lower the discount rate). TTO values are rarely corrected for time preference.
Which Valuation Technique Should Be Used?
Health economists have tended to favor the choice-based scaling methods of SG and TTO in the context of cost per QALY analysis and a choice-based method is also recommended by NICE. Each of the SG and TTO methods starts with the premise that health is an important argument in an individual’s utility function. The welfare change associated with a change in health status can then be determined by the compensating change required in one of the remaining arguments in the individual’s utility function that leaves overall utility unchanged. In the SG, the compensating change is valued in terms of the risk of immediate death. In the TTO, the compensating change is valued in terms of the amount of life expectancy an individual is prepared to sacrifice.
SG has the most rigorous foundation in theory in the form of EUT theory of decision making under uncertainty. However, there are theoretical arguments against the use of SG in health state valuation and there is little empirical support for EUT. There are also concerns about the empirical basis of the TTO technique. There is also concern that duration effects and time preference effects can have an impact on the elicitation of TTO values.
In summary, there are theoretical concerns with all three valuation techniques. We argue that unadjusted VAS values do not provide a valid basis for estimating preferences over health states and satisfactory adjustments remain elusive. For trade-off-based valuations from an individual perspective, the current choice is between SG and TTO, but for the reasons outlined above the values they generate are distorted by factors apart from preferences over health states, and currently there is no compelling basis on which to select one or the other. This is one reason why researchers in the field have begun to examine the potential role of ordinal techniques such as ranking and discrete choice experiments (DCEs) in health state valuation.
The Use Of Ordinal Techniques
Ordinal methods simply ask respondents to say whether they prefer one state to another but not by how much. Two well-known ordinal methods are ranking and pairwise comparisons. Typically, with ranking, respondents are asked to rank a set of health states from best to worst. The pairwise comparison limits the comparison to two states and has been used widely in DCEs in health economics, though not usually to value health per se. To use them to value health states it is necessary to include full health and death in the comparisons or to introduce valuations for these states from other sources (e.g., Ratcliffe et al., 2006).
Until recently the use of ordinal data in health state valuation has largely been ignored. Ranking exercises have traditionally been included in health state valuation studies as a warm-up procedure prior to the main cardinal method to familiarize the respondent with the set of health states to be valued and with the task of preference elicitation between health states. Often these data may not be used at all in data analysis, or they may be used to check consistency between the ordinal ranking of health states and the ranking of health states according to their actual values obtained using a standard elicitation technique (e.g., TTO or SG). Thurstone’s law of comparative judgment offers a potential theoretical basis for deriving cardinal values from rank preference data. Thurstone’s method considers the proportion of times that one health state (A) is considered worse than another health state (B). The preferences over the health states represent a latent cardinal utility function and the likelihood of health state A being ranked above health state B when health state B is actually preferred to health state A is a function of how close to each other the states lie on this latent utility function.
Salomon (2003) used conditional logistic regression to model rank data from the UK measurement and valuation of health (MVH) valuation of the EQ-5D. He was able to estimate a model equivalent to the original TTO model by rescaling the worst state using the observed TTO value. Other methods of rescaling were also considered, including normalization to produce a utility of 0 for death, but these were found not to provide the best-fitting predictions.
DCEs have their theoretical basis in random utility theory. Although DCEs have become a very popular tool for eliciting preferences in health care, the vast majority of published studies using DCE methodology have tended to focus on the possibility that individuals derive benefit from nonhealth outcomes and process attributes in addition to health outcomes. A limited number of studies have used DCEs to estimate values for different health state profiles and few have linked these values to the full-health dead scale required for the calculation of QALYs. Ratcliffe et al., (2006) used an external valuation of the worst state of health defined by the classification (i.e. PITS state by TTO to recalibrate the results of a DCE onto the conventional 0 to 1 scale. Brazier et al. (2007) have finally used DCE data on their own by setting death to 0 and including death in some of the comparisons. The use of DCE – and to a lesser extent ranking data for this purpose – is at an early stage of development, however, it offers promise as an alternative to cardinal methods.