Assessing the Eﬀectiveness of Anti-smoking Media Campaigns by Recall and Rating Scores — A Pattern-Mixture GEE Model Approach

: Anti-smoking media campaign is an eﬀective tobacco control strategy. How to identify what types of advertising messages are eﬀective is important for maximizing the use of limited funding sources for such campaigns. In this paper, we propose a statistical modeling approach for systematically assessing the eﬀectiveness of anti-smoking media campaigns based on ad recall rates and rating scores. This research is motivated by the need for evaluating youth responses to the Massachusetts Tobacco Control Program (MTCP) media campaign. Pattern-mixture GEE models are proposed to evaluate the impact of viewer and ads characteristics on ad recall rates and rating scores controlling for missing values, confounding and correlations in the data. A key diﬃculty for pattern-mixture modeling is that there were too many distinct missing data patterns which cause convergence problem for modeling ﬁtting based on limited data. A heuristic argument based on collapsing missing data patterns is used to test the missing completely at random (MCAR) assumption in pattern-mixture GEE models. The proposed modeling approach and the recall-rating study design provide a complete system for identifying the most eﬀective type of advertising messages.


Anti-smoking media campaigns
Anti-smoking mass media campaign has been shown to significantly reduce the progression to regular smoking among both adults and adolescents, Flynn et. al (1992Flynn et. al ( , 1994Flynn et. al ( , 1995)), Hu et. al (1995), Popham et. al (1995), Siegel et. al (1998Siegel et. al ( , 2000) ) and Worden et. al (1996).However, development and evaluation of antismoking media campaign are difficult due to multiple reasons, such as insufficient public or private funding, existence of multiple intervention programs and lack of evaluation methodology for design and analysis Worden et. al (1999).Many previous studies emphasize the behavioral changing effect of media campaigns on smoking initiation or cessation.Alternatively, another type of evaluation studies focus on the advertising effectiveness by assessing the relationship between ad characteristics and responses from the target audience.This type of studies addresses research questions such as "What type of ads are more likely to be remembered and are perceived to be more effectiveness?"These information are valuable for anti-smoking advertising agencies who are under the constraint of limited budget.There are relatively fewer studies of the second type in the literature.Goldman and Glantz (1998) studied 118 anti-tobacco advertisements and grouped them into thematic categories.They found that the messages on secondhand smoking and industry manipulation are effective in reaching all audiences compared to youth access to tobacco, short-term effects (such as yellow-teeth), long term health effects and romantic rejection.Pechmann and Goldberg (1998) used a message-based topology of large number of anti-smoking ads for youth to classify them into three types: fear appeals, peer norms and tobacco marketing manipulation.Comparison of the reaction from one set of a particular type of ads and a set of placebo ads (non-smoking related) shows that three subtypes ("smoking endangers the family unit", "young smokers have taken the wrong path" and "most kids don't smoke") resulted in lower intentions to smoke.Biener, Keeler and Nyman (2000), Biener (2000) and Biener, Ming, Gilpin, and Alber (2004) presented data from in the Massachusetts Tobacco Control Program (MTCP) on the adult and youth response to eight different TV ads which were preclassified into three categories: sad, normative and funny.Their analysis concluded that the sad ads which evoke strong negative emotions had the highest recall rates and rating scores.This paper was motivated by the MTCP data analysis for developing a general statistical modeling approach for assessing anti-smoking media campaigns using recall and rating scores.

The MTCP program
The evaluation of the Massachusetts Tobacco Control Program (MTCP) began in 1993 with a monthly random-digit-dialed telephone survey containing questions on exposure to the anti-smoking mass media campaign and the recall of its tag line ("It's time we made smoking history").In 1997, the MTCP conducted a longitudinal follow-up survey of the teens interviewed at the 1993 baseline survey to evaluate reactions to the mass media campaign.Altogether there were 618 teens reinterviewed again in 1997.Below we describe the study design and measures used.The statistical methods are deferred to the METHODS section.
Respondents to the survey were asked if they could recall eight anti-smoking Normative television ads, and for ads recalled, to rate each ad on a scale from 0 (the ad was not effective at all) to 10 (the ad was very effective).This design was based on the consideration of respondents' affective response to advertising which has been studied in marketing and consumer behavior research.The aim of the MTCP study was to assess two important dimensions of the impact of antismoking media campaigns: penetration and audience receptivity.These two dimensions reflect how successful the ads were in reaching the target audience and how effective the ads were in people's perceptions.Quantitatively, these dimensions were measured by the recall rate and the rating score for each ad queried in the survey.These two measures may be affected by the characteristics of both the ads and the viewers.Viewer characteristics included demographic variables, TV viewing habits, household education level, ownership of tobacco promotional items and other factors.There were eight ads in the survey: Camel, Model, Janet Sackman, Happy Birthday, Pam, Cowboy, Lung and Monica.The characteristics of the eight ads were first screened and rated by an independent panel of judges (104 young people) and finally categorized into 3 groups: (1) sad/frightening ads (2) normative ads, and (3) funny ads.Table 1 contains the description and the classification of the 8 ads surveyed in the MTCP.For more background information about this study, please refer to Biener (2000), which also contains an analysis of the youth's responses to the eight TV ads.The main finding in Biener (2000) was that sad ads, which provoke strong negative emotions, were remembered the most and rated the highest in effectiveness.

Organization of the paper
In this paper, we present a statistical methodology for assessing the effectiveness of anti-smoking media campaigns based on the MTCP study.The structure of this paper is as follows.In the METHOD Section, we discuss three intrinsic statistical issues related to the MTCP data and the rationale for using GEE pattern-mixture models for the analysis.In the DATA ANALYSIS Section, we discuss the implementation of our method and present analysis results and interpretation.In the Conclusion and Discussion Section, we discuss both the strengths and the limitations of our analysis and point out some possible directions for future research efforts.

Statistical issues in data analysis
In this section, we discuss several challenging issues intrinsic to the analysis of the MTCP data.First, there were significant amount of missing values in the rating scores since not many people had seen or remembered seeing all eight ads.Second, the rating scores are cognitive measurements from the same person and therefore likely to be correlated.Third, both viewer and ad characteristics may confound ad recall and rating.Below, we discuss these three issues in details and outline ways to handle them.
Missing Data Let y ij be the rating score by the i-th subject for the j-th ad, then the raw rating scores from all the 618 teens on 8 ads in the study form a 618 × 8 data matrix Y = (y ij ), where i = 1, . . ., 618, j = 1, . . ., 8. From the data matrix Y, we determined that there were 3051 scores observed out of 618 × 8 = 4944 possible scores.The percentage of missing values in the rating scores is high at about 38%.This is because not many people would saw or remember all eight ads.
Let m ij = 0 if y ij is missing and m ij − 1 otherwise.Then, corresponding to the data matrix Y, we have a 618 × 8 matrix M = (m ij ) whose entities are either 0 or 1. is usually called the missing data matrix in the statistical literature.
The rows of Y and M are rating scores and the recall indicators, respectively.It is easy to compute the mean rating score or the mean recall rate for each ad by averaging the columns of Y and M.However, it is well known that this approach may lead to biased estimates when the missing data are not Missing Completely at Random, i.e., MCAR (Little and Rubin (1987)).

Confounding Factors
The relationship between the rating score and potentially confounding factors is fairly complex.For example, the intensity of advertising or frequency of ad airing for each ad may affect recall; the higher the intensity, the higher the recall rate might be.Smoking status may affect ratings because established smokers may have negative attitude toward anti-smoking ads and rate them lower.In the MTCP study, there were a variety of potentially confounding factors measured on demographics, TV viewing habits and ad characteristics.In order to obtain accurate estimates of the rating scores and recall rates, adjustment for these potential confounding factors is necessary.

Correlated Data
The third issue is the correlation among the outcome measures.The rating scores y ij as well as the recall indicators m ij , for fixed i, are repeated measures from the same person and therefore correlated.For example, a person's perception of sad ads may be similar but could be very different from his perception of normative ads.As a result, this person might rate all sad ads higher than all normative ads.However, although this correlation exists, it is not our primary concern.Our goal is to obtain an accurate estimate of the relationship between ad characteristics and viewer characteristics and the recall and rating scores.By taking into account the correlations among the outcomes, we can develop more efficient estimates for recalls and rating scores.

Statistical modeling approach
Pattern-Mixture Models with Covariates Let x ¯be a vector of covariates that summarize the viewer and ad characteristics.Let y and m denote the generic random vector that generates the rows of Y and M, respectively.Then the goal of controlling confounding and missing data can be achieved by estimating the regression between x and (y, m) The conditional distribution of y and m given x can be decomposed as follows This decomposition is named the pattern-mixture model with covariates by Little and Wang (1996).It is proposed as an alternative approach of the selection-model with covariates for the missing data problem.The selection model approach has the following decomposition Note that in (2.1), f (y|m, x) is a mixture over distinct missing data patterns.The idea of the pattern-mixture modeling approach to missing data is to stratify the study sample based on missing data patterns, without directly modeling the missing data mechanism as in the selection model approach.More discussions on the pattern-mixture model approach to missing data problems can be found in Little and Wang (1996), Little (1993) and Little (1994).
The pattern-mixture model with covariates decomposition (2.1) is a complete solution for our purpose of modeling the recall rates and the rating scores as functions of viewer and ad characteristics.Note that f (m|x) describes how the viewer and ads characteristics affect the recalls.It is exactly the model for the recall of the anti-smoking advertising.It can answer questions such as: "What type of ads are more likely to be remembered?"On the other hand, modeling f (y|x) addresses the perceived effectives of the ads.It can answer questions such as: "what type of ads are rated higher, (i.e., perceived to be more effective)?"Although we have f (y|m, x) in (2.1), we will show that our data supports the hypothesis H 0 : f (y|m, x) = f (y|x) using a test developed by Park and Lee (1997) for testing MCAR in repeated measures data.The meaning of H 0 is that the regression between y and x is homogeneous across different missing data patterns.It is equivalent to y ⊥ m|x, where ⊥ means "independent of", which is MCAR in the traditional Rubin and Little (1987) sense.

Generalized Estimating Equations
Since the recalls and rating scores on different ads from the same person are correlated, statistical models for repeated measures that take into account of this correlation will provide more efficient estimates than fitting one model for each ad.There is an extensive statistical literature for modeling repeated measures data as described in Diggle, Zeger and Liang (1994).In particular, generalized estimating equations (GEE) are a class of repeated measures models.Several fundamental papers on GEE are Liang and Zeger (1987), Zieger and Liang (1986), Zieger and Liang (1988) and Prentice (1988).An excellent review of GEE theory can be found in Ziegler, Kastner, Gromping, and Blettner (1996).GEE is implemented in many statistical packages such as PROC GENMOD in SAS.A review on GEE software is Horton and Lipsitz (1999).
To be more specific, denote the repeated measures as y ij , which is the j-th measurements for i-thsubject.Let y i = (y i1,...,y it i ) T be the t i × 1 (t i ≤ t) vector of responses and x i = (x i1 , . . ., x it i ) T the t i × p matrix of covariate values, where x ij is a 1 × p vector of covariates for the i-th subject.Let us assume the marginal distribution of y i is in the exponential family where a, b and c are specific functions with canonical parameters θ ij and scale parameter φ Zeger and Liang (1988).The mean and variance are assumed to be respectively.The relationship between the response and covariates is described by the following regression model where Here β = (β 1 , . . ., β p ) T is the vector of the unknown regression parameters to be estimated and g is known as the link function.Liang and Zeger showed that the GEE estimate for β is consistent and asymptotically normal if the mean structure (2.4) is correctly specified, even if the covariance structure V ar(y i ) is misspecified Liang and Zeger (1987) and Zeger and Liang (1988).This property makes GEE very suitable for our analysis, because our goal is to estimate the mean regression between the rating scores and the confounding variables while accurate information about the correlation structure is not a priority.Note that for the GEE models, any arrangement of the second index j represents a fixed order of the repeated measures.In our data, the ratings scores are arranged according to their natural order in the survey.Ideally, the order should be randomized, because the recall and rating could be affected by the order of the ads.The effect of the ordering in the current MTCP data cannot be evaluated because no randomization of ordering was done.This is a limitation of this study, and further discussion on this issue is in the conclusion and discussion Section.Testing MCAR As we point out in previous sections, a prerequisite for applying the GEE pattern-mixture model for the rating scores is to show that the missing values are MCAR as formalized in H 0 : f (y|m, x) = f (y|x).In this subsection, we describe a method from Park and Lee (1997) to test the hypothesis H 0 .Their idea is to derive an extended GEE which includes an additional term corresponding to missing patterns and then test if the regression parameters for this term are significantly different from 0. The following description of their approach is adopted from Park and Lee (1997).In the presence of missing data, let us assume that there are K distinct missing data patterns.Let S k denote the set of observations with missing data pattern k (k = 1, 2, . . ., K). Define indicator variables I ik so that I ik = 1 if the i-th observation is in S k and I ik = 0 otherwise.Using these definitions, Park and Lee (1997) showed that the MCAR hypothesis H 0 : f (y|m, x) = f (y|x) can be tested by fitting the following extended GEE model: where 1 i is a t i × 1 vector of 1's, I i = (I i1 , . . ., I iK ) and balpha = (α 1 , . . ., α K ) T .
The hypothesis H 0 is equivalent to If H * 0 does not hold, then the η ij 's, which are the overall mean values of the y ij ' s differ across the S k 's.This means that the missing data affects the regression between the response y and the covariates x.
Let ( α, β) denote the GEE estimates for (2.7) and V α,β the asymptotic variance-covariance matrix.Let V α be the submatrix of V α,β corresponding to , where V α is the "robust" estimator of the variance-covariance matrix associated with α.
T w has an approximate chi-square distribution with K degrees of freedom under H * 0 .Actually, T w = n −1 αT V α α is exactly the generalized Wald test statistic in Rotnitzky and Jewell (1990).The generalized Wald test statistic is available in SAS PROC GENMOD by specifying the options TYPE3 and WALD in the MODEL statement.Another related test statistic in GEE is the "score" test statistic which is the default of the TYPE3 option in PROC GENMOD.Both can be used to test the significance of an effect in a GEE model.Therefore, the Park and Lee test can be implemented easily using SAS PROC GENMOD.
A potential problem with this approach occurs when the number of distinct missing patterns K is large.This is because pattern-mixture modeling essentially requires stratification of the sample according to distinct missing data patterns.When K increases, the number of strata increases and the number of unknown parameters for the GEE model (2.7) also increases rapidly.Unless the sample size is sufficiently large, there will be some strata with sparse data that cause convergence problems for the GEE model fitting.This is a major difficulty for applying pattern-mixture models to missing data problems.To our knowledge, currently there is no sound statistical methodology to handle this problem.In this paper, we propose a heuristic approach based on collapsing the distinct missing data patterns to apply the Park and Lee's method.Our idea is to collapse the missing data patterns into a smaller number of patterns and run the Park and Lee test, and then to repeat this procedure many times according to different ways of collapsing the missing data patterns.If the MCAR is consistently rejected or accepted across different scenarios of collapsing, then it is plausible to reject or accept MCAR.More details related to this issue are in the DATA ANALYSIS section.
To summarize the discussion of this section, we will fit two GEE models for f (m|x) and f (y|x) for assessing the recall and rating of the ads in our study after the Park and Lee test confirms that hypothesis H 0 : f (y|m, x) = f (y|x) is plausible.

Implementation
We implemented the GEE analysis proposed in the previous section using PROC GENMOD in SAS 8.1.PROC GENMOD contains several useful functionalities that suit our purpose.For example, besides the utility of the generalized TYPE 3 Wald statistic, the CONTRAST and LSMEANS statements in PROC GENMOD allow us to compare different levels of categorical variables, especially for interaction terms as shown in our subsequent analysis.

Dependent and Independent Variables
The dependent variables are the recall indicators and rating scores, respectively.The independent variables used are listed in Table 2.They describe the characteristics of the viewers and the ads.

Collapsing Missing Data Patterns
As pointed out previously, a large number of missing data patterns can cause convergence problem for the pattern-mixture modeling.Since there are eight ads, there are 2 8 − 1 = 255 possible missing data patterns.For example, let "M " represent that the observation is missing and "O" represent that it is observed, then OOOOOOOO represents all eight ads were recalled and rated; OOOOOOOM represents the first seven ads were rated but the rating for the last ad was missing.The 255 possible missing data patterns will create many strata with sparse data because there were only 618 study participants.As mentioned before, we will use different schemes to collapse the missing data patterns.
For example, we can use the number of ads recalled to present the missing data patterns, see the first case of Table 3.There are nine of these patterns but the pattern of missing all of the rating scores can be excluded since there is no response and the individual contribution to the estimating equation cannot be constructed.Thus, the missing data pattern variable, is defined as the subset of those who have k observed rating scores, where k = 1, . . ., 8.
S 1 = {missing one of the first four rating scores} 0.56 S 2 = {missing one of the last four rating scores} S 3 = {others} Collapsing missing data patterns has another problem.For the S k 's specified above, for example, S 6 may include the missing data pattern MMOOOOOO or OOOOOOMM among others.So after the collapsing, there are still different missing data patterns within each S k .From GEE theory, even with missing values in the response variable, GEE may still lead to consistent estimates for the regression parameters, but only when the missing values are MCAR, Liang and Zeger (1987).Therefore the formulation of the S k 's based on number of missing values requires an assumption that within each , the missing data are MCAR.Under this assumption, the GEE model will lead to consistent estimate of (α k , β) in each S k for testing H * 0 .This is a strong assumption that we have to make for our current analysis.
A remedy for this strong assumption on collapsing missing data patterns is to exploit the flexibility in formulating the S k 's in Park and Lee's approach.For example, we can define S k as the group of all the subjects who are "drop-outs" from the k-th position, where k = 0, 1, . . ., 8. Then we will have ten strata, namely, S 0 , S 1 , . . ., S 8 and S 9 , which consists of all the observations that do not have a "monotone" drop-out pattern.If we experiment with different ways of collapsing missing data patterns in terms of different ways of formulating S k 's for smaller K(< 256) and H * 0 always holds, then it is more defendable that H * 0 holds for K = 256.We have listed several different ways of collapsing the missing data pattern in Table 3.The p-values from the score test of GEE for H * 0 will be used to test the significance of the missing data pattern.We will conclude that it is reasonable to assume MCAR for the data, if the missing data patterns are not significant no matter how we collapse.This argument is a heuristic solution for treating the difficult problem of having too many distinct missing data patterns in pattern-mixture modeling.A more rigorous methodology for treating this missing data pattern problem still needs to be developed.

Correlation Structure
To fit a GEE model, we also must specify a correlation structure for the repeated measurements, in our case, the correlation among the rating scores.Although we may hypothesize that similar ads may be rated similarly, we do not have a precise understanding of this correlation.This discourages the use of a particular correlation structure such as AR(1).To be as general as possible, we used the unstructured correlation matrix in our GEE analysis.Fortunately, even with this general correlation structure, the GEE models converged quickly.This is also a benefit of reducing the number of missing data patterns.

Results and interpretation
Model Fitting For both recall rates and rating scores, we fit several GEE models.The first GEE model was fitted using all the explanatory variables in Table 2 and two interaction terms, GENDER * ADTYPE and AGEGRP * ADTYPE.In order to form the interaction term, we dichotomized AGE into AGEGRP by defining AGEGRP to be 0 if AGE is between 12 and 13 or 1 if AGE is between 14 and 15.For rating scores, an additional term representing the missing data pattern MISSPATN is included.Next, another GEE used the significant or marginally significant variables from the first GEE based on the p-values from the generalized Wald statistics of TYPE 3 analysis (p < 0.15).The final GEE model included only significant predictors.For the recall rates, 3 GEE models were fitted, and 2 were fitted for the rating scores.

Significant Variables
The resulting estimates from the final GEE models are listed in Tables 4 and 5 for the recall rates and the rating scores, respectively.In Table 4, the p-values for the parameter estimates show that only the IN-TERCEPT, AGE, GENDER, GRP (advertising intensity) and ADTYPE (type of ads) remained significant in the final GEE model.Sad ads were more likely to be recalled.Females remembered more ads.The longer time since ads aired last time, the less likely they would be remembered.Younger adolescents recalled more ads.The hours of TV watching (B10, B15), smoking status (SMK), social influences (HHED, HHSMK, FRNDSMK), and ownership of tobacco promotional items (OWN) did not significantly affect recall rates.
For Table 5, we started to use the missing data pattern in which equal numbers of ads recalled.The p-value associated with MISSPATN from the TYPE 3 generalized Wald statistic was 0.63.We then tried different sets of strata, S k as specified in Table 3.We observed that H * 0 held across different scenarios.All the p-values were larger than 0.05 for the different ways of collapsing the missing data patterns shown in Table 3.This suggests that our data strongly support H * 0 .Therefore the hypothesis H 0 : f (y|x, m) = f (y|x) is a plausible assumption in the application of the pattern-mixture model for this data set.The final GEE model for the rating scores in Table 5 shows that the only significant effects on the rating scores were: the INTERCEPT, RACE, GENDER, OWN, GRP and ADTYPE.Sad ads were the perceived to be most effective.Female adolescents rated the ads higher than their male counterparts.Minority teens rated ads higher than white kids.Owning tobacco promotional items was negatively associated with the rating scores.

Conclusion and Discussion
We have presented a statistical modeling approach to evaluate the effectiveness of anti-smoking media campaigns according to market penetration and perceived effectiveness of the ads as measured by ad recall and rating.The fitted models provide more efficient estimates of recall rates and rating scores, controlling for missing values, confounding variables and correlations among these outcome measures.The modeling approach offers a flexible way to identify significant viewer characteristics and ad type and execution elements on the effectiveness of ads.
Our approach also has several limitations.The first is a design issue: the order of the ads was not randomized and whether or how this will influence our results is nuclear.The order of the ads asked may have an effect on recall or rating scores.For example, repeatedly asking about several sad ads together instead of mixing them with other types of ads could strengthen a nonsmoker's memory of the negative emotions associated with sad ads and lead to higher recall or rating.A smoker, however, may object to being repeatedly reminded of sad antismoking messages and give lower ratings.The ads surveyed in the MTCP were arranged in a fixed order.Therefore, the potential bias caused by this fixed order of ads cannot be evaluated.Randomizing ad order should be considered in future designs.The second limitation is that we used the ADTYPE variable for all the ads in the same category.This is a simplification, which implies homogeneity of recall and rating scores across different ads within the same categories.However, we can solve this problem by drop the ADTYPE variable and use the ad index, namely, j (= 1, 2, 3, . . ., 8), in the GEE models.This will lead to ad-specific recall rates and rating scores for each individual ad.The third limitation is the heuristic approach to the large number of missing data patterns.More rigorous methods are needed for this difficult issue in applying pattern-mixture modeling.
Despite the limitations, our modeling approach, combined with design based on recall and rating would provide a systematic way for assessing the effectiveness of advertising messages in general.For example, the data collection of recall and rating can be conducted in more controlled laboratory setting.We can show different types of advertising messages to study subjects and then ask them to recall and rate each message after a short time (such as next day).The rating scores from the same person on multiple ads can be analyzed using the GEE modeling approach and the amount of missing data will be much reduced due to the short follow up time.Therefore, we can evaluate the effectiveness of different advertising messages in a laboratory environment before put them in mass media.

Table 1 :
The eight ads in the Massachusetts Tobacco Control Program Sad 6 Cowboy Ad Shows a man who is the brother of the actor who played the Marlboro Man in cigarette ads.He talks about how tobacco companies use ads to make smokers look strong and independent.Then he talks about how his brohter got very sick and died from smoking.Sad 7 Lung Ad Shows a teenage boy having dinner at his girlfriend's home for the first time.The boy is a smoker.He is nervous and begins coughing.At the end of the commercial, he coughs up one of his lungs onto the dinner table.Funny 8 Monica Ad Shows a picture of a young woman getting sprayed with hair spray and splahsed with mouthwash as she tries to get rid of a cigarette smell.

Table 3 :
p-values for different missing data patterns

Table 4 :
GEE model fitting for rating scores The p-value is from type 3 Wald test statistic except the p-value for the intercept which is from a two-sided t-test. a

Table 5 :
GEE model fitting for rating scores All the p-values are from type 3 Wald test statistic except the p-value for the intercept which is from two-sided t-test. a