Factor Timing with Portfolio Characteristics∗

In a factor timing context, academic research has focused on identifying a set of predictors that can explain the dynamics of factor portfolios. We propose an alternative approach for timing factor portfolio returns by exploiting the information from their portfolio characteristics. Different combinations of dimension reduction techniques are employed to independently reduce both the number of predictors and portfolios to predict. Characteristic-based models outperform existing methods in terms of exact predictability, as well as investment performance.


Introduction
The asset pricing literature has long been shaped by the idea that observable firm characteristics convey information about the cross-section of expected stock returns. A common practice in the literature is to extract the risk premium associated with these characteristics by constructing long-short (LS) factor portfolios (Fama and French, 1993). Such zero-investment, market-neutral portfolios have given rise to the so-called factor investing.
Yet, there are benefits over and above static factor investing. Studies such as Stambaugh et al. (2012), Jacobs (2015), Akbas et al. (2016) and Keloharju et al. (2016) show that the performance of LS portfolios, and therefore the benefits from factor investing, are significantly time-varying. More importantly, such time variation in performance is not harmonious across portfolios, allowing for substantial investment gains from timing factor portfolio returns. 1 As such, from an investor's perspective timing is important and an active factor allocation is needed in order to capitalize on the fluctuations in LS portfolio returns.
In a factor timing context, several studies have emerged utilizing a variety of predictive signals as a way to improve upon static factor investing. Valuation ratios, investor sentiment, issuer-repurchaser spread and technical indicators, such as factor momentum, are the most prominent examples, among others. In this paper, we create an optimal factor timing strategy, going over and above existing methods for predicting factor portfolio returns. In doing so, we extend the predictability of stock returns from observable firm characteristics to a portfolio level and predict factor portfolio returns using a collection of portfolio characteristics. Specifically, the characteristics used to sort stocks into portfolios are subsequently aggregated into portfolio characteristics and used as predictive variables to forecast future factor portfolio returns. The use of multiple characteristics to predict individual factor portfolio returns is motivated by the fact that many stocks coexist in different factor portfolio legs simultaneously. 2 Hence, it is sensible to assess the joint predictability that arises from characteristics at a portfolio level and examine the possibility that factor portfolios are predictable by characteristics other than their own.
A comparison of the collective characteristic-based predictability against alternative sets of predictors documented in the literature highlights the joint importance of characteristics in explaining the dynamics of factor portfolios.
A key aspect of our methodology is the use of different dimension reduction techniques to reduce the dimensions of both sides of the predictability problem. In line with , we begin by reducing the number of forecasting targets, recognizing the underlying factor structure in factor portfolio returns. Instead of independently predicting individual anomalies, we focus our attention on the main sources of return variation by isolating the first five Principal Components (PCs). These PCs capture around 67% of the variation in factor portfolio returns (see Figure IA.1 in the Internet Appendix), allowing us to greatly reduce the dimensions of the problem at the expense of little return variation foregone. Since the dominant PCs capture common variation in the underlying risk premia, being able to accurately predict their performance leads to the detection of robust predictive patterns across individual anomalies. 3 In addition, PCs are not just statistical factors but have an investable interpretation as well. As each PC is a linear combination of the underlying variables, PC portfolios are portfolios of factor portfolios, meaning that their returns and characteristics are calculable. To construct the PC portfolios we use conventional PCA, as well as the Risk Premium PCA (RPPCA) proposed by Lettau and Pelger (2020a). 4 Unlike PCA, RPPCA utilises information of the mean returns of the factor portfolios in addition to their variances and leads to the extraction of 2 For example, the stocks with the highest asset growth are also the ones with the lowest book-tomarket ratio, the highest return on assets and the highest accruals (Cooper et al., 2008).
3 Applying Principal Component Analysis (PCA) to a set of factor portfolio returns in order to achieve dimension reduction has recently gained a lot of attention in asset pricing. For example,  form PC portfolios by running PCA on a set of 50 anomalies and use their own book-to-market ratio to predict their performance. 4 Henceforth, "PC portfolios" refers to the estimation of the Principal Component portfolios using either PCA or RPPCA.
factors that may explain a smaller part of the time-series variation but are important in pricing the cross-section. The resulting PCs have higher Sharpe ratios and in our context help guiding the forecasting study around factor portfolios with higher average returns.
We then proceed by compressing the predictive information from the characteristics of the PC portfolios. To achieve this, we do not only rely on PCA, but employ methods that account for the covariance structure between predictors and forecasting targets, such as Partial Least Squares (PLS) (Wold et al., 1984). Conventional PCA focuses on the variance within the predictors and can lead to components that mix return-relevant and irrelevant variation. By using PLS we aim to capture only the variation in the characteristics that is relevant in predicting returns, potentially resulting in sparser and more accurate models.
After rotating characteristics in space using PCA or PLS, we either use the first characteristic component in standard predictive regressions or apply LASSO on the whole set of characteristic components to identify the relevant subset of features. 5 The first case is used to investigate the predictability in the simplest case of a single predictive factor, while the use of LASSO allows for successive components to be included in the surviving subset of predictors, with the importance of each characteristic component being assessed based on its contribution to minimizing the forecasting error rather than the magnitude of its eigenvalue. Our procedure is implemented recursively and the optimal degree of coefficient shrinkage is identified separately for each PC portfolio based on a cross-validation step. This approach has two important implications. First, the number of factors can be different across PC portfolios, allowing for different sources of variation in factor portfolio returns to be approximated by models of different complexity. For instance, many characteristic components may be required to predict the first PC portfolio but only a few for the second. Second, allowing for different values for the level of coefficient shrinkage across time allows us to examine the time variation in the strength of the characteristic signal overall.
In our empirical analysis we use a collection of 72 anomalies spanning the period from 1970 to 2019 and find that characteristics are particularly useful for factor timing purposes. We distinguish factor portfolio predictability in terms of exact predictive accuracy (comparing predicted with future realized returns) and ability to predict the cross-sectional dispersion in returns (differentiating winners from losers). The characteristic-based models that incorporate LASSO are the most successful and consistently outperform existing methods in both terms as they deliver smaller forecasting errors and higher cross-sectional correlations between forecasted and realized returns. They also deliver average monthly returns of up to 1.47% and annualized Sharpe ratios of up to 0.73, while the best benchmark model delivers 1.06% and 0.55, respectively. Importantly, our factor timing strategies show no decay in return performance over time, although many individual anomalies have been found empirically to do so (McLean and Pontiff, 2016).
In terms of the different methods used, the implications of using PCA or RPPCA to reduce the number of portfolios to predict are minimal. Yet, when it comes to reducing the number of predictors down to a single predictive factor, the dimension reduction technique matters. In particular, PCA delivers slightly better exact predictability, but severely underperforms PLS in terms of ranking the anomaly portfolios successfully. Essentially, when a single-factor model is used, it is better to condense the information from the predictors using a tool that is specifically designed for forecasting purposes. Nonetheless, the difference between PCA and PLS disappears when multiple characteristic components are considered in conjunction with LASSO, suggesting that the exact rotation method of the predictors is less important once we account for the whole information set. After employing LASSO results improve uniformly across models reflecting the importance of accounting for further components and the benefits of regularization in dealing with overfitting. Furthermore, the cross-validation step reveals that the required number of features varies significantly across time for all the PC portfolios. This implies that characteristics work better in predicting returns in certain periods than others, which is expected given the time-variation in factor risk premia. Our LASSO-based factor timing strategies are flexible enough to downgrade (upgrade) information in the characteristics when their informativeness is low (high).
The rest of the paper is structured as follows: Section 2 provides a discussion of the relevant literature. Section 3 describes the general framework and our estimation approach and Section 4 provides an assessment of the various models in terms of forecasting ability and investment performance. Section 5 modifies the baseline models and examines alternative model specifications. Finally, Section 6 concludes.

Literature review
Our paper is related to several strands of the literature. Without attempting a full-scale review, we discuss briefly how we contribute to two main categories, namely studies that utilize dimension reduction techniques in the context of asset pricing and studies that explore factor portfolio predictability.

Dimension reduction in asset pricing
Machine learning has surfaced in recent years in various asset pricing applications due to the limitations of standard methodologies in a high dimensional setting. Gu et al. (2020) compare various machine learning techniques in their effort to forecast stock returns using a large collection of stock characteristics. Similarly, numerous studies attempt to identify the extent to which characteristics are associated with expected returns by regularizing the cross-sectional regressions or the characteristic-based portfolio sorts used in the esti-mation of risk premia. For instance, DeMiguel et al. (2020), Freyberger et al. (2020 and Feng et al. (2020) employ LASSO regularization to create a stochastic-discount-factor (SDF) with sparse characteristic exposure. However, imposing sparsity in the number of return predictors under a LASSO approach may not be a realistic assumption after all due to the diverse characteristic space . Nevertheless, sparse models allow for a parsimonious representation of the cross-section of expected stock returns and an easier interpretation and link to economic theories. In our empirical application, we apply LASSO on a set of characteristic PCs instead of raw characteristics. Hence, our approach still encourages a sparse factor structure, while allowing multiple characteristics to have an effect on expected factor portfolio returns through their exposure to the characteristic PCs.
Another strand of the literature applies PCA on a set of stock or portfolio returns to reduce their dimensions. Examples of PCA applications in asset pricing include Connor and Korajczyk (1988), who apply Asymptotic PCA on asset returns to extract latent factors, and Kozak et al. (2018), who form a low dimensional SDF using the first few PCs of anomaly returns.  also find that a low dimensional specification in terms of PC portfolios is feasible due to the high degree of common variation in factor portfolio returns. In general, the use of PCA in this context is both economically and empirically motivated. Economically, the existence of arbitrageurs in the economy implies that near-arbitrage opportunities, meaning extremely high Sharpe ratios, are implausible to achieve. Hence, high Sharpe ratios associated with low eigenvalue PCs should make no contribution to explaining returns (Kozak et al., 2018). 6 Empirically, returns possess a spiked covariance structure, meaning the variance-covariance matrix is dominated by a small number of large eigenvalues, separated from the rest. Combining these facts implies that asset returns should be adequately explained by a small number of dominant PCs. We contribute to this literature by constructing PC portfolios of LS portfolios and 6 Still, this argument does not explicate whether high eigenvalue PCs reflect risk or mispricing. examining their predictability.
Several recent studies also focus on modifying conventional PCA with the purpose of making it more suitable for asset pricing applications. Kelly et al. (2019) propose a new method of Instrumental Principal Components, allowing latent factor loadings to be time-varying and partially dependent on firm characteristics. 7 They find that only a small number of characteristic-based factors are important for identifying a successful latent factor model. Lettau and Pelger (2020a) augment standard PCA by a crosssectional pricing error in order to extract factors that can simultaneously explain the time-series variation and the cross-section of asset returns and Lettau and Pelger (2020b) demonstrate the superiority of the estimator compared to standard PCA on a set of 37 factor portfolios. Finally, Giglio and Xiu (2021) account for omitted factors in the estimation of risk premia by combining PCA with two-pass cross-sectional regressions.
We exploit the recent advancements in the literature by also using the RPPCA of Lettau and Pelger (2020a) to extract factors from LS portfolio returns.

Factor portfolio predictability
In a factor timing context, factor momentum has emerged as a mechanism to time factor portfolio returns. Early contributors to this literature include Grundy and Martin (2001), who document a momentum effect in the factor component of stock returns. The momentum effect in factor portfolio returns is strong and has its own distinctive behaviour, different from that of stock momentum. For example, Arnott et al. (2021) and Gupta and Kelly (2019) find that the effect is the strongest at the 1-month horizon, even though stocks exhibit reversals in such short intervals. Nonetheless, factor momentum captures the effect at its purest form as it subsumes stock, industry momentum as well as momentum found in other well diversified portfolios (Arnott et al., 2021). Further-more, factor momentum is concentrated in the highest eigenvalue PCs of factor portfolio returns, which implies that momentum is intertwined with the covariance structure of factor portfolios (Ehsani and Linnainmaa, 2022). Whether looking at PC portfolios or individual factors, factor momentum can accommodate factor timing simply by buying (selling) portfolios that have performed well (poorly) in the recent past or relative to their peers. Such strategies deliver strong return performance and are not susceptible to crashes, as stock momentum (Gupta and Kelly, 2019). Nevertheless, using exactly the same investment rule we show that characteristic-based forecasts provide superior information and result in more profitable investment strategies compared to factor momentum.
Outside factor momentum, numerous studies attempt to predict the performance of individual factor portfolios using a collection of potential predictors. Daniel and Moskowitz (2016) forecast stock momentum using market indicators and volatility proxies in an effort to explain momentum crashes. Similarly, Huang (2022) finds that the return spread between winners and losers negatively predicts stock momentum returns. Baba-Yara et al. (2021) analyse the ability of the value spread to forecast the returns of the valueminus-growth portfolio across asset classes. They find that the first principal component of the value spread captures most of the variation in expected value returns. In a similar manner, we also use the first principal component of multiple characteristics to predict PC portfolio returns, even though we examine the possibility that further characteristic components are required. In contrast to previous studies targeting only specific anomalies, we examine factor portfolio predictability across a large set of factor portfolios.
Other studies also examine the predictability of multiple portfolios at once, using either a single or multiple predictors. Asness et al. (2017) use the value spread to construct timing strategies for value, momentum and betting-against-beta portfolios, though they observe little improvement upon a constant multi-style strategy. Greenwood and Hanson (2012) show that corporate share issuance can be used to forecast the performance of factor portfolios related to size and value. Stambaugh et al. (2012) find that LS strategies appear to be stronger following periods of high investor sentiment. They find the sentiment effect to be concentrated on the short leg of anomalies, which they base on the short sale impediments that results in relatively higher overpricing compared to underpricing. On a much larger scale, Jacobs (2015) confirms the findings of Stambaugh et al. (2012) by examining the role of sentiment in a large set of 100 anomalies. Kelly and Pruitt (2013) forecast four sets of characteristic-sorted portfolios using the cross-section of book-to-market ratios and observe higher predictability at lower frequencies. Dichtl et al. (2019) attempt to predict 20 equity factors using fundamental and technical indicators. They distinguish between cross-sectional and time-series predictability which results in factor-tilting and factor timing portfolio allocations, respectively.  construct PC portfolios by running PCA on the time-series of 50 anomalies and find that the largest eigenvalue PCs are the most predictable by their own book-to-market ratio.
We extend  framework by incorporating information across a large set of observable characteristics to predict a large set of factor portfolio returns. Furthermore, we allow the effect of characteristics to be independently identified for every PC portfolio, examining the possibility that different characteristics affect different sources of variation in factor portfolio returns.

Methodology
This section begins by setting out the general framework, followed by our forecasting procedure and the benchmark models employed. Section IA.2 of the Internet Appendix introduces the statistical methods used in this study and provides a comprehensive overview of their functional form and statistical properties.

General framework
The main objective is to predict a large set of factor portfolio returns using a large set of portfolio characteristics. Let R be a (T × N ) matrix of N factor portfolio returns for T periods. Equivalently, let R t,. = (R t,1 , . . . , R t,N ) be a (1 × N ) vector of portfolio returns at time t and C t , a (N × M ) matrix of M characteristics for N factor portfolios at time t. The base case arises from a conditional version of Cochrane's (2011) framework for modeling returns as a function of characteristics: where a t,n and b m t,n denote the conditional alpha and beta at time t, and ε t+1,n is the pricing error at time t + 1. Entertaining time variation in b m t,n and a t,n due to changes in portfolio attributes is the essence of factor timing. 8 By combining different dimension reduction techniques, we essentially investigate the possibility that the conditional alphas and betas are a function of the covariance of returns, the covariance of the characteristics, or even the covariance of returns with the characteristics. The covariance of returns comes into play by focusing on the dominant components of factor portfolio returns instead of predicting each factor portfolio separately. More concretely, assuming a linear latent factor specification, excess asset returns can be expressed as: where Z t+1,. = (z t+1,1 , z t+1,2 , . . . , z t+1,K ) is a (1×K) vector of factor returns with K << N , vector of idiosyncratic errors. The time dimension in this context arises by the recursive estimation of eigenvectors and principal components. The first term of the right-hand-8 Cochrane (2011) uses the formulation in Equation (1) to model the returns of an individual stock in excess of the risk-free rate. In our setting, we model factor returns, i.e., the returns of a long portfolio in excess of the returns of a short portfolio. Getting from individual stocks to factor portfolios is straightforward and hence we focus directly on the latter to simplify the exposition of our framework. side reflects compensation for the exposure on systematic risk factors while the second term reflects asset specific risk. Under the assumption that the factors and the errors are uncorrelated, the variance-covariance matrix of asset returns can be decomposed into a systematic and idiosyncratic part. A common practice is to estimate Z t+1,. and W t directly, by applying PCA on the variance-covariance matrix of R and retaining the dominant components (e.g., Connor and Korajczyk (1986) and Kozak et al. (2018)). Provided that time variation in asset risk premia is driven by exposure to time-varying aggregate risk, being able to accurately predict the dominant components Z t+1,. allows to form forecasts for individual anomalies through W t . By only focusing on Z t+1,. , we isolate common sources of predictability across factor portfolios and ignore spurious predictability associated with smaller PCs.
In order to forecast Z t+1,. , we model PC portfolio returns as a function of observable characteristics. Specifically, lagged characteristics are used to predict next-period PC portfolio returns. The characteristics of the PC portfolios are computed by combining factor portfolio characteristics according to their weights given by the i th eigenvector w t,i .
The cross-section of characteristics for the i th i = (1, . . . , K) PC portfolio is calculated as Repeating the process for every t and every i results in a (T × M ) matrix H i of characteristics for each PC portfolio.
However, using raw characteristics as inputs in standard predictive regressions would be suboptimal due to high correlations and lack of predictive information for some of them.
Therefore, we transform the characteristics of PC portfolios into scores by using PCA and PLS. This is achieved by multiplying the matrix of characteristics H i with a matrix of eigenvectors, such as: where Next, we model PC portfolio returns using the characteristic components: where X m t,i is the m th characteristic component of the i th PC portfolio at time t, and z t+1,i is the one-month ahead return of the same portfolio. Equations (1) to (4), lead to: where η t+1,. is a (1 × N ) vector of composite errors capturing both the unexplained return variation from the characteristics, as well as the variation from potentially omitting higher-order PC components. Equation (5) shows that a t,n and b m t,n from Equation (1)

Forecasting procedure
We use at least 20 years (240 months) of information to estimate the PC portfolios and their characteristics to then make return predictions at t + 1. Our forecasts employ an expanding estimation window, with the estimation sample always starting at the beginning of the sample period and incorporating additional observations as they become available.
PC portfolios are recursively re-estimated at each point in time, using an updated w t,i with i = 1, . . . , K based on the in-sample variance-covariance matrix of factor portfolio returns. 9 Notice that PC portfolio characteristics H i do not only change because of the change in the underlying factor portfolio characteristics C t , but because of the change in the weighting vectors w t,i as well. Overall, our approach is flexible enough to account for a potentially unstable correlation structure in the factor portfolio returns.
In a similar fashion, the matrix of characteristic components is obtained as follows; for PCA, which only utilizes information contained in the characteristics to extract the latent factors, characteristics up to t are used to estimate X i . For PLS, which uses information in both characteristics and returns, characteristics up to t−1 and PC portfolio returns up to t are used to estimate X i . The βs in Equation (4) are always estimated using returns up to t and values in X i up to t − 1. Values of X i at t are then plugged into Equation (4) to obtain forecasts for each PC portfolio return at t + 1. Hence, our forecasts are completely out of sample and do not suffer from any look-ahead bias.
Another subtle but important detail is the cross-sectional standardization of C t to account for the difference in the scale of the characteristics. Running raw PCA or PLS on C t would tilt the PCs towards the larger characteristics, as those will have significantly higher variance. For this reason, we standardize the matrix of factor portfolio characteristics C t cross-sectionally before calculating H i and ultimately X i by subtracting the cross-sectional characteristic mean and dividing by the cross-sectional characteristic standard deviation at each time t. Apart from ensuring a reasonable covariance matrix for the characteristics, such an approach allows us to focus on the cross-sectional differences in the data. As long as factor portfolio characteristics coincide with factor portfolio returns in cross-sectional terms, PC portfolio characteristics should coincide with returns across time, as they are both linear combinations of the cross-section and thus making a predictive regression approach sensible. 10 The first decision being made is on the optimal number of factors in Equation (2). Specifying the optimal number of PCs is ultimately an empirical question as it depends on the underlying factor structure. Bai and Ng (2002), Onatski (2010) and , all develop critical value thresholds for determining the number of factors. We follow a simple approach and focus on the first five PCs as they capture about 67% of the variation in factor portfolio returns. Selecting the first five PC portfolios is also consistent with similar studies performing PCA on a set of factor portfolios, e.g.,  and Lettau and Pelger (2020b). Hence, let Z t,5 = (z t,1 , z t,2 , . . . , z t,5 ) and W t,5 = (w t,1 , w t,2 , . . . , w t,5 ) be the set of the 5 largest PC portfolios and eigenvectors.
The second decision to be made is on how to estimate β m t,i in Equation (4). Here, we examine two different cases, one that imposes sparsity and one that is data-driven. In the first case, we only use the first characteristic component of each PC portfolio (i.e., the first column of X i ) in standard bivariate predictive regressions. Although this is the sparsest specification possible, multiple characteristics can have an effect on PC portfolio returns through their weights on the first characteristic PC. As an alternative, we apply LASSO on the whole set of characteristic components for each PC portfolio (i.e., the whole matrix X i ) to identify a subset that is useful for our forecasting objective. Hence, βs in Equation (4) for the first case are obtained through OLS for a single predictive factor (m = 1) and in the second case the βs are obtained through LASSO for m = 1, . . . , M .
When performing LASSO the optimal amount of coefficient shrinkage is selected by conducting cross-validation on a rolling basis. In particular, before every forecasting step we separate the in-sample period into a training and a validation sample. The training sample is used to estimate the PC portfolios and characteristic PCs and the validation sample is used to identify the degree of model complexity that delivers reliable out-ofsample performance. 11 At the start, the training sample is used to forecast the first period in the validation sample subject to a geometric sequence of shrinkage values. 12 The actual value of the forecasted data point is then used as part of the next training set to forecast the subsequent point in the validation sample. After repeating this procedure for every period in the validation sample, we pick the level of shrinkage that minimizes the mean-squared error. We then re-estimate the PC portfolios and characteristic PCs using the whole in-sample period (training and validation) and apply LASSO using the fixed value for the shrinkage parameter to estimate β m t,i and predict PC portfolios at t + 1.
Depending on the magnitude of the shrinkage, our approach examines the possibility that none of the characteristic components are relevant in predicting PC portfolio returns, in which case returns forecasts shrink down to a constant term.
As already discussed, LASSO is applied separately on each PC portfolio, meaning that the number of features can be different across PC portfolios. Essentially, our method allows for different sources of variation in factor portfolio returns to be approximated by models of different complexity, examining the possibility that characteristic importance varies across the main sources of return variation. Furthermore, since LASSO is applied iteratively, the number of features can also vary across time for each PC portfolio depending on how strong the characteristic signal has been in the recent past. Lastly, it is important to highlight that LASSO can select low eigenvalue characteristic PCs, as long as they contribute to minimizing the forecasting error in the validation period.
To summarize, we attempt to regularize both the left (LHS) and the right-hand side (RHS) of the predictability problem by combining different dimension reduction techniques. Regularization in the number of forecasting targets is achieved with the use of PCA or RPPCA and in the number of predictors with the use of PCA or PLS, resulting in four base models that we define as PCA, RPPCA, PCA-PLS, RPPCA-PLS. 13 Figure   1 provides a visual depiction of our procedure that can be summarized in the following steps: 1. Reduce a set of factor portfolios to their first five components using PCA or RPPCA.
2. Estimate the characteristics of the PC portfolios using their loadings from the first step.
3. Rotate PC portfolio characteristics using either PCA or PLS.
4. Either select the first characteristic PC or apply LASSO on the whole set of characteristic PCs of each PC portfolio.
5. Produce separate forecasts for each PC portfolio using the selected number of features.
6. Expand these forecasts to individual factor portfolios using their loadings on each PC portfolio. Figure 1: Visual depiction of our modeling procedure. The figure presents the process of forecasting factor portfolio returns using their portfolio characteristics. PC portfolios are calculated as linear combinations of factor portfolios. The same weighting vectors are used to decompose the three-dimensional set of characteristics into 5 independent matrices of characteristics (one for each PC portfolio). The matrices of predictors are transformed to components and either the first component is retained or LASSO is applied on the whole set of components to pick those that are the most informative. Individual forecasts for each PC portfolio are produced and those forecasts are aggregated into factor portfolio return forecasts using the weighting vectors that were used to aggregate factor portfolios into PC portfolios.

Benchmark Models
To examine whether characteristic-based models provide superior information compared to different approaches, we employ alternative information sets to predict factor portfolio returns. Panel B of Table IA.2 includes a listing of all the benchmark models used. In a general setting, we form the baseline benchmark models following the methodological framework proposed by the original authors. Section 5 modifies the original models in various ways in order to examine the robustness of our results.

Factor Momentum
The first benchmark, is the 1-month momentum strategy (1mMOM), which forms the momentum signal based on a look-back-window of 1 month. Essentially, the return at time t becomes the prediction for the return at time t + 1. The second benchmark is the 12-month momentum strategy (12mMOM), which forms the momentum signal based on a look-back-window of twelve months. In this case, the prediction for the return at time t + 1 is the average monthly return of the previous twelve months. In order to improve consistency across characteristic and momentum models, in Section 5 we also apply both momentum strategies to the PC portfolios and then extend the forecasts to individual anomalies as in Equation (5). Hence, we also examine the possibility of a stronger momentum effect on the main sources of variation of factor portfolio returns. 14

Valuation Ratios
As a third benchmark, we use only the book-to-market ratio of factor portfolios as a return predictor. Specifically, we follow  in predicting the first five PCs by their own book-to-market ratio and then extending the forecasts to individual anomalies.
In order to keep things consistent with our framework, we estimate the PC portfolios recursively rather than using the first half of the sample. In Section 5, we simultaneously use the book-to-market ratio of all dominant PC portfolios in combination with LASSO as an alternative to the baseline model.

Issuer-Repurchaser Spread
Following Greenwood and Hanson (2012), we estimate the issuer-repurchaser spread of each portfolio and use it to predict next period factor portfolio returns. The issuerrepurchaser spread is defined as the average characteristic decile difference between issuers and repurchasers. Repurchasers are defined as firms that have reduced their shares outstanding by more than 0.5% during the fiscal year and issuers are firms that have increased their shares outstanding by more than 10% during the fiscal year. The metric can take values from −9 to 9, with low values implying that issuers are located in the low leg and repurchasers in the high leg of each factor portfolio (and vice versa). In Section 5, we generalize this approach by considering the issuer-repurchaser spreads of the PC portfolios.

Investor Sentiment
We explore the role of investor sentiment in predicting factor portfolio returns. Stambaugh et al. (2012) and Jacobs (2015) find that anomaly performance is stronger following periods of high sentiment. To examine the effect of sentiment, we use the investor sentiment index of Baker and Wurgler (2006), which captures the common component in five sentiment proxies, with each proxy being orthogonalized with respect to six macroeconomic indicators. Specifically, next period factor portfolio returns are regressed on the lagged values of the index and forecasts for individual anomalies are formed based on a standard regression setting. In Section 5, we also form forecasts for individual PC portfolios and employ LASSO to examine potential time variability in the sentiment signal.

Historical Sample Mean
Finally, we use the in-sample average of factor portfolio returns as a forecast for the next period, as in Campbell and Thompson (2008). Such a simple non-parametric technique utilizes information in the returns only, allowing us to examine the incremental effect of sophisticated statistical techniques and different information sets.

Data
We replicate a large set of 72 characteristics, also considered by .
The characteristics are calculated using data from the Center of Research on Securities (CRSP) and Compustat. Our dataset covers the 50-year period from January 1970 to December 2019. The stock universe includes common stocks listed on NYSE, AMEX, and NASDAQ that have a record of month-end market capitalization on CRSP and a non-missing and non-negative common value of equity on Compustat. Additional information about the characteristics, including origination and characteristic description, can be found in Table IA.3 of the Internet Appendix.
For every month in our sample, stock returns at month t are matched against their respective characteristics at month t − 1. For accounting data, we allow at least six months to pass from the firms' fiscal year end before they become available and at least four months to pass for quarterly data. We also winsorize characteristics cross-sectionally at a 99% confidence level to account for extreme outliers. Finally, to isolate the effect of microcaps, we remove stocks with price below $5 at the portfolio formation period and use NYSE-breakpoints to split stocks into deciles, following Fama and French (2008). These adjustments help us robustify our inferences, since many anomalies have been found to work better on small stocks (Fama and French, 2008).
We then move to the construction of the factor portfolios. For each anomaly, we first group stocks into value-weighted deciles based on their characteristic exposure in the previous month and then go long decile ten and short decile one, 15 even if the characteristic 15 In the early years of the sample period, there are few characteristics, such as characteristics based on research and development expenses, which do not have enough variation in order to form ten separate portfolios. To account for this, we allow the number of quantiles to be less than ten for months in which the required number of cut-off points is not reached. In other words, LS portfolio returns are calculated as long as there are at least two different values for the same characteristic in a particular month.
is negatively related to future returns. Such an approach requires no ex-ante information about the relationship between characteristics and returns and results in the highest dispersion in factor portfolio returns. Furthermore, given that factor timing strategies can take long and short positions on factors, the sign of factor portfolio returns is irrelevant. 16 Similarly to computing factor portfolio returns, the characteristics of factor portfolios can be computed by value-weighting characteristics of stocks within each decile portfolio and then subtracting the value of the bottom from the top decile. Notice that the portfolio constructed based on a particular characteristic sort will also have the highest characteristic score by construction. 17 Figure 2 displays the average monthly returns of the factor portfolios together with the 95% confidence intervals. Out of all the factor portfolios, 12-month momentum (mom12m) has the highest average return, followed by 6-month momentum (mom6m).
Out of the 72 portfolios, only 22 have significant average returns, confirming a high degree of redundancy among the documented factors (Hou et al., 2020). When we focus on the out-of-sample period only, this number goes down to 10, reflecting the decay in the performance of the anomalies over time (McLean and Pontiff, 2016). Further descriptive statistics for the factor portfolios can be found in Table IA.4 in the Internet Appendix. 16 Hence, strategies with a negative risk premium, such as asset growth, should on average be allocated in the short side of our factor timing portfolio.
17 For example, the momentum portfolio will always have the highest momentum score compared to all the other factor portfolios. As already discussed, we proceed by constructing recursively five PC portfolios, i.e., linear combinations of the 72 factor portfolios using either PCA or RPPCA. These PC portfolios are by construction affected by all factor portfolios in a time-varying fashion; as a result, at a first glance they might look as if they do not have any economic interpretation.
In order to tackle this, we regress recursively each PC portfolio return on each of the 72 anomalies and estimate the monthly time-series of R 2 values for each anomaly. The analysis, which is detailed in Section IA.6 of the Internet Appendix, shows that the constructed PC portfolios have in fact a quite clear economic interpretation. For example, the first PC portfolio (based on PCA) loads heavily on volatility characteristics, the second one loads more on value characteristics, while the third one is mostly driven by momentum characteristics. Moreover, despite the recursive construction procedure of the PC portfolios, these economic relations are very stable over time.

Predictive performance
We examine the out-of-sample performance of our predictive models using standard forecast evaluation measures and a monthly holding period as in Campbell and Thompson (2008). We use an in-sample window of at least 240 months, with the initial in-sample period covering the period 01/1970-12/1989 and forecasts being obtained out-of-sample for the period 01/1990-12/2019. As a first indication of the out-of-sample fit of our models, we estimate the out-of-sample R 2 for each individual PC portfolio as: where z i,t+1 is the PC portfolio return forecast at time t + 1 andz i,t+1 is the average PC portfolio return using information up to period t. We also estimate a Total OOS R 2 , which pools squared errors across factor portfolios and across time: Total OOS R 2 assesses the predictive ability of each model under a grand panel framework and therefore is a bulk measure of the accuracy of the model-based predictions of future factor portfolio returns. It is also important to highlight that LASSO may select characteristic components other than the first, potentially resulting in considerably different forecasts compared to the single factor case. 19 The historical sample mean is not included as it has a zero Total OOS R 2 by construction.
implied by the highly negative Total OOS R 2 . When returns are averaged over the past twelve months, results improve significantly, although the Total OOS R 2 remains on the negative side. Conversely, models based on the book-to-market ratio, issuer-repurchaser spread and investor sentiment deliver positive Total OOS R 2 , though they still fall behind the characteristic-based models that employ LASSO.
Ultimately, we are interested in the predictability of individual factor portfolios based on PC portfolio forecasts. As a measure of individual factor portfolio predictability, we estimate the individual OOS R 2 for all anomalies under the different models. Apropos observe substantial anomaly predictability and find many predominant anomalies, such as value (bm) and sales-to-price ratio (sp) to be highly predictable by observed characteristics. However, almost all characteristic-based models fail to predict anomalies that are based on a % change in accounting variables, such as % change in sales minus % change receivables (pchsale pchrect) and % change in the current ratio (pchcurrat) among others, located in the lower half of the heat-map. These portfolios have returns statistically indistinguishable from zero and low covariance with the rest of the anomaly universe.
As a result, they do not load heavily on the first five components and their performance is not adequately captured by PC portfolio forecasts. With regards to the benchmark models, only factor momentum results in high forecasting errors and therefore negative OOS R 2 for almost all anomalies. The remaining benchmark models perform sufficiently well, delivering positive OOS R 2 for the majority of the anomaly universe.  Figure 3: OOS R 2 for individual anomalies under the characteristic-based models that employ LASSO and benchmark models (historical sample mean is inferred by the R 2 metric). Negative values (in red) show lack of predictive ability while positive values (in green) show predictive ability of the underlying model for a given factor portfolio.
Whereas OOS R 2 accommodates a general quantitative comparison of the predictive performance of the various models, it is also important to assess the statistical significance of the differences among model forecasts. To make pairwise comparisons of the out-ofsample predictive accuracy we use the modified Diebold and Mariano (DM) test by Gu et al. (2020), which compares the cross-sectional average error differential between two models. The DM test statistic between two models (1) and (2) is defined as DM 1,2 = d 1,2 /σd 1,2 , whered 1,2 andσd 1,2 are the mean and standard deviation of the error differential, defined as: where e (1) n,t+1 2 and e (2) n,t+1 2 denote the prediction error of factor portfolio return n at time t + 1 under models (1) and (2)   ble) and 1% (triple) level, respectively. We observe that the characteristic-based models provide significantly higher predictive accuracy than the factor momentum models and the historical sample mean model, even though the results are less strong in the latter case. In contrast, the higher predictive accuracy compared to the other benchmark model is not translated into statistical significance.
Nevertheless, predicting anomaly returns is of interest as long as it accommodates the construction of a profitable investment strategy. Specifically, in asset pricing the focus of interest is not so much on obtaining accurate predictions for individual returns, but rather on constructing portfolios with good risk-return properties (Nagel, 2021). Put dif-ferently, we are more interested in predicting cross-sectional differences in returns rather than predicting individual returns in exact terms. In that sense, Total OOS R 2 is just a distance measure that does not reflect whether models can distinguish strong from weak performers. Consider, for example, a stylized hypothetical scenario with three factor portfolios and a forecasting period of only one month. If the realized returns of the portfolios are 3%, 2% and 1%, the estimated historical samples means are 0%, 1% and 2%, and the model-implied predictions are 6%, 5% and 4%, respectively, then the predictive model will end up having a very negative OOS R 2 (-145.45%) even though it will be able to rank the portfolios perfectly. Consequently, models that yield higher Total OOS R 2 do not necessarily yield better portfolios in terms of average returns or Sharpe ratios. This argument explains, for example, why the 1-month factor momentum has been found empirically to be particularly profitable even though our results show that it has a very negative Total OOS R 2 . The disconnect between OOS R 2 and investment performance is discussed in detail both theoretically and empirically in Kelly et al. (2022).
Given that predictive accuracy in relative terms might be more important than predictive accuracy in exact terms, we proceed by exploring two alternative measures, namely the percentage of times that the sign of future factor portfolio returns is identified correctly and the average cross-sectional correlation between forecasted and realized returns. The former measure examines the ability of the models to predict the direction of individual factor portfolio returns and the latter measure examines whether model-based forecasts capture the cross-sectional dispersion in factor portfolio returns.  When considering the single factor predictive models in Panel A, we observe that the models that use PLS for the RHS are far superior to the models that use PCA for the RHS, even though Table 1 shows that they exhibit worse OOS R 2 . In order to understand this discrepancy, we can take the example of the PCA-PLS model. This model has a forecasting error that is lower than that of the historical sample mean model in 54% of the times. In those cases, it exhibits an OOS R 2 of 12.24% and an average crosssectional correlation of 41%. In the remaining 46% of the cases, it exhibits an OOS R 2 of -17.17% and an average cross-sectional correlation of -28%. This means that, while the model's low overall OOS R 2 is driven by some large forecasting errors, its high overall cross-sectional correlation is due to the fact that in the majority of the cases it is particularly informative for the ranking of next period portfolio returns. Importantly, Panel B reveals that accounting for further components under a LASSO approach harmonizes the performance across all four characteristic-based models. Finally, Panel C shows that the benchmark models display slightly lower proportions of correct sign and markedly lower average cross-sectional correlations compared to the models in Panel B. 20 Overall, results confirm that characteristic-based models can better distinguish anomaly performance compared to alternative approaches.
Recall from Section 3.2 that our predictive approach entails cross-sectional standardization of each characteristic in each month. Therefore, given the success of the approach, a natural question that arises is what is the source of variation in the characteristics of the factor portfolios that leads to predictability. This issue is discussed in detail in Section IA.3 of the Internet Appendix. We show that the main source of time variation comes from the higher moments of the cross-sectional distribution of the characteristics. This is intuitive given that the literature with respect to stock return predictability already establishes that the predictive power of several characteristics is closely related to their non-normal distribution. For example, Cooper et al. (2008) show that asset growth is highly positively skewed and accordingly its predictive power is mainly driven by the high-rather than the low-asset growth stocks. Another source of variation comes from the time-varying correlations across the different characteristics. For example, it is possible that for a given month the correlation between stock momentum and value is high and hence the standardized momentum score of the respective factor portfolios is similar, while in another month the correlation might be low and hence the momentum score of the respective factor portfolios will be completely different. In the latter case, there is additional information content that can be exploited. 21 20 It is noteworthy that, similar to the case of the PLS single factor models, the factor momentum models perform reasonably well despite their negative OOS R 2 values. In fact, the 1-month factor momentum has a forecasting error that is lower than that of the historical sample mean model in only 20% of the times. In those cases, it exhibits an OOS R 2 of 38.93% and an average cross-sectional correlation of 60%. In the remaining 80% of the cases, it exhibits an OOS R 2 of -153.32% and an average cross-sectional correlation of -7%. This means that the good overall performance of the 1-month factor momentum is driven by only a small subsample of observations during which it can predict future factor portfolio returns particularly well in terms of both exact and relative terms. 21 Obviously, another source of variation stems from the recursively estimated weighting vector w i,t . However, we show that this vector remains relatively stable across time.
Finally, we examine the implications of applying LASSO on the sets of characteristic components in terms of model complexity. Our approach allows for the number of features to vary across factor portfolios and across time, enabling us to see when the characteristic signal is strong and when it diminishes. Figure

Investment performance
In this section, we assess the performance of each model in terms of economic rather than statistical contribution and examine how return forecasts can be translated into factor timing strategies. We construct three different strategies and assess their performance using a monthly holding period and standard portfolio evaluation measures. The first strategy is a simple long-short strategy (LSS), or an LS portfolio of factor portfolios. Factor portfolios are grouped into equally-weighted deciles based on their return forecasts and a long-short strategy is constructed that goes long the top 10% and short the bottom 10% of the anomalies. Such a strategy focuses on the extremes of the conditional returns distribution and neglects factor portfolios that lie in the middle. Hence, LSS will work well as long as the models can identify anomalies with very high or very low expected returns at each period, even if they are indecisive about anomalies with conditional returns close to zero.
The second investment strategy is similar to the time-series factor momentum (TSFM) strategy by Gupta and Kelly (2019). TSFM scales factor portfolio returns R t+1,. according to return forecastsR t+1,. . The scaling vector s t,n is obtained by dividing return forecasts by individual factor in-sample monthly volatility and capping them at ±2, as shown below: s t,n = min max 1 σ t,nR t+1,n , −2 , 2 .
The strategy goes long in factors with positive scores and short in factors with negative scores. The scores are rescaled to form unit dollar weights for the long and the short leg. 22 Multiplying next period factor portfolio returns by their respective weights reveals the return of the strategy: TSFM t+1 = n 1 {st,n>0} R t+1,n × s t,n n 1 {st,n>0} s t,n − n 1 {st,n≤0} R t+1,n × s t,n n 1 {st,n≤0} s t,n .
The main difference between LSS and TSFM is that, while both are technically longshort, TSFM invests in the whole universe of factor portfolios and not in factor portfolios with extreme return forecasts only. Furthermore, the number of factor portfolios in each leg, as well as the relative weights, can differ for TSFM while remaining constant under LSS. More concretely, the sign of the return forecast determines whether the anomaly will be bought or sold, while the magnitude of the forecast determines the relative weight.
Hence, under TSFM the long and the short legs can have a disproportional number of constituents and in extreme cases, the strategy can converge to long or short only.
The last strategy, also in Gupta and Kelly (2019), is the cross-sectional version of TSFM (CSFM). The main difference between CSFM and TSFM is that the cross-sectional median is subtracted from the return forecasts before scaling with volatility. This strategy takes positions in factor portfolios that have outperformed or underperformed relative to their peers. For example, if return forecasts are positive for all factor portfolios then TSFM will take a long position in all of them, while CSFM will go long only in those with above median return forecasts and short the rest. Hence, even if the models cannot identify the sign correctly, this strategy will still be profitable if forecasts are consistent in relative terms, similarly to LSS:

35
unable to predict the cross-sectional dispersion of factor portfolio returns, implying again that a lot of variation in the characteristics is irrelevant in asset return prediction. As a result, strategies based on PCA and RPPCA deliver returns indistinguishable from zero, with returns for PCA even becoming negative. Conversely, when PLS is used for the RHS all strategies deliver positive and significant returns, reflecting the ability of the method to concentrate return-relevant variation into a single predictor.
Panel B of Table 4 shows that the use of further components in combination with LASSO uniformly improves investment performance across all models. This result is again consistent with the results reported in Table 3. In particular, all strategies deliver highly significant returns, surpassing the t-value threshold of three by Harvey et al. (2016).
Turning to the specifics, the use of PCA for the LHS leads to investment strategies with higher average returns while the use of RPPCA leads to higher Sharpe ratios, irrespective of the strategy or the RHS model. Furthermore, although strategies that utilize PCA for the LHS have higher volatility, they also exhibit a higher hit-rate and a lower max drawdown, reflecting higher consistency and lower downside risk. Overall, results are now similar across models, suggesting that once further components are considered, no significant difference arises across methods.
Panel C displays the results for the benchmark models. In line with prior literature (e.g., Gupta and Kelly (2019)), factor momentum using a 1-month formation period achieves the highest return among the benchmark models for the LSS strategy, while the 12month signal delivers higher returns for TFSM and CFSM. Using the book-to-market ratio, issuer-repurchaser spread or investor sentiment as predictors results in strategies with moderate return performance and Sharpe ratios. The historical sample average strategy delivers low average returns, albeit statistically significant. Such a strategy produces conservative return forecasts and as a result, takes more static positions compared to the rest of the models. Comparing results across panels, characteristic-based models that employ LASSO outperform all benchmark models under all three strategies, demonstrating the benefits of conditioning factor portfolio returns on observable characteristics under a regularized framework.
In order to compare the performance of the various models across time, Figure 5 presents the cumulative return performance of the factor timing portfolios under the three investment strategies. For conciseness, we only display the performance for the characteristicbased models employing LASSO together with the benchmark models. Graphs to the left show the cumulative performance over the whole out-of-sample period and graphs to the right focus on the last ten years. As it can be seen from the graphs, the 1-month factor Notably, the book-to-market approach works equally well with the characteristic-based models in later years. Both approaches isolate the first five PCs of factor portfolio returns and use a characteristic-based measure to create forecasts. As such, results highlight the importance of focusing on the main sources of variation and the ability of characteristics to explain the dynamics of factor portfolios. Characteristic-based models outperform the rest of the benchmarks under all three strategies, with the difference being more pronounced for the LSS strategy, as it focuses solely on the most prominent subset of factor portfolios. Overall, the profitability of the benchmark strategies erodes significantly in later years, suggesting that the informativeness of alternative predictors about future factor portfolio returns has faded.
It is also important to note that factor timing strategies based on observed characteristics yield positive returns in the most recent period, even though most factors have been found empirically to die out over time (Chordia et al., 2014;McLean and Pontiff, 2016;. Corroborating this evidence, a comparison between Table 4 and Table IA.4 reveals that characteristic-based factor timing strategies exhibit investment performance superior to that of unconditional factor portfolios. In that sense, our paper acknowledges the fact that unconditional risk premia lack robustness and shows that focusing on the predictability of conditional risk premia can help an investor expand her investment opportunity set. In a similar vein,  find that strong factor portfolio predictability implies a stochastic discount factor that is much more volatile than previously thought.
Lastly, a question that arises is what are the trading positions that our characteristicbased models take over time. The analysis presented in Section IA.7 of the Internet Appendix provides some interesting insights. First, even though prominent anomalies such as mom12m and retvol are heavily traded, the factor timing strategies rotate among multiple anomalies and do not focus on only a small subset with high unconditional returns. Second, several anomalies appear almost equally often in the long and the short legs. Finally, anomalies that have only a small impact on the PC portfolios are hardly considered by our factor timing strategies, which is expected given that their return forecasts are by construction tilted towards zero.

Alternative approaches
In this section, we examine different estimation approaches. Our method uses a large collection of characteristics and combines different dimension reduction and regularization techniques to achieve robust out-of-sample predictability. As such, it is important to examine where the predictability stems from by evaluating the incremental effect of each contributor on the out-of-sample performance. Furthermore, it is important to assess whether the benchmark models can beat our characteristic-based models once dimension reduction and regularization techniques are also used in their cases.
Starting with the characteristics, the simplest approach is to forecast each anomaly using the time-series of its own characteristic spread. Alternatively, one can forecast each anomaly individually using the whole collection of characteristics and can further employ   , predicting each anomaly by its own spread is not particularly successful as it provides negative OOS R 2 and relatively low average cross-sectional-correlation. When we incorporate the full set of characteristics for each anomaly, the OOS R 2 metric worsens possibly due to overfitting, but the average cross-sectional correlation improves in two out of the three cases (the exception being the model that uses PLS on the RHS). When we further condense the information content of the anomalies into five PC portfolios the average cross-sectional correlation increases even more. Nevertheless, the OOS R 2 remains negative and the cross-sectional correlation is still at the levels of 6%, clearly lower than the 8%-9% provided by our main models using dimension reduction also on the RHS (Panel B of Table 3). Overall, the results of Panel A corroborate the importance of using the full set of characteristics for factor timing purposes, while they further highlight the additional benefits that arise when incorporating dimension reduction and regularization techniques on both sides of the forecasting exercise.
Panel B shows the results for the alternative specifications of the benchmark models.
Applying the momentum signal on the PCs instead of individual anomalies has an inconsistent effect on forecasting performance as it improves the OOS R 2 but reduces the cross-sectional correlation. Using the book-to-market ratios of all PC portfolios to predict each single of them individually is not particularly fruitful with both performance measures worsening compared to the main PCA-BM model. Predicting each anomaly by its own book-to-market ratio delivers a positive OOS R 2 and higher cross-sectional correlation, but it still falls behind the baseline BM model. In terms of the issuer-repurchaser spread, using the spreads of all portfolios instead of the spread of each individual portfolio, as in the baseline IR Spread model, reduces the OOS R 2 but improves considerably the cross-sectional correlation. 23 Nevertheless, the use of LASSO does not make any substantial contribution in this case, with the OOS R 2 still being on the negative side and the cross-sectional correlation decreasing. Finally, using the investor sentiment index to predict the PC portfolios or in combination with LASSO to predict individual anomalies has little effect compared to the baseline Sentiment model. Overall, we find mixed results for the modified benchmark models, with the dimension reduction and regularization additions improving the models only occasionally. In any case, even the best modified models exhibit clearly worse performance than our main characteristic-based models.   Table 6: Portfolio evaluation measures for long-short (LSS), time-series (TSFM) and cross-sectional (CSFM) strategies under the alternative specifications for the sample period 1990-2019. Average Return: average monthly return, Standard Deviation: monthly standard deviation, Sharpe ratio: monthly Sharpe ratio, t-statistic: t-statistic on H 0 : Average Return = 0, Hit-Rate: percentage of the total number of occasions that the strategy resulted in positive returns, Maxdrawdown: maximum cumulative loss. Table 6 presents the portfolio evaluation results for the modified models. Starting with Panel A, the investment performance of the modified characteristic-based models is broadly in line with the cross-sectional correlations from Table 5. In particular, all strategies exhibit good investment performance, while the average returns and Sharpe ratios tend to improve when, on top of using the whole set of portfolio characteristics, we further incorporate PCA and/or LASSO in the forecasting exercise. Still, even the best performing modified models fall behind the main ones presented in Panel B of Table   4. Therefore, it is confirmed again that using multiple portfolio characteristics is indispensable for forming a successful factor timing strategy, but the dimension reduction and regularization techniques provide additional benefits. Turning to Panel B, the PCA-based momentum models, the model that uses all issuer-repurchaser spreads, the model that uses each portfolio's book-to-market ratio, and the model that employs LASSO together with market sentiment appear to be the strongest ones. This is unsurprising given that these models also deliver the highest cross-sectional correlations in Table 5. Importantly, even these alternative benchmark models exhibit weaker investment performance than our preferred characteristic-based models in Panel B of Table 4. Overall, the alternative information sets have lower factor timing ability compared to the set of portfolio characteristics even if they are enhanced by employing dimension reduction or LASSO.

Conclusion
We investigate the predictability of factor portfolios from their own portfolio characteristics, going over and above existing methods for predicting factor portfolio returns and examining the possibility that factor portfolios are predictable by characteristics other than their own. Our approach offers a natural continuation to the stock return predictability problem and our findings shed light on the evolution of the underlying return drivers over time. Under our empirical framework, a large collection of stock characteristics is used to initially construct the LS portfolios and subsequently predict their performance. A key aspect of our methodology is the reduction of the dimensions of the predictability problem, which we achieve by independently shrinking the number of predictors and forecasting targets. Our approach provides a new framework for dealing with panel data, allowing each source of variation to be approximated by models of different complexity. By using a flexible model specification that combines LASSO with dimension reduction techniques, we allow the number of predictors to vary across PC portfolios and over time. We find this approach to be especially fruitful, as it considerably improves results over a static single latent factor model.
In terms of factor portfolio predictability, we observe significant benefits from timing factor portfolio returns using observed characteristics. These benefits go over and above existing methods documented in the literature, highlighting the importance of considering the information in the characteristics in a collective way. Specifically, the dominant PC portfolios are highly predictable by the information contained in their characteristics and this predictability can be easily extended to individual anomalies. In that sense, dimension reduction techniques not only accommodate the computational tractability of the estimation problem, but also improve forecasting and investment performance by enabling us to focus on the sources of variation that are most predictable. The performance of our factor timing strategies is superior to that of any individual anomaly and persistent over the later years of the sample period, demonstrating the benefits of timing over static factor investing. Hence, in the context of anomaly return prediction it is important to (1) account for the information contained in mutliple characteristics,

IA.1 Factor variation explained by the PC portfolios
One of the key elements of our predictive approach is condensing the information of the factor portfolios using either conventional PCA or the RPPCA of Lettau and Pelger (2020). While the estimation of the PC portfolios is straightforward, the number of PC portfolios to be retained remains an empirical question. Figure where W is a (N × N ) matrix whose i th column w i is the eigenvector of Σ and Λ is a diagonal matrix whose diagonal elements are the corresponding eigenvalues in decreasing order. The i th eigenvector w i , solves: Practically, the solution in Equation (2) is obtained via a singular value decomposition (SVD) R. The (T ×N ) matrix of PCs is then obtained by multiplying the matrix of factor portfolio returns with the eigenvectors, Z = RW . Notice that since W is an orthogonal matrix, this is equivalent to regressing the factor portfolio returns on the eigenvectors.
PCA is also used to regularize the characteristics of each PC portfolio, H i . This logic is identical to Principal Component Regression (PCR) where the predictors are transformed to their PCs and the coefficients of low variance PCs are set to zero.

Risk Premium PCA (RPPCA)
In general, PCA extracts factors that best explain time-series variation in the data.
The variance-covariance matrix of factor portfolio returns can also be written as Σ = 1 T R ′ R −RR ′ , whereR is an (N × 1) vector of average portfolio returns. Since average returns are subtracted, PCA utilizes information from the second moment while it neglects information from the first moment of the data. Some factors may have weak explanatory power in terms of variance if they only affect a small proportion of assets, but may still be important in an asset pricing context. In this case, conventional PCA is unable to detect the true factors (Onatski, 2012). Under an Arbitrage Pricing Theory framework, exposure to systemic risk factors should be able to explain the cross-section of expected asset returns (Ross, 1976). As such, latent factors should be able to simultaneously capture time-series variation and explain the cross-section of average returns. Lettau and Pelger (2020) propose a new estimator by augmenting PCA with a penalty term to account for pricing errors in average returns. RPPCA is a generalization of PCA, regularized by a cross-sectional pricing error and can be implemented by simple eigenvalue decomposition of the variance-covariance matrix of asset returns after a simple transformation: Essentially, the method applies PCA to the variance-covariance matrix with over-weighted Again, we apply SVD on 1 T R ′ R + 10RR ′ and retain the first five eigenvectors to calculate the PC portfolios Z i ∈ R (T ×1) , i = 1, . . . , 5. Since the purpose of RPPCA is to detect weak factors within asset returns and given that characteristics are standardized due to their difference in scale, it would be insensible to apply it on H i ∈ R (T ×M ) , i = 1, . . . , 5.
Instead, we apply SDV on each 1 Partial Least Squares (PLS) One of the limitations of PCA is that it focuses on condensing the covariation within the predictors. However, some of the characteristics may have no predictive power, meaning that PCA-based PCs can contain information that is ultimately useless in the forecasting exercise. In contrast, PLS constructs linear combinations of the characteristics based on their relationship with future returns by directly exploiting the covariance between the two. The method can be used to rotate H i into linear combinations that best explain Z i while still being orthogonal to each other.
1 A value of γ = 10 is also consistent with what the authors identify as optimal in their empirical exercise.
The vector of weights for the i th PC is estimated recursively by solving: Equation (4) highlights the distinction between PLS and PCA. Specifically, by making a comparison between Equation (2) and Equation (4) of PC portfolio Z i ∈ R T ×1 on a set of characteristic PCs X i ∈ R T ×M as: where δ is a hyperparameter that determines the degree of regularization such as: High values for δ result in solutions that set many coefficients exactly equal to zero, delivering parsimonious models. Using the coordinate decent algorithm by Friedman et al. (2010), we fit many values of δ simultaneously and pick the one that minimizes the forecasting error in the validation period.

IA.3 Sources of variation in the PC characteristics
The characteristics of long-short factor portfolios are initially calculated by value-weighting characteristics of stocks within each decile portfolio and then subtracting the value of the bottom decile from the top. At this stage no standardization is applied, meaning that the average characteristic across factor portfolios still preserves its time series trend and the cross-sectional variance for any given characteristic changes over time. The factor portfolio characteristics are then standardized cross-sectionally by subtracting each month the cross-sectional characteristic mean and dividing by the cross-sectional characteristic standard deviation. These standardized characteristics are then transformed into PC portfolio characteristics by being multiplied with w t,i , allowing us to focus on the crosssectional dispersion in the data.  Given our procedure, it is important to understand the sources of variation in the characteristics of each PC portfolio that ultimately lead to PC portfolio return predictability.
Let us start with the diagonal elements in the characteristic matrix in Table IA.1 (e.g., the momentum of the momentum portfolio). The diagonal elements will always have the highest scores across rows (1 in our hypothetical example), since factor portfolios have the highest score for their own characteristic by construction. Still, the diagonal elements would remain constant across time only if each characteristic's distribution across the portfolio cross-section remained identical in terms of skewness and kurtosis. However, we observe significant variability in those higher moments for all characteristics over time.
In particular, Figures IA.2 and IA.3 display the cross-sectional skewness and kurtosis of the 72 characteristics across all the factor portfolios over the whole sample period. As it can be seen, there is significant variation in the skewness and kurtosis of the characteristics across time, which suggests that the diagonal elements do change affecting also the characteristic scores of the PC portfolio.
With regards to the off-diagonal elements of the characteristic matrix in Table IA.1 (e.g., the momentum characteristic of the value portfolio), those can further change from month to month depending on the characteristics of stocks within each factor portfolio. Provided that different factor portfolios do not contain the exact same stocks, each characteristic can differ significantly across factor portfolios and can vary over time in non-standard ways. For example, it is possible that for a given month the correlation between stock momentum and value is high and hence the standardized momentum score of the two factor portfolios is similar, while in another month the correlation might be low and hence the momentum score of the two factor portfolios will be completely dif-  However, we observe that in most of the cases the loadings (especially the most important ones) remain quite stable throughout the whole sample period.
Overall, the PC characteristics change across months (within the same predictive iteration) because the characteristic distributions across portfolios exhibit time-varying skewness and kurtosis and because the characteristic themselves have time-varying cor- relations. An additional but less important source of variation stems from the recursive estimation of PC portfolios and consequently weighting vectors.

IA.4 Details of the predictive models
The main analysis of the paper focuses on four models that incorporate PCA or RPPCA for condensing the variation in factor portfolio returns and PCA or PLS for condensing the variation in the portfolio characteristics. We further consider two cases, one where we retain only a single latent factor from the characteristics and another one where we retain all the characteristic factors but employ LASSO to select the important ones at each forecasting period. In addition, we select a series of alternative benchmark models that rely on predictors such as the past return of the factor portfolios, the issuer-repurchaser spread, market sentiment etc. Finally, we modify our main models in order to investigate what the most important pillars of our successful predictive method are, and also modify the benchmark models in order to examine whether adding additional statistical features to them delivers performance similar to the one provided by the portfolio characteristics.
For convenience, Table IA.2 presents the list with all the different models used in the study, together with a detailed description of how we handle the left-hand-side and the right-hand-side of the forecasting problem.

IA.5 Factor database and statistics
Our empirical analysis uses 72 factor portfolios based on stock characteristics considered by . We provide a detailed description of each characteristic in Table   IA.3. In particular, we present the acronym of the characteristic, the paper that identifies the characteristic, and the exact definition. In Table IA.4, we further report some summary statistics for the 72 factor portfolios that are based on our stock characteristics. Specifically, we report the average monthly return with the respective t-statistic, the monthly volatility, and the monthly Sharpe ratio of each portfolio. As discussed in the main paper, there is high cross-sectional variation in the performance of the portfolios with only 22 of them having an average return significantly different from zero.

IA.6 Interpreting the PC portfolios
One disadvantage of extracting latent common factors from factor portfolio returns is that the resulting PC portfolios do not have a straightforward economic interpretation.
In order to tackle this problem, we regress recursively each PC portfolio return on each of the 72 anomalies and retain the R 2 . Next, we estimate the average R 2 across months for each PC portfolio and each anomaly. As an example, we present in Figure IA.8 the results of the PCs stemming from PCA. We observe that the first PC portfolio loads more on volatility variables such as beta, idiovol, maxret, retvol and std turn. The second PC portfolio loads more on value variables such as bm, bm ia and sp, as well as on profitability and leverage variables such as gma, cashdebt and lev. The third PC portfolio is clearly affected by the momentum characteristics, while the fourth PC portfolio is mainly driven by r&d variables. The fifth portfolio loads more on salerec and to a lesser extent on gma.
The above results stem from a time-series aggregation of the estimated R 2 s. Another interesting question that arises is whether the relation between the PCs and the underlying characteristics is stable across time. To this end, we select the most important characteristic for each PC portfolio and plot the R 2 of all of them across time in Figure   IA.9. In particular, we plot the R 2 values of beta, bm, mom12m, rd sale, and salerec.
Each characteristic dominates a respective PC portfolio and it is evident from Figure IA.9 that all the loadings are remarkably stable across time. Overall, we conclude that the PC portfolios extracted with statistical techniques have reasonable economic interpretation and the recursive estimation does not impact negatively this interpretation.

IA.7 Factor timing strategy constituents
In this section we examine the trading positions of our factor timing portfolios. Specifically, using the forecasts from the LASSO-based models and the LSS, we investigate how often each anomaly is traded. The anomalies traded under LSS are also the ones with the highest absolute weights under TSFM and CSFM, so focusing on this case only is representative of the general investing approach. Figure IA.10 displays the percentage trading frequency of each anomaly. Blue bars imply long and orange bars imply short positions. Clearly, the benefits of factor timing strategies arise from rotating among multiple anomalies and not by focusing on a handful of picks. Yet, all strategies tend to go long on anomalies with high average returns, such as mom6m and mom12m, and short anomalies with negative average returns, like beta, chmom and retvol. Furthermore, some anomalies appear almost equally often in the short and the long legs of our factor timing portfolio, implying higher volatility in conditional returns. For the models that use PCA, these anomalies are usually related to market frictions, which also have higher volatility and thus load more heavily on the first PCs. When RPPCA is used for the left-hand-side, the anomalies more regularly traded are those with higher absolute aver-age returns. Finally, anomalies that do not load heavily on the dominant components, using either PCA or RPPCA, stay out of the investable universe as their small loadings compress their individual return forecasts close to zero. Consequently, the use of PCs for the left-hand-side has an impact on the factor timing portfolio formation.