An August 9th, 2021 Wall Street Journal article by Karen Langley stated, “Shares of Colgate-Palmolive Co. fell 4.8% after the consumer-products company said it expects costs to chip away at its margins in the second half of the year.” A graph of the market’s reaction is available here and links to the filings are here and here.

Despite the most recent seasonal quarterly earnings per share increasing by 12% (from $0.74 to $0.83), and Net Sales increasing by 9.5%, the company also provided the following forward guidance: “On a GAAP basis, the Company now expects a decline in gross profit margin, increased advertising investment and earnings-per-share growth at the lower end of its low to mid-single-digit rate.”

Prices then promptly dropped the 4.8% noted above. If the market was expecting much larger increases then this earnings announcement could still be seen as disappointing. But it could also be the case that the reported results were in line with expectations but what was deemed more important was the information about the future rather than how the last quarter shook out.

What does the academic literature say about how much the market actually cares about accounting performance measures that are always (mostly) backward looking and historical in nature? I think that depends on which pieces of the published evidence you look at, and that’s what I’m going to discuss today.

If you grab an finance, accounting, or economics professor off the street many would say that “value relevance” studies have consistently shown that the stock market reacts to earnings reports. This is true. Over the decades many different researchers in accounting, finance, and economics have found that various measures of earnings and returns have a “statistically significant” relationship.

But upon closer inspection, it’s clear that most of the papers with the large test statistics, small p-values and statistical significance explain very, very little of the market’s reaction. Here are some examples of influential papers and how much of the market’s returns around earnings announcements was explained by the statistical models. Note that not all of these papers are clear about whether they are reporting R-squared or Adjusted R-squared. The papers also contain various return windows (e.g., 5 day, 20 day, quarterly, annual, four-year, etc.), various models to separate “abnormal” returns from total returns, many different measures of both “Street”, actual, and expected earnings and/or other financial statement measures, different time periods, different samples of firms, different industries, and models that are both parsimonious and complex. The papers are:

Amir and Lev. Value-relevance of nonfinancial information: The wireless communication industry. Journal of Accounting and Economics. 1996.

Reported R-squared ranges from  3% to 10%.

Hammersley, Myers, and Shakespeare. Market reactions to the disclosure of internal control weaknesses and the characteristics of those weaknesses under section 302 of the Sarbanes Oxley Act of 2002. Review of Accounting Studies. 2008.

Reported R-squared of 3%.  

Subramanyam. The pricing of discretionary accruals. Journal of Accounting and Economics. 1996.

Reported R-squared ranges from 4% to 6%.   

Bernard and Thomas. Post-earnings announcement drift: Delayed price response or risk premium? Journal of Accounting Research. 1989.

Reported R-squared of 6%.  

Swaminathan and Weintrop. The information content of earnings, revenues, and expenses. Journal of Accounting Research. 1991.

Reported R-squared of 3%.  

Bartov, Givoly, Hayn. The rewards to meeting or beating earnings expectations. Journal of Accounting and Economics. 2002.

Reported R-squared ranges from 6% to 8%.

Dechow. Accounting earnings and cash flows as measures of firm performance. The role of accounting accruals. Journal of Accounting and Economics. 1994.

Reported R-squared ranges from 3% to 40%.

Easton and Harris. Earnings as an explanatory variable for returns. Journal of Accounting Research. 1991.

Reported R-squared ranges from 3% to 7%.

Hayn. The information content of losses. Journal of Accounting and Economics. 1995.

Reported R-squared ranges from 5% to 17%.

Subramanyam. The pricing of discretionary accruals. Journal of Accounting and Economics. 1996.

R-squared ranges from 2% to 6%.  

Kinney, Burgstahler, and Martin. Earnings surprise “materiality” as measured by stock returns. Journal of Accounting Research. 2002.

R-squared ranges from slightly above 0% to 3%.

In PhD programs, concerns about models that explain such a small fraction of the variation in what they seek to predict are often explained away along the lines of, “Markets are extremely complicated for any model to explain well and we still found statistical significance so everything is okay. Just go use these models to publish some stuff and get tenure.” My question is this: is it really useful to find that a component of a model is statistically significant when the entire model has almost no power to explain much of anything? It’s akin to saying, “We found something that statistically significantly explains an insignificant portion of the variance in the thing we are interested in predicting or understanding.”

This might sound harsh, but I’d like you to take a moment and look at three graphs. When you look at them, please don’t even read the labels at first. Just look at the scatter plot of y and x and ask yourself these questions:

  1. How strong of a relationship does there appear to be between these variables?
  2. Is the functional form of the relationship linear, nonlinear, or does it look like there really isn’t much of a relationship at all?
  3. Based on what you see, how confident would you be using predictions from this model out of sample (i.e., on data the model has never seen before and was not trained on)?

Graph 1

Graph 2

Graph 3

Since you saw the titles and labels, you know that these are graphs of various returns or abnormal returns in different windows plotted against earnings. Plots like this are what the data behind a linear model with an R-squared of around 5% actually looks like. And running a regression of y on x for data like this can still result in statistical significance… and therefore tenure.

Here is a graph that’s more convincing. This graph is from a replication of MacKinlay. Event studies in economics and finance. Journal of Economic Literature. 1997. The dataset used to create this graph included all companies in SIC 5411 (grocery stores) and pricing data that runs through the middle of this year.

My bet is the professor you pulled off the street earlier in the post and asked about value relevance of earnings was answering based on seeing something similar to this in a paper and the multitude or large test statistics and small p-values from the corpus of academic literature. But my point is that the scatter plots you see of returns and earnings are created from the same dataset used to also create the plots of average returns for each trading day and each group of “good news”, “bad news”, and “no news” firms. The R-squared’s from the same dataset used to construct the graph per MacKinlay are around 2%.

The market appears to respond to earnings announcements fairly cleanly and convincingly from one graph, but the response is more nuanced than that. The other things that need to be considered are the 1) variances between the groups, 2) the explanatory power of the overall statistical model (i.e., R-squared), and 3) additional graphs that plot not only the averages of the groups but show the dependent variable (stock market returns) and the independent variable (the percentage change in seasonal earnings per share) plotted together.

I’m not saying any author of any paper here is wrong. My replication shows the same statistical significance of earnings as many previously published papers have reported. The replication can also produce the same convincing graph from MacKinlay’s paper. All I’m trying to do is ask, “Can we look at other summary statistics, produce other data visualizations, or look at other parts of the regression model’s output to learn a little bit more about this?” I think the answer to all of those questions is, “yes”, and I think the results are intriguing.  

Markets are complex and forward looking. Some would say markets have already correctly anticipated, on average, what the earnings announcement will be and that “price leads earnings.” Others would say trying to forecast stock prices is a fool’s errand. There is a ton of data investors can use from any 10-Q, any 10-K, or any earnings call to make decisions- earnings per share is just one potential number from these filings.

Lots of other useful data exists outside of earnings calls and SEC filings, and expectations about the future may not be captured well by past earnings. Investors are a diverse group with different methodologies, opinions, biases, etc. Macroeconomic variables and expectations of the future macroeconomic environment are perhaps even more important for the future prospects of a company than anything that happened a few months ago and was calculated in accordance with Generally Accepted Accounting Principles.

There may even still be ways to make money based on the noisy relationships shown above- maybe knowing that, on average, the market reacts positively to “good news” earnings announcements is like the house edge in Black Jack. Perhaps over the long run being right 51% of the time is enough.

All of the code I wrote to: merge the pricing data and fundamentals data, calculate cumulative returns, calculate the “expected” returns based on the constant mean return model, calculate abnormal returns, generate all of the data visualizations, and produce all of the summary statistics and regression models is available on a publicly available GitHub repository. Detailed notes are included throughout the code for auditing and reconciliation of data to external sources (e.g., Yahoo! Finance, 10-Q’s on EDGAR, etc.). The datasets used were from Sharadar and were originally obtained via Quandl prior to their merge with Nasdaq.

Technical Details:

The average 20 day cumulative return for good news firms is 2.5% and for negative returns is -2.3%. The minimums and maximums of the average returns of the good news and bad news firms are -28% (minimum) and 32% (maximum) and -34% (minimum) and 23% (maximum), respectively. The standard deviations of each group are approximately 13%. The test statistic for the difference in means is 2.19 regardless of whether equal or unequal variances are assumed. Regressions of the 20 day cumulative returns on the percentage change in seasonal earnings yield an R-squared of 1.8%, a slope estimate of 0.027, and a test statistics greater than 2. These results are robust to exclusion of an intercept and clustering standard errors at the firm level. They are not robust to estimating a model with seasonally differenced earnings per share in levels as opposed to the percentage change in seasonal earnings per share. Thus, a single percentage change in seasonal earnings is predicted to change 20 day cumulative returns by 0.027%. The estimated impact is statistically significant at conventional levels, but the model explains almost none of the variation in the returns.

The average 20 day cumulative abnormal return for good news firms is 2.4% and for bad news firms the mean return is -1.9%.  The minimums and maximums of the average returns for good news and bad news firms are -37% (minimum) and 29% (maximum) and -38% (minimum) and 36% (maximum), respectively. The standard deviations of each group are approximately 14%. Regressions of the 20 day cumulative abnormal returns on the percentage change in seasonal earnings yield an R-squared of 1%, a slope estimate of 0.022, and a test statistic of approximately 1.6. The results are robust to exclusion of the intercept and clustering standard errors at the firm level but not to estimating the model with seasonal earnings differences in levels. Overall, this is a very similar pattern to what was observed for 20 day cumulative returns albeit with less precise parameter estimates and the resulting smaller test statistics and larger p-values.

The largest difference in market reactions for the good news and bad news firms occurs around the fifth day after the announcement. The 5 day cumulative abnormal return mean for the good news firms is 1.4%, but for bad news firms the mean return is approximately -4%. %.  The minimums and maximums average returns of the good news and bad news firms are -26% (minimum) and 35% (maximum) and -37% (minimum) and 13% (maximum), respectively. The standard deviations of each group are approximately 11%. The test statistics for the difference in means are close to 2.6 regardless of whether equal or unequal variances are assumed. Regressions of the 5 day cumulative abnormal returns on the percentage change in seasonal earnings yield an R-squared of 2%, a slope estimate of 0.023, and a test statistic of approximately 2.4.The results are robust to exclusion of the intercept, albeit with a diminished slope estimate of 0.02. The results are also quantitatively similar clustering standard errors at the firm level but not to estimating the model with seasonal earnings differences in levels. Overall, these results are similar to what was observed for both the 20 day cumulative return window and the 20 day cumulative abnormal return window.

To summarize, the average returns of the good news and bad news firms are in the expected direction with good news firms experiencing a positive market reaction and bad news firms eliciting a negative market reaction. However, the variances in the groups are rather large. Good news firms have negative market reactions and bad news firms have positive market reactions and neither of these “unexpected” reactions are particularly rare. This is illustrated by the scatter plots of returns and the percentage change of seasonal earnings per share and, no surprise after seeing the scatter plots, the miniscule R-squared’s from the regression models.