I built a web application using Python, Streamlit, and Heroku. The app shows chess players their five best and worst openings for each color, the highest rated player they have ever played, and the highest rated player they have beaten. Additionally, the players they have beaten the most and lost to the most are also reported. Graphs of the player’s average annual rating are included along with summary statistics for each year and time control combination.

During the pandemic, and in-between knee surgeries, I started focusing on aerobic base training runs. The guidance I followed is from Steve House’s “Training for the New Alpinism.” One thing I noticed was that focusing only on aerobic base training significantly improved my time on his “box step test.” The test (described on pg. 179) consists of putting a 16 kg kettlebell into a backpack, wearing mountaineering boots, and doing 1,000 step ups on a one-foot box for time.

What surprised me is that when I was primarily doing CrossFit and “fast” runs, my box step times were in the 42-44 minute range. But after a period of time spent only climbing and doing aerobic base training, I hit a PR of 35:38.

Read the rest of this entry »

SeekingAlpha just published a piece I wrote on Metalla Royalty & Streaming (MTA) . This is an illustration of how the basic financial statements tie together and how investors can use changes in them to see how a company is sustaining itself.

Over the weekend a friend sent me a link to a Reddit post where a user concluded that following Jim Cramer’s stock buying recommendations for the first few months of 2021 would result in a 555% cumulative return. This number seemed implausible so I looked into the data and analyses. My opinion is that once a series of individual stocks has been bought in a time period, summing up the returns is not valid. Instead, one must consider the “weight” of each stock in the combined portfolio of assets. The comment I posted on the thread is below:

Read the rest of this entry »

An August 9th, 2021 Wall Street Journal article by Karen Langley stated, “Shares of Colgate-Palmolive Co. fell 4.8% after the consumer-products company said it expects costs to chip away at its margins in the second half of the year.” A graph of the market’s reaction is available here and links to the filings are here and here.

Despite the most recent seasonal quarterly earnings per share increasing by 12% (from $0.74 to $0.83), and Net Sales increasing by 9.5%, the company also provided the following forward guidance: “On a GAAP basis, the Company now expects a decline in gross profit margin, increased advertising investment and earnings-per-share growth at the lower end of its low to mid-single-digit rate.”

Prices then promptly dropped the 4.8% noted above. If the market was expecting much larger increases then this earnings announcement could still be seen as disappointing. But it could also be the case that the reported results were in line with expectations but what was deemed more important was the information about the future rather than how the last quarter shook out.

What does the academic literature say about how much the market actually cares about accounting performance measures that are always (mostly) backward looking and historical in nature? I think that depends on which pieces of the published evidence you look at, and that’s what I’m going to discuss today.

Read the rest of this entry »

NGVC announced their most recent earnings after markets closed on 5/6/2021 and promptly fell by 20% the next day. I was curious about how the market typically reacts to grocers reporting earnings that are less than the prior seasonally lagged quarter. Using data from Sharadar and obtained via Quandl, the steps to answer this question are:

Step 1: Pull the Tickers dataset conditional on the ticker == NGVC. This dataset has the four-digit SIC code along with other useful information.

Step 2: Filter this dataset based on the SIC Code== 5411 and Table == SF1.

Step 3: Initialize a Python list object, iterate through the tickers that meet the conditions in Step 2, and append each ticker to the list. [Note: these steps are not shown in the posted code. This post starts with Steps 6 and 7 as they are the most complicated.]

Step 4: Pull the Sharadar SF1 data, which contains financial statement variables, via feeding the list in Step 3 into the query and conditioning on the “ARQ” time dimension.

Step 5: Pull the Sharadar daily price data from the SEP table.

Step 6: Merge the earnings and accounting data in Step 4 with the stock price data in Step 5. This step requires consideration of firms reporting earnings on non-trading days (i.e., the earnings announcement date in the SF1 data does not exist in the SEP data since markets were closed on that day). My approach to this was to identify earnings announcement dates that were not connected to trading days, and then to “shift” these dates forward by one or two days and also backward by one or two days to see trading occurred on the shifted dates.

Step 7: Write a function that will take an input for the number of trading days around the earnings announcement to calculate the returns. Note that the day before the earnings announcement is used for the beginning price in the returns calculation. This accounts for firms issuing earnings during or after a given trading day.

Read the rest of this entry »

Shaw Communications (SJR) has segments that include Wireline customers for businesses and consumers and Wireless customers for prepaid and postpaid plans. The  company has recognized around 1.3  billion (CAD) for recent quarters, of which around 300 million (CAD) is earned from the Wireless segment.

An interesting pattern emerges from the customer segments. Shaw is losing Wireline customers but gaining Wireless customers, and most of the decline in the Wireline segments are from consumers rather than business customers. What does it mean for expectations of future revenue if a company is losing customers in the segments that make up around three quarters of their revenue and gaining customers in the segments that make up around a quarter of their revenue? To attempt to answer this question, I ended up:

Read the rest of this entry »

Part 1 and Part 2 show the history and evolution of my attempts to web scrape EDGAR files from the SEC. You can see detailed discussions of the approaches in the previous posts. The implementation issues with Part 2 caused me to rethink my approach to the problem.

I was hesitant to use XBRL tags initially because 1) this data isn’t currently audited and 2) my dissertation used both XBRL data and hand-collected data and I saw differences around 30% of the time between what I hand-collected vs. the XBRL tags. But my opinion now is that using the XBRL tags is the only viable solution to the problem. Previously, I was able to scrape the actual titles used by the company for each financial statement line item, but the substantial amount of variation in the titles used for the same accounts made appending this data together into one DataFrame problematic (e.g., slight differences in naming conventions cause the same underlying variable to be spread out among multiple columns for different firms. This could be coded around in theory, but in practice it would be a nightmare).

Read the rest of this entry »

Yesterday during a presentation I gave there were a few questions about the interpretation of p-values.  My opinion is that classical hypothesis testing confuses many people and there a few things that are counterintuitive. After the presentation, I corresponded with one of the participants and ended up creating a Monte Carlo simulation to help illustrate hypothesis testing and p-value interpretation. It seemed like a logical thing to turn into a blog post.

Peter Kennedy’s Guide to Econometrics has a great treatment of Monte Carlos and explains many topics in econometrics through the lens of Monte Carlos. I also had a time series professor who illustrated topics with Monte Carlos and I found that this approach helped solidify these topics in my mind.

Two things follow: a link to my Github with output from a Monte Carlo simulation that I ran in Stata and a write up based on my correspondence with the participant.

Read the rest of this entry »

An automated approach to web scraping publicly traded companies’ financial statements is something that I’ve been working on for a while. My first post identified the balance sheet due to a firm-specific characteristic of FCCY’s 12/31/2017 balance sheet- namely that it had a row titled “ASSETS.” Of course, not every firm is going to have this header in all caps to identify which table in the .html file is the balance sheet. But it was a start. The next, currently unpublished, step pulled the annual report and then identified the balance sheet using the summation of the number of times accounts commonly found on balance sheets were present.

Some of the issues I ran into while programming this:  the amount of account variation between balance sheets of companies in different industries, the presence of extra spaces or characters present in the .html file that are not readily apparent to human eyes, the use of capital letters in account titles (or not) by different firms, and variation in how the same account may be called by different companies (e.g., additional paid in capital vs. additional paid-in capital and stockholder’s equity(deficiency) vs. shareholders’ equity vs. stockholders’ equity, etc.), financial statements being split in two across separate pages and identified in the file as two separate tables, notes at the bottom of the page with the financial statements also being tagged as tables by the issuer, substantial variation in the exact titles firms use for the various financial statements, and the actual layout of the tables after they have been scraped (e.g., multiple columns for a given year with data spread across the columns).

All of these are things that can be programmed around, and some of these issues we will see later in the post with FCCY’s 12/31/2017 10-K after we scrape it.

Read the rest of this entry »