In this post I will explore whether there is a correlation between Covid-19 deaths and elderly influenza vaccination rates across countries.

## Introduction

At least two studies have found an association between flu vaccines and susceptibility to *other non-influenza respiratory diseases*, including *other coronaviruses*:

- Cowling, 2012 – a randomised controlled trial (RCT) in which 69 children were given a flu shot and 46 a placebo. 20 vaccinated children (29%) got sick with a non-flu virus and only 3 unvaccinated children (6.5%) got sick with a non-flu virus, a statistically significant result. The most common non-flu viruses detected were rhinoviruses and coxsackie/echoviruses. For both these virus types, a significant association was found between getting a flu shot and susceptibility to the virus. The sample size was too small to identify any association between flu shots and coronaviruses.

- Wolff, 2020 – a retrospective study of the illness and vaccination records of 9469 individuals working for the Department of Defense, of which 6541 had received the 2017/8 seasonal flu shot and 2928 had not. The flu shot was associated with reduced risk of getting the flu but an increased risk of non-influenza illnesses, including specifically coronaviruses. 507 (7.8%) of the vaccinated and 170 (5.8%) of the unvaccinated tested positive for a coronavirus, resulting a significant relationship with an odds ratio of 1.36 [1.14-1.63 95% CI].

The above studies suggest that the flu shot may increase susceptibility to coronaviruses, possibly by a mechanism known as viral interference. It is hypothesised that the flu shot may increase susceptibility to SARS-CoV-2 via the same mechanism. Hence, there are reasons to expect a correlation between flu shot rates and Covid-19 death rates.

Correlation does not imply causation, obviously. No ecological study, no matter how strong the correlation, can ever be strong evidence of causation. They can merely give us a clue about where to look and what further studies to do. Ecological studies are low on the evidence hierarchy (beneath RCTs, cohort and case-control studies) but they are quick, cheap and easy, which is why these studies are often first to emerge in cases like this.

Since we have no studies of the stronger types yet, ecological studies are the best we can do for Covid-19 at the moment. One study comparing regions of Italy can be found here; a significant negative correlation was found, but no confounders were examined. Here I will present my analysis of the international data we have for flu vaccination rates and Covid-19 death rates, and then examine six possible confounders.

## Data Sources

Covid-19 death rates per million people by country are available from Our World In Data. In this study, death rates as at 31st July 2020 were used.

Influenza vaccination rates in elderly people (defined for most countries as aged 65+) are available from The OECD. However, data from within the last 5 years is only available for 31 out of the 37 OECD countries. I could find no explanation why there is no recent data for the 6 other OECD countries (Austria, Australia, Colombia, Mexico, Poland and Switzerland). In this study, latest available vaccination rate data is used for each country (for most countries, this is the 2018/9 seasonal flu vaccine uptake rates).

## Europe

There are 23 European countries in the OECD for which flu shot data is available. I have excluded two of them – Iceland and Luxembourg – for having a population of less than a million. The remaining 21 countries are shown in the following plot:

As indicated by the slope of the red line, there is a positive correlation between these two variables. A correlation coefficient is a measure of the degree to which a pair of variables are linearly related, between -1 and +1. As shown on the chart, the correlation coefficient R is **0.67**, which is considered a moderate-to-high correlation. The p-value of <0.01 shows that the null hypothesis (i.e. no relationship between these variables) is falsified by the data, with a confidence level exceeding 99%.

## World

There are 8 non-European OECD countries for which data is available: Canada, USA, Chile, Turkey, Israel, Korea, Japan and New Zealand. Adding these countries to the plot, we find:

The positive correlation is still present but is weakened, the correlation coefficient now **0.49**. This is due to Korea, Japan and New Zealand all being outliers, having high vaccination rates and low Covid-19 death rates. The p-value of <0.01 shows that the null hypothesis of no relationship between these variables is falsified by the data, with a confidence exceeding 99%.

## Confounders

One way to improve an ecological study beyond a single-variable is to look for confounders. A confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. I have looked at all of the following six variables that have been suggested to me as possible confounders:

- Income (as GDP-per-capita)
- Population density
- Elderly as a proportion of population
- Climate (as average temperature in April)
- Health of population (as life expectancy)
- Healthcare system (as hospital beds per population)

### Income

Slight positive correlation: richer countries had more Covid-19 deaths. Not statistically significant.

### Population Density

No correlation: being more densely packed is not associated with more Covid-19 deaths.

### Elderly Population

No correlation: having more elderly people is not associated with having more Covid-19 deaths.

### Climate

No correlation: being a colder country is not associated with more Covid-19 deaths.

### Health of Population

Slight positive correlation: countries where people live longer had more Covid-19 deaths. Not statistically significant.

### Healthcare System

Here we have a weak but statistically significant correlation: countries with more capacity in the healthcare system had fewer Covid-19 deaths. The correlation coefficient is **-0.37**, and with a p-value of 0.049, the result is significant with 95% confidence but not 99% confidence.

## Multivariate Models

### Two Variables

Out of the 7 variables tested, two showed a significant relationship with Covid-19 death rates: flu shot rates and hospital beds. If we create a model based on these two variables, we obtain the following:

The p-value for the flu shot rate is still <0.05, so remains significant to a 95% confidence level, but the p-value for the hospital beds has gone to 0.068, so is no longer statistically significant at that level.

### Seven Variables

If we create a model based on all seven variables that we have, we obtain the following:

Here we have an opposite result to the two-variable model. The p-value for the hospital beds is still <0.05, so remains significant to a 95% confidence level, but the p-value for the flu shot rate has gone to 0.076, so is no longer statistically significant at that level.

## Conclusion

Does the flu shot make us more vulnerable to Covid-19? Until we have better kinds of studies, ecological studies of Covid-19 death rates against flu vaccination rates are the best way to get an idea of whether the flu shot makes people more susceptible to Covid-19 due to viral interference, as seems to occur with other coronaviruses.

This ecological analysis found a correlation coefficient of **0.67** when only European OECD countries are included. This would be classified as a moderate-to-high positive correlation. With the addition of non-European OECD countries the correlation coefficient is **0.49**, a low-to-moderate positive correlation. This contradicts the finding of the study of Italian regions, which found a negative correlation of **-0.58**, without looking for confounders. Confounders between Italian regions should be examined, and similar studies of regions within countries should be done to try and resolve this apparent contradiction.

Six possible confounders have been analysed and one (hospital beds per thousand population) was found to have a significant association to Covid-19 deaths, just like the flu shot rates. When combined into a multivariate model, these two variables seem to cancel out, to the extent that one of them becomes statistically significant, but which one is dependent on what other variables are included in the model. I cannot explain this behaviour – if you think you can, let me know!

## Appendix: Paul’s Chart

This picture has been shared on social media:

It comes from this blog, authored by “Paul”, who created it in response to a “whimsical suggestion” in the BMJ (here) by Dr Allan Cunningham: to correlate influenza vaccine uptake with Covid-19 death rates. My post above was inspired by Dr Cunningham’s challenge and by seeing the flaws in Paul’s chart and wanting to dig into the data myself.

In his note, Dr Cunningham provided data on influenza vaccine coverage rates in the elderly and covid-19 death rates per million for 20 selected European countries. His source for covid deaths was Worldometers, accessed 21st May 2020. His source for flu shot rates was the OECD. There are 26 European members of the OECD… Dr Cunningham did not include Belgium, Greece or Iceland in his list of 20 countries despite the data being available at the same source he used – with no explanation given. As we have seen, data from Austria, Poland, and Switzerland are not available from the OECD.

Here is a plot of the 20-countries data provided by Dr Cunningham:

A coefficient of determination, or R-squared, is the proportion of the variance in the dependent variable (covid deaths) that is predictable from the independent variable (flu shot rate). Using just Dr Cunningham’s 20 countries, we find an R-squared of 0.5327 under the assumption of a linear relationship, which corresponds to a correlation coefficient of **0.7299**.

On Paul’s chart, this correlation coefficient is displayed prominently… but as we have seen, this value comes from Cunningham’s data as shown on the chart above. Paul’s chart is completely different, so the value of R he shows has no relation to his chart! Paul’s chart displays 27 data points (seven extra), the values plotted are different from Cunningham’s (due to a change to the data source), and the line drawn on the chart is exponential rather than linear (so it has nothing to do with the correlation coefficient of 0.7299, which assumes a linear relationship).

The extra seven countries that Paul added are curious. From among the three countries mysteriously omitted by Dr Cunningham, Paul rightly added back Belgium and Iceland, but not Greece, which would be a significant outlier, weaking the association. Paul adds Poland, which isn’t in the OECD data, and Romania and Croatia, which aren’t even in the OECD. Paul refers to the ECDC as a data source for these three and all other European countries. He may be referring to this publication, but I could not find the exact numbers he used. Adding Poland, Romania and Croatia strengthens the association because apparently they all have low vaccination rates and few Covid-19 deaths.

Strangest of all, Paul has added Canada and USA, being the only two countries on the chart outside of Europe. He uses OECD data for these countries, which makes it strange why he would omit other non-European OECD countries like Korea, New Zealand and Japan. These countries would all weaken the association, and their omission seems somewhat convenient for someone wishing to make the case that there is a strong correlation.

It is misleading to display on the chart an R value that has nothing to do with the data in the chart. It is misleading to cherry-pick countries and omit significant outliers without explanation. I think my charts give a more complete and honest picture than this chart by Paul.