Professor Caroline Colijn on Estimating the Impact of Physical Distancing on COVID-19 in BC
The topic of this interview is a mathematical model developed by a team of researchers, including Sean C. Anderson, Andrew M. Edwards, Madi Yerlanov, Nicola Mulberry, Jessica Stockdale, Sarafa A. Iyaniwura, Rebeca C. Falcao, Michael C. Otterstatter, Michael A. Irvine, Naveed Z. Janjua, Daniel Coombs, and Caroline Colijn. Read more technical details about the model in a pre-print here
This interview took place on June 3, 2020. Since then, there has been a sharp rise in COVID-19 cases in British Columbia- for more information, visit the British Columbia COVID-19 Dashboard.
"The public can help by avoiding indoor socializing, keeping their distance from each other, not attending parties, bars, clubs and events where they don't know others, and others don't know them - think: how many people would have to be phoned if I got symptoms tomorrow? Do I know their numbers? Would these folks know how to contact me, if they got ill?" -Professor Caroline Colijn, August 19, 2020
Stephanie Harvard (SH): At the time that BC began [its] physical distancing policy, what was the evidence that supported implementing this intervention?
Caroline Colijn (CC): Well, we knew this virus could spread globally. It had already been declared a pandemic on March 11th. We knew it could spread easily human-to-human, and we knew it could get to large outbreaks. Even in the early days, we knew that it caused very severe illness and death. We were starting to understand how much this virus can spread even from people who are not showing symptoms. If you intervene only with people who are already showing symptoms, you’ve missed part of those transmission chains. All of those things together were really strong evidence that we needed to do things like stopping mass gatherings, stopping university classes, and the kinds of things we did starting on March 12th in BC.
SH: Would you describe the evidence as a common-sense, biological rationale? Or were there instances in history in other disease contexts where these kinds of interventions had been tried and were successful?
CC: They were certainly tried 100 years ago in the 1918-19 flu pandemic. It’s not something that was a completely new idea. I would say a little bit of both of those.
SH: What’s the role of a modelling team like yours?
CC: Modellers can be involved in lots of different ways. Our team is involved in developing models, doing short-term forecasting, but also exploration, what we would call mechanistic models. So starting to think about, “What happens if we relax measures?” “What happens if we introduce stronger measures?” “What happens if more people are involved in distancing?” “What happens if the incubation period is longer than we thought?"
One of the powers of mathematical modelling is that we can run experiments with models, where we can’t run experiments in the world. We can’t say, “In this town, we’re going to do this thing, and in that town, we’re going to do the other thing, and we’re going to introduce the same number of cases in the two towns.” Modelling helps us to understand big picture dynamics that we can’t experiment with in real populations.
SH: In British Columbia, were you involved in making recommendations to policymakers?
CC: Absolutely. I was running a hackathon here at SFU because it seemed clear there was a huge potential for a pandemic, and there were already open data and open modelling papers out there from late January. We got in touch with BCCDC [British Columbia Centre for Disease Control] at that point and we started building our team. Out of the hackathon, we identified two data sets that we could use to estimate how much pre-symptomatic transmission there is. We used the hackathon time to get those data sets in a form that we could use to do statistical estimation, and think through how long before symptoms are people transmitting, according to these data sets. From there, we started developing distancing models and other models with the BCCDC.
SH: Today’s conversation is about the physical distancing model. Have you done a similar model before?
CC: It’s the first physical distancing model we’ve built. This is part of a large class of models that are very well-known and very widely used in the field. So it’s a new one, but it’s fairly easy to make a new one by adding a compartment or adding a term or adding something in the model. This is a new one that we set up for Covid-19, with the understanding that I wanted to explore physical distancing in particular.
SH: What’s the name of the broad class of models?
CC: They’re deterministic compartmental models. These are models that use differential equations to describe how many people in the population are in different groups. So, 'Susceptible', 'Exposed but not yet infectious', 'Infectious', and 'Recovered or quarantined', for example. Those are the broad groups in this model.
SH: I took it that your team's physical distancing model models a very specific period of time. How did you decide how long to observe the intervention before estimating its effect?
CC: Well, the intervention happened at a time, and we observe it until the present. We didn’t really make timeframe decisions.
SH: I got the impression from the pre-print that the observation period ended at April 12th. Maybe you’ve been continuing to update it?
CC: We have been updating it.
SH: Ok, got it.
CC: At the time, we estimated that the delay between getting symptoms and appearing in the public data was 8 to 10 days. When we do something like change physical distancing, what we change is the rate of new infections. If someone gets infected today, they don’t show symptoms today, they show symptoms sometime later than today. Then, after they have symptoms, they don’t necessarily get tested instantly, and then that test is not uploaded instantly into the public data. So that started a window, between mid-March and mid-April, that seems long. But it is a pretty reasonable timeframe for estimating the impact of the measure. We’re seeing 20 to 30 days by the time we have enough data and reported cases to be estimating changes that we made. I think that’s going to be a real challenge as we move forward and start reopening.
SH: Can I ask you for a summary of how the model works?
CC: In the model, some fraction of the population are willing and able to engage in physical distancing. That was informed by a number of data sources; one was an Angus Reid survey that described Canadians and their tendency to believe that Covid-19 should be taken seriously. Also, we had some mobility data, some Translink data, that showed there are people who are distancing very strongly, and others who are not able.
Imagine somebody’s not going out and they’re working from home all the time on their laptop and they hardly ever go to the store. If I’m out and about in the population, I’m not likely to see that person. And that person is also less likely to go out and find others. So you have a dual effect: distancers are less likely to go out and about and circulate, and if they do go out and about and circulate, they’re not likely to find other distancers, they’re likely to find those who are not able to do the distancing. We have that effect in the model and then we model the numbers of people who are susceptible and exposed, pre-symptomatic, and then infectious symptomatic, and then recovered or quarantined. Then we play that out through time.
We fit the model to data from BC. We made sure that it was looking like it would produce realistic numbers of cases. That’s a whole other set of challenges, because the reported cases that we see aredelayed, and they’re also not the full picture. When we report eight cases in a day, that doesn’t mean that there are only eight new cases in all of BC. We report only some of the cases because we don’t find and test them all. We have to take that into account.
SH: Is it fair to say that contact is defined in the model as a contact that could result in transmission?
CC: I think that’s broadly right. These models all have a transmission and contact parameter that’s sort of joined together. It’s an overall rate at which people are close toeach other to the extent that there could be a transmission. That base transmission parameter expresses how infectious the virus is, and something about the contact and the health of the population. In our model, the distancers have a reduced contact. So we have a parameter f, which scales down their likelihood of going out and having potentially infectious contact with other people.
SH: So, the transmission parameter is effectively the probability of getting infected given the contact?
CC: Per unit time, yes.
SH: Is that something that’s modelled as a range of values? Or is it a point estimate?
CC: We compare the model’s predictions to the data, and we use Bayesian estimation to figure out the cloud of points that parameter could be. Then we can sample from that. So we have a realistic cloud of feasible parameters for the model.
When we fit the data, we fit those parameters of the growth of the epidemic. On about March 15th, we had started with cancelling mass gatherings, cancelling services, there were a whole lot of changes rolled out over that week of about the 15th to the 22nd. When we look at the mobility data that we have from Google and other sources, that’s really the week when things changed profoundly. During that time that distancing effect ramps up, so, by the end of the week, it’s at where it’s going to be at. It’s going to inform the data for the next month. So, then we can estimate that strength. One of the points of the pre-print was to estimate the strength of distancing.
SH: In your pre-print, you flag a few assumptions that other models have made that could impact their results. Can you tell me about some assumptions that your model aims to avoid making?
CC: I think one of the nice aspects about this model is this distinction between the distancing and non-distancing population. That gives us a bit more flexibility than a lot of models, to allow for essential workers and hospital and long-term care. We don’t really model long-term care facilities, but we have differences between contact in different parts of the population. We use this one simple difference, which is those engaged in distancing and those not engaged in distancing. I think this is the only model of its exact kind.
The other strength of this modelling approach was what we called the observation model. That’s like a translation between how many cases the model population thinks it has, and how many cases you’re going to see, and when are you going to see them. That allows us to build in that delay between symptom onset in the model and reported case in the data. It also allows us to build in noise or variation in the data that come from things that aren’t in the model. Like I said, we don’t have a compartment for a care facility, and another compartment for a poultry plant, and another compartment for a hospital, because that would require data that we don’t have. But we can say, “these counts are going to have high variability from processes we haven’t modelled,” and take that into account without constraining the model too much.
So the first thing was having a distancing and non-distancing population, the second thing was having an observation model that copes with delays between symptom onset and observation and noise and observation. The third thing is being able to handle differences in testing over time. A lot of models struggle with that or don’t do that, and we think that’s important because one of the things that changes the number of positive tests that you see is the number of tests that you do and the population that you test. There were several times when BC widened the pool of people who could get tested. That will give you a bump in cases. If you don’t take that into account, you might falsely infer that transmission has actually gone up, where what’s really happened is you’ve just started testing a broader population. So, I would say those are three things that we liked about the model that we thought it did that lots of other models don’t do.
SH: Were there any assumptions that you couldn’t avoid making that concerned you, and you explored the effects of them?
CC: Yes. Any model has lots of assumptions. Some of the ones we explored include the true duration of the infectious period. We don’t know the true duration of the incubation period and those things can change disease dynamics. If it takes a really long time to get to symptoms, then people need to be infecting more to get up to the same number of cases that we’ve seen. So, we’ve tried varying some of those things that we’re not certain about. Another one was the true testing rate. We still don’t really know if we’re getting 10 percent or 30 percent or 50 percent of the cases. We varied that in the model as well. We checked that our results were robust to different assumptions about the testing and about some of the unknown parameters.
SH: Were there any results that would be fundamentally altered if one of the assumptions was disproved?
CC: I'll discuss the assumptions we tested, which are the ones we could test. This is all in the supplement. So, if we wanted to estimate the true prevalence in the population, to say, “At a peak, there were 1,000 British Columbians who had Covid-19,” or, “At the peak in late March, there were 5,000 British Columbians,” that we can’t estimate from the data in this pre-print. We can get different answers for that peak prevalence depending on what we assume about the sampling. If we assume that we only measured 10 percent, then there’d be more [true prevalent cases] than if we assume we only measured 50 percent. Similarly, if we want to measure the basic reproduction number, R0, we can get different answers for that depending on what we assume about the duration of the incubation and infectious period. That result for R0 will change. Partly for this reason, we didn’t set out to estimate R0 with this model. What we set out to estimate was, ‘how effective is distancing?’ And the answer to that was really robust across the things that we were able to vary. Distancing was really effective.
Every model will have limitations, and there are things that we did not explore. For example, our model doesn’t have an age structure. And it doesn’t have a network structure. There’s a lot of real population structure in the world that is not in any simple model like this. We think there are good rationales for choosing a model with this low level of detail and working with it. Within BCCDC, we’ve now started with some of these other models. So, there are other models that we didn’t use; the impacts of those choices are not explored here.
SH: One of your data sources is a survey from Angus Reid. Was your team involved in the design of that survey, or was that an independent thing that you happened upon as a data source?
CC: No, weren’t involved in that. We just read that survey.
SH: Was that part of coming up with this model? Was it an inspiration, or were you planning on doing this anyway and looking for a data source?
CC: So, I wrote this first version of this model on a Sunday when I had seen results from really complicated models called 'agent-based' models. [Agent-based models] require a lot of data to get them set up. Of course, we don’t always have those data, so they also tend to require a lot of guesswork. And I was interested in, ‘could I make a simple model that looks the same?’ Then I saw one of the Imperial College reports. Their report had a very complicated model with 13,000 lines of code. And the predictions looked very much like my little distancing model.
SH: In terms of that survey as a data source, I had a couple of questions. That survey had four different questions, which pertained to different types of physical distancing. Which questions did you actually use to estimate the proportion of British Columbians participating in physical distancing?
CC: The survey was taken as a motivation for the model’s overall structure, in that there is some large majority of people who are going to take this seriously. Those people will do many things that will have a net effect of decreasing their contact rate. You’re going to wash your hands, you’re going to stay away from other people, you’re going to stay home. We haven’t picked one of them and said, “Ok, hand-washing, it’s all about hand-washing, that’s what we’re doing.” We’ve just said, “All of these measures will reduce the rate at which people get infectious contact with other people.”
SH: So the phenomenon of physical distancing that’s represented in the model is kind of a brute phenomenon. It’s like you’re either someone that’s reducing their contact or you’re someone that’s not reducing your contact.
CC: Right, and in a way that’s one of the limitations. But there is a reason to do that in a model. Let’s imagine we wanted to do something else. Let’s imagine we wanted to say, “Ok, some people are going to have cloth masks. Some of them are going to wash their hands, some of them are going to wash their handsandthey’re going to stay back in the line-up at Safeway, and some of them are not even going to go to Safeway.” And you start enumerating all the possible sets of behaviours that people could have. You could do that. The problem is, we actually don’t have much information about what that means for transmission. We can say someone washes their hands, but how do you put that in the model? We don’t know, does that prevent every transmission event? Probably not. Does it prevent 90 percent of them? Does it prevent 20 percent of them? We just don’t have the data to get really high levels of resolution.
SH: Were there any data that you would’ve really liked to have and just didn’t exist? Not because it’s not possible to collect, but because it hasn’t been collected?
CC: Loads of things. If we knew the contact patterns, "how many people did you speak with yesterday? How close were you to them? What age were they? What age are you?", then we could work with age structured models more closely.
I think there are three reasons why we might not have data that we want. One is that it’s collected but it’s not collated. It’s expensive and time consuming to do that, and public health systems have been chronically underfunded for a decade, and now is not the time to rebuild them. There are still health authorities using pieces of paper and faxing information to each other in contact tracing. So that data in principle exists, but exists in formats that are not easy to share. Data that could in principle be shared and is collated but isn’t shared for privacy reasons. Some of those are probably great reasons, and others are probably a little too far on the side of protecting privacy at the cost of not making information available that the public needs. Then there’s data that isn’t gathered at all because people don’t want to or it’s hard.
The fourth thing is data that were desperately needed but are actually just in principle incredibly hard to measure. Questions about, [for example], where are the real transmissions happening? Even if you had your best-ever digital contact tracing app, it would still be hard to traipse through those data and trace down the infection events.
SH: Something that appeals to me is to communicate a basic amount of information about how a model works. We often hear just the results of things, and we don’t hear that much about how they work.
CC: I think there’s a reason that models are not usually described that way. The “how does it work?” would be so wordy and long and so fraught with all the reasons it’s wrong that you probably would never get to the results. But a broad description like , “The model works by considering the fact that physical distancing reduces contact, and then uses data to estimate how much it reduced contact in a fixed time period” can convey the main idea.
SH: Can you imagine ways that the public could be helpful to the modelling process?
CC: Absolutely. I think providing data on contact patterns would be really helpful. One of the issues with the data is this delay between symptom onset and testing. If we had a really great indication of the proportion of British Columbians that had Covid-19 symptoms on a given day that would be great. But the sample will be biased by who’s interested So researchers could do is recruit, for example 100,000 people who prospectively forward in time will say, “Ok, the minute I get Covid-19 symptoms, I’m going to put it in this app”.
Another one is if the public could think about that boundary between data privacy and public good. For example, one of the things that is useful for us to have is time of symptom onset. When are people showing symptoms? That’s something that’s hard to get because of privacy and data stewardship. One thing the public could do is be active around the questions around data and data gathering. Another key area is contact data, and we have seen a lot of discussion around apps, and Canada has gone with a very conservative app in privacy terms. The app could help, but if the public advocated for data sharing for research, we could potentially learn a lot about where transmission happens and doesn’t happen.
SH: Something that I took away from your model is there seems to be a critical threshold beyond which we have to be reducing our contact. Is it correct to interpret that folks who are reducing their contacts to an even greater degree than that critical threshold are helping to mitigate the effect of people who are not [willing and able] to do physical distancing?
CC: I would say yes. There’s going to be a spread there, and that spread isn’t something we have in the model. All of these models will have some threshold*: below it, cases decline. Above it, cases grow exponentially, in the early stage. That’s generic to all epidemic models, this one included.
*Editor's Note: As of August 19, 2020, British Columbia is above the threshold for epidemic control- visit the British Columbia COVID-19 Dashboard or get the BC COVID-19 app for more information. Contacted for an update on August 19, 2020, Professor Colijn added "the public can help by avoiding indoor socializing, keeping their distance from each other, not attending parties, bars, clubs and events where they don't know others, and others don't know them - think: how many people would have to be phoned if I got symptoms tomorrow? Do I know their numbers? Would these folks know how to contact me, if they got ill?"