In this vignette, we describe a typical example of building a model in R, which uses the demography and templates available through the Montagu API, and submited burden estimates to it. We will proceed very naively and cover all of the important functionality on the way.

Identify the Montagu Server

Begin by identifying the Montagu server to connect to. You must have an account on this server; typically this will be our live production server, and you will have received login information on joining VIMC. Identify the server as follows:-

montagu::montagu_server_global_default_set(
  montagu::montagu_server("production", "montagu.vaccineimpact.org"))

The next time you query the server, you may be asked to login with a username and password.

Basic Information

For many of the montagu api functions, you will need to provide your modelling group id, and your disease id. Your group id will have been communicated on joining VIMC; the disease ids can be looked up with a simple function call like this:-

##         id                         name
## 1     HepB                         HepB
## 2      Hib                          Hib
## 3      HPV                          HPV
## 4       JE                           JE
## 5  Measles                      Measles
## 6     MenA                         MenA
## 7      PCV                          PCV
## 8     Rota                    Rotavirus
## 9  Rubella                      Rubella
## 10      YF                 Yellow fever
## 11     DTP Diphtheria Tetanus Pertussis

Touchstones

A touchstone is an identifier that links:

  • A request from VIMC to the modelling groups to provide burden estimates
  • The input data (coverage and demography) provided to calculate those estimates
  • The estimates submitted by the groups using the input data.

Touchstone ids are referred to as a basename and a version together. Errata to the input data may be addressed with a new touchstone version. Generally, modelling groups will know which touchstone they are required to run models against. To lookup all the currently open touchstones for a group:-

montagu::montagu_touchstone_versions("IC-Garske", require_open = TRUE)
##                  id            name version                 description
## 4      201910test-1      201910test       1      201910test (version 1)
## 7      201710gavi-5      201710gavi       5      201710gavi (version 5)
## 8 201810synthetic-2 201810synthetic       2 201810synthetic (version 2)
##   status
## 4   open
## 7   open
## 8   open

If we knew the basic touchstone name (without version) that we were interested in, then we could also have asked:

montagu::montagu_touchstone_versions("IC-Garske", "201710gavi", require_open = TRUE)
##             id       name version            description status
## 1 201710gavi-5 201710gavi       5 201710gavi (version 5)   open

Scenarios

Within a touchstone, we may have various scenarios, describing different vaccination conditions. These vary across diseases due to the different types of vaccination approaches that have been employed. We look our the scenarios we are required to model for a touchstone with:

montagu::montagu_scenarios("IC-Garske", "201710gavi-5")
##          scenario_id                  description disease
## 1  yf-no-vaccination               No vaccination      YF
## 2    yf-routine-gavi               Routine, total      YF
## 3 yf-preventive-gavi Campaign (preventive), total      YF

We can then look up the status a particular scenario within this touchstone:

montagu::montagu_scenario_status("IC-Garske", "201710gavi-5", "yf-routine-gavi")
## [1] "valid"

and if that status were not valid, then we can query any problems with:

montagu::montagu_scenario_problems("IC-Garske", "201710gavi-5", "yf-routine-gavi")
## NULL

Expectations - years, ages, countries and outcomes

For a given touchstone, modelling groups are required to produce burden estimates stratified by calendar year, and by age. The expectations about time and age will be consistent across scenarios. We can look up these expectations like so:

montagu::montagu_expectations("IC-Garske", "201710gavi-5")
##   id           description min_year max_year min_age max_age
## 1 30 YF:IC-Garske:standard     2000     2100       0     100
##   min_birth_cohort max_birth_cohort disease
## 1             1900             2100      YF

Some modelling groups model more than one disease, in which case a row appears for each disease. And additionally, if there is some reason to have different expectations for different scenarios, then multiple lines for the same disease may appear, with a difference in the description column indicating the scenario.

If we already knew the expectation id, we could look up a single expectation with

montagu::montagu_expectation("IC-Garske", "201710gavi-5", 30)
## $id
## [1] 30
##
## $description
## [1] "YF:IC-Garske:standard"
##
## $min_year
## [1] 2000
##
## $max_year
## [1] 2100
##
## $min_age
## [1] 0
##
## $max_age
## [1] 100
##
## $min_birth_cohort
## [1] 1900
##
## $max_birth_cohort
## [1] 2100
##
## $disease
## NULL

The countries for which estimates are required can vary between disease, and between scenarios. To retrieve the list of countries that are required for a particular expectation:-

countries <- montagu::montagu_expectation_countries("IC-Garske", "201710gavi-5", 30)
head(countries)
##    id                     name
## 1 AGO                   Angola
## 2 BDI                  Burundi
## 3 BEN                    Benin
## 4 BFA             Burkina Faso
## 5 CAF Central African Republic
## 6 CIV            Cote d'Ivoire

To see which outcomes are required for an expectation:

montagu::montagu_expectation_outcomes("IC-Garske", "201710gavi-5", 30)
## [[1]]
## [[1]]$code
## [1] "deaths"
##
## [[1]]$name
## [1] "deaths"
##
##
## [[2]]
## [[2]]$code
## [1] "cases"
##
## [[2]]$name
## [1] "cases"
##
##
## [[3]]
## [[3]]$code
## [1] "dalys"
##
## [[3]]$name
## [1] "dalys"

And to confirm which scnearios an expectation applies to (which should align with the description text for the expectation):

montagu::montagu_expectation_applicable_scenarios("IC-Garske", "201710gavi-5", 30)
## [1] "yf-no-vaccination"  "yf-preventive-gavi" "yf-routine-gavi"

Demography

Montagu provides standardised demographic data necessary for running vaccine models. In the early stages of VIMC, we surveyed the groups to ascertain what demographic data they required, and Montagu provides the superset of all those data. If, however, your model needs demographic data not provided by Montagu, get in touch, as we want to make sure that demography is always consistent between different models.

We can list the available demographic data associated with a touchstone like this:-

##              id                                             name gendered
## 1       as_fert                Fertility: Age-specific fertility    FALSE
## 2        births                    Fertility: Births (number of)    FALSE
## 3     qq_births     Fertility: Births (number of) - quinquennial    FALSE
## 4     as_births               Fertility: Births by age of mother    FALSE
## 5           cbr                Fertility: Crude birth rate (CBR)    FALSE
## 6      birth_mf                    Fertility: Sex-ratio at birth    FALSE
## 7      fert_tot                       Fertility: Total Fertility    FALSE
## 8  net_mig_rate                    Migration: Net migration rate    FALSE
## 9     mort_rate             Mortality: Central Death Rate (ASMR)     TRUE
## 10          cdr                Mortality: Crude death rate (CDR)    FALSE
## 11     mort_age                Mortality: Deaths (number) by age     TRUE
## 12     mort_tot                 Mortality: Deaths (total number)     TRUE
## 13      life_ex      Mortality: Expected remaining years of life     TRUE
## 14          lx0              Mortality: Life expectancy at birth     TRUE
## 15 unwpp_cm_nmr  Mortality: Neonatal Mortality Rate (28-day NMR)    FALSE
## 16      p_dying           Mortality: Probability of dying by age     TRUE
## 17  n_survivors Mortality: Survivors from a birth-cohort of 100k     TRUE
## 18    unwpp_imr          Mortality: Under 1 Mortality Rate (IMR)    FALSE
## 19   unwpp_u5mr         Mortality: Under 5 Mortality Rate (U5MR)    FALSE
## 20          pgr                          Population: Growth Rate    FALSE
## 21      int_pop   Population: Interpolated (1-year time and age)     TRUE
## 22       qq_pop   Population: Quinquennial (5-year time and age)     TRUE
## 23      tot_pop                                Population: Total     TRUE
##        source
## 1  dds-201710
## 2  dds-201710
## 3  dds-201710
## 4  dds-201710
## 5  dds-201710
## 6  dds-201710
## 7  dds-201710
## 8  dds-201710
## 9  dds-201710
## 10 dds-201710
## 11 dds-201710
## 12 dds-201710
## 13 dds-201710
## 14 dds-201710
## 15 dds-201710
## 16 dds-201710
## 17 dds-201710
## 18 dds-201710
## 19 dds-201710
## 20 dds-201710
## 21 dds-201710
## 22 dds-201710
## 23 dds-201710

We can then retrieve a particular demographic statistic as follows:-

##   country_code_numeric country_code     country age_from age_to year
## 1                    4          AFG Afghanistan        0      0 1950
## 2                    4          AFG Afghanistan        0      0 1951
## 3                    4          AFG Afghanistan        0      0 1952
## 4                    4          AFG Afghanistan        0      0 1953
## 5                    4          AFG Afghanistan        0      0 1954
## 6                    4          AFG Afghanistan        0      0 1955
##   gender    value
## 1   both 0.050000
## 2   both 0.050082
## 3   both 0.050244
## 4   both 0.050399
## 5   both 0.050546
## 6   both 0.050688

Gender

The example above, the central birth rate, is not gender-specific. Other fields are gender-specific, where data can be downloaded for ‘female’, ‘male’, or the total of all people is called ‘both’. By default ‘both’ is returned, but to select a specific gender:

##   country_code_numeric country_code     country age_from age_to year
## 1                    4          AFG Afghanistan        0    120 1950
## 2                    4          AFG Afghanistan        0    120 1951
## 3                    4          AFG Afghanistan        0    120 1952
## 4                    4          AFG Afghanistan        0    120 1953
## 5                    4          AFG Afghanistan        0    120 1954
## 6                    4          AFG Afghanistan        0    120 1955
##   gender   value
## 1 female 3652874
## 2 female 3705031
## 3 female 3760979
## 4 female 3820747
## 5 female 3884348
## 6 female 3951800

Format

Data is also available in long format (the default), in which years and ages are reported in separate rows, or wide format, in which the years of time are represented as columns, making the resulting data shorter, but wider. For example:-

as_fert <- montagu::montagu_demographic_data("as_fert", "201710gavi-5")
names(as_fert)
## [1] "country_code_numeric" "country_code"         "country"
## [4] "age_from"             "age_to"               "year"
## [7] "gender"               "value"
nrow(as_fert)
## [1] 21000
##  [1] "country_code_numeric" "country_code"         "country"
##  [4] "age_from"             "age_to"               "gender"
##  [7] "X1950"                "X1955"                "X1960"
## [10] "X1965"                "X1970"                "X1975"
## [13] "X1980"                "X1985"                "X1990"
## [16] "X1995"                "X2000"                "X2005"
## [19] "X2010"                "X2015"                "X2020"
## [22] "X2025"                "X2030"                "X2035"
## [25] "X2040"                "X2045"                "X2050"
## [28] "X2055"                "X2060"                "X2065"
## [31] "X2070"                "X2075"                "X2080"
## [34] "X2085"                "X2090"                "X2095"
nrow(as_fert_wide)
## [1] 700

Templates

The above will give all the information required to build your own data frame for submitting the expected results. Alternatively, you can download burden estimate templates with the years, countries and ages already populated:-

##   disease year age country country_name cohort_size deaths cases dalys
## 1      YF 2000   0     AGO       Angola          NA     NA    NA    NA
## 2      YF 2001   0     AGO       Angola          NA     NA    NA    NA
## 3      YF 2002   0     AGO       Angola          NA     NA    NA    NA
## 4      YF 2003   0     AGO       Angola          NA     NA    NA    NA
## 5      YF 2004   0     AGO       Angola          NA     NA    NA    NA
## 6      YF 2005   0     AGO       Angola          NA     NA    NA    NA

Coverage

We now need to obtain the coverage data relevant for our disease and touchstone, for each of the scenarios we’re going to model. Let’s look at the yf-routine-gavi coverage metadata.

montagu::montagu_coverage_info("IC-Garske", "201710gavi-5", "yf-routine-gavi")
##     id touchstone_version                   name vaccine gavi_support
## 1 1182       201710gavi-5 YF: YF, none, campaign      YF   no vaccine
## 2  651       201710gavi-5  YF: YF, with, routine      YF        total
##   activity_type
## 1      campaign
## 2       routine

And now let’s retrieve the coverage data itself, and summarise it.

cov <- montagu::montagu_coverage_data("IC-Garske", "201710gavi-5", "yf-routine-gavi")
head(cov[cov$coverage > 0, ])
##          scenario               set_name vaccine gavi_support
## 1 yf-routine-gavi YF: YF, none, campaign      YF   no vaccine
## 2 yf-routine-gavi YF: YF, none, campaign      YF   no vaccine
## 3 yf-routine-gavi YF: YF, none, campaign      YF   no vaccine
## 4 yf-routine-gavi YF: YF, none, campaign      YF   no vaccine
## 5 yf-routine-gavi YF: YF, none, campaign      YF   no vaccine
## 6 yf-routine-gavi YF: YF, none, campaign      YF   no vaccine
##   activity_type country_code country year age_first age_last
## 1      campaign          AGO  Angola 1988         1      100
## 2      campaign          AGO  Angola 2016         1      100
## 3      campaign          BEN   Benin 1940         0      100
## 4      campaign          BEN   Benin 1944         0      100
## 5      campaign          BEN   Benin 1948         0      100
## 6      campaign          BEN   Benin 1952         0      100
##   age_range_verbatim           target   coverage
## 1               <NA> 2689316.38932258 0.24926759
## 2               <NA>          7067755 0.03486047
## 3               <NA>             <NA> 0.10682008
## 4               <NA>             <NA> 0.60071624
## 5               <NA>             <NA> 0.60228038
## 6               <NA>             <NA> 0.70605678
range(cov$year)
## [1] 1940 2100
length(unique(cov$country_code))
## [1] 32
nrow(cov)
## [1] 4048
ncol(cov)
## [1] 13

More information on the coverage is available in the Montagu Web Portal, but briefly:

  • Coverage is measured as a proportion of the original target population. It is possible for this to exceed 1, if more people were vaccinated than originally targeted.
  • Target is the number of individuals targeted for vaccination. If this is NA, then assume the target population matches the demographic data, and the whole population is targeted.

Long or Wide format

The data above is formatted in what we call long format, in which one year is reported per row. Alternatively, you can specify:

cov <- montagu::montagu_coverage_data("IC-Garske", "201710gavi-5", "yf-routine-gavi", format="wide")
names(cov)[1:20]
##  [1] "scenario"           "set_name"           "vaccine"
##  [4] "gavi_support"       "activity_type"      "country_code"
##  [7] "country"            "age_first"          "age_last"
## [10] "age_range_verbatim" "coverage_1988"      "coverage_2016"
## [13] "target_1988"        "target_2016"        NA
## [16] NA                   NA                   NA
## [19] NA                   NA
nrow(cov)
## [1] 71
ncol(cov)
## [1] 14

Here, we have one row per country, and separate columns for coverage_year and target_year.

Returning the full range of countries

By default, the coverage data returned is limited to those countries for which burden estimate data is required for the given scenario. HepB is a particular example, where different sets of countries are relevant in different scenarios, because the birth-dose vaccination is not applied in all countries where there is HepB vaccination. However, some groups prefer to model all of these scenarios similarly, with the same numbers of countries, hence they require the full coverage data for all countries. And for some diseases, the historic effects of vaccination in a country that is not required for estimates as present, may have relevance in their model to countries that are required.

For those cases, where all countries are required, the optional all_countries argument can be used.

Burden Estimate Sets

The burden estimate set defines the results of the models, which modelling groups submit back to Montagu. To view the burden estimate sets already uploaded, for a group, touchstone and scenario:-

head(montagu::montagu_burden_estimate_sets("IC-Garske", "201710gavi-5", "yf-no-vaccination"))
##     id              uploaded_on    uploaded_by               type
## 2  687 2018-04-06T13:08:18.883Z katy.gaythorpe   central-averaged
## 4  698 2018-05-30T09:27:32.341Z    tini.garske central-single-run
## 1 1034 2019-04-04T13:01:58.052Z katy.gaythorpe   central-averaged
## 3 1037 2019-04-05T09:04:59.238Z katy.gaythorpe   central-averaged
##                                                                details
## 2 Averaged over posterior predictive distribution for model parameters
## 4                                                                    x
## 1                      The median of posterior predictive distribution
## 3                      The median of posterior predictive distribution
##     status
## 2 complete
## 4    empty
## 1 complete
## 3 complete

If we know the burden estimate set id, we can also get a list of information about a burden estimate set with:

montagu::montagu_burden_estimate_set_info("IC-Garske", "201710gavi-5", "yf-no-vaccination", 687)
## $id
## [1] 687
##
## $uploaded_on
## [1] "2018-04-06T13:08:18.883Z"
##
## $uploaded_by
## [1] "katy.gaythorpe"
##
## $type
## [1] "central-averaged"
##
## $details
## [1] "Averaged over posterior predictive distribution for model parameters"
##
## $status
## [1] "complete"

If there are any reported problems with a burden estimate set, we can examine those with

montagu::montagu_burden_estimate_set_problems("IC-Garske", "201710gavi-5", "yf-no-vaccination", 687)
## list()

and we can retrieve the data for the burden estimate set with:

data <- montagu::montagu_burden_estimate_set_data("IC-Garske", "201710gavi-5", "yf-no-vaccination", 687)
head(data)
##   disease year age country             country_name cohort_size
## 1      YF 2000   0     AGO                   Angola      700752
## 2      YF 2000   0     BDI                  Burundi      240741
## 3      YF 2000   0     BEN                    Benin      262535
## 4      YF 2000   0     BFA             Burkina Faso      479329
## 5      YF 2000   0     CAF Central African Republic      134101
## 6      YF 2000   0     CIV            Cote d'Ivoire      621898
##         cases        dalys       deaths
## 1  318.439060   8546.53100  149.6663700
## 2    1.367271     36.17757    0.6426176
## 3  391.006130  10355.80300  183.7728900
## 4  886.209960  22402.71700  416.5186800
## 5  131.855550   3089.01600   61.9721070
## 6 4620.652300 106431.76000 2171.7068000

Creating a new burden estimate set

A new burden estimate set can be created as follows:-

The fourth parameter, the type, is either: * central-averaged - for a central estimate that is calculated by the average of a number of runs * central-single-run - for a stand-alone central estimate run.

The result of the burden estimate set creation is the id of the newly created burden estimate set, which must be given when we later upload results.

Uploading data

Suppose we have now populated the appropriate burden estimate template with results, and we want to upload it. If results is our completed template, then the simplest upload function call is:-

This will upload the completed data, and close the burden estimate set.

Plotting previous burden estimate sets

One further function call useful for plotting, allows retrieving a particular burden outcome from a burden estimate set, summed across all countries, and dis-aggregated by either age or by year. The result will be a sequence of sets of (x,y) points, suitable for straightforward plotting. For example:

##   age    x        y
## 1   0 2000 21955.19
## 2   0 2001 22429.77
## 3   0 2002 22904.21
## 4   0 2003 23388.83
## 5   0 2004 23899.00
## 6   0 2005 24444.64
plot(data$x[data$age==20], data$y[data$age==20], xlab="year", ylab="cases", main="For age 20")

Or to dis-aggregate by year:-

##       year x        y
## 20001 2000 0 21955.19
## 20002 2000 1 20183.78
## 20003 2000 2 18625.01
## 20004 2000 3 17217.85
## 20005 2000 4 15857.58
## 20006 2000 5 14775.94
plot(data$x[data$year==2040], data$y[data$year==2040], xlab="age", ylab="cases", main="For year 2040")

Stochastic Runs

In order to establish confidence intervals, modelling groups are sometimes asked to run a stochastic ensemble of models. This may be, for instance, 200 instances of a model, with different parameters for each instance. We would call the instance number, between 1 and 200, the run_id, and there would be a unique set of parameters for each.

It is important that for a given run_id, all the scenarios are run with the matching set of parameters for that run_id, so that we can calculate impacts by comparing scenarios.

It is also important that the spread of parameters chosen across the 200 runs significantly captures the range of sensible behaviour of the model, and that the central estimates submitted are somewhere near the centre of the spread of parameters.

Lastly, it is desirable to vary as few parameters as possible, so that with 200 runs, we can see how the parameter changes affect the model outcomes. If there are too many parameters, it may be difficult to ascertain which parameters are causing the changes, or whether the spread of parameters really does capture the significant behaviour of the model.

The R client cannot currently support the full upload process of stochastic runs. Because of their large volume, and the international locations of modelling groups, dropbox provides a more robust service for uploading. It is possible, however, to work with model run parameter sets, and using the Montagu client will make it easier to produce the stochastic results (perhaps directly into a local dropbox folder), in a systematic way.

Model Run Parameter Sets

The model run parameter set has a column for run_id, and another column for each parameter being varied. For example:

params <- data.frame(run_id = 1:5, param_1 = sample(5), param_2 = sample(5))
params
##   run_id param_1 param_2
## 1      1       3       2
## 2      2       4       1
## 3      3       2       3
## 4      4       5       5
## 5      5       1       4

This can then be uploaded as follows:-

The id of the newly created parameter set will be returned.

We can also list the existing model run parameter sets with:-

head(montagu::montagu_model_run_parameter_sets("IC-Garske", "201710gavi-5"))
##   id model    uploaded_by              uploaded_on disease
## 1 20  YFIC katy.gaythorpe 2018-04-06T13:10:35.645Z      YF
## 2 19  YFIC katy.gaythorpe 2018-03-27T10:00:52.710Z      YF
## 3 15  YFIC katy.gaythorpe 2018-01-23T15:52:58.353Z      YF

or for a list of information about a single parameter set:

montagu::montagu_model_run_parameter_set_info("IC-Garske", "201710gavi-5", 20)
## $id
## [1] 20
##
## $model
## [1] "YFIC"
##
## $uploaded_by
## [1] "katy.gaythorpe"
##
## $uploaded_on
## [1] "2018-04-06T13:10:35.645Z"
##
## $disease
## [1] "YF"

Finally, if we want to retrieve the parameter values for a previous set:-

##   run_id     FOI_AGO     FOI_BEN    FOI_BFA      FOI_BDI
## 1      1 0.004549685 0.015696021 0.02084831 3.071269e-05
## 2      3 0.003717290 0.011943511 0.01713641 4.950013e-05
## 3      5 0.003522454 0.009811987 0.01321275 1.058319e-04
## 4      6 0.003485912 0.012657722 0.01655512 7.327719e-05
## 5      8 0.002722810 0.007266782 0.00851512 4.569112e-05

Appendix

Models

Montagu allows querying of basic information about all models, past and present used in VIMC. These can be listed as follows:-

##                id
## 1        Yakob-YF
## 2  Winter-Rubella
## 3         PATH-JE
## 4 Harvard-Kim-HPV
## 5        Moore-JE
## 6      Perkins-YF
##                                                                       description
## 1                         generic 201810rfp model - to be updated if joining VIMC
## 2                         generic 201810rfp model - to be updated if joining VIMC
## 3 JE model used in SDF8 and SDF12. This is a static population-based cohort model
## 4                       generic 201804rfp-1 model - to be updated if joining VIMC
## 5                         generic 201810rfp model - to be updated if joining VIMC
## 6                         generic 201810rfp model - to be updated if joining VIMC
##                   citation modelling_group
## 1 <citation for the model>     LSHTM-Yakob
## 2 <citation for the model>       PU-Winter
## 3 <citation for the model>    Suraratdecha
## 4 <citation for the model>     Harvard-Kim
## 5                                UND-Moore
## 6                              UND-Perkins

and a list of information about a particular model by its id can be retrieved like this:-

## $id
## [1] "YFIC"
##
## $description
## [1] "YF model developed at Imperial College London, used in SDF8 and SDF12. This is a generalised linear regression model fitted to locations of reported yellow fever outbreaks or cases with anthropogenic and environmental covariates generating a risk map for yellow fever across Africa. It was then compared to the available serological data, primarily from central and eastern Africa, to assess the country-specific scale of under-reporting and obtain absolute estimates of transmission intensity in terms of a static force of infection or a basic reproduction number in a dynamical model, taking into account the population-level vaccination coverage through time. Burden and vaccine impact estimates were then generated by combining the transmission intensity with demographic information (population size and age distribution) and observed and hypothetical vaccination coverages in any given location and year."
##
## $citation
## [1] "10.1371/journal.pmed.1001638"
##
## $modelling_group
## [1] "IC-Garske"

Note that the model_id (id column above) is different from the modelling_group_id, and generally the modelling_group_id is the significant identifier required for other Montagu functions.

Demographic Sources

Occasionally, there may be multiple instances of the same demographic data field, but for different sources. For example, infant mortality is provided by both UNWPP, and IGME (ChildMortality.org). Neither sources are ideal, and Montagu provides a documented compromise between the two sources for infant, and especially neonatal mortality.

If an example arises where there are two instances of the same data field, then they will have a different data_source, which will need to be specified in the form:-

where the The-Source-Code is in the table returned by