vignettes/montagu_user_guide.Rmd
montagu_user_guide.Rmd
In this vignette, we describe a typical example of building a model in R, which uses the demography and templates available through the Montagu API, and submited burden estimates to it. We will proceed very naively and cover all of the important functionality on the way.
Begin by identifying the Montagu server to connect to. You must have an account on this server; typically this will be our live production server, and you will have received login information on joining VIMC. Identify the server as follows:-
montagu::montagu_server_global_default_set(
montagu::montagu_server("production", "montagu.vaccineimpact.org"))
The next time you query the server, you may be asked to login with a username and password.
For many of the montagu api functions, you will need to provide your modelling group id, and your disease id. Your group id will have been communicated on joining VIMC; the disease ids can be looked up with a simple function call like this:-
## id name
## 1 HepB HepB
## 2 Hib Hib
## 3 HPV HPV
## 4 JE JE
## 5 Measles Measles
## 6 MenA MenA
## 7 PCV PCV
## 8 Rota Rotavirus
## 9 Rubella Rubella
## 10 YF Yellow fever
## 11 DTP Diphtheria Tetanus Pertussis
A touchstone is an identifier that links:
Touchstone ids are referred to as a basename and a version together. Errata to the input data may be addressed with a new touchstone version. Generally, modelling groups will know which touchstone they are required to run models against. To lookup all the currently open touchstones for a group:-
## id name version description
## 4 201910test-1 201910test 1 201910test (version 1)
## 7 201710gavi-5 201710gavi 5 201710gavi (version 5)
## 8 201810synthetic-2 201810synthetic 2 201810synthetic (version 2)
## status
## 4 open
## 7 open
## 8 open
If we knew the basic touchstone name (without version) that we were interested in, then we could also have asked:
## id name version description status
## 1 201710gavi-5 201710gavi 5 201710gavi (version 5) open
Within a touchstone, we may have various scenarios, describing different vaccination conditions. These vary across diseases due to the different types of vaccination approaches that have been employed. We look our the scenarios we are required to model for a touchstone with:
## scenario_id description disease
## 1 yf-no-vaccination No vaccination YF
## 2 yf-routine-gavi Routine, total YF
## 3 yf-preventive-gavi Campaign (preventive), total YF
We can then look up the status a particular scenario within this touchstone:
## [1] "valid"
and if that status were not valid, then we can query any problems with:
## NULL
For a given touchstone, modelling groups are required to produce burden estimates stratified by calendar year, and by age. The expectations about time and age will be consistent across scenarios. We can look up these expectations like so:
## id description min_year max_year min_age max_age
## 1 30 YF:IC-Garske:standard 2000 2100 0 100
## min_birth_cohort max_birth_cohort disease
## 1 1900 2100 YF
Some modelling groups model more than one disease, in which case a row appears for each disease. And additionally, if there is some reason to have different expectations for different scenarios, then multiple lines for the same disease may appear, with a difference in the description column indicating the scenario.
If we already knew the expectation id, we could look up a single expectation with
## $id
## [1] 30
##
## $description
## [1] "YF:IC-Garske:standard"
##
## $min_year
## [1] 2000
##
## $max_year
## [1] 2100
##
## $min_age
## [1] 0
##
## $max_age
## [1] 100
##
## $min_birth_cohort
## [1] 1900
##
## $max_birth_cohort
## [1] 2100
##
## $disease
## NULL
The countries for which estimates are required can vary between disease, and between scenarios. To retrieve the list of countries that are required for a particular expectation:-
countries <- montagu::montagu_expectation_countries("IC-Garske", "201710gavi-5", 30)
head(countries)
## id name
## 1 AGO Angola
## 2 BDI Burundi
## 3 BEN Benin
## 4 BFA Burkina Faso
## 5 CAF Central African Republic
## 6 CIV Cote d'Ivoire
To see which outcomes are required for an expectation:
## [[1]]
## [[1]]$code
## [1] "deaths"
##
## [[1]]$name
## [1] "deaths"
##
##
## [[2]]
## [[2]]$code
## [1] "cases"
##
## [[2]]$name
## [1] "cases"
##
##
## [[3]]
## [[3]]$code
## [1] "dalys"
##
## [[3]]$name
## [1] "dalys"
And to confirm which scnearios an expectation applies to (which should align with the description text for the expectation):
## [1] "yf-no-vaccination" "yf-preventive-gavi" "yf-routine-gavi"
Montagu provides standardised demographic data necessary for running vaccine models. In the early stages of VIMC, we surveyed the groups to ascertain what demographic data they required, and Montagu provides the superset of all those data. If, however, your model needs demographic data not provided by Montagu, get in touch, as we want to make sure that demography is always consistent between different models.
We can list the available demographic data associated with a touchstone like this:-
## id name gendered
## 1 as_fert Fertility: Age-specific fertility FALSE
## 2 births Fertility: Births (number of) FALSE
## 3 qq_births Fertility: Births (number of) - quinquennial FALSE
## 4 as_births Fertility: Births by age of mother FALSE
## 5 cbr Fertility: Crude birth rate (CBR) FALSE
## 6 birth_mf Fertility: Sex-ratio at birth FALSE
## 7 fert_tot Fertility: Total Fertility FALSE
## 8 net_mig_rate Migration: Net migration rate FALSE
## 9 mort_rate Mortality: Central Death Rate (ASMR) TRUE
## 10 cdr Mortality: Crude death rate (CDR) FALSE
## 11 mort_age Mortality: Deaths (number) by age TRUE
## 12 mort_tot Mortality: Deaths (total number) TRUE
## 13 life_ex Mortality: Expected remaining years of life TRUE
## 14 lx0 Mortality: Life expectancy at birth TRUE
## 15 unwpp_cm_nmr Mortality: Neonatal Mortality Rate (28-day NMR) FALSE
## 16 p_dying Mortality: Probability of dying by age TRUE
## 17 n_survivors Mortality: Survivors from a birth-cohort of 100k TRUE
## 18 unwpp_imr Mortality: Under 1 Mortality Rate (IMR) FALSE
## 19 unwpp_u5mr Mortality: Under 5 Mortality Rate (U5MR) FALSE
## 20 pgr Population: Growth Rate FALSE
## 21 int_pop Population: Interpolated (1-year time and age) TRUE
## 22 qq_pop Population: Quinquennial (5-year time and age) TRUE
## 23 tot_pop Population: Total TRUE
## source
## 1 dds-201710
## 2 dds-201710
## 3 dds-201710
## 4 dds-201710
## 5 dds-201710
## 6 dds-201710
## 7 dds-201710
## 8 dds-201710
## 9 dds-201710
## 10 dds-201710
## 11 dds-201710
## 12 dds-201710
## 13 dds-201710
## 14 dds-201710
## 15 dds-201710
## 16 dds-201710
## 17 dds-201710
## 18 dds-201710
## 19 dds-201710
## 20 dds-201710
## 21 dds-201710
## 22 dds-201710
## 23 dds-201710
We can then retrieve a particular demographic statistic as follows:-
## country_code_numeric country_code country age_from age_to year
## 1 4 AFG Afghanistan 0 0 1950
## 2 4 AFG Afghanistan 0 0 1951
## 3 4 AFG Afghanistan 0 0 1952
## 4 4 AFG Afghanistan 0 0 1953
## 5 4 AFG Afghanistan 0 0 1954
## 6 4 AFG Afghanistan 0 0 1955
## gender value
## 1 both 0.050000
## 2 both 0.050082
## 3 both 0.050244
## 4 both 0.050399
## 5 both 0.050546
## 6 both 0.050688
The example above, the central birth rate, is not gender-specific. Other fields are gender-specific, where data can be downloaded for ‘female’, ‘male’, or the total of all people is called ‘both’. By default ‘both’ is returned, but to select a specific gender:
female_pop <- montagu::montagu_demographic_data("tot_pop", "201710gavi-5",
gender_code = "female")
head(female_pop)
## country_code_numeric country_code country age_from age_to year
## 1 4 AFG Afghanistan 0 120 1950
## 2 4 AFG Afghanistan 0 120 1951
## 3 4 AFG Afghanistan 0 120 1952
## 4 4 AFG Afghanistan 0 120 1953
## 5 4 AFG Afghanistan 0 120 1954
## 6 4 AFG Afghanistan 0 120 1955
## gender value
## 1 female 3652874
## 2 female 3705031
## 3 female 3760979
## 4 female 3820747
## 5 female 3884348
## 6 female 3951800
Data is also available in long format (the default), in which years and ages are reported in separate rows, or wide format, in which the years of time are represented as columns, making the resulting data shorter, but wider. For example:-
## [1] "country_code_numeric" "country_code" "country"
## [4] "age_from" "age_to" "year"
## [7] "gender" "value"
## [1] 21000
as_fert_wide <- montagu::montagu_demographic_data("as_fert", "201710gavi-5",
wide = TRUE)
names(as_fert_wide)
## [1] "country_code_numeric" "country_code" "country"
## [4] "age_from" "age_to" "gender"
## [7] "X1950" "X1955" "X1960"
## [10] "X1965" "X1970" "X1975"
## [13] "X1980" "X1985" "X1990"
## [16] "X1995" "X2000" "X2005"
## [19] "X2010" "X2015" "X2020"
## [22] "X2025" "X2030" "X2035"
## [25] "X2040" "X2045" "X2050"
## [28] "X2055" "X2060" "X2065"
## [31] "X2070" "X2075" "X2080"
## [34] "X2085" "X2090" "X2095"
## [1] 700
The above will give all the information required to build your own data frame for submitting the expected results. Alternatively, you can download burden estimate templates with the years, countries and ages already populated:-
## disease year age country country_name cohort_size deaths cases dalys
## 1 YF 2000 0 AGO Angola NA NA NA NA
## 2 YF 2001 0 AGO Angola NA NA NA NA
## 3 YF 2002 0 AGO Angola NA NA NA NA
## 4 YF 2003 0 AGO Angola NA NA NA NA
## 5 YF 2004 0 AGO Angola NA NA NA NA
## 6 YF 2005 0 AGO Angola NA NA NA NA
We now need to obtain the coverage data relevant for our disease and touchstone, for each of the scenarios we’re going to model. Let’s look at the yf-routine-gavi
coverage metadata.
## id touchstone_version name vaccine gavi_support
## 1 1182 201710gavi-5 YF: YF, none, campaign YF no vaccine
## 2 651 201710gavi-5 YF: YF, with, routine YF total
## activity_type
## 1 campaign
## 2 routine
And now let’s retrieve the coverage data itself, and summarise it.
cov <- montagu::montagu_coverage_data("IC-Garske", "201710gavi-5", "yf-routine-gavi")
head(cov[cov$coverage > 0, ])
## scenario set_name vaccine gavi_support
## 1 yf-routine-gavi YF: YF, none, campaign YF no vaccine
## 2 yf-routine-gavi YF: YF, none, campaign YF no vaccine
## 3 yf-routine-gavi YF: YF, none, campaign YF no vaccine
## 4 yf-routine-gavi YF: YF, none, campaign YF no vaccine
## 5 yf-routine-gavi YF: YF, none, campaign YF no vaccine
## 6 yf-routine-gavi YF: YF, none, campaign YF no vaccine
## activity_type country_code country year age_first age_last
## 1 campaign AGO Angola 1988 1 100
## 2 campaign AGO Angola 2016 1 100
## 3 campaign BEN Benin 1940 0 100
## 4 campaign BEN Benin 1944 0 100
## 5 campaign BEN Benin 1948 0 100
## 6 campaign BEN Benin 1952 0 100
## age_range_verbatim target coverage
## 1 <NA> 2689316.38932258 0.24926759
## 2 <NA> 7067755 0.03486047
## 3 <NA> <NA> 0.10682008
## 4 <NA> <NA> 0.60071624
## 5 <NA> <NA> 0.60228038
## 6 <NA> <NA> 0.70605678
## [1] 1940 2100
## [1] 32
## [1] 4048
## [1] 13
More information on the coverage is available in the Montagu Web Portal, but briefly:
NA
, then assume the target population matches the demographic data, and the whole population is targeted.The data above is formatted in what we call long format
, in which one year is reported per row. Alternatively, you can specify:
cov <- montagu::montagu_coverage_data("IC-Garske", "201710gavi-5", "yf-routine-gavi", format="wide")
names(cov)[1:20]
## [1] "scenario" "set_name" "vaccine"
## [4] "gavi_support" "activity_type" "country_code"
## [7] "country" "age_first" "age_last"
## [10] "age_range_verbatim" "coverage_1988" "coverage_2016"
## [13] "target_1988" "target_2016" NA
## [16] NA NA NA
## [19] NA NA
## [1] 71
## [1] 14
Here, we have one row per country, and separate columns for coverage_year
and target_year
.
By default, the coverage data returned is limited to those countries for which burden estimate data is required for the given scenario. HepB is a particular example, where different sets of countries are relevant in different scenarios, because the birth-dose vaccination is not applied in all countries where there is HepB vaccination. However, some groups prefer to model all of these scenarios similarly, with the same numbers of countries, hence they require the full coverage data for all countries. And for some diseases, the historic effects of vaccination in a country that is not required for estimates as present, may have relevance in their model to countries that are required.
For those cases, where all countries are required, the optional all_countries
argument can be used.
The burden estimate set defines the results of the models, which modelling groups submit back to Montagu. To view the burden estimate sets already uploaded, for a group, touchstone and scenario:-
## id uploaded_on uploaded_by type
## 2 687 2018-04-06T13:08:18.883Z katy.gaythorpe central-averaged
## 4 698 2018-05-30T09:27:32.341Z tini.garske central-single-run
## 1 1034 2019-04-04T13:01:58.052Z katy.gaythorpe central-averaged
## 3 1037 2019-04-05T09:04:59.238Z katy.gaythorpe central-averaged
## details
## 2 Averaged over posterior predictive distribution for model parameters
## 4 x
## 1 The median of posterior predictive distribution
## 3 The median of posterior predictive distribution
## status
## 2 complete
## 4 empty
## 1 complete
## 3 complete
If we know the burden estimate set id, we can also get a list of information about a burden estimate set with:
## $id
## [1] 687
##
## $uploaded_on
## [1] "2018-04-06T13:08:18.883Z"
##
## $uploaded_by
## [1] "katy.gaythorpe"
##
## $type
## [1] "central-averaged"
##
## $details
## [1] "Averaged over posterior predictive distribution for model parameters"
##
## $status
## [1] "complete"
If there are any reported problems with a burden estimate set, we can examine those with
montagu::montagu_burden_estimate_set_problems("IC-Garske", "201710gavi-5", "yf-no-vaccination", 687)
## list()
and we can retrieve the data for the burden estimate set with:
data <- montagu::montagu_burden_estimate_set_data("IC-Garske", "201710gavi-5", "yf-no-vaccination", 687)
head(data)
## disease year age country country_name cohort_size
## 1 YF 2000 0 AGO Angola 700752
## 2 YF 2000 0 BDI Burundi 240741
## 3 YF 2000 0 BEN Benin 262535
## 4 YF 2000 0 BFA Burkina Faso 479329
## 5 YF 2000 0 CAF Central African Republic 134101
## 6 YF 2000 0 CIV Cote d'Ivoire 621898
## cases dalys deaths
## 1 318.439060 8546.53100 149.6663700
## 2 1.367271 36.17757 0.6426176
## 3 391.006130 10355.80300 183.7728900
## 4 886.209960 22402.71700 416.5186800
## 5 131.855550 3089.01600 61.9721070
## 6 4620.652300 106431.76000 2171.7068000
A new burden estimate set can be created as follows:-
bsid <- montagu::montagu_burden_estimate_set_create(
"IC-Garske", "201710gavi-5", "yf-no-vaccination", "central-averaged",
details = "Details about model")
The fourth parameter, the type, is either: * central-averaged
- for a central estimate that is calculated by the average of a number of runs * central-single-run
- for a stand-alone central estimate run.
The result of the burden estimate set creation is the id of the newly created burden estimate set, which must be given when we later upload results.
Suppose we have now populated the appropriate burden estimate template with results, and we want to upload it. If results
is our completed template, then the simplest upload function call is:-
montagu::montagu_burden_estimate_set_upload(
"IC-Garske", "201710gavi-5", "yf-routine-gavi", bsid, results)
This will upload the completed data, and close the burden estimate set.
One further function call useful for plotting, allows retrieving a particular burden outcome from a burden estimate set, summed across all countries, and dis-aggregated by either age or by year. The result will be a sequence of sets of (x,y) points, suitable for straightforward plotting. For example:
data <- montagu::montagu_burden_estimate_set_outcome_data("IC-Garske", "201710gavi-5",
"yf-no-vaccination", 687, "cases", group_by = "age")
head(data)
## age x y
## 1 0 2000 21955.19
## 2 0 2001 22429.77
## 3 0 2002 22904.21
## 4 0 2003 23388.83
## 5 0 2004 23899.00
## 6 0 2005 24444.64
Or to dis-aggregate by year:-
data <- montagu::montagu_burden_estimate_set_outcome_data("IC-Garske", "201710gavi-5",
"yf-no-vaccination", 687, "cases", group_by = "year")
head(data)
## year x y
## 20001 2000 0 21955.19
## 20002 2000 1 20183.78
## 20003 2000 2 18625.01
## 20004 2000 3 17217.85
## 20005 2000 4 15857.58
## 20006 2000 5 14775.94
In order to establish confidence intervals, modelling groups are sometimes asked to run a stochastic ensemble of models. This may be, for instance, 200 instances of a model, with different parameters for each instance. We would call the instance number, between 1 and 200, the run_id
, and there would be a unique set of parameters for each.
It is important that for a given run_id
, all the scenarios are run with the matching set of parameters for that run_id
, so that we can calculate impacts by comparing scenarios.
It is also important that the spread of parameters chosen across the 200 runs significantly captures the range of sensible behaviour of the model, and that the central estimates submitted are somewhere near the centre of the spread of parameters.
Lastly, it is desirable to vary as few parameters as possible, so that with 200 runs, we can see how the parameter changes affect the model outcomes. If there are too many parameters, it may be difficult to ascertain which parameters are causing the changes, or whether the spread of parameters really does capture the significant behaviour of the model.
The R client cannot currently support the full upload process of stochastic runs. Because of their large volume, and the international locations of modelling groups, dropbox provides a more robust service for uploading. It is possible, however, to work with model run parameter sets, and using the Montagu client will make it easier to produce the stochastic results (perhaps directly into a local dropbox folder), in a systematic way.
The model run parameter set has a column for run_id
, and another column for each parameter being varied. For example:
## run_id param_1 param_2
## 1 1 3 2
## 2 2 4 1
## 3 3 2 3
## 4 4 5 5
## 5 5 1 4
This can then be uploaded as follows:-
The id of the newly created parameter set will be returned.
We can also list the existing model run parameter sets with:-
## id model uploaded_by uploaded_on disease
## 1 20 YFIC katy.gaythorpe 2018-04-06T13:10:35.645Z YF
## 2 19 YFIC katy.gaythorpe 2018-03-27T10:00:52.710Z YF
## 3 15 YFIC katy.gaythorpe 2018-01-23T15:52:58.353Z YF
or for a list of information about a single parameter set:
## $id
## [1] 20
##
## $model
## [1] "YFIC"
##
## $uploaded_by
## [1] "katy.gaythorpe"
##
## $uploaded_on
## [1] "2018-04-06T13:10:35.645Z"
##
## $disease
## [1] "YF"
Finally, if we want to retrieve the parameter values for a previous set:-
data <- montagu::montagu_model_run_parameter_set_data("IC-Garske", "201710gavi-5", 20)
data[1:5,1:5]
## run_id FOI_AGO FOI_BEN FOI_BFA FOI_BDI
## 1 1 0.004549685 0.015696021 0.02084831 3.071269e-05
## 2 3 0.003717290 0.011943511 0.01713641 4.950013e-05
## 3 5 0.003522454 0.009811987 0.01321275 1.058319e-04
## 4 6 0.003485912 0.012657722 0.01655512 7.327719e-05
## 5 8 0.002722810 0.007266782 0.00851512 4.569112e-05
Montagu allows querying of basic information about all models, past and present used in VIMC. These can be listed as follows:-
## id
## 1 Yakob-YF
## 2 Winter-Rubella
## 3 PATH-JE
## 4 Harvard-Kim-HPV
## 5 Moore-JE
## 6 Perkins-YF
## description
## 1 generic 201810rfp model - to be updated if joining VIMC
## 2 generic 201810rfp model - to be updated if joining VIMC
## 3 JE model used in SDF8 and SDF12. This is a static population-based cohort model
## 4 generic 201804rfp-1 model - to be updated if joining VIMC
## 5 generic 201810rfp model - to be updated if joining VIMC
## 6 generic 201810rfp model - to be updated if joining VIMC
## citation modelling_group
## 1 <citation for the model> LSHTM-Yakob
## 2 <citation for the model> PU-Winter
## 3 <citation for the model> Suraratdecha
## 4 <citation for the model> Harvard-Kim
## 5 UND-Moore
## 6 UND-Perkins
and a list of information about a particular model by its id can be retrieved like this:-
## $id
## [1] "YFIC"
##
## $description
## [1] "YF model developed at Imperial College London, used in SDF8 and SDF12. This is a generalised linear regression model fitted to locations of reported yellow fever outbreaks or cases with anthropogenic and environmental covariates generating a risk map for yellow fever across Africa. It was then compared to the available serological data, primarily from central and eastern Africa, to assess the country-specific scale of under-reporting and obtain absolute estimates of transmission intensity in terms of a static force of infection or a basic reproduction number in a dynamical model, taking into account the population-level vaccination coverage through time. Burden and vaccine impact estimates were then generated by combining the transmission intensity with demographic information (population size and age distribution) and observed and hypothetical vaccination coverages in any given location and year."
##
## $citation
## [1] "10.1371/journal.pmed.1001638"
##
## $modelling_group
## [1] "IC-Garske"
Note that the model_id (id column above) is different from the modelling_group_id, and generally the modelling_group_id is the significant identifier required for other Montagu functions.
Occasionally, there may be multiple instances of the same demographic data field, but for different sources. For example, infant mortality is provided by both UNWPP, and IGME (ChildMortality.org). Neither sources are ideal, and Montagu provides a documented compromise between the two sources for infant, and especially neonatal mortality.
If an example arises where there are two instances of the same data field, then they will have a different data_source, which will need to be specified in the form:-
where the The-Source-Code is in the table returned by