Title: | Parametric Mixture Models for Uncertainty Estimation of Fatalities in UCDP Conflict Data |
---|---|
Description: | Provides functions for estimating uncertainty in the number of fatalities in the Uppsala Conflict Data Program (UCDP) data. The package implements a parametric reported-value Gumbel mixture distribution that accounts for the uncertainty in the number of fatalities in the UCDP data. The model is based on information from a survey on UCDP coders and how they view the uncertainty of the number of fatalities from UCDP events. The package provides functions for making random draws of fatalities from the mixture distribution, as well as to estimate percentiles, quantiles, means, and other statistics of the distribution. Full details on the survey and estimation procedure can be found in Vesco et al (2024). |
Authors: | David Randahl [cre, aut] |
Maintainer: | David Randahl <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.2 |
Built: | 2025-01-29 03:22:07 UTC |
Source: | https://github.com/doktorandahl/uncertainucdp |
Mean, median, and quantiles of the parametric uncertainty distributions for UCDP events. The parametric uncertainty distributions are based on the reported-value inflation Gumbel mixture distribution. The median
and quantile
functions are shortcuts for the quncertainUCDP
function.
mean_uncertainUCDP(fatalities, tov = c("sb", "ns", "os", "any")) median_uncertainUCDP(fatalities, tov = c("sb", "ns", "os", "any")) quantiles_unceartainUCDP(probs, fatalities, tov = c("sb", "ns", "os", "any"))
mean_uncertainUCDP(fatalities, tov = c("sb", "ns", "os", "any")) median_uncertainUCDP(fatalities, tov = c("sb", "ns", "os", "any")) quantiles_unceartainUCDP(probs, fatalities, tov = c("sb", "ns", "os", "any"))
fatalities |
A vector of non-negative integers representing the number of fatalities of the UCDP events. Non-integer values are allowed but should be considered experimental. |
tov |
A character string representing the type of violence of the UCDP. Must be one of "sb", "ns", "os", or "any". The options are: * "sb" for state-based violence * "ns" for non-state violence * "os" for one-sided violence * "any" for parameters estimated across all type of violence. This is somewhat experimental and should be used with caution. This is possibly useful when the type of violence is unknown or when the user wants to combine all types of violence into a single category. |
probs |
A numeric vector of probabilities with values in [0,1]. The quantiles to calculate. |
A numeric vector of the same length as the input vector of fatalities representing the means, medians, and quantiles of the parametric uncertainty distribution for each UCDP event.
data(ucdpged) # Calculate the mean for an arbitrary UCDP event mean_uncertainUCDP(fatalities = 100, tov = 'sb') # Calculate the mean for the first event in the UCDP GED sample mean_uncertainUCDP(ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Calculate the median for an arbitrary UCDP event median_uncertainUCDP(fatalities = 100, tov = 'sb') # Calculate the median for the first event in the UCDP GED sample median_uncertainUCDP(ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Calculate the 90th percentile for an arbitrary UCDP event quantiles_unceartainUCDP(probs = 0.9, fatalities = 100, tov = 'sb') # Calculate the 90th percentile for the first event in the UCDP GED sample quantiles_unceartainUCDP(ucdpged$best[1], 0.9, tov = ucdpged$type_of_violence[1])
data(ucdpged) # Calculate the mean for an arbitrary UCDP event mean_uncertainUCDP(fatalities = 100, tov = 'sb') # Calculate the mean for the first event in the UCDP GED sample mean_uncertainUCDP(ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Calculate the median for an arbitrary UCDP event median_uncertainUCDP(fatalities = 100, tov = 'sb') # Calculate the median for the first event in the UCDP GED sample median_uncertainUCDP(ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Calculate the 90th percentile for an arbitrary UCDP event quantiles_unceartainUCDP(probs = 0.9, fatalities = 100, tov = 'sb') # Calculate the 90th percentile for the first event in the UCDP GED sample quantiles_unceartainUCDP(ucdpged$best[1], 0.9, tov = ucdpged$type_of_violence[1])
Density, distribution, quantile and random number generation functions for the parametric reported-value inflated Gumbel mixture distribution for UCDP events. The functions estimate the parameters of the distribution based on the number of fatalities and the type of violence of the UCDP event.
runcertainUCDP(n, fatalities, tov = c("sb", "ns", "os", "any")) puncertainUCDP(q, fatalities, tov = c("sb", "ns", "os", "any")) duncertainUCDP(x, fatalities, tov = c("sb", "ns", "os", "any")) quncertainUCDP(p, fatalities, tov = c("sb", "ns", "os", "any"))
runcertainUCDP(n, fatalities, tov = c("sb", "ns", "os", "any")) puncertainUCDP(q, fatalities, tov = c("sb", "ns", "os", "any")) duncertainUCDP(x, fatalities, tov = c("sb", "ns", "os", "any")) quncertainUCDP(p, fatalities, tov = c("sb", "ns", "os", "any"))
n |
Number of observations to generate random values for |
fatalities |
A vector of non-negative integers representing the number of fatalities of the UCDP events. Non-integer values are allowed but should be considered experimental. |
tov |
A character string representing the type of violence of the UCDP. Must be one of "sb", "ns", "os", or "any". The options are: * "sb" for state-based violence * "ns" for non-state violence * "os" for one-sided violence * "any" for parameters estimated across all type of violence. This is somewhat experimental and should be used with caution. This is possibly useful when the type of violence is unknown or when the user wants to combine all types of violence into a single category. |
x , q
|
Vector of quantiles |
p |
Vector of probabilities |
The reported-value inflated Gumbel mixture distribution is a parametric distribution for modeling the uncertainty in the number of fatalities of UCDP events. The distribution is a mixture of a Gumbel distribution and a point mass at the reported number of fatalities. The distribution is estimated based on the number of fatalities and the type of violence of the UCDP event. The distribution is estimated using a set of regression models that estimate the location, scale, and weight parameters of the distribution based on the number of fatalities and the type of violence of the UCDP event.
* duncertainUCDP
gives the density function
* puncertainUCDP
gives the distribution function
* quncertainUCDP
gives the quantile function
* runcertainUCDP
generates random values as a vector of length n
data(ucdpged) # Generate 10 random values for an arbitrary UCDP event runcertainUCDP(n = 10, fatalities = 100, tov = 'sb') # Generate 10 random values for the first event in the GED sample runcertainUCDP(n = 10, fatalities = ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Obtaining the probability that an arbitrary UCDP event has at least 150 fatalities puncertainUCDP(q = 150, fatalities = 100, tov = 'ns') # Obtaining the probability that the for the first event in the GED sample has at least 5 fatalities puncertainUCDP(q = 5, fatalities = ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Obtaining the 90th percentile for an arbitrary UCDP event and one-sided violence quncertainUCDP(p = 0.9, fatalities = 100, tov = 'os') # Obtaining the 90th percentile for the first event in the GED sample quncertainUCDP(p = 0.9, fatalities = ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Obtaining the density for an arbitrary UCDP event and state-based violence duncertainUCDP(x = seq(from = 0, to = 500), fatalities = 100, tov = 'sb') # Obtaining the density for the first event in the GED sample duncertainUCDP(x = seq(0, 50), fatalities = ucdpged$best[1], tov = ucdpged$type_of_violence[1])
data(ucdpged) # Generate 10 random values for an arbitrary UCDP event runcertainUCDP(n = 10, fatalities = 100, tov = 'sb') # Generate 10 random values for the first event in the GED sample runcertainUCDP(n = 10, fatalities = ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Obtaining the probability that an arbitrary UCDP event has at least 150 fatalities puncertainUCDP(q = 150, fatalities = 100, tov = 'ns') # Obtaining the probability that the for the first event in the GED sample has at least 5 fatalities puncertainUCDP(q = 5, fatalities = ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Obtaining the 90th percentile for an arbitrary UCDP event and one-sided violence quncertainUCDP(p = 0.9, fatalities = 100, tov = 'os') # Obtaining the 90th percentile for the first event in the GED sample quncertainUCDP(p = 0.9, fatalities = ucdpged$best[1], tov = ucdpged$type_of_violence[1]) # Obtaining the density for an arbitrary UCDP event and state-based violence duncertainUCDP(x = seq(from = 0, to = 500), fatalities = 100, tov = 'sb') # Obtaining the density for the first event in the GED sample duncertainUCDP(x = seq(0, 50), fatalities = ucdpged$best[1], tov = ucdpged$type_of_violence[1])
A sample of the UCDP Georeferenced Event Dataset (GED) from the 2023 data release. The data contains information about the date, location, and type of conflict events. The data is a sample of the full dataset, which can be downloaded from the UCDP website <https://ucdp.uu.se/downloads/>.
ucdpged
ucdpged
a tibble with 1000 rows and 49 columns
<https://ucdp.uu.se/downloads/>
Extracting parameters for the reported-value inflated Gumbel mixture distribution for UCDP events. Primarily intended for internal use by the uncertainUCDP-functions, but can be used to extract parameters for the distribution manually.
uncertainUCDP_parameters(fatalities, tov)
uncertainUCDP_parameters(fatalities, tov)
fatalities |
A vector of non-negative integers representing the number of fatalities of the UCDP event. Non-integer values are allowed but should be considered experimental |
tov |
A character string or integer value representing the type of violence of the UCDP. Must be one of "sb", "ns", "os", or "any" or their numeric equivalent The options are: * "sb" or 1 for state-based violence * "ns" or 2 for non-state violence * "os" or 3 for one-sided violence * "any" or 4 for parameters estimated across all type of violence. This is somewhat experimental and should be used with caution. This is possibly useful when the type of violence is unknown or when the user wants to combine all types of violence into a single category. |
A list with three elements: loc, scale, and w. loc and scale are the location and scale parameters of the Gumbel distribution, respectively. w is the weight parameter for the reported-value inflation