Title: | Species Richness Estimation and Modeling |
---|---|
Description: | Understanding the drivers of microbial diversity is an important frontier of microbial ecology, and investigating the diversity of samples from microbial ecosystems is a common step in any microbiome analysis. 'breakaway' is the premier package for statistical analysis of microbial diversity. 'breakaway' implements the latest and greatest estimates of species richness, described in Willis and Bunge (2015) <doi:10.1111/biom.12332>, Willis et al. (2017) <doi:10.1111/rssc.12206>, and Willis (2016) <arXiv:1604.02598>, as well as the most commonly used estimates, including the objective Bayes approach described in Barger and Bunge (2010) <doi:10.1214/10-BA527>. |
Authors: | Amy D Willis [aut, cre], Bryan D Martin [aut], Pauline Trinh [aut], Sarah Teichman [aut], David Clausen [aut], Kathryn Barger [aut], John Bunge [aut] |
Maintainer: | Amy D Willis <[email protected]> |
License: | GPL-2 |
Version: | 4.8.4 |
Built: | 2025-02-23 04:19:23 UTC |
Source: | https://github.com/adw96/breakaway |
Build objects of class alpha_estimate from their components. alpha_estimate()
is a constructor method
alpha_estimate( estimate = NULL, error = NULL, estimand = NULL, name = NULL, interval = NULL, interval_type = NULL, type = NULL, model = NULL, warnings = NULL, frequentist = NULL, parametric = NULL, plot = NULL, reasonable = NULL, other = NULL, ... )
alpha_estimate( estimate = NULL, error = NULL, estimand = NULL, name = NULL, interval = NULL, interval_type = NULL, type = NULL, model = NULL, warnings = NULL, frequentist = NULL, parametric = NULL, plot = NULL, reasonable = NULL, other = NULL, ... )
estimate |
The estimate |
error |
The standard error in the estimate |
estimand |
What is the estimate trying to estimate? (richness, Shannon...) |
name |
The name of the method |
interval |
An interval estimate |
interval_type |
Type of interval estimate |
type |
TODO(Amy): Deprecate? |
model |
What model is fit |
warnings |
Any warnings? |
frequentist |
Logical. Frequentist or Bayesian? |
parametric |
Logical. Parametric or not? |
plot |
A ggplot associated with the estimate |
reasonable |
Is the estimate likely to be reasonable? |
other |
Any other relevant objects |
... |
Any other objects |
An object of class alpha_estimate
Build objects of class alpha_estimates from their components. alpha_estimates()
is a constructor method
alpha_estimates(...)
alpha_estimates(...)
... |
Objects of class alpha_estimate, or a list of objects of class alpha_estimate |
An object of class alpha_estimates
(Data) Frequency count table of soil microbes in an apples orchard.
apples
apples
A data frame with 88 rows and 2 variables:
an index variable
number of taxa that were observed with this frequency
...
Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. Biometrics, 71(4), 1042–1049. doi:10.1111/biom.12332
Walsh, F. et al. (2014). (2014). Restricted streptomycin use in apple orchards did not adversely alter the soil bacteria communities. Frontiers in Microbiology 4, 383.
This function tests for heterogeneity of total diversity (observed plus unobserved) across multiple sites. It can account or test for fixed effects that may explain diversity. It returns the significance of the covariates in explaining diversity and a hypothesis test for heterogeneity.
betta( chats = NULL, ses, X = NULL, initial_est = NULL, formula = NULL, data = NULL, p.digits = 3 )
betta( chats = NULL, ses, X = NULL, initial_est = NULL, formula = NULL, data = NULL, p.digits = 3 )
chats |
A vector of estimates of total diversity at different sampling locations. ‘breakaway’ estimates are suggested in the high-diversity case but not enforced. |
ses |
The standard errors in |
X |
A numeric matrix of covariates. If not supplied, an intercept-only
model will be fit. This is optional with the |
initial_est |
(Optional) A vector of length 1 + ncol(X) giving the starting values for the likelihood maximisation search. The first element is the starting estimate for sigma^2_u, and the remaining elements are the starting elements for beta. Defaults to NULL, in which case the starting values outlined in the paper are used. |
formula |
A formula object of the form |
data |
A dataframe containing the response, response standard errors, covariates,
and grouping variable. Required with the |
p.digits |
(Optional) A number that specifies the number of digits to which p-values will be rounded. The default value is 3 digits. |
table |
A coefficient table for the model parameters. The columns give the parameter estimates, standard errors, and p-values, respectively. This model is only as effective as your diversity estimation procedure; for this reason please confirm that your estimates are appropriate and that your model is not misspecified. betta_pic may be useful for this purpose. |
cov |
Estimated covariance matrix of the parameter estimates. |
ssq_u |
The estimate of the heterogeneity variance. |
homogeneity |
The test statistic and p-value for the test of homogeneity. |
global |
The test statistic and p-value for the test of model explanatory power. |
blups |
The conditional expected values of the diversity estimates (conditional on the random effects). The authors propose that if the practitioner believes that information from one diversity estimator may inform the others, then using the ‘condfits’ as estimators of total diversity rather than ‘Chats’ may reduce variance of diversity estimates by “sharing strength” across the samples. |
blupses |
The estimated standard deviation (standard errors) in the blups. |
loglikelihood |
The log likelihood of the fitted model. |
aic |
The Akaike information criterion for the fitted model. |
aicc |
The finite sample correction of the Akaike information criterion for the fitted model. |
r_squared_wls |
The weighted R^2 statistic, appropriate for heteroskedastic linear models. |
function.args |
A list containing values initially passed to betta_random. |
Ecologists who are interested in the way species richness varies with
covariate information often run a regression-type analysis on the observed
diversity using their covariate information as predictors. However, in many
settings (especially microbial), rare and unobserved taxa play a hugely
important role in explaining the subtleties of the ecosystem, however, a
regression analysis on the observed diversity level fails to account for
these unobserved taxa. By predicting the total level of diversity (for
example, via breakaway
) and estimating the standard error in
the estimate, one can take account of these unobserved, but important, taxa.
In order to account for the estimated nature of the response, a mixed model
approach is taken, whereby the varying levels of confidence in the estimates
contributes to a diagonal but heteroscedastic covariance matrix. Given
covariates constitute the fixed effects in the mixed model, and significance
of the random effect term “sigsq_u” reflects heterogeneity in the sample,
that is, variability that cannot be explained by only the covariates. The
authors believe this to be the first attempt at modelling total diversity in
a way that accounts for its estimated nature.
Amy Willis
Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. arXiv preprint.
Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. Biometrics.
breakaway
; breakaway_nof1
;
apples
df <- data.frame(chats = c(2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180), Cont_var = c(100, 150, 100, 50)) # formula notation betta(formula = chats ~ Cont_var, ses = ses, data = df) # direct input betta(c(2000, 3000, 4000, 3000), c(100, 200, 150, 180), cbind(1, c(100, 150, 100, 50))) ## handles missing data betta(c(2000, 3000, 4000, 3000), c(100, 200, 150, NA)) ## A test for heterogeneity of apples diversity estimates vs butterfly estimates betta(c(1552, 1500, 884), c(305, 675, 205), cbind(1, c(0, 0, 1)))
df <- data.frame(chats = c(2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180), Cont_var = c(100, 150, 100, 50)) # formula notation betta(formula = chats ~ Cont_var, ses = ses, data = df) # direct input betta(c(2000, 3000, 4000, 3000), c(100, 200, 150, 180), cbind(1, c(100, 150, 100, 50))) ## handles missing data betta(c(2000, 3000, 4000, 3000), c(100, 200, 150, NA)) ## A test for heterogeneity of apples diversity estimates vs butterfly estimates betta(c(1552, 1500, 884), c(305, 675, 205), cbind(1, c(0, 0, 1)))
This function provides point estimates, standard errors, and equal-tailed confidence intervals for linear combinations of fixed effects estimated via betta() or betta_random(). A p-value for a Wald test of the null that the linear combination of effects is equal to zero (against a general alternative) is also returned.
betta_lincom(fitted_betta, linear_com, signif_cutoff = 0.05)
betta_lincom(fitted_betta, linear_com, signif_cutoff = 0.05)
fitted_betta |
A fitted betta object – i.e., the output of either betta() or betta_random() – containing fixed effect estimates of interest. |
linear_com |
The linear combination of fixed effects for which a point estimate, confidence interval, and hypothesis test are to be produced. |
signif_cutoff |
The type-I significance threshold for confidence intervals. Defaults to 0.05. |
table |
A table containing a point estimate, standard error, lower and upper confidence bounds, and a p-value for the linear combination of fixed effects specified in input. The p-value is generated via a two-sided Wald test of the null that the linear combination of fixed effects is equal to zero. |
David Clausen
Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. arXiv preprint.
# generate example data df <- data.frame(chats = c(2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180), Cont_var = c(100, 150, 100, 50)) # fit betta() example_fit <- betta(formula = chats ~ Cont_var, ses = ses, data = df) # generate point estimate and 95% CI for mean richness at Cont_var = 125 betta_lincom(fitted_betta = example_fit, linear_com = c(1, 125)) # this tells betta_lincom to estimate value of beta_0 + 125*beta_1, # where beta_0 is the intercept, and beta_1 is the (true value of the) coefficient on Cont_var
# generate example data df <- data.frame(chats = c(2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180), Cont_var = c(100, 150, 100, 50)) # fit betta() example_fit <- betta(formula = chats ~ Cont_var, ses = ses, data = df) # generate point estimate and 95% CI for mean richness at Cont_var = 125 betta_lincom(fitted_betta = example_fit, linear_com = c(1, 125)) # this tells betta_lincom to estimate value of beta_0 + 125*beta_1, # where beta_0 is the intercept, and beta_1 is the (true value of the) coefficient on Cont_var
A simple plotting interface for comparing total diversity across samples or a covariate gradient.
betta_pic( y, se, x = 1:length(y), ylimu = NULL, myy = NULL, mymain = NULL, mycol = NULL, labs = NULL, mypch = NULL, myxlim = NULL )
betta_pic( y, se, x = 1:length(y), ylimu = NULL, myy = NULL, mymain = NULL, mycol = NULL, labs = NULL, mypch = NULL, myxlim = NULL )
y |
A vector of estimates of total diversity. Other parameter estimates are accessible; this method may be used for plotting any parameter estimates.. |
se |
The standard errors in ‘y’, the diversity (or other parameter's) estimates. |
x |
A vector of covariates to form the x-coordinates of the intervals. If no argument is given, defaults to the order. |
ylimu |
The upper endpoint of the y-axis. |
myy |
Deprecated, for backwards compatibility |
mymain |
Deprecated, for backwards compatibility |
mycol |
Deprecated, for backwards compatibility |
labs |
Deprecated, for backwards compatibility |
mypch |
Deprecated, for backwards compatibility |
myxlim |
Deprecated, for backwards compatibility |
A ggplot object.
Amy Willis
Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. arXiv preprint.
betta_pic(c(1552, 1500, 884), c(305, 675, 205), mymain = "Example title")
betta_pic(c(1552, 1500, 884), c(305, 675, 205), mymain = "Example title")
This function extends betta() to permit random effects modelling.
betta_random( chats = NULL, ses, X = NULL, groups = NULL, formula = NULL, data = NULL, p.digits = 3 )
betta_random( chats = NULL, ses, X = NULL, groups = NULL, formula = NULL, data = NULL, p.digits = 3 )
chats |
A vector of estimates of total diversity at different sampling
locations. Required with the |
ses |
The standard errors in |
X |
A numeric matrix of covariates corresponding to fixed effects. If
not supplied, an intercept-only model will be fit. Optional with the |
groups |
A categorical variable representing the random-effects groups
that each of the estimates belong to. Required with the |
formula |
A formula object of the form |
data |
A dataframe containing the response, response standard errors, covariates,
and grouping variable. Required with the |
p.digits |
(Optional) A number that specifies the number of digits to which p-values will be rounded. The default value is 3 digits. |
table |
A coefficient table for the model parameters. The columns give the parameter estimates, standard errors, and p-values, respectively. This model is only as effective as your diversity estimation procedure; for this reason please confirm that your estimates are appropriate and that your model is not misspecified. betta_pic may be useful for this purpose. |
cov |
Estimated covariance matrix of the parameter estimates. |
ssq_u |
The estimate of the heterogeneity variance. |
ssq_g |
Estimates of within-group variance. The estimate will be zero for groups with only one observation. |
homogeneity |
The test statistic and p-value for the test of homogeneity. |
global |
The test statistic and p-value for the test of model explanatory power. |
blups |
The conditional expected values of the diversity estimates (conditional on the random effects). Estimates of variability for the random effects case are unavailable at this time; please contact the maintainer if needed. |
loglikelihood |
The log likelihood of the fitted model. |
aic |
The Akaike information criterion for the fitted model. |
aicc |
The finite sample correction of the Akaike information criterion for the fitted model. |
function.args |
A list containing values initially passed to betta_random. |
Amy Willis
Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. arXiv preprint.
df <- data.frame(chats = c(2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180), Cont_var = c(100, 150, 100, 50), groups = c("a", "a", "a", "b")) # formula notation betta_random(formula = chats ~ Cont_var| groups, ses = ses, data = df) # direct input betta_random(c(2000, 3000, 4000, 3000), c(100, 200, 150, 180), X = cbind(Int = 1, Cont_var = c(100, 150, 100, 50)), groups = c("a", "a", "a", "b")) ## handles missing data betta_random(c(2000, 3000, 4000, 3000), c(100, 200, 150, NA), groups = c("a", NA, "b", "b"))
df <- data.frame(chats = c(2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180), Cont_var = c(100, 150, 100, 50), groups = c("a", "a", "a", "b")) # formula notation betta_random(formula = chats ~ Cont_var| groups, ses = ses, data = df) # direct input betta_random(c(2000, 3000, 4000, 3000), c(100, 200, 150, 180), X = cbind(Int = 1, Cont_var = c(100, 150, 100, 50)), groups = c("a", "a", "a", "b")) ## handles missing data betta_random(c(2000, 3000, 4000, 3000), c(100, 200, 150, NA), groups = c("a", NA, "b", "b"))
breakaway is a wrapper for modern species richness estimation for modern datasets
breakaway( input_data, cutoff = NA, output = NULL, plot = NULL, answers = NULL, print = NULL, ... )
breakaway( input_data, cutoff = NA, output = NULL, plot = NULL, answers = NULL, print = NULL, ... )
input_data |
An input type that can be processed by |
cutoff |
The maximum frequency count to use for model fitting |
output |
Deprecated; only for backwards compatibility |
plot |
Deprecated; only for backwards compatibility |
answers |
Deprecated; only for backwards compatibility |
print |
Deprecated; only for backwards compatibility |
... |
Additional arguments will be ignored; this is for backward compatibility |
An object of class alpha_estimate
‘breakaway’ presents an estimator of species richness that is
well-suited to the high-diversity/microbial setting. However, many microbial
datasets display more diversity than the Kemp-type models can permit. In
this case, the log-transformed WLRM diversity estimator of Rocchetti et. al.
(2011) is returned. The authors' experience suggests that some datasets that
require the log-transformed WLRM contain “false” diversity, that is,
diversity attributable to sequencing errors (via an inflated singleton
count). The authors encourage judicious use of diversity estimators when the
dataset may contain these errors, and recommend the use of
breakaway_nof1
as an exploratory tool in this case.
Amy Willis
Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. Biometrics, 71(4), 1042–1049.
breakaway(apples) breakaway(apples, plot = FALSE, output = FALSE, answers = TRUE)
breakaway(apples) breakaway(apples, plot = FALSE, output = FALSE, answers = TRUE)
This function permits estimation of total diversity based on a sample
frequency count table. Unlike breakaway
, it does not require
an input for the number of species observed once, making it an excellent
exploratory tool for microbial ecologists who believe that their sample may
contain spurious singletons. The underlying estimation procedure is similar
to that of breakaway
and is outlined in Willis & Bunge (2014).
The diversity estimate, standard error, estimated model coefficients and
plot of the fitted model are returned.
breakaway_nof1( input_data, output = NULL, plot = NULL, answers = NULL, print = NULL )
breakaway_nof1( input_data, output = NULL, plot = NULL, answers = NULL, print = NULL )
input_data |
An input type that can be processed by |
output |
Deprecated; only for backwards compatibility |
plot |
Deprecated; only for backwards compatibility |
answers |
Deprecated; only for backwards compatibility |
print |
Deprecated; only for backwards compatibility |
An object of class alpha_estimate
code |
A category representing algorithm behaviour. ‘code=1’ indicates no nonlinear models converged and the transformed WLRM diversity estimate of Rocchetti et. al. (2011) is returned. ‘code=2’ indicates that the iteratively reweighted model converged and was returned. ‘code=3’ indicates that iterative reweighting did not converge but a model based on a simplified variance structure was returned (in this case, the variance of the frequency ratios is assumed to be proportional to the denominator frequency index). Please peruse your fitted model before using your diversity estimate. |
name |
The “name” of the selected model. The first integer represents the numerator polynomial degree and the second integer represents the denominator polynomial degree. See Willis & Bunge (2014) for details. |
para |
Estimated model parameters and standard errors. |
est |
The estimate of total (observed plus unobserved) diversity. |
seest |
The standard error in the diversity estimate. |
full |
The chosen nonlinear model for frequency ratios. |
It is common for microbial ecologists to believe that their dataset contains false diversity. This often arises because sequencing errors result in classifying already observed organisms as new organisms. ‘breakaway_nof1’ was developed as an exploratory tool in this case. Practitioners can run ‘breakaway’ on their dataset including the singletons, and ‘breakaway_nof1’ on their dataset excluding the singletons, and assess if the estimated levels of diversity are very different. Great disparity may provide evidence of an inflated singleton count, or at the very least, that ‘breakaway’ is especially sensitive to the number of rare species observed. Note that ‘breakaway_nof1’ may be less stable than ‘breakaway’ due to predicting based on a reduced dataset, and have greater standard errors.
Amy Willis
Willis, A. (2015). Species richness estimation with high diversity but spurious singletons. arXiv.
Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. Biometrics.
breakaway_nof1(apples[-1, ]) breakaway_nof1(apples[-1, ], plot = FALSE, output = FALSE, answers = TRUE)
breakaway_nof1(apples[-1, ]) breakaway_nof1(apples[-1, ], plot = FALSE, output = FALSE, answers = TRUE)
Build frequency count tables from an OTU table
build_frequency_count_tables(the_table)
build_frequency_count_tables(the_table)
the_table |
An OTU table as a data frame or a matrix. Columns are the samples and rows give the taxa. |
A list of frequency count tables corresponding to the columns.
This function implements the species richness estimation procedure outlined in Chao & Bunge (2002).
chao_bunge(input_data, cutoff = 10, output = NULL, answers = NULL)
chao_bunge(input_data, cutoff = 10, output = NULL, answers = NULL)
input_data |
An input type that can be processed by |
cutoff |
The maximum frequency to use in fitting. |
output |
Deprecated; only for backwards compatibility |
answers |
Deprecated; only for backwards compatibility |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
Amy Willis
chao_bunge(apples)
chao_bunge(apples)
The Chao-Shen estimate of Shannon diversity
chao_shen(input_data)
chao_shen(input_data)
input_data |
An input type that can be processed by |
An object of class alpha_estimate
This function implements the Chao1 richness estimate, which is often mistakenly referred to as an index.
chao1(input_data, output = NULL, answers = NULL)
chao1(input_data, output = NULL, answers = NULL)
input_data |
An input type that can be processed by |
output |
Deprecated; only for backwards compatibility |
answers |
Deprecated; only for backwards compatibility |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
The authors of this package strongly discourage the use of this estimator. It is only valid when you wish to assume that every taxa has equal probability of being observed. You don't really think that's possible, do you?
Amy Willis
chao1(apples)
chao1(apples)
This function implements the bias-corrected Chao1 richness estimate.
chao1_bc(input_data, output = NULL, answers = NULL)
chao1_bc(input_data, output = NULL, answers = NULL)
input_data |
An input type that can be processed by |
output |
Deprecated; only for backwards compatibility |
answers |
Deprecated; only for backwards compatibility |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
The authors of this package strongly discourage the use of this estimator. It is only valid when you wish to assume that every taxa has equal probability of being observed. You don't really think that's possible, do you?
Amy Willis
chao1_bc(apples)
chao1_bc(apples)
Run some basic checks on a possible frequency count table
check_format(output_data)
check_format(output_data)
output_data |
A matrix to test |
A checked frequency count table
Inputs slated for development include phyloseq and otu_table
convert(input_data)
convert(input_data)
input_data |
Supported types include filenames, data frames, matrices, vectors... |
Frequency count able
Find a cut-off for estimates relying on contiguous counts
cutoff_wrap(my_data, requested = NA)
cutoff_wrap(my_data, requested = NA)
my_data |
Frequency count table |
requested |
The user-requested cutoff |
Cutoff value
This function performs an F-test of a null hypothesis LB = 0 where B is a vector of p fixed effects returned by betta() or betta_random() and L is an m x p matrix with linearly independent rows.
F_test(fitted_betta, L, method = "bootstrap", nboot = 1000)
F_test(fitted_betta, L, method = "bootstrap", nboot = 1000)
fitted_betta |
A fitted betta object – i.e., the output of either betta() or betta_random() – containing fixed effect estimates of interest. |
L |
An m x p matrix defining the null LB = 0. L must have full row rank. |
method |
A character variable indicating which method should be used to estimate the distribution of the test statistic under the null. |
nboot |
Number of bootstrap samples to use if method = "bootstrap". Ignored if method = "asymptotic". |
A list containing
pval |
The p-value |
F_stat |
The calculated F statistic |
boot_F |
A vector of bootstrapped F statistics if bootstrap has been used. Otherwise NULL. |
David Clausen
Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. arXiv preprint.
# generate example data df <- data.frame(chats = c(2000, 3000, 4000, 3000, 2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180, 100, 200, 150, 180), Cont_var = c(100, 150, 100, 50, 100, 150, 100, 50), Cont_var_2 = c(50,200,25,125, 50,200,25,125)) # fit betta() example_fit <- betta(formula = chats ~ Cont_var + Cont_var_2, ses = ses, data = df) # construct L for hypothesis that B_cont_var = B_cont_var_2 = 0 L <- rbind(c(0,1,0), c(0,0,1)) F_test_results <- F_test(example_fit, L, nboot = 100)
# generate example data df <- data.frame(chats = c(2000, 3000, 4000, 3000, 2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180, 100, 200, 150, 180), Cont_var = c(100, 150, 100, 50, 100, 150, 100, 50), Cont_var_2 = c(50,200,25,125, 50,200,25,125)) # fit betta() example_fit <- betta(formula = chats ~ Cont_var + Cont_var_2, ses = ses, data = df) # construct L for hypothesis that B_cont_var = B_cont_var_2 = 0 L <- rbind(c(0,1,0), c(0,0,1)) F_test_results <- F_test(example_fit, L, nboot = 100)
This function calculates an F statistic for a test of null hypothesis LB = 0 (against an unrestricted alternative) where B is a vector of p fixed effects returned by betta() or betta_random() and L is an m x p matrix with linearly independent rows.
get_F_stat(fitted_betta, L)
get_F_stat(fitted_betta, L)
fitted_betta |
A fitted betta object – i.e., the output of either betta() or betta_random() – containing fixed effect estimates of interest. |
L |
An m x p matrix defining the null LB = 0. L must have full row rank. |
A list containing
F_stat |
The calculated F statistic |
The Good-Turing estimate of species richness
good_turing(input_data)
good_turing(input_data)
input_data |
An input type that can be processed by |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
(Data) Frequency count table of soil microbes in Hawaii.
hawaii
hawaii
A data frame with 198 rows and 2 variables:
an index variable
number of taxa that were observed with this frequency
...
Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. Biometrics, 71(4), 1042–1049. doi:10.1111/biom.12332
This function implements the species richness estimation procedure outlined in Willis & Bunge (2015). The diversity estimate, standard error, estimated model coefficients, model details and plot of the fitted model are returned.
kemp( input_data, cutoff = NA, output = NULL, plot = NULL, answers = NULL, print = NULL, ... )
kemp( input_data, cutoff = NA, output = NULL, plot = NULL, answers = NULL, print = NULL, ... )
input_data |
An input type that can be processed by |
cutoff |
The maximum frequency count to use for model fitting |
output |
Deprecated; only for backwards compatibility |
plot |
Deprecated; only for backwards compatibility |
answers |
Deprecated; only for backwards compatibility |
print |
Deprecated; only for backwards compatibility |
... |
Additional arguments will be ignored; this is for backward compatibility |
An object of class alpha_estimate
code |
A category representing algorithm behaviour. ‘code=1’ indicates no nonlinear models converged and the transformed WLRM diversity estimate of Rocchetti et. al. (2011) is returned. ‘code=2’ indicates that the iteratively reweighted model converged and was returned. ‘code=3’ indicates that iterative reweighting did not converge but a model based on a simplified variance structure was returned (in this case, the variance of the frequency ratios is assumed to be proportional to the denominator frequency index). Please peruse your fitted model before using your diversity estimate. |
name |
The “name” of the selected model. The first integer represents the numerator polynomial degree and the second integer represents the denominator polynomial degree of the model for the frequency ratios. See Willis & Bunge (2015) for details. |
para |
Estimated model parameters and standard errors. |
est |
The estimate of total (observed plus unobserved) diversity. |
seest |
The standard error in the diversity estimate. |
full |
The chosen nonlinear model for frequency ratios. |
ci |
An asymmetric 95% confidence interval for diversity. |
Amy Willis
Willis, A. and Bunge, J. (2015). Estimating diversity via frequency ratios. Biometrics, 71(4), 1042–1049.
Rocchetti, I., Bunge, J. and Bohning, D. (2011). Population size estimation based upon ratios of recapture probabilities. Annals of Applied Statistics, 5.
breakaway
; breakaway_nof1
; apples
kemp(apples) kemp(apples, plot = FALSE, output = FALSE, answers = TRUE)
kemp(apples) kemp(apples, plot = FALSE, output = FALSE, answers = TRUE)
Make design matrix
make_design_matrix(phyloseq_object, variables)
make_design_matrix(phyloseq_object, variables)
phyloseq_object |
A phyloseq object |
variables |
variable names |
A matrix object giving the design matrix from the desired variables.
Draw frequency count subtables from an OTU table
make_frequency_count_table(labels)
make_frequency_count_table(labels)
labels |
A vector of counts of the taxa; i.e. a vector giving the number of times each taxon was observed. |
A frequency count table.
Estimate species richness with an objective Bayes method using a geometric model
objective_bayes_geometric( data, output = TRUE, plot = TRUE, answers = FALSE, tau = 10, burn.in = 100, iterations = 2500, Metropolis.stdev.N = 75, Metropolis.start.theta = 1, Metropolis.stdev.theta = 0.3 )
objective_bayes_geometric( data, output = TRUE, plot = TRUE, answers = FALSE, tau = 10, burn.in = 100, iterations = 2500, Metropolis.stdev.N = 75, Metropolis.start.theta = 1, Metropolis.stdev.theta = 0.3 )
data |
TODO(Kathryn)(Kathryn) |
output |
TODO(Kathryn)(Kathryn) |
plot |
TODO(Kathryn)(Kathryn) |
answers |
TODO(Kathryn)(Kathryn) |
tau |
TODO(Kathryn) |
burn.in |
TODO(Kathryn) |
iterations |
TODO(Kathryn) |
Metropolis.stdev.N |
TODO(Kathryn) |
Metropolis.start.theta |
TODO(Kathryn) |
Metropolis.stdev.theta |
TODO(Kathryn) |
A list of results, including
est |
the median of estimates of N |
,
ci |
a confidence interval for N |
,
mean |
the mean of estimates of N |
,
semeanest |
the standard error of mean estimates |
,
dic |
the DIC of the model |
,
fits |
fitted values |
, and
diagnostics |
model diagonstics |
.
Objective Bayes species richness estimate with the mixed-geometric model
objective_bayes_mixedgeo( data, output = TRUE, plot = TRUE, answers = FALSE, tau = 10, burn.in = 100, iterations = 2500, Metropolis.stdev.N = 100, Metropolis.start.T1 = 1, Metropolis.stdev.T1 = 2, Metropolis.start.T2 = 3, Metropolis.stdev.T2 = 2, bars = 3 )
objective_bayes_mixedgeo( data, output = TRUE, plot = TRUE, answers = FALSE, tau = 10, burn.in = 100, iterations = 2500, Metropolis.stdev.N = 100, Metropolis.start.T1 = 1, Metropolis.stdev.T1 = 2, Metropolis.start.T2 = 3, Metropolis.stdev.T2 = 2, bars = 3 )
data |
TODO(Kathryn) |
output |
TODO(Kathryn) |
plot |
TODO(Kathryn) |
answers |
TODO(Kathryn) |
tau |
TODO(Kathryn) |
burn.in |
TODO(Kathryn) |
iterations |
TODO(Kathryn) |
Metropolis.stdev.N |
TODO(Kathryn) |
Metropolis.start.T1 |
TODO(Kathryn) |
Metropolis.stdev.T1 |
TODO(Kathryn) |
Metropolis.start.T2 |
TODO(Kathryn) |
Metropolis.stdev.T2 |
TODO(Kathryn) |
bars |
TODO(Kathryn) |
A list of results, including
est |
the median of estimates of N |
,
ci |
a confidence interval for N |
,
mean |
the mean of estimates of N |
,
semeanest |
the standard error of mean estimates |
,
dic |
the DIC of the model |
,
fits |
fitted values |
, and
diagnostics |
model diagonstics |
.
Objective Bayes species richness estimate with the Negative Binomial model
objective_bayes_negbin( data, output = TRUE, plot = TRUE, answers = FALSE, tau = 10, burn.in = 1000, iterations = 5000, Metropolis.stdev.N = 100, Metropolis.start.T1 = -0.8, Metropolis.stdev.T1 = 0.01, Metropolis.start.T2 = 0.8, Metropolis.stdev.T2 = 0.01, bars = 5 )
objective_bayes_negbin( data, output = TRUE, plot = TRUE, answers = FALSE, tau = 10, burn.in = 1000, iterations = 5000, Metropolis.stdev.N = 100, Metropolis.start.T1 = -0.8, Metropolis.stdev.T1 = 0.01, Metropolis.start.T2 = 0.8, Metropolis.stdev.T2 = 0.01, bars = 5 )
data |
TODO(Kathryn) |
output |
TODO(Kathryn) |
plot |
TODO(Kathryn) |
answers |
TODO(Kathryn) |
tau |
TODO(Kathryn) |
burn.in |
TODO(Kathryn) |
iterations |
TODO(Kathryn) |
Metropolis.stdev.N |
TODO(Kathryn) |
Metropolis.start.T1 |
TODO(Kathryn) |
Metropolis.stdev.T1 |
TODO(Kathryn) |
Metropolis.start.T2 |
TODO(Kathryn) |
Metropolis.stdev.T2 |
TODO(Kathryn) |
bars |
TODO(Kathryn) |
A list of results, including
est |
the median of estimates of N |
,
ci |
a confidence interval for N |
,
mean |
the mean of estimates of N |
,
semeanest |
the standard error of mean estimates |
,
dic |
the DIC of the model |
,
fits |
fitted values |
, and
diagnostics |
model diagonstics |
.
Objective Bayes species richness estimate with the Poisson model
objective_bayes_poisson( data, output = TRUE, plot = TRUE, answers = FALSE, tau = 10, burn.in = 100, iterations = 2500, Metropolis.stdev.N = 75, Metropolis.start.lambda = 1, Metropolis.stdev.lambda = 0.3, bars = 5 )
objective_bayes_poisson( data, output = TRUE, plot = TRUE, answers = FALSE, tau = 10, burn.in = 100, iterations = 2500, Metropolis.stdev.N = 75, Metropolis.start.lambda = 1, Metropolis.stdev.lambda = 0.3, bars = 5 )
data |
TODO(Kathryn) |
output |
TODO(Kathryn) |
plot |
TODO(Kathryn) |
answers |
TODO(Kathryn) |
tau |
TODO(Kathryn) |
burn.in |
TODO(Kathryn) |
iterations |
TODO(Kathryn) |
Metropolis.stdev.N |
TODO(Kathryn) |
Metropolis.start.lambda |
TODO(Kathryn) |
Metropolis.stdev.lambda |
TODO(Kathryn) |
bars |
TODO(Kathryn) |
A list of results, including
est |
the median of estimates of N |
,
ci |
a confidence interval for N |
,
mean |
the mean of estimates of N |
,
semeanest |
the standard error of mean estimates |
,
dic |
the DIC of the model |
,
fits |
fitted values |
, and
diagnostics |
model diagonstics |
.
(Data) Data frame of covariate information about pasolli_et_al.
pasolli_et_al
pasolli_et_al
A data frame with 4930 rows and 9 variables:
sample ID
average abundance in the sample
standard deviation
first quartile
median abundance
third quartile
minimum
maximum
number of samples
...
Wrapper for phyloseq objects
physeq_wrap(fn, physeq, ...)
physeq_wrap(fn, physeq, ...)
fn |
alpha diversity estimator function with |
physeq |
A |
... |
Additional arguments for fn |
Object of class alpha_estimates
Plot function for alpha_estimates class
## S3 method for class 'alpha_estimates' plot( x, physeq = NULL, measure = NULL, color = NULL, shape = NULL, title = NULL, trim_plot = FALSE, ... )
## S3 method for class 'alpha_estimates' plot( x, physeq = NULL, measure = NULL, color = NULL, shape = NULL, title = NULL, trim_plot = FALSE, ... )
x |
Object of class |
physeq |
(Optional). Default |
measure |
(Optional). If there are multiple richness measures included in |
color |
(Optional). Default |
shape |
(Optional). Default |
title |
(Optional). Default NULL. Character string. The main title for the graphic. |
trim_plot |
(Optional). Default |
... |
See details |
... does not currently have any implemented options. Optional arguments currently include "trim_plot", a Optional
A ggplot object.
library(phyloseq) data(GlobalPatterns) alphas <- breakaway(GlobalPatterns) plot(alphas)
library(phyloseq) data(GlobalPatterns) alphas <- breakaway(GlobalPatterns) plot(alphas)
A model to estimate the number of missing taxa under a Poisson Model
poisson_model(input_data, cutoff = 10)
poisson_model(input_data, cutoff = 10)
input_data |
A frequency count table |
cutoff |
The largest frequency to use for predicting f0. Defaults to 10. |
An object of class alpha_estimate
.
A model to estimate the number of missing taxa under a zero- and one-truncated Poisson Model
poisson_model_nof1(input_data, cutoff = 10)
poisson_model_nof1(input_data, cutoff = 10)
input_data |
A frequency count table |
cutoff |
The largest frequency to use for predicting f0 and f1. Defaults to 10. |
An object of class alpha_estimate
.
OTU table to relative abundances
proportions_instead(the_table)
proportions_instead(the_table)
the_table |
An OTU table |
A proportion table or vector.
Simulate a frequency count table based on a negative binomial model. Zero-truncated, obviously.
rnbinomtable(C, size, probability)
rnbinomtable(C, size, probability)
C |
species richness |
size |
size parameter for the negative binomial distribution |
probability |
probability parameter for the negative binomial distribution |
A simulated frequency count table.
Amy Willis
Simulate a frequency count table based on a negative binomial model. Zero-truncated, obviously.
rztnbinomtable(C, size, probability)
rztnbinomtable(C, size, probability)
C |
species richness |
size |
size parameter for the negative binomial distribution |
probability |
probability parameter for the negative binomial distribution |
A simulated frequency count table.
Amy Willis
This function implements the plug-in Inverse Simpson diversity
sample_inverse_simpson(input_data)
sample_inverse_simpson(input_data)
input_data |
An input type that can be processed by |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
sample_inverse_simpson(apples)
sample_inverse_simpson(apples)
This function implements the sample richness estimate, which is the number of non-zero taxa per sample.
sample_richness(input_data)
sample_richness(input_data)
input_data |
An input type that can be processed by |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
sample_richness(apples)
sample_richness(apples)
This function implements the plug-in Shannon diversity
sample_shannon(input_data)
sample_shannon(input_data)
input_data |
An input type that can be processed by |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
sample_shannon(apples)
sample_shannon(apples)
This function implements the plug-in Shannon's E
sample_shannon_e(input_data)
sample_shannon_e(input_data)
input_data |
An input type that can be processed by |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
sample_shannon_e(apples)
sample_shannon_e(apples)
This function implements the plug-in Simpson diversity
sample_simpson(input_data)
sample_simpson(input_data)
input_data |
An input type that can be processed by |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
sample_simpson(apples)
sample_simpson(apples)
Estimate the sample size needed to do an unpaired one-way test using betta
sample_size_estimate( control_group_est, se_est, diff = 5, alpha = 0.05, prop = 0.8, samples = 20, precision = 5 )
sample_size_estimate( control_group_est, se_est, diff = 5, alpha = 0.05, prop = 0.8, samples = 20, precision = 5 )
control_group_est |
An estimate of the alpha diversity parameter for the control group |
se_est |
An estimate of the (common) standard deviation |
diff |
An estimate of the difference between the control and non-control groups |
alpha |
Minimum significance level desired |
prop |
Desired power |
samples |
Number of bootstrap resamples used to estimate the sample size. Increase for a more accurate estimate. |
precision |
How much to increment the sample size as we try to increase the power |
An estimate of the necessary sample size and some details
Plot the power obtained with sample size
sample_size_figure(control_group_est, se_est, diff = 5, samples = 20)
sample_size_figure(control_group_est, se_est, diff = 5, samples = 20)
control_group_est |
An estimate of the alpha diversity parameter for the control group |
se_est |
An estimate of the (common) standard deviation |
diff |
An estimate of the difference between the control and non-control groups |
samples |
Number of bootstrap resamples used to estimate the sample size. Increase for a more accurate estimate. |
A plot of the power with the sample size
Simulate from a fitted betta model
simulate_betta(fitted_betta, nsim)
simulate_betta(fitted_betta, nsim)
fitted_betta |
A fitted betta object |
nsim |
Number of times to simulate |
A list of length nsim, each element of which is a vector of simulated Y-values under the fitted betta model
Simulate from a fitted betta_random model
simulate_betta_random(fitted_betta, nsim)
simulate_betta_random(fitted_betta, nsim)
fitted_betta |
A fitted betta_random object |
nsim |
Number of times to simulate |
A list of length nsim, each element of which is a vector of simulated Y-values under the fitted betta model
A phyloseq object with an OTU table and sample data from a soil microbiome study.
soil_phylo
soil_phylo
A phyloseq-class experiment-level object with an OTU table and sample data.
OTU table with 7,770 taxa and 119 samples
taxonomy table
sample data with the following covariates:
Plants
, values 0
and 1
. Index for different plants
Day
, values 0
(initial sampling point), 1
(12 days after treatment additions), and 2
(82 days after treatment additions). Index for different days of measurement
Amdmt
, values 0
(no additions), 1
(biochar additions), and 2
(fresh biomass additions). Index for different soil additives.
DayAmdmt
, values 00
, 01
, 02
, 10
, 11
, 12
, 20
, 21
, and 22
. A single index for the combination of Day
and Amdmt
with Day
as the first digit and Amdmt
as the second digit.
ID
, values A
, B
, C
, D
, and F
. Index for different soil plots.
...
Whitman, T., Pepe-Ranney, C., Enders, A., Koechli, C., Campbell, A., Buckley, D. H., Lehmann, J. (2016). Dynamics of microbial community composi-tion and soil organic carbon mineralization in soil following addition of pyrogenic andfresh organic matter. The ISME journal, 10(12):2918. <doi: 10.1038/ismej.2016.68>.
This function performs an F-test of a null hypothesis LB = 0 where B is a vector of p fixed effects returned by betta() or betta_random() and L is an m x p matrix with linearly independent rows.
test_submodel( fitted_betta, submodel_formula, method = "bootstrap", nboot = 1000 )
test_submodel( fitted_betta, submodel_formula, method = "bootstrap", nboot = 1000 )
fitted_betta |
A fitted betta object – i.e., the output of either betta() or betta_random() – containing fixed effect estimates of interest. |
submodel_formula |
A formula defining which submodel to treat as the null. It is not necessary to include random effects in this formula (they will be ignored if included – the submodel will be fit with the same random effect structure as the full model regardless of input.) |
method |
A character variable indicating which method should be used to estimate the distribution of the test statistic under the null. |
nboot |
Number of bootstrap samples to use if method = "bootstrap". Ignored if method = "asymptotic". |
A list containing
pval |
The p-value |
F_stat |
The calculated F statistic |
boot_F |
A vector of bootstrapped F statistics if bootstrap has been used. Otherwise NULL. |
David Clausen
Willis, A., Bunge, J., and Whitman, T. (2015). Inference for changes in biodiversity. arXiv preprint.
# generate example data df <- data.frame(chats = c(2000, 3000, 4000, 3000, 2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180, 100, 200, 150, 180), Cont_var = c(100, 150, 100, 50, 100, 150, 100, 50), Cont_var_2 = c(50,200,25,125, 50,200,25,125)) # fit betta() example_fit <- betta(formula = chats ~ Cont_var + Cont_var_2, ses = ses, data = df) # construct L for hypothesis that B_cont_var = B_cont_var_2 = 0 L <- rbind(c(0,1,0), c(0,0,1)) F_test_results <- F_test(example_fit, L, nboot = 100)
# generate example data df <- data.frame(chats = c(2000, 3000, 4000, 3000, 2000, 3000, 4000, 3000), ses = c(100, 200, 150, 180, 100, 200, 150, 180), Cont_var = c(100, 150, 100, 50, 100, 150, 100, 50), Cont_var_2 = c(50,200,25,125, 50,200,25,125)) # fit betta() example_fit <- betta(formula = chats ~ Cont_var + Cont_var_2, ses = ses, data = df) # construct L for hypothesis that B_cont_var = B_cont_var_2 = 0 L <- rbind(c(0,1,0), c(0,0,1)) F_test_results <- F_test(example_fit, L, nboot = 100)
(Data) Data frame of covariate information about toy_otu_table.
toy_metadata
toy_metadata
A data frame with 143 rows and 4 variables:
Year of sampling
Did the sample correspond to a bloom event?
What season was sampled?
Where was the sample taken from?
...
Covariate info available in 'toy_metadata'. A data frame with 448 rows and 143 columns. Rows give the abundance of each taxon; columns give the samples
toy_otu_table
toy_otu_table
An object of class data.frame
with 448 rows and 143 columns.
(Data) The taxonomy of the OTUs in 'toy_otu_table'.
toy_taxonomy
toy_taxonomy
An object of class factor
of length 448.
Calculate the true Gini-Simpson index
true_gini(input)
true_gini(input)
input |
A vector of proportions. |
The Gini-Simpson index of the population given by input.
This function is intended for population-level data. If you are dealing with a microbial sample, use DivNet instead.
Calculate the true Hill numbers
true_hill(input, q)
true_hill(input, q)
input |
A vector of proportions. |
q |
The Hill number of interest. q = 0 corresponds to species richness, q = 2 corresponds to inverse Simpson, etc. |
The Hill number of the population given by input.
Calculate the true Inverse Simpson index
true_inverse_simpson(input)
true_inverse_simpson(input)
input |
A vector of proportions. |
The inverse-Simpson index of the population given by input.
This function is intended for population-level data. If you are dealing with a microbial sample, use DivNet instead.
Calculate the true Shannon index based on proportions
true_shannon(input)
true_shannon(input)
input |
A vector of proportions. |
The Shannon index of the population given by input.
This function is intended for population-level data. If you are dealing with a microbial sample, use DivNet instead.
Calculate the true Shannon's equitability index
true_shannon_e(input)
true_shannon_e(input)
input |
A vector of proportions. |
The Shannon E's of the population given by input.
This function is intended for population-level data. If you are dealing with a microbial sample, use DivNet instead.
Calculate the true Simpson index
true_simpson(input)
true_simpson(input)
input |
A vector of proportions. |
The Simpson index of the population given by input.
This function is intended for population-level data. If you are dealing with a microbial sample, use DivNet instead.
This function implements the transformed version of the species richness estimation procedure outlined in Rocchetti, Bunge and Bohning (2011).
wlrm_transformed( input_data, cutoff = NA, print = NULL, plot = NULL, answers = NULL )
wlrm_transformed( input_data, cutoff = NA, print = NULL, plot = NULL, answers = NULL )
input_data |
An input type that can be processed by |
cutoff |
Maximum frequency count to use |
print |
Deprecated; only for backwards compatibility |
plot |
Deprecated; only for backwards compatibility |
answers |
Deprecated; only for backwards compatibility |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
While robust to many different structures, model is almost always misspecified. The result is usually implausible diversity estimates with artificially low standard errors. Extreme caution is advised.
Amy Willis
Rocchetti, I., Bunge, J. and Bohning, D. (2011). Population size estimation based upon ratios of recapture probabilities. Annals of Applied Statistics, 5.
breakaway
; apples
;
wlrm_untransformed
wlrm_transformed(apples) wlrm_transformed(apples, plot = FALSE, print = FALSE, answers = TRUE)
wlrm_transformed(apples) wlrm_transformed(apples, plot = FALSE, print = FALSE, answers = TRUE)
This function implements the untransformed version of the species richness estimation procedure outlined in Rocchetti, Bunge and Bohning (2011).
wlrm_untransformed( input_data, cutoff = NA, print = NULL, plot = NULL, answers = NULL )
wlrm_untransformed( input_data, cutoff = NA, print = NULL, plot = NULL, answers = NULL )
input_data |
An input type that can be processed by |
cutoff |
Maximum frequency count to use |
print |
Deprecated; only for backwards compatibility |
plot |
Deprecated; only for backwards compatibility |
answers |
Deprecated; only for backwards compatibility |
An object of class alpha_estimate
, or alpha_estimates
for phyloseq
objects
This estimator is based on the negative binomial model and for that reason generally produces poor fits to microbial data. The result is usually artificially low standard errors. Caution is advised.
Amy Willis
Rocchetti, I., Bunge, J. and Bohning, D. (2011). Population size estimation based upon ratios of recapture probabilities. Annals of Applied Statistics, 5.
breakaway
; apples
;
wlrm_transformed
wlrm_untransformed(apples)
wlrm_untransformed(apples)