I am basically trying to create a 2 stage bootstrap where clusters are sampled first and then households within the clusters are sampled and calculations are one on the newly sampled dataset. I want to repeat this process 1000 times. My code does the sampling, but the "replicate" is not resampling - meaning I am just getting the same results 1000 times rather than a slightly different results each time. Am I missing something with the replicate command? or is there a more appropriate command I could use?
require(sampling)
###pull dataset into R
population<- read.csv("DATA/TZA_population.csv")
df <- read.csv ("DATA/TZA_Monduli_tt_true.csv")
###create a function
library(data.table)
simulate <- function(tt_prev) {
###create cluster list
cluster <-1:60
###create cluster subset
selected_clusters <- c(sample(cluster,size = 30, replace = FALSE))
###using selected clusters create subset of households
cluster_subset <- subset(df,df$cluster_cluster %in% selected_clusters)
cluster_subset <- cluster_subset[order(cluster_subset$cluster_cluster,cluster_subset$household_id),]
#create list of unique household_ids from the selected clusters
list<- subset(cluster_subset,select = c(cluster_cluster,household_id))
list<- unique(list)
order(list$cluster_cluster,list$household_id)
#randomly select the households within the selected clusters
library(plyr)
household_subset <-ddply(list,.(cluster_cluster),function(x) x[sample(nrow(x),20),])
#subset from the original dataset using the selected households and clusters
clean <- subset(df,df$household_id %in% household_subset$household_id)
#calculate prevalence
source("TT_Age_Sex_Simulation_40pls.R")
}
r<- replicate(1000,simulate(),simplify = FALSE)
data.frame(r)