Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

Fixed effects model without time?

$
0
0

I have a problem with replicating a fixed effects estimations with some kind of panel data structure (but no time index).

I've seen several good explanations for FE-models and some easy application in R. But I'm working at some paper which does not include a time-index but three different indices (person, village, block). Whereas block is the FE (some administrative unit).

Here is what the authors do (FE-estimation):

enter image description here

Here is some of their results:

enter image description here

Question: I would like to replicate, say, column 3, the coefficient and the robust SE.

(Link to the data: https://www.aeaweb.org/articles?id=10.1257/aer.20150474 )

My approach so far:

To get an idea:

# making up some data
person_id <- c(1,3,4,5,7,8)
person_id <- as.integer(person_id)  # integer
village_id <- c(1,1,1,2,2,2) 
village_id <- as.integer(village_id) # # integer
block <- c("a","a","b","b","c","c") # character
block <- as.factor(block) # factor
treat <- c(0,1,1,0,1,0) # numeric
treat <- as.integer(treat) # integer
outcome <- c(13,7,8,22,91,2) # numeric

# combining data
df <- cbind(person_id, village_id, block, outcome, treat)
df <- as.data.frame(df)

# converting data, not really necessary
pdata <- plm.data(df, index=c("person_id", "village_id"))

# just for comparison
lm(outcome ~ treat, data=df) # no problem
lm(outcome ~ treat + block, data=df) # no problem 

# using panel data structure, error: empty model
FE <- plm(outcome ~ treat, data=pdata, method="within") 
# alternative, , error: empty model
FE <- plm(outcome ~ treat, data=pdata, method="within", index=c("person_id", "village_id")) 

It's not possible to just create panel data with 3 indices like in pdata <- plm.data(df, index=c("person_id", "village_id", "block")) but I can't tell the reason. Still it seems that R interprets those indices as "time".

I managed to set up a pooling-model (this yields the perfect coefficient, don't know why, I would like a within-model):

pooling<- plm(DV_dap ~ gotminikit + paddyarea + block, data=r_farmlevel_year2, model="pooling", index=c("farmer_id", "village_id")) # coef 393.768 fits!

and adjusted the calculation of robust SE (just trial and error):

coeftest(pooling, vcov=pvcovHC(pooling, method="arellano", cluster="time", type="HC0")) # 135.377
coeftest(pooling, vcov=pvcovHC(pooling, method="arellano", cluster="time", type="HC1")) # 135.927
coeftest(pooling, vcov=pvcovHC(pooling, method="arellano", cluster="time", type="HC2")) # 136.705 - pretty close!
coeftest(pooling, vcov=pvcovHC(pooling, method="arellano", cluster="time", type="HC3")) # 138.087

I don't have enough econometric background to decide between those ways of SE-calculation. But none of them results in exactly the given number of 136.410.

A linear model (as suggested) get's me very close to the results, but doesn't yield a perfect match:

lmodel <- lm(DV_dap ~ r_farmlevel_year2$gotminikit + r_farmlevel_year2$paddyarea + r_farmlevel_year2$block) 
coeftest(lmodel , vcov = sandwich) # coef 393.768  SE  137.775

I would appreciate any hints :)


Viewing all articles
Browse latest Browse all 17268

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>