version 9 capture log close log using leed.log, replace /* --------------------------------------------------------------------------- Illustrative do-file for estimation of linked employer-employee models Martyn Andrews (University of Manchester, UK) Thorsten Schank (Universität Erlangen-Nürnberg) Richard Upward (University of Nottingham, UK) July 2005 Notation i index for individual j index for plant t index for time period (year) x data on individuals, time-varying w data on plants, time-varying u data on individuals, fixed over time q data on plants, fixed over time y dependent variable (e.g. log daily wages) J number of plants K number of covariates (excluding plant dummies) G number of "groups" Nstar number of individual-years (rows of data) N number of individuals bar within-i mean sbar within-s mean jbar within-j mean tilde time-demeaned variable (within individual) stilde time-demaned variable (within spell) B full vector of estimated coefficients (K+J) BETA vector of coefficients on observable covariates, excluding plant dummies (K) PSI vector of coefficients on plant dummies (J) V full covariance matrix (K+J,K+J) DELTA matrix of (BETA,PSI) from CMD method psi variable containing values from PSI vector theta variable containing estimated individual effects 1 Rows belonging to individuals who change plant ("movers") 2 Rows belonging to non-movers References are to Andrews, Upward & Schank (2005) unless otherwise stated --------------------------------------------------------------------------- */ clear set mem 64m set matsize 1000 set more off *--------------------------------------------------------------------------- * example.dta is a simple simulated linked employer-employee dataset * See Andrews, Schank & Upward (2004b) for details use example.dta *--------------------------------------------------------------------------- *--------------------------------------------------------------------------- * Pooled OLS model (p.10) regress y u x q w *--------------------------------------------------------------------------- *--------------------------------------------------------------------------- * One-way fixed-effect models (p.11) xtreg y u x q w, fe i(i) xtreg y u x q w, fe i(j) *--------------------------------------------------------------------------- *--------------------------------------------------------------------------- * Spell fixed effects (Section 3.1) egen s = group(i j) xtreg y u x q w, fe i(s) *--------------------------------------------------------------------------- *--------------------------------------------------------------------------- * CMD method (Section 3.3) clear use example.dta sort i j by i: gen byte mover = j[1]!=j[_N] egen plantin = sum(mover), by(j) save cmd, replace * Movers' regression (p.13) keep if mover==1 quietly tabulate j, generate(F_) local J1 = r(r) xtreg y x w F_1-F_`J1', fe i(i) matrix B1 = e(b)' matrix B1 = B1["x".."F_`J1'",1] matrix V1 = e(V) matrix V1 = V1["x".."F_`J1'","x".."F_`J1'"] * Non-movers' regression (p.13) use cmd if mover==0, clear xtreg y x w, fe i(i) matrix BETA2 = e(b)' matrix BETA2 = BETA2["x".."w",1] local K = rowsof(BETA2) matrix V2 = e(V) matrix V2 = V2["x".."w","x".."w"] * Pad out the non-mover matrices with zeros (pp.13-14) matrix B2 = BETA2\J(`J1',1,0) matrix V2inv = J(`J1'+`K',`J1'+`K',0) matrix V2inv[1,1] = syminv(V2) * Equation (10) matrix DELTA = syminv(syminv(V1)+V2inv)*((syminv(V1)*B1)+(V2inv*B2)) * Equation (11) matrix VARDELTA = syminv(syminv(V1)+V2inv) * Display pooled estimates local rownames: rownames B1 matrix rownames DELTA = `rownames' matrix rownames VARDELTA = `rownames' matrix colnames VARDELTA = `rownames' matrix DELTA = DELTA' ereturn post DELTA VARDELTA ereturn display matrix DELTA = e(b)' matrix VARDELTA = e(V) matrix BETA = DELTA["x".."w",1] matrix PSI = DELTA["F_1".."F_`J1'",1] * Chi-2 statistic (Equation 12) matrix CHI2 = ((B1-DELTA)'*syminv(V1)*(B1-DELTA) + (BETA2-BETA)'*V2inv[`K',`K']*(BETA2-BETA)) display as text "Chi^2 statistic: " el(CHI2,1,1) display as text "P-value: " chi2tail(`K',el(CHI2,1,1)) * Restore the whole sample to compute correlations use cmd, clear * Generate estimates of psi using Equation (13) egen j1 = group(j) if plantin>0 generate psi=. forvalues j=1(1)`J1' { quietly replace psi = PSI[`j',1] if j1==`j' } assert psi==. if plantin==0 * Calculate groups (page 15) grouping g, ivar(i) jvar(j) * Normalise psi within groups (page 16) egen psigbar = mean(psi), by(g) replace psi = psi-psigbar * Calculate theta for each individual (Equation 14) matrix x = DELTA["x".."w",1]' matrix score xb=x gen theta_it = y - xb - psi egen theta = mean(theta_it), by(i) * Equation (15) regress theta u if t==1, robust * Equation (16) bysort j: gen n=_n regress psi q if n==1, robust * Correlations between the components summarize theta psi correlate theta psi egen psibar = mean(psi), by(i) correlate theta psibar if t==1 egen psijbar=mean(psi), by(j) egen thetajbar=mean(theta), by(j) correlate thetajbar psijbar if n==1 log close exit --------------------------------------------------------------------------- References Abowd, J., Creecy, R. and Kramarz, F. (2002) "Computing person and firm effects using linked longitudinal employer-employee data" US Census Bureau Technical Paper 2002-06. Abowd, J., Kramarz, F. and Margolis, D. (1999) "High wage workers and high wage firms" Econometrica vol.67 pp.251--333 Andrews, M., Schank, T. and Upward, R. (2005) "Practical fixed effects estimation methods for the three-way error components model" http://www.nottingham.ac.uk/economics/staff/details/richard_upward.html Andrews, M., Schank, T. and Upward, R. (2004b) "High wage workers and low wage firms? Negative assortative matching or statistical artefact?" http://www.nottingham.ac.uk/economics/staff/details/richard_upward.html ---------------------------------------------------------------------------