比如我們以Abadie et al.(2010)文中加州控?zé)煹慕?jīng)典例子,作者用38個(gè)州作為加州潛在的控制組,估計(jì)出合成加州,從而得到加州控?zé)煼ò傅挠绊?。在進(jìn)行假設(shè)檢驗(yàn)的時(shí)候,分別用38個(gè)控制州作為偽干預(yù)州,其余州作為控制州,進(jìn)行同樣的合成,從而可以得到偽干預(yù)州的因果效應(yīng),而這些州事實(shí)上沒有任何政策干預(yù),因而,得到的因果效應(yīng)路徑反映的就是沒有干預(yù)時(shí),可能看到的分布。因?yàn)?,在進(jìn)行合成時(shí),是利用干預(yù)前的數(shù)據(jù)進(jìn)行合成的,有些州合成的效應(yīng)會(huì)比較差,即事前合成的與實(shí)際的有較大的偏差,作者是事前的均方預(yù)測(cè)誤差(MSPE)在作為判斷依據(jù),它越小,說明事前擬合的越好,事前擬合的好,我們才能對(duì)事后預(yù)測(cè)有比較大的信息。為此,作者通過限制MSPE不超過加州MSPE的多少倍來進(jìn)行控制,比如作者通過限制偽干預(yù)州MSPE不超過加州的MPSE的20倍、10倍、5倍、2倍等,作者畫出了相應(yīng)的圖形(有興趣的讀者可查原文),下圖我將MSPE限制為加州的2倍對(duì)應(yīng)的圖,此時(shí),只保留了13個(gè)控制州,事前擬合的比較好,即事前干預(yù)效應(yīng)基本接近于零。事后,可以看到加州是在最邊界上,從而證明加州的效應(yīng)是顯著的,不是隨機(jī)產(chǎn)生的。
*====================================== *合成控制法假設(shè)檢驗(yàn)(Placebo test and draw graph)* *====================================== set more off use smoking,clear tsset state year * 獲取一些后文要用的參數(shù) qui tab state local n = r(r) // 州數(shù) qui tab year local n_year = r(r) // 年份數(shù)
*====================================== *根據(jù)個(gè)人研究需要,調(diào)整這些參數(shù) *====================================== local date_t = '1989' // 干預(yù)時(shí)間點(diǎn) local m = 2 // 限制MSPE為干預(yù)州MSPE的m倍,m=0表示無限制 *local slow = 'nested' // 取消*使用nested選項(xiàng),計(jì)算量大,擬合更好 local id_t=3 // 干預(yù)州的id或行號(hào) local treat_name ='California' // 圖中顯示的干預(yù)組名稱 local ctrl_name='Control States' // 圖中顯示的控制組名稱 local xtitle 'year' // 橫軸變量名稱 local ytitle 'gap in per-capita cigarette sales (in packs)' //縱軸變量名稱 local saving 'syn_plot' //保存安慰劑檢驗(yàn)圖 *======================================
use tmp`i',clear keep _Y_treated _Y_synthetic _time gen te = _Y_treated- _Y_synthetic gen id = `i' keep in 1/`n_year' //1970-2000, there are 31 years, which is keep in the first 31 obs. gen te2 = te*te // use it to calculate MSPE local n_before = `date_t' - _time[1] //取干預(yù)期之前對(duì)應(yīng)位置或序號(hào) local n_after = `n_before' + 1 //干預(yù)期起點(diǎn) qui sum te2 in 1/`n_before' // MSPE local mspe_pre = r(mean) // 干預(yù)前的MSPE qui sum te2 in `n_after'/`n_year' local mspe_post = r(mean) // 干預(yù)后的MSPE local r = `mspe_post'/`mspe_pre' //計(jì)算Abadie-R統(tǒng)計(jì)量
matrix `resmat' = nullmat(`resmat')\(`rmspe', `mspe_pre', `mspe_post', `r') //resmat saves the RMSPE for each model local names `'`names'`'`i''''' // names of each
save tmp`i', replace
use smoking,clear tsset state year } mat colnames `resmat' = 'RMSPE' 'MSPE_pre' 'MSPE_post' 'Abadie_R' mat rownames `resmat' = `names' matlist `resmat', row('Treated Unit')
*Placebo Graphs - Draw Figure 3 *Get the RMSPE of the treated unit
local RMSPE_t=`resmat'[`id_t',1] use tmp1, clear local num = 0 // # of units includes in the graph forvalues i=2/`n' { if `m'==0 { append using tmp`i' local num = `num' + 1 } else if `resmat'[`i',1]^2<=`m'*`RMSPE_t'^2 { // MSPE comparation append using tmp`i' local num = `num' + 1 } }
*====================================== *畫安慰劑圖1
local s='' // string to store the graph command local controls = '' //string to store the id of control units used local num_t = `num'+1 // # postion to identify the treated unit
levelsof id, local(levels) foreach l of local levels { if `l'!=`id_t' { local s = '`s''+'(line te _time if id==`l', lc(gs13))' local controls = '`controls''+' '+'`l'' } }
local date_before = `date_t'-1 two `s'(line te _time if id==`id_t', lc(black)), /// legend(order(`num_t' '`treat_name'' `num' '`ctrl_name'') cols(1) pos(11) ring(0)) xline(`date_before', lp(dot) lc(black)) yline(0, lp(dash) lc(black)) /// xlabel(1970(5)2000) xtitle('`xtitle'') ytitle('`ytitle'') saving(`saving'_`m', replace)
di '# of controls after limit `m' times of RMSPE of treated unit: ' `num' //顯示保留的控制組數(shù)量 di 'ID of controls:' '`controls'' //顯示保留的控制組id或序號(hào)
*====================================== *畫出Abadie-R統(tǒng)計(jì)量分布圖,Abadie et al. (2010) *====================================== clear svmat `resmat', names(col) save tmp_R, replace //unstar this line if you want to save the file
更詳細(xì)的介紹可以參考本人編寫的教材MUSE,另推薦你讀一下Abadie 2020發(fā)在JEL上綜述性文章Using synthetic controls: feasibility, data requirements, and methodological aspects,這是SCM的創(chuàng)立者Abadie講的如何使用SCM方法。
有基礎(chǔ)之后,可以學(xué)習(xí)Imbens and Rubin (2015)的Introduction to causal inference, Angrist and Pishcke (2009) Mostly Harmless Econometrics(MHE),但是MHE需要有統(tǒng)計(jì)推斷的知識(shí),所以看MHE之前最好對(duì)統(tǒng)計(jì)推斷或傳統(tǒng)計(jì)量經(jīng)濟(jì)學(xué)的理論有所了解。
經(jīng)典的兩本教材是Wooldridge的Introductory econometrics: A modern approach,和Stock and Watson的Introduction to econometrics,中文都叫《計(jì)量經(jīng)濟(jì)學(xué)入門》,它們的內(nèi)容均包括基本的經(jīng)典線性回歸模型、時(shí)間序列模型、工具變量法等,Stock and Watson還引入了關(guān)于實(shí)驗(yàn)和自然實(shí)驗(yàn)以及大數(shù)據(jù)和機(jī)器學(xué)習(xí)的介紹,相對(duì)更新一些。對(duì)于做實(shí)證分析的應(yīng)用學(xué)者而言,這兩本的內(nèi)容就差不多了,盡管一般把它們看作本科生教材。
更高級(jí)一些的經(jīng)典教材有Hayashi (2000), Econometrics, Wooldridge (2010), Econometric analysis of cross section and panel data,微觀計(jì)量經(jīng)濟(jì)學(xué)Cameron and Trivedi (2005) Microeconometrics: Methods and applications