关于 psmatch2 与 teffects psmatch 估计结果差异的一个原因
发布于 2021-01-13 23:58
关于 psmatch2
与 teffects psmatch
估计结果差异的一个原因
关于具体 PSM 方法的原理,我不做过多阐述,这里我仅讨论teffects psmatch
和psmatch2
在stata
中的估计结果不相同的一个原因。
stata15
及之后的版本中有个teffects
模块,PSM 方法也可以用其实现,一般的psmatch2
命令用来做 psm 是比较多的,但,psmatch2
对标准差的估计是有问题的,其报告结果的时候都会提示Note: S.E. does not take into account that the propensity score is estimated.
,而teffects psmath
的标准差你大可以放心。
这篇文章Propensity Score Matching in Stata using teffects (连接:https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm)关于psmatch2
和teffects psmatch
的讲解是比较详细的,该文中也指出对命令选项进行调整,理应可以获得同样的系数,psmatch2 t x1 x2, out(y) logit ate
和teffects psmatch (y) (t x1 x2), atet
应该可以获得同样的ATT。而,部分学者使用psmatch2
和teffects psmatch
命令对同一个数据进行估计时,往往却发现两个命令的估计结果不相同,甚至结论完全相反。
至于为什么会导致这种情况发生,原因在于psmatch2
在最近邻匹配时,如果多个控制组个体与干预组个体具有相同的最近距离,那么不加ties
选项的psmatch2
将会选择最先遇到的控制组个体作为匹配,因此,样本的顺序会影响匹配样本,而影响估计结果,如果加了ties
选项,将会用到所有相同最近距离的控制组个体的平均结果作为干预组个体的匹配,而teffects psmatch
则是采用后者的方法。
故,当你使用两个命令却发现获得不同的结果时,例,att 与teffects psmatch (y) (treat xlist), atet
相差很大时,你应当检查你的psmatch2 treat xlist,out(y) logit ate
是否有ties
选项,这个可能是系数差异的一个可能原因。
clear all
frames create data
frames change data
webuse cattaneo2
frames copy data frames1
frames change frames1
sum bweight mbsmoke mmarried c.mage##c.mage fbaby medu
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
bweight | 4,642 3361.68 578.8196 340 5500
mbsmoke | 4,642 .1861267 .3892508 0 1
mmarried | 4,642 .6996984 .4584385 0 1
mage | 4,642 26.50452 5.619026 13 45
|
c.mage#|
c.mage | 4,642 734.0564 305.2242 169 2025
-------------+---------------------------------------------------------
|
fbaby | 4,642 .4379578 .4961893 0 1
medu | 4,642 12.68957 2.520661 0 17
1.Logit
logit mbsmoke mmarried c.mage##c.mage fbaby medu
Iteration 0: log likelihood = -2230.7484
Iteration 1: log likelihood = -2053.769
Iteration 2: log likelihood = -2043.2897
Iteration 3: log likelihood = -2043.2504
Iteration 4: log likelihood = -2043.2504
Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
predict pr_
(option pr assumed; Pr(mbsmoke))
2.psmatch2
2.1 using pscore()
psmatch2 mbsmoke, out(bweight) pscore(pr_) ate logit
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3334.84259 -197.18287 55.6185293 -3.55
ATU | 3412.91159 3164.00185 -248.909741 . .
ATE | -239.281991 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642
2.2 general
psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)
Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3334.84259 -197.18287 55.6185293 -3.55
ATU | 3412.91159 3164.00185 -248.909741 . .
ATE | -239.281991 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642
3. teffects psmatch
teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),atet nn(1)
Treatment-effects estimation Number of obs = 4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
----------------------------------------------------------------------------------------
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
----------------------------------------------------------------------------------------
两个命令的估计结果不同?
4. 计算结果不同?
frames create frame2
frames change frame2
use http://ssc.wisc.edu/sscc/pubs/files/psm
sum
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
x1 | 1,000 -.012963 1.000053 -3.6593 3.084742
x2 | 1,000 -.0246025 1.034555 -3.363018 3.399474
t | 1,000 .333 .4715224 0 1
y | 1,000 .3474242 1.957462 -5.494524 6.873514
psmatch2 t x1 x2, out(y) logit ate
Logistic regression Number of obs = 1,000
LR chi2(2) = 222.78
Prob > chi2 = 0.0000
Log likelihood = -524.89072 Pseudo R2 = 0.1751
------------------------------------------------------------------------------
t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .9068298 .0885341 10.24 0.000 .7333062 1.080353
x2 | .8100408 .0816962 9.92 0.000 .6499192 .9701624
_cons | -.8528442 .0788823 -10.81 0.000 -1.007451 -.6982378
------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
y Unmatched | 1.8910736 -.423243358 2.31431696 .109094342 21.21
ATT | 1.8910736 .930722886 .960350715 .168252917 5.71
ATU |-.423243358 .625587554 1.04883091 . .
ATE | 1.01936701 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 667 | 667
Treated | 333 | 333
-----------+-----------+----------
Total | 1,000 | 1,000
teffects psmatch (y) (t x1 x2), atet
Treatment-effects estimation Number of obs = 1,000
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 1
------------------------------------------------------------------------------
| AI Robust
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATET |
t |
(1 vs 0) | .9603507 .1204748 7.97 0.000 .7242245 1.196477
------------------------------------------------------------------------------
两个命令的估计结果又相同了?
5. 差异以及潜在的原因
两个命令的估计结果为何有时相同有时不同?
区别:
Matches:max=74
可能的原因:
psmatch2 直接匹配的第一个,即使在有相同距离的其他个体存在情况下;
猜想:
而 teffects psmatch 将匹配到的最近距离的所有个体计算,即,存在1:1匹配,但是某一个样本与其他多个样本的距离相同
6. 检验
frame copy data simu,replace
frame change simu
(note: frame simu not found)
6.1 获取匹配得分并排序
logit mbsmoke mmarried c.mage##c.mage fbaby medu
Iteration 0: log likelihood = -2230.7484
Iteration 1: log likelihood = -2053.769
Iteration 2: log likelihood = -2043.2897
Iteration 3: log likelihood = -2043.2504
Iteration 4: log likelihood = -2043.2504
Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
cap drop pr_
predict pr_
(option pr assumed; Pr(mbsmoke))
sort pr_ mbsmoke
gen id = _n
6.2 两个命令的ATT计算结果
teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet
//teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) ate
Treatment-effects estimation Number of obs = 4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
----------------------------------------------------------------------------------------
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
----------------------------------------------------------------------------------------
psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1) //ATT = -248.515046
Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3386.17477 -248.515046 54.7442913 -4.54
ATU | 3412.91159 3166.47327 -246.438327 . .
ATE | -246.82486 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642
6.3 获取 teffect psmatch 的匹配信息
teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet gen(match)
// -236.78475
Treatment-effects estimation Number of obs = 4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
----------------------------------------------------------------------------------------
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
----------------------------------------------------------------------------------------
6.4 保留处理组并合并匹配组的数据
6.4.1 保留处理组数据
frame copy simu simu2
frame change simu2
keep id bweight mbsmoke match*
keep if mbsmoke == 1
(3,778 observations deleted)
6.4.2 生成匹配对应表
reshape long match,i(id) j(match_id)
keep if match != .
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
> 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 864 -> 63936
Number of variables 77 -> 5
j variable (74 values) -> match_id
xij variables:
match1 match2 ... match74 -> match
-----------------------------------------------------------------------------
6.4.3 建立frames 连接
frlink m:1 match,frame(simu id) gen(simu_21)
(all observations in frame simu2 matched)
6.4.4 获取匹配组结果变量
frget bweight ,from(simu_21) pre(_0)
(1 variable copied from linked frame)
6.5 计算单个匹配的处置效应
gen att = bweight - _0bweight
sum att
return list
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
att | 15,721 -238.8935 796.7148 -4082 3884
scalars:
r(N) = 15721
r(sum_w) = 15721
r(mean) = -238.8935182240315
r(Var) = 634754.4716125642
r(sd) = 796.7147994185649
r(min) = -4082
r(max) = 3884
r(sum) = -3755645
6.6 计算样本加权均值
bysort id:gen num = _N
sum att [aweight = 1/num]
return list
Variable | Obs Weight Mean Std. Dev. Min Max
-------------+-----------------------------------------------------------------
att | 15,721 864 -236.7848 808.3941 -4082 3884
scalars:
r(N) = 15721
r(sum_w) = 864
r(mean) = -236.7847508257834
r(Var) = 653501.0708042918
r(sd) = 808.394130857153
r(min) = -4082
r(max) = 3884
r(sum) = -204582.0247134768
frames simu : teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet
Treatment-effects estimation Number of obs = 4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
----------------------------------------------------------------------------------------
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
----------------------------------------------------------------------------------------
frames simu : psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1) ties
Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3374.44447 -236.784751 26.0535546 -9.09
ATU | 3412.91159 3207.84728 -205.064318 . .
ATE | -210.968337 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642
6.7 计算匹配第一个的组的均值
sum att if match_id == 1
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
att | 864 -248.515 817.1815 -3403 2750
return list
scalars:
r(sum) = -214717
r(max) = 2750
r(min) = -3403
r(sd) = 817.1815138978078
r(Var) = 667785.6266563131
r(mean) = -248.5150462962963
r(sum_w) = 864
r(N) = 864
frames simu : psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)
Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841
-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3386.17477 -248.515046 54.7442913 -4.54
ATU | 3412.91159 3166.47327 -246.438327 . .
ATE | -246.82486 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642
本文来自网络或网友投稿,如有侵犯您的权益,请发邮件至:aisoutu@outlook.com 我们将第一时间删除。
相关素材