关于 psmatch2 与 teffects psmatch 估计结果差异的一个原因

发布于 2021-01-13 23:58

关于 psmatch2teffects psmatch 估计结果差异的一个原因

关于具体 PSM 方法的原理,我不做过多阐述,这里我仅讨论teffects psmatchpsmatch2stata中的估计结果不相同的一个原因。

stata15及之后的版本中有个teffects模块,PSM 方法也可以用其实现,一般的psmatch2命令用来做 psm 是比较多的,但,psmatch2对标准差的估计是有问题的,其报告结果的时候都会提示Note: S.E. does not take into account that the propensity score is estimated.,而teffects psmath的标准差你大可以放心。

这篇文章Propensity Score Matching in Stata using teffects (连接:https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm)关于psmatch2teffects psmatch的讲解是比较详细的,该文中也指出对命令选项进行调整,理应可以获得同样的系数,psmatch2 t x1 x2, out(y) logit ateteffects psmatch (y) (t x1 x2), atet应该可以获得同样的ATT。而,部分学者使用psmatch2teffects psmatch命令对同一个数据进行估计时,往往却发现两个命令的估计结果不相同,甚至结论完全相反。

至于为什么会导致这种情况发生,原因在于psmatch2在最近邻匹配时,如果多个控制组个体与干预组个体具有相同的最近距离,那么不加ties选项的psmatch2将会选择最先遇到的控制组个体作为匹配,因此,样本的顺序会影响匹配样本,而影响估计结果,如果加了ties选项,将会用到所有相同最近距离的控制组个体的平均结果作为干预组个体的匹配,而teffects psmatch则是采用后者的方法。

故,当你使用两个命令却发现获得不同的结果时,例,att 与teffects psmatch (y) (treat xlist), atet相差很大时,你应当检查你的psmatch2 treat xlist,out(y) logit ate是否有ties选项,这个可能是系数差异的一个可能原因。

clear all

frames create data
frames change data

webuse cattaneo2
frames copy data frames1
frames change frames1

sum bweight mbsmoke mmarried c.mage##c.mage fbaby medu
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bweight | 4,642 3361.68 578.8196 340 5500
mbsmoke | 4,642 .1861267 .3892508 0 1
mmarried | 4,642 .6996984 .4584385 0 1
mage | 4,642 26.50452 5.619026 13 45
|
c.mage#|
c.mage | 4,642 734.0564 305.2242 169 2025
-------------+---------------------------------------------------------
|
fbaby | 4,642 .4379578 .4961893 0 1
medu | 4,642 12.68957 2.520661 0 17

1.Logit

logit mbsmoke mmarried c.mage##c.mage fbaby medu
Iteration 0:   log likelihood = -2230.7484  
Iteration 1: log likelihood = -2053.769
Iteration 2: log likelihood = -2043.2897
Iteration 3: log likelihood = -2043.2504
Iteration 4: log likelihood = -2043.2504

Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841

-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
predict pr_
(option pr assumed; Pr(mbsmoke))

2.psmatch2

2.1 using pscore()

psmatch2 mbsmoke, out(bweight) pscore(pr_) ate logit
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3334.84259 -197.18287 55.6185293 -3.55
ATU | 3412.91159 3164.00185 -248.909741 . .
ATE | -239.281991 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642

2.2 general

psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)
Logistic regression                             Number of obs     =      4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841

-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3334.84259 -197.18287 55.6185293 -3.55
ATU | 3412.91159 3164.00185 -248.909741 . .
ATE | -239.281991 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642

3. teffects psmatch

teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),atet nn(1)
Treatment-effects estimation                   Number of obs      =      4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
----------------------------------------------------------------------------------------
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
----------------------------------------------------------------------------------------

两个命令的估计结果不同?

4. 计算结果不同?

frames create frame2
frames change frame2
use http://ssc.wisc.edu/sscc/pubs/files/psm
sum
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
x1 | 1,000 -.012963 1.000053 -3.6593 3.084742
x2 | 1,000 -.0246025 1.034555 -3.363018 3.399474
t | 1,000 .333 .4715224 0 1
y | 1,000 .3474242 1.957462 -5.494524 6.873514
psmatch2 t x1 x2, out(y) logit ate
Logistic regression                             Number of obs     =      1,000
LR chi2(2) = 222.78
Prob > chi2 = 0.0000
Log likelihood = -524.89072 Pseudo R2 = 0.1751

------------------------------------------------------------------------------
t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .9068298 .0885341 10.24 0.000 .7333062 1.080353
x2 | .8100408 .0816962 9.92 0.000 .6499192 .9701624
_cons | -.8528442 .0788823 -10.81 0.000 -1.007451 -.6982378
------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
y Unmatched | 1.8910736 -.423243358 2.31431696 .109094342 21.21
ATT | 1.8910736 .930722886 .960350715 .168252917 5.71
ATU |-.423243358 .625587554 1.04883091 . .
ATE | 1.01936701 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 667 | 667
Treated | 333 | 333
-----------+-----------+----------
Total | 1,000 | 1,000
teffects psmatch (y) (t x1 x2), atet
Treatment-effects estimation                   Number of obs      =      1,000
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 1
------------------------------------------------------------------------------
| AI Robust
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATET |
t |
(1 vs 0) | .9603507 .1204748 7.97 0.000 .7242245 1.196477
------------------------------------------------------------------------------

两个命令的估计结果又相同了?

5. 差异以及潜在的原因

两个命令的估计结果为何有时相同有时不同?

区别:

  • Matches:max=74

可能的原因:

  • psmatch2 直接匹配的第一个,即使在有相同距离的其他个体存在情况下;

猜想:

  • 而 teffects psmatch 将匹配到的最近距离的所有个体计算,即,存在1:1匹配,但是某一个样本与其他多个样本的距离相同

6. 检验

frame copy data simu,replace
frame change simu
(note: frame simu not found)

6.1 获取匹配得分并排序

logit mbsmoke mmarried c.mage##c.mage fbaby medu
Iteration 0:   log likelihood = -2230.7484  
Iteration 1: log likelihood = -2053.769
Iteration 2: log likelihood = -2043.2897
Iteration 3: log likelihood = -2043.2504
Iteration 4: log likelihood = -2043.2504

Logistic regression Number of obs = 4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841

-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
cap drop pr_
predict pr_
(option pr assumed; Pr(mbsmoke))
sort pr_ mbsmoke
gen id = _n

6.2 两个命令的ATT计算结果

teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet
//teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) ate
Treatment-effects estimation                   Number of obs      =      4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
----------------------------------------------------------------------------------------
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
----------------------------------------------------------------------------------------
psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1) //ATT = -248.515046
Logistic regression                             Number of obs     =      4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841

-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3386.17477 -248.515046 54.7442913 -4.54
ATU | 3412.91159 3166.47327 -246.438327 . .
ATE | -246.82486 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642

6.3 获取 teffect psmatch 的匹配信息

teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet gen(match)
// -236.78475
Treatment-effects estimation                   Number of obs      =      4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
----------------------------------------------------------------------------------------
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
----------------------------------------------------------------------------------------

6.4 保留处理组并合并匹配组的数据

6.4.1 保留处理组数据

frame copy simu simu2
frame change simu2
keep id bweight mbsmoke match*
keep if mbsmoke == 1
(3,778 observations deleted)

6.4.2 生成匹配对应表

reshape long match,i(id) j(match_id)
keep if match != .
(note: j = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 
> 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74)

Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 864 -> 63936
Number of variables 77 -> 5
j variable (74 values) -> match_id
xij variables:
match1 match2 ... match74 -> match
-----------------------------------------------------------------------------

6.4.3 建立frames 连接

frlink m:1 match,frame(simu id) gen(simu_21)
  (all observations in frame simu2 matched)

6.4.4 获取匹配组结果变量

frget bweight ,from(simu_21) pre(_0)
  (1 variable copied from linked frame)

6.5 计算单个匹配的处置效应

gen att = bweight - _0bweight
sum att
return list
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
att | 15,721 -238.8935 796.7148 -4082 3884



scalars:
r(N) = 15721
r(sum_w) = 15721
r(mean) = -238.8935182240315
r(Var) = 634754.4716125642
r(sd) = 796.7147994185649
r(min) = -4082
r(max) = 3884
r(sum) = -3755645

6.6 计算样本加权均值

bysort id:gen num = _N
sum att [aweight = 1/num]
return list
    Variable |     Obs      Weight        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------------------
att | 15,721 864 -236.7848 808.3941 -4082 3884



scalars:
r(N) = 15721
r(sum_w) = 864
r(mean) = -236.7847508257834
r(Var) = 653501.0708042918
r(sd) = 808.394130857153
r(min) = -4082
r(max) = 3884
r(sum) = -204582.0247134768
frames simu : teffects psmatch (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu),nn(1) atet
Treatment-effects estimation                   Number of obs      =      4,642
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 74
----------------------------------------------------------------------------------------
| AI Robust
bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET |
mbsmoke |
(smoker vs nonsmoker) | -236.7848 26.57789 -8.91 0.000 -288.8765 -184.693
----------------------------------------------------------------------------------------
frames simu : psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)  ties
Logistic regression                             Number of obs     =      4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841

-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3374.44447 -236.784751 26.0535546 -9.09
ATU | 3412.91159 3207.84728 -205.064318 . .
ATE | -210.968337 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642

6.7 计算匹配第一个的组的均值

sum att if match_id == 1
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
att | 864 -248.515 817.1815 -3403 2750
return list
scalars:
r(sum) = -214717
r(max) = 2750
r(min) = -3403
r(sd) = 817.1815138978078
r(Var) = 667785.6266563131
r(mean) = -248.5150462962963
r(sum_w) = 864
r(N) = 864
frames simu : psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, out(bweight) ate logit n(1)
Logistic regression                             Number of obs     =      4,642
LR chi2(5) = 375.00
Prob > chi2 = 0.0000
Log likelihood = -2043.2504 Pseudo R2 = 0.0841

-------------------------------------------------------------------------------
mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mmarried | -1.145706 .0918962 -12.47 0.000 -1.32582 -.965593
mage | .321518 .0638472 5.04 0.000 .1963798 .4466563
|
c.mage#c.mage | -.0060368 .0011849 -5.09 0.000 -.0083592 -.0037144
|
fbaby | -.3864258 .0880445 -4.39 0.000 -.5589898 -.2138618
medu | -.1420833 .0173215 -8.20 0.000 -.1760328 -.1081338
_cons | -2.950915 .8102504 -3.64 0.000 -4.538976 -1.362853
-------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
bweight Unmatched | 3137.65972 3412.91159 -275.251871 21.4528037 -12.83
ATT | 3137.65972 3386.17477 -248.515046 54.7442913 -4.54
ATU | 3412.91159 3166.47327 -246.438327 . .
ATE | -246.82486 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 3,778 | 3,778
Treated | 864 | 864
-----------+-----------+----------
Total | 4,642 | 4,642

本文来自网络或网友投稿,如有侵犯您的权益,请发邮件至:aisoutu@outlook.com 我们将第一时间删除。

相关素材