Authors list

Danial Habibi
Mahdi Akbarzadeh
Farshad Teymoori
Sahand Tehrani Fateh
Sajedeh Masjoudi
Amir Hossein Saeidian
Farhad Hosseinpanah
Hakon Hakonarson
Fereidoun Azizi
Alireza Soleymani T
Mehdi Hedayati
Maryam Sadat Daneshpour
Marjan Mansourian

Introduction

Title: Investigating the causality between Serum 25 hydroxy vitamin D concentration on cardioembolic stroke
- Exposure:Serum 25 hydroxy vitamin D concentration, Reference paper: Revez Paper 2020 | GWAS ID:GCST90000616
- Sample size:417,580
- Outcome: cardioembolic stroke, Reference paper: Mishra Paper 2022 | GWAS ID: GCST90104541
- Sample size: 1,245,612 , Number of cases: 10,804 , Number of controls: 1,234,808
- Ancestry: European

Data Preparation

1- Number of total SNPs in exposure: 7,250,104 SNPs

2- Number of SNPs exposure with p-value < \(10^-8\) = 16,012 SNPs

3- Number of SNPs exposure after clumping = 150 SNPs

4- Number of total SNPs in outcome: 8,306,091 SNPs

5- Number of common variants between exposure and outcome: 112 SNPs

6- Number of SNPs after harmonization (action=2) = 110 SNPs

7- Number of SNPs after removing HLA region = 110 SNP

8- Number of SNPs after removing those that have MAF < 0.01 = 110 SNPs

Checking pleiotropy by Phenoscanner:

How many SNPs have been eliminated with checking the phenoscanner website: 0 SNPs

Checking weakness of the instruments:

data <- fread("data.txt")
data$F<-(data$beta.exposure/data$se.exposure)^2
summary(data$F)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.78   34.98   42.88  118.79   68.94 2567.54

How many SNPs have been eliminated with checking the weakness: 0 SNP

RUN an initial MR:

res<-mr(data)
res

##   id.exposure id.outcome outcome exposure                    method nsnp
## 1      w10boa     k0xTZ9 outcome exposure                  MR Egger  110
## 2      w10boa     k0xTZ9 outcome exposure           Weighted median  110
## 3      w10boa     k0xTZ9 outcome exposure Inverse variance weighted  110
## 4      w10boa     k0xTZ9 outcome exposure               Simple mode  110
## 5      w10boa     k0xTZ9 outcome exposure             Weighted mode  110
##            b         se       pval
## 1 -0.1582032 0.16218781 0.33152582
## 2 -0.1414437 0.15063818 0.34774977
## 3 -0.1221412 0.09678982 0.20697681
## 4 -0.5416048 0.30771357 0.08119709
## 5 -0.1780771 0.13741357 0.19774028

plot(data$beta.exposure,data$beta.outcome)
text(data$beta.exposure,                                
     data$beta.outcome,
     labels = data$SNP,
     pos = 4)

#scatter plot
p1 <- mr_scatter_plot(res, data)    
p1[[1]]

#Heterogeneity testing
mr_heterogeneity<- mr_heterogeneity(data); mr_heterogeneity

##   id.exposure id.outcome outcome exposure                    method        Q
## 1      w10boa     k0xTZ9 outcome exposure                  MR Egger 131.6582
## 2      w10boa     k0xTZ9 outcome exposure Inverse variance weighted 131.7523
##   Q_df     Q_pval
## 1  108 0.06058768
## 2  109 0.06816311

#pleiotropy testing
mr_pleiotropy_test<- mr_pleiotropy_test(data); mr_pleiotropy_test

##   id.exposure id.outcome outcome exposure egger_intercept         se      pval
## 1      w10boa     k0xTZ9 outcome exposure     0.001187683 0.00427599 0.7817304

#plot of single SNP MR:
res_single <- mr_singlesnp(data); p2 <- mr_forest_plot(res_single); p2[[1]]

#plot of LOO:
res_loo <- mr_leaveoneout(data); p3 <- mr_leaveoneout_plot(res_loo); p3[[1]]

#Funnel plot
p4 <- mr_funnel_plot(res_single); p4[[1]]

Testing Outlier with PRESSO test

## $`Main MR results`
##        Exposure       MR Analysis Causal Estimate         Sd    T-stat
## 1 beta.exposure               Raw      -0.1221412 0.09678982 -1.261922
## 2 beta.exposure Outlier-corrected              NA         NA        NA
##     P-value
## 1 0.2096704
## 2        NA
## 
## $`MR-PRESSO results`
## $`MR-PRESSO results`$`Global Test`
## $`MR-PRESSO results`$`Global Test`$RSSobs
## [1] 133.587
## 
## $`MR-PRESSO results`$`Global Test`$Pvalue
## [1] 0.077

##   id.exposure id.outcome outcome exposure                    method nsnp
## 1      w10boa     k0xTZ9 outcome exposure                  MR Egger  110
## 2      w10boa     k0xTZ9 outcome exposure           Weighted median  110
## 3      w10boa     k0xTZ9 outcome exposure Inverse variance weighted  110
## 4      w10boa     k0xTZ9 outcome exposure               Simple mode  110
## 5      w10boa     k0xTZ9 outcome exposure             Weighted mode  110
##            b         se       pval
## 1 -0.1582032 0.16218781 0.33152582
## 2 -0.1414437 0.13879759 0.30817241
## 3 -0.1221412 0.09678982 0.20697681
## 4 -0.5416048 0.30870173 0.08216196
## 5 -0.1780771 0.13890941 0.20257478

##   id.exposure id.outcome outcome exposure                    method        Q
## 1      w10boa     k0xTZ9 outcome exposure                  MR Egger 131.6582
## 2      w10boa     k0xTZ9 outcome exposure Inverse variance weighted 131.7523
##   Q_df     Q_pval
## 1  108 0.06058768
## 2  109 0.06816311

##   id.exposure id.outcome outcome exposure egger_intercept         se      pval
## 1      w10boa     k0xTZ9 outcome exposure     0.001187683 0.00427599 0.7817304

Radial test

## 
## Radial IVW
## 
##                    Estimate  Std.Error   t value  Pr(>|t|)
## Effect (Mod.2nd) -0.1221414 0.09678978 -1.261925 0.2069759
## Iterative        -0.1221414 0.09678978 -1.261925 0.2069759
## Exact (FE)       -0.1234126 0.08804344 -1.401724 0.1609976
## Exact (RE)       -0.1231890 0.08810024 -1.398282 0.1648669
## 
## 
## Residual standard error: 1.099 on 109 degrees of freedom
## 
## F-statistic: 1.59 on 1 and 109 DF, p-value: 0.21
## Q-Statistic for heterogeneity: 131.7325 on 109 DF , p-value: 0.06831866
## 
##  Outliers detected 
## Number of iterations = 2

##        SNP Q_statistic      p.value
## 1 rs532436    25.43846 4.567209e-07

Studentized residuals:

library(data.table)
data<-fread("data.txt")
reg_1<-lm(data$beta.outcome~data$beta.exposure-1)
data$st_1<-rstandard(reg_1)

#Histogram plot
hist(data$st_1)

How many SNPs have been eliminated with checking the weakness: 0 SNP

datacc_1 <- fread("datacc_1.txt")
res_cc_1<-mr(datacc_1)
res_cc_1

##   id.exposure id.outcome outcome exposure                    method nsnp
## 1      w10boa     k0xTZ9 outcome exposure                  MR Egger  103
## 2      w10boa     k0xTZ9 outcome exposure           Weighted median  103
## 3      w10boa     k0xTZ9 outcome exposure Inverse variance weighted  103
## 4      w10boa     k0xTZ9 outcome exposure               Simple mode  103
## 5      w10boa     k0xTZ9 outcome exposure             Weighted mode  103
##            b         se       pval
## 1 -0.1576632 0.14756508 0.28787278
## 2 -0.1448570 0.14499155 0.31775972
## 3 -0.1665251 0.08908217 0.06157511
## 4 -0.5255756 0.30254676 0.08537554
## 5 -0.1697537 0.13983600 0.22757085

mr_heterogeneity_datacc_1<- mr_heterogeneity(datacc_1); mr_heterogeneity_datacc_1

##   id.exposure id.outcome outcome exposure                    method        Q
## 1      w10boa     k0xTZ9 outcome exposure                  MR Egger 89.19564
## 2      w10boa     k0xTZ9 outcome exposure Inverse variance weighted 89.20132
##   Q_df    Q_pval
## 1  101 0.7932955
## 2  102 0.8130977

mr_pleiotropy_test_datacc_1<- mr_pleiotropy_test(datacc_1); mr_pleiotropy_test_datacc_1

##   id.exposure id.outcome outcome exposure egger_intercept         se     pval
## 1      w10boa     k0xTZ9 outcome exposure   -0.0002928949 0.00388821 0.940102

Cook distance

In statistics, Cook’s distance or Cook’s D is a commonly used estimate of the influence of a data point when performing a least-squares regression analysis.[1] In a practical ordinary least squares analysis, Cook’s distance can be used in several ways:

1- To indicate influential data points that are particularly worth checking for validity.

2- To indicate regions of the design space where it would be good to be able to obtain more data points.

It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977.

Refernce

library(data.table)
library(TwoSampleMR)
datacc_1 <- fread("datacc_1.txt")

#Residuals vs fitted:
plot(lm(datacc_1$beta.outcome~datacc_1$beta.exposure-1), which=1)

#Q-Q Residuals:
plot(lm(datacc_1$beta.outcome~datacc_1$beta.exposure-1), which=2)

#Standardized Residuals:
plot(lm(datacc_1$beta.outcome~datacc_1$beta.exposure-1), which=3)

#Cook distance:
plot(lm(datacc_1$beta.outcome~datacc_1$beta.exposure-1), which=4)

MR analyses

dataRC <- fread("dataRC.txt")
res_RC<-mr(dataRC)
res_RC

##   id.exposure id.outcome outcome exposure                    method nsnp
## 1      w10boa     k0xTZ9 outcome exposure                  MR Egger   99
## 2      w10boa     k0xTZ9 outcome exposure           Weighted median   99
## 3      w10boa     k0xTZ9 outcome exposure Inverse variance weighted   99
## 4      w10boa     k0xTZ9 outcome exposure               Simple mode   99
## 5      w10boa     k0xTZ9 outcome exposure             Weighted mode   99
##            b         se       pval
## 1 -0.2334039 0.16146571 0.15153152
## 2 -0.1765377 0.14937491 0.23726791
## 3 -0.1989857 0.09549837 0.03719151
## 4 -0.5271198 0.32539786 0.10846221
## 5 -0.2393457 0.14040596 0.09142457

mr_heterogeneity_dataRC<- mr_heterogeneity(dataRC); mr_heterogeneity_dataRC

##   id.exposure id.outcome outcome exposure                    method        Q
## 1      w10boa     k0xTZ9 outcome exposure                  MR Egger 78.64052
## 2      w10boa     k0xTZ9 outcome exposure Inverse variance weighted 78.71040
##   Q_df    Q_pval
## 1   97 0.9135906
## 2   98 0.9239006

mr_pleiotropy_test_dataRC<- mr_pleiotropy_test(dataRC); mr_pleiotropy_test_dataRC

##   id.exposure id.outcome outcome exposure egger_intercept          se      pval
## 1      w10boa     k0xTZ9 outcome exposure     0.001064541 0.004026936 0.7920672

Sensitivity analyses

## 
## Inverse-variance weighted method
## (variants uncorrelated, random-effect model)
## 
## Number of Variants : 99 
## 
## ------------------------------------------------------------------
##  Method Estimate Std Error  95% CI        p-value
##     IVW   -0.199     0.095 -0.386, -0.012   0.037
## ------------------------------------------------------------------
## Residual standard error =  0.896 
## Residual standard error is set to 1 in calculation of confidence interval when its estimate is less than 1.
## Heterogeneity test statistic (Cochran's Q) = 78.7104 on 98 degrees of freedom, (p-value = 0.9239). I^2 = 0.0%.

##                     Method Estimate Std Error 95% CI         P-value
##              Simple median   -0.291     0.166  -0.616  0.034   0.079
##            Weighted median   -0.177     0.149  -0.468  0.114   0.234
##  Penalized weighted median   -0.179     0.148  -0.469  0.112   0.228
##                                                                     
##                        IVW   -0.199     0.095  -0.386 -0.012   0.037
##              Penalized IVW   -0.199     0.095  -0.386 -0.012   0.037
##                 Robust IVW   -0.218     0.088  -0.390 -0.046   0.013
##       Penalized robust IVW   -0.218     0.088  -0.390 -0.046   0.013
##                                                                     
##                   MR-Egger   -0.233     0.161  -0.550  0.083   0.148
##                (intercept)    0.001     0.004  -0.007  0.009   0.792
##         Penalized MR-Egger   -0.233     0.161  -0.550  0.083   0.148
##                (intercept)    0.001     0.004  -0.007  0.009   0.792
##            Robust MR-Egger   -0.216     0.130  -0.470  0.039   0.097
##                (intercept)    0.000     0.004  -0.008  0.008   0.988
##  Penalized robust MR-Egger   -0.216     0.130  -0.470  0.039   0.097
##                (intercept)    0.000     0.004  -0.008  0.008   0.988

## $r2_exp
## [1] 0
## 
## $r2_out
## [1] 0.25
## 
## $r2_exp_adj
## [1] 0
## 
## $r2_out_adj
## [1] 0.25
## 
## $correct_causal_direction
## [1] FALSE
## 
## $steiger_test
## [1] NA
## 
## $correct_causal_direction_adj
## [1] FALSE
## 
## $steiger_test_adj
## [1] NA
## 
## $vz
## [1] NaN
## 
## $vz0
## [1] 0
## 
## $vz1
## [1] NaN
## 
## $sensitivity_ratio
## [1] NaN
## 
## $sensitivity_plot

Working with MRraps

## $beta.hat
## [1] -0.200448
## 
## $beta.se
## [1] 0.09647138
## 
## $beta.p.value
## [1] 0.03772802
## 
## $naive.se
## [1] 0.09603428
## 
## $chi.sq.test
## [1] 78.67845

##   over.dispersion loss.function   beta.hat    beta.se
## 1           FALSE            l2 -0.2004480 0.09647138
## 2           FALSE         huber -0.2159041 0.09898388
## 3           FALSE         tukey -0.2142594 0.09898336
## 4            TRUE            l2 -0.2004483 0.09647265
## 5            TRUE         huber -0.2159041 0.09898433
## 6            TRUE         tukey -0.2142595 0.09898382

## 
## MR-Lasso method 
## 
## Number of variants : 99 
## Number of valid instruments : 99 
## Tuning parameter : 0.2464102 
## ------------------------------------------------------------------
##  Exposure Estimate Std Error  95% CI        p-value
##  exposure   -0.199     0.095 -0.386, -0.012   0.037
## ------------------------------------------------------------------

## 
## Constrained maximum likelihood method (MRcML) 
## Number of Variants:  99 
## Results for:  cML-MA-BIC 
## ------------------------------------------------------------------
##      Method Estimate    SE Pvalue          95% CI
##  cML-MA-BIC   -0.201 0.096  0.036 [-0.389,-0.013]
## ------------------------------------------------------------------

## 
## Debiased inverse-variance weighted method
## (Over.dispersion:TRUE)
## 
## Number of Variants : 99 
## ------------------------------------------------------------------
##  Method Estimate Std Error  95% CI        p-value Condition
##    dIVW   -0.201     0.096 -0.390, -0.012   0.037  1098.764
## ------------------------------------------------------------------

## 
## Mode-based method of Hartwig et al
## (weighted, delta standard errors [not assuming NOME], bandwidth factor = 1)
## 
## Number of Variants : 99 
## ------------------------------------------------------------------
##  Method Estimate Std Error  95% CI       p-value
##     MBE   -0.239     0.163 -0.560, 0.081   0.143
## ------------------------------------------------------------------

Supplementary Material of the paper entitled:

Causal effect of serum 25 hydroxyvitamin D concentration on cardioembolic stroke: evidence from Mendelian randomization

Created by Mahdi Akbarzadeh (Email: akbarzadeh.ms@gmail.com), and Danial Habibi

2023-09-23