EWAS

GLINT allows to perform epigenome-wide association study (EWAS) under several different models, as described bellow. All of the different models require using the general arguments desribed below under “General arguments”.

Note

The example commands described bellow assume that the user generated GLINT files with covariates file and phenotypes file.



General arguments

–ewas

Runs an association test on each site in the data and outputs a results file.

Note

If –ewas is used without additional arguments GLINT will use linear regression as a default.

Note

GLINT can produce plots based on the results file. For more details read about the –plot argument.

Note

Use –out in order to change the default output name.

–pheno

Selects a phenotype to use in the association test.

For example:

python glint.py --datafile datafile.glint --ewas --linreg --pheno y1

will run EWAS using linear regression model with the phenotype y1. The names of the phenotypes are defined by the headers in the datafile.samples.txt file associated with the datafile.glint. For more details see GLINT files.

Note

Use the argument –phenofile in order to provide phenotypes that were not included in the datafile.glint file or in case where a textual version of the data is used rather than a .glint file.

–covar

Selects covariates to use in the association test.

For example:

python glint.py --datafile datafile.glint --ewas --linreg --covar c1 c2 c3

will run EWAS using linear regression model with the covariates c1, c2 and c3. The names of the covariates are defined by the headers in the datafile.samples.txt file associated with the datafile.glint. For more details see GLINT files.

Alternatively, run:

python glint.py --datafile datafile.glint --ewas --linreg --covar

without specifying names of covariates in order to include into the model all of the covariates included in the GLINT file.

Note

Use the argument –covarfile in order to provide covariates that were not included in the datafile.glint file or in case where a textual version of the data is used rather than a .glint file.

–stdth

Excludes sites with standard deviation lower than a specified value for the EWAS analysis. This argument can be used in order to reduce the number of hypotheses by excluding near-constant sites that are not expected to be associated with phenotypes.

For example:

python glint.py --datafile datafile.glint --ewas --linreg --pheno y1 --stdth 0.01

will consider only sites with standard deviation greater than 0.01 in the EWAS analysis.



Linear regression

–linreg

Performs EWAS on the data using linear regression model. This is the default model for –ewas.

The output file titled results.glint.linreg.txt includes a list of the sites, sorted by their association p-value. The output file format includes the following columns: ID (CpG identifier), chromosome (chromosome number of the site), MAPINFO (position of the site in the genome), p-value, q-value, intercept , V1 (coefficient of the first covariate),…, Vn (coefficient of the last covaraite, beta (the coefficient of the site under test), statistic (the test statistic), UCSC_RefGene_Name (name of the gene that is closest to this site), Relation_to_UCSC_CpG_Island (category)

For example:

python glint.py --datafile datafile.glint --ewas --linreg --pheno y1

will run EWAS using linear regression model.



Logistic regression

–logreg

Performs EWAS on the data using logistic regression model. This option requires a binary phenotype (controls are assumed to be coded as ‘0’ and cases as ‘1’).

The output file titled results.glint.logreg.txt includes a list of the sites, sorted by their association p-value. The output file format is indentical to the once described under –linreg.

For example:

python glint.py --datafile datafile.glint --ewas --logreg --pheno y1

will run EWAS using logistic regression model.



Wilcoxon rank-sum test

–wilc

Performs EWAS on the data using the non-parameteric Wilcoxon rank-sum text. This option requires a binary phenotype (controls are assumed to be coded as ‘0’ and cases as ‘1’).

The output file titled results.wilc.logreg.txt includes a list of the sites, sorted by their association p-value. The output file format includes the following columns: ID (CpG identifier), chromosome (chromosome number of the site), MAPINFO (position of the site in the genome), p-value, q-value, statistic (the test statistic), UCSC_RefGene_Name (name of the gene that is closest to this site), Relation_to_UCSC_CpG_Island (category)

For example:

python glint.py --datafile datafile.glint --ewas --wilc --pheno y1

will run EWAS using the Wilcoxon rank-sum test.



Linear mixed model (LMM)

–lmm

Performs EWAS on the data using linear mixed model (LMM). This is an implementation of the FaST-LMM algorithm by Lippert et al. [1]

The output file named results.glint.lmm.txt* includes a list of the sites, sorted by their association p-value. The output file includes the following columns: ID (CpG identifiers), chromosome (chromosome number of the site), MAPINFO (position of the site in the genome), p-value, q-value, intercept , V1 (coefficient of the first covariate),…, Vn (coefficient of the last covaraite, beta (the coefficient of the site under test), statistic (the test statistic), sigma-e (an estimate of sigma_e), sigma-g (an estimate of sigma_g), UCSC_RefGene_Name (name of the gene that is closest to this site), Relation_to_UCSC_CpG_Island (category)

–kinship

The kinship matrix for modelling the inter-individual similarity in the data that is required for the LMM. GLINT allows two options:

  • User-supplied kinship - users can suplly a text file with samples by samples kinship matrix (tab-delimited and with no row or column headers).
  • refactor - the ReFACTor algorithm can be used for constructing the kinship matrix. If this option is used then ReFACTor is executed for selecting the top informative sites in the data. The kinship matrix is then constructed by calculatign the empirical covariance matrix of the samples based on the selected sites.

For example:

python glint.py --datafile datafile.glint --ewas --lmm --pheno y1 --kinship kinship.txt

will run EWAS using LMM with the kinship matrix specified in the kinship.txt file. Alternatively:

python glint.py --datafile datafile.glint --ewas --lmm --pheno y1 --kinship refactor --k 6

will use the ReFACTor algorithm for constructing the kinship matrix (where 6 is the number of assumed cell types, see the argument –k for more details).

Note

If the refactor option is used then all of the arguments available with the –refactor argument are also available here.

–ml

Allows to indicate whether rstricted maximum likelihood estimation (REML) or maximum likelihood estimation (ML) should be used. If this flag is supplied than ML is used, otherwise REML is used (default).

For example:

python glint.py --datafile datafile.glint --ewas --lmm --pheno y1 --kinship kinship.txt --ml

will perform EWAS on the data using LMM with ML estimation.

–norm

This argument normalizes the covariates (if supplied) before fitting the LMM.

For example:

python glint.py --datafile datafile.glint --ewas --lmm --pheno y1 --covar c1 c2 c3 --norm

will perform EWAS on the data using LMM after normalizing the covariates c1, c2 and c3.

–oneld

This argument allows to fit the log delta parameter in the Fast-LMM model only once under the null model (instead for each site separately).

For example:

python glint.py --datafile datafile.glint --ewas --lmm --pheno y1 --oneld

will perform EWAS on the data using LMM with a single value of log detla.

[1]Lippert, Christoph, Jennifer Listgarten, Ying Liu, Carl M. Kadie, Robert I. Davidson, and David Heckerman. “FaST linear mixed models for genome-wide association studies.” Nature methods 8, no. 10 (2011): 833-835.