Bias-measures Monte Carlo-Wilcoxon (bMCW) Test

The bMCW test is a statistical tool to assess whether a set of measures of bias for a quantitative trait between two conditions or a subset of these bias measures are themselves significantly biased in the same direction. For instance, bMCW tests can be used to analyze bias indexes obtained using other MCW tests or fold change for transcript abundances spanning the entire transcriptome or only for genes located in specific genomic regions from two sets of mice exposed to different conditions.

Usage

bMCWtest(path, max_rearrangements)

Format

When executing the bMCWtest function, users must provide the path to a local CSV file named X_bMCWtest_data.csv, where X serves as a user-defined identifier. X_bMCWtest_data.csv should include the following columns:

Column bias_value contains the value of the bias measure under analysis.
Columns subset_x, where x represents the specific type of subset for each column, such as "chr" for chromosomes or "GO" for Gene Ontology. These columns are required if users intend to assess whether bias measures for certain subsets of elements in the dataset are significantly biased in the same direction. Columns subset_x can indicate whether an element belongs to a subset using either "YES" and "NO", or specific subset names like "chr1" or "chrX", or a combination of both, such as "chr1", "chrX" and "NO". The function bMCW test will transform the dataset to conduct independent analysis of each subset of elements marked as "YES" or with a specific subset name in each subset_x column.
As many informative columns as needed by users to contextualize the results of each test. The names of these columns should not contain the terms bias_value or subset. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously. All rows for each individual test must contain the same information in these columns.
Users can specify columns with information relevant about each element or row using the column name structure element_x, where x indicates the specific information in each column (see example). However, element_x columns are not essential for bMCW testing and will not be included in the results file.

Arguments

path: Path for the local CSV file containing the entry dataset formatted for bMCW tests.
max_rearrangements: User-defined maximum number of rearrangements of the dataset used by the function bMCWtest to generate a collection of expected-by-chance bMCW_wBIs and bMCW_sBIs and estimate the statistical significance of observed bMCW_BIs. If the number of distinct dataset rearrangements is less than max_rearrangements, bMCWtest calculates bMCW_wBIs and bMCW_sBIs for all possible data rearrangements. If the number of distinct dataset rearrangements is greater than max_rearrangements, bMCWtest will perform N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance bMCW_wBIs and bMCW_sBIs.

Value

The bMCWtest function reports to the console the total number of tests it will execute, and their exact and approximated counts. It also creates a CSV file named X_bMCWtest_results.csv, where X is a user-defined identifier for the entry dataset CSV file. The X_bMCWtest_results.csv file contains one row for each bMCWtest to indicate the results of whole-set bMCW testing, and as many rows as necessary to indicate the results of subset bMCW testing. Rows for whole-set analyses will be at the top of X_bMCWtest_results.csv file. The X_bMCWtest_results.csv file includes the following columns:

User-provided informative columns to contextualize the results of each test.
Column subset_type indicates whether the results in each row corresponds to whole-set tests or specific subset tests, such as "chr" for chromosomes or "GO" for Gene Ontology terms.
Column tested_subset indicates the name of the subset under analysis. For whole-set tests, the tested_subset column indicates "none". For subset tests, the tested_subset column indicates "YES" or the specific name of the subset under analysis, such as "chr1" or "chrX".
Columns N and n indicate the total number of elements in the whole set and those associated with the subset under analysis, respectively, after removing missing values (NAs). For whole-set tests, columns N and n have the same value.
Column test_type distinguishes between exact and approximated tests.
Column BI_type indicates whether results correspond to whole-set tests (bMCW_wBI) or to subset tests (bMCW_sBIs).
Column observed_BI contains the value of bMCW_BIs obtained from analyzing the user-provided dataset.
Column expected_by_chance_BI_N indicates the number of data rearrangements used to calculate the expected-by-chance bMCW_wBIs and bMCW_sBIs. This value corresponds to the lowest number between all possible measure rearrangements and the parameter max_rearrangements.
Columns pupper and plower represent the P~upper~ and P~lower~ values, respectively. They denote the fraction of expected-by-chance bMCW_wBIs or bMCW_sBIs with values higher or equal to and lower or equal to the observed bMCW_wBIs or bMCW_sBIs, respectively.

Details

The function bMCWtest eliminates missing values (NAs) from the dataset before proceeding with the following steps.

To estimate the bias for all bias measures in the entire dataset or a subset of them, the function bMCWtest performs the following tasks:
- It ranks all bias measures with non-zero values from lowest to highest. Bias measures with a value of 0 are assigned a 0 rank. If multiple bias measures have the same absolute value, all tied bias measures are asssigned the lowest rank possible.
- It assigns each rank a sign based on the sign of its corresponding bias measure.
- It calculates a whole-set bias index (bMCW_wBI) by summing the signed ranks for all elements in the dataset and dividing it by the maximum number that sum could have if all bias measures were positive. Consequently, bMCW_wBI ranges between 1 when all bias measures are positive, and -1 when all bias measures are negative.
- It calculates a subset bias index (bMCW_sBI) for each subset of elements under analysis by summing the signed ranks for the elements in the subset and dividing it by the maximum number that sum could have if the elements in the subset had the highest possible positive bias measures. Consequently, bMCW_sBI ranges between 1 when the bias measures for the subset in question have the highest positive bias measures in the entire dataset, and -1 when the bias measures for the subset in question have the lowest negative bias measures in the entire dataset.
To assess the significance of the bMCW-wBIs and bMCW-sBIs obtained from the user-provided dataset (observed bMCW-wBIs and bMCW-sBIs), the function bMCWtest performs the following tasks,
- It generates a collection of expected-by-chance bMCW_wBIs by rearranging the signs of all signed ranks multiple times. The function bMCWtest also generates a collection of expected-by-chance bMCW_sBIs by rearranging the subset of elements multiple times. The user-provided parameter max_rearrangements determines the two paths that the function bMCWtest can follow to generate the collection of expected-by-chance bMCW_wBIs and bMCW_sBIs:
  - bMCW exact testing: If the number of distinct bias measure rearrangements that can alter their initial sign distribution or subset distribution is less than max_rearrangements, the function bMCWtest calculates bMCW_wBIs or bMCW_sBIs for all possible data rearrangements.
  - bMCW approximated testing: If the number of distinct bias measure rearrangements that can alter their initial sign distribution or subset distribution is greater than max_rearrangements, the function bMCWtest performs N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance bMCW_wBIs or bMCW_sBIs.
- It calculates P~upper~ and P~lower~ values, as the fraction of expected-by-chance bMCW-wBIs and bMCW-sBIs that are higher or equal to and lower or equal to the observed bMCW-wBIs and bMCW-sBIs, respectively.

Examples

test_temp <- tempdir()
extdata <- system.file("extdata", "example_bMCWtest_data.csv", package = "MCWtests")
file.copy(extdata, test_temp)
#> [1] TRUE
# running bMCWtest with an ideal entry dataset
path <- file.path(test_temp, "example_bMCWtest_data.csv")
bMCWtest_results <- bMCWtest(path, 10)
#> total number of tests: 15
#> number of wholeset exact tests: 0
#> number of subset exact tests: 0
#> number of wholeset approximated tests: 3
#> number of subset approximated tests: 12
#> running approximated tests: 
print(bMCWtest_results)
#>     contrast contrast_trait bias_measure condition_contrast subset_type
#>       <char>         <char>       <char>             <char>      <fctr>
#>  1:        I        trait_a      uMCW_BI          AAAA-BBBB    wholeset
#>  2:       II        trait_b      uMCW_BI          AAAA-BBBB    wholeset
#>  3:      III        trait_c      uMCW_BI          AAAA-BBBB    wholeset
#>  4:        I        trait_a      uMCW_BI          AAAA-BBBB  subset_chr
#>  5:        I        trait_a      uMCW_BI          AAAA-BBBB  subset_chr
#>  6:        I        trait_a      uMCW_BI          AAAA-BBBB  subset_chr
#>  7:        I        trait_a      uMCW_BI          AAAA-BBBB  subset_chr
#>  8:       II        trait_b      uMCW_BI          AAAA-BBBB  subset_chr
#>  9:       II        trait_b      uMCW_BI          AAAA-BBBB  subset_chr
#> 10:       II        trait_b      uMCW_BI          AAAA-BBBB  subset_chr
#> 11:       II        trait_b      uMCW_BI          AAAA-BBBB  subset_chr
#> 12:      III        trait_c      uMCW_BI          AAAA-BBBB  subset_chr
#> 13:      III        trait_c      uMCW_BI          AAAA-BBBB  subset_chr
#> 14:      III        trait_c      uMCW_BI          AAAA-BBBB  subset_chr
#> 15:      III        trait_c      uMCW_BI          AAAA-BBBB  subset_chr
#>     tested_subset     N     n    test_type  BI_type observed_BI
#>            <char> <int> <int>       <char>   <char>       <num>
#>  1:          none    10    10 approximated bMCW_wBI  -0.1636364
#>  2:          none    10    10 approximated bMCW_wBI   0.8545455
#>  3:          none    10    10 approximated bMCW_wBI   0.2363636
#>  4:          chr1    10     3 approximated bMCW_sBI   0.4444444
#>  5:          chr2    10     3 approximated bMCW_sBI  -0.2592593
#>  6:          chrX    10     3 approximated bMCW_sBI  -0.5555556
#>  7:          chrY    10     1 approximated bMCW_sBI   0.1000000
#>  8:          chr1    10     3 approximated bMCW_sBI   0.4444444
#>  9:          chr2    10     3 approximated bMCW_sBI   0.7777778
#> 10:          chrX    10     3 approximated bMCW_sBI   0.6296296
#> 11:          chrY    10     1 approximated bMCW_sBI  -0.3000000
#> 12:          chr1    10     3 approximated bMCW_sBI   1.0000000
#> 13:          chr2    10     3 approximated bMCW_sBI   0.1481481
#> 14:          chrX    10     3 approximated bMCW_sBI  -0.4814815
#> 15:          chrY    10     1 approximated bMCW_sBI  -0.5000000
#>     expected_by_chance_BI_N pupper plower
#>                       <int>  <num>  <num>
#>  1:                      10    0.4    0.6
#>  2:                      10    0.0    1.0
#>  3:                      10    0.0    1.0
#>  4:                      10    0.1    0.9
#>  5:                      10    0.4    0.6
#>  6:                      10    1.0    0.0
#>  7:                      10    0.4    0.6
#>  8:                      10    0.4    0.6
#>  9:                      10    0.3    0.8
#> 10:                      10    0.5    0.5
#> 11:                      10    1.0    0.0
#> 12:                      10    0.0    1.0
#> 13:                      10    0.2    0.8
#> 14:                      10    1.0    0.0
#> 15:                      10    0.9    0.1
rm(test_temp)