Matched-Measures Univariate Monte Carlo-Wilcoxon (muMCW) Test

The muMCW test is a statistical tool to assess whether one set of inherently matched-paired measures is significantly biased in the same direction. For instance, muMCW tests can be used to analyze bodyweights or transcript abundances determined at two different timepoints for the same set of mice.

Usage

muMCWtest(path, max_rearrangements)

Format

When executing the muMCWtest function, users must provide the path to a local CSV file named X_muMCWtest_data.csv, where X serves as a user-defined identifier. X_muMCWtest_data.csv can be structured in two distinct formats:

Vertical layout: This format allows appending datasets with varying structures, such as different numbers of measure matched-pairs for each appended test. Vertical entry datasets should include the following columns:
- Columns condition_a and condition_b uniquely identify the two conditions under which matched-paired measures were collected.
- Columns value_a and value_b contain the actual measures under analysis.
- As many informative columns as needed by users to contextualize the results of each test. The names of the these columns should not contain the terms condition or value. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously. All rows for each individual test must contain the same information in these columns.
Horizontal layout: This format allows appending datasets with similar structures, such as the same number of matched-paired measures for each appended test. Horizontal entry datasets should include the following columns:
- Columns condition_a and condition_b uniquely identify the two conditions under which matched-paired measures were collected.
- Columns a.i and b.i, where i represents integers to differentiate each specific matched-pairs of measures, contain the actual measures under analysis.
- As many informative columns as needed by users to contextualize the results of each test. The names of these columns should not contain the term condition or have the same structure as the a.i and b.i columns. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously.

Arguments

path: Path for the local CSV file containing the entry dataset formatted for muMCW tests.
max_rearrangements: User-defined maximum number of rearrangements of the dataset used by the function muMCWtest to generate a collection of expected-by-chance muMCW_BIs and estimate the statistical significance of observed muMCW_BIs. If the number of distinct dataset rearrangements is less than max_rearrangements, muMCWtest calculates muMCW_BIs for all possible data rearrangements. If the number of distinct dataset rearrangements is greater than max_rearrangements, muMCWtest will perform N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance muMCW_BIs.

Value

The muMCWtest function reports to the console the total number of tests it will execute, and their exact and approximated counts. It also creates a CSV file named X_muMCWtest_results.csv where X is a user-defined identifier for the entry dataset CSV file. The X_muMCWtest_results.csv file contains two rows for each muMCW test, with muMCW_BIs calculated for each possible condition contrast (e.g., a-b and b-a). The X_muMCWtest_results.csv file includes the following columns:

User-provided informative columns to contextualize the results of each test.
Columns condition_a and condition_b indicate the two conditions for which matched-paired measures were provided.
Column N indicates the total number of measure matched-pairs after removing matched-pairs with missing values (NAs).
Column test_type distinguishes between exact and approximated tests.
Column BI_type indicates muMCW_BI.
Column condition_contrast indicates the condition contrast for each row of results.
Column observed_BI contains the value of muMCW_BIs obtained from analyzing the user-provided dataset.
Column expected_by_chance_BI_N indicates the number of data rearrangements used to calculate the expected-by-chance muMCW_BIs. This value corresponds to the lowest number between all possible measure rearrangements and the parameter max_rearrangements.
Columns pupper and plower represent P~upper~ and P~lower~ values, respectively. They denote the fraction of expected-by-chance muMCW_BIs with values higher or equal to and lower or equal to the observed muMCW_BIs, respectively.

Details

The function muMCWtest eliminates any matched-paired measures with at least one missing value (NA) before proceeding with the following steps.

To estimate the bias for all matched-paired measures in the dataset, the function muMCWtest performs the following tasks:
- For each matched-pair of measures, it subtracts values for the two possible condition contrasts (e.g., a-b and b-a).
- For each condition contrast, it ranks the absolute values of non-zero differences from lowest to highest. Measure pair differences with a value of 0 are assigned a 0 rank. If multiple measure pair differences have the same absolute value, all tied measure pair differences are assigned the lowest rank possible.
- It assigns each measure pair rank a sign based on the sign of its corresponding measure pair difference.
- It sums the signed ranks for each condition contrast.
- It calculates muMCW_BI by dividing each sum of signed ranks by the maximum number that sum could have if the corresponding measure pairs had the highest possible positive ranks. Consequently, muMCW_BI ranges between 1 when all measures corresponding to the first condition are higher than all measures corresponding to the second condition, and -1 when all measures corresponding to the first condition are lower than all measures corresponding to the second condition.
To assess the significance of the muMCW_BIs obtained from the user-provided dataset (observed muMCW_BIs), the function muMCWtest performs the following tasks:
- It generates a collection of expected-by-chance muMCW_BIs. These expected values are obtained by rearranging the measures between and within the two conditions multiple times. The user-provided parameter max_rearrangements determines the two paths that the function muMCWtest can follow to generate the collection of expected-by-chance muMCW_BIs:
  - muMCW exact testing: If the number of distinct measure rearrangements that can alter their initial pair and set distribution is less than max_rearrangements, the function muMCWtest calculates muMCW_BIs for all possible data rearrangements.
  - muMCW approximated testing: If the number of distinct measure rearrangements that can alter their initial pair and set distribution is greater than max_rearrangements, the function muMCWtest performs N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance muMCW_BIs.
- It calculates the P~upper~ and P~lower~ values as the fraction of expected-by-chance muMCW_BIs that are higher or equal to and lower or equal to the observed muMCW_BIs, respectively.

Examples

test_temp <- tempdir()
extdata_v <- system.file("extdata", "example_vertical_muMCWtest_data.csv", package = "MCWtests")
file.copy(extdata_v, test_temp)
#> [1] TRUE
extdata_h <- system.file("extdata", "example_horizontal_muMCWtest_data.csv", package = "MCWtests")
file.copy(extdata_h, test_temp)
#> [1] TRUE
# running muMCWtest with an ideal vertical entry dataset
path_v <- file.path(test_temp, "example_vertical_muMCWtest_data.csv")
muMCWtest_vertical_results <- muMCWtest(path_v, 10)
#> total number of tests: 2
#> number of exact tests: 0
#> number of approximated tests: 2
#> running approximated tests: 
print(muMCWtest_vertical_results)
#> Key: <contrast, condition_a, condition_b>
#>    contrast condition_a condition_b     N    test_type  BI_type
#>      <char>      <char>      <char> <int>       <char>   <char>
#> 1:        I        AAAA        BBBB    10 approximated muMCW_BI
#> 2:        I        AAAA        BBBB    10 approximated muMCW_BI
#> 3:       II        AAAA        BBBB    10 approximated muMCW_BI
#> 4:       II        AAAA        BBBB    10 approximated muMCW_BI
#>    condition_contrast observed_BI expected_by_chance_BI_N pupper plower
#>                <char>       <num>                   <int>  <num>  <num>
#> 1:          AAAA-BBBB   0.2181818                      10    0.2    0.8
#> 2:          BBBB-AAAA  -0.2181818                      10    0.8    0.2
#> 3:          AAAA-BBBB  -0.9454545                      10    1.0    0.0
#> 4:          BBBB-AAAA   0.9454545                      10    0.0    1.0
# running muMCWtest with an ideal horizontal entry dataset
path_h <- file.path(test_temp, "example_horizontal_muMCWtest_data.csv")
muMCWtest_horizontal_results <- muMCWtest(path_h, 10)
#> total number of tests: 6
#> number of exact tests: 0
#> number of approximated tests: 6
#> running approximated tests: 
print(muMCWtest_horizontal_results)
#> Key: <contrast, contrast_trait, element_ID, element_chr, element_start, element_end, condition_a, condition_b>
#>     contrast contrast_trait element_ID element_chr element_start element_end
#>       <char>         <char>     <char>       <int>         <int>       <int>
#>  1:        I        trait_a         x1           1          1000        2000
#>  2:        I        trait_a         x1           1          1000        2000
#>  3:        I        trait_a         x2           1          5000        5500
#>  4:        I        trait_a         x2           1          5000        5500
#>  5:        I        trait_a         x3           1         90000      100000
#>  6:        I        trait_a         x3           1         90000      100000
#>  7:       II        trait_b         x1           1          1000        2000
#>  8:       II        trait_b         x1           1          1000        2000
#>  9:       II        trait_b         x2           1          5000        5500
#> 10:       II        trait_b         x2           1          5000        5500
#> 11:       II        trait_b         x3           1         90000      100000
#> 12:       II        trait_b         x3           1         90000      100000
#>     condition_a condition_b     N    test_type  BI_type condition_contrast
#>          <char>      <char> <int>       <char>   <char>             <char>
#>  1:        AAAA        BBBB     4 approximated muMCW_BI          AAAA-BBBB
#>  2:        AAAA        BBBB     4 approximated muMCW_BI          BBBB-AAAA
#>  3:        AAAA        BBBB     5 approximated muMCW_BI          AAAA-BBBB
#>  4:        AAAA        BBBB     5 approximated muMCW_BI          BBBB-AAAA
#>  5:        AAAA        BBBB     5 approximated muMCW_BI          AAAA-BBBB
#>  6:        AAAA        BBBB     5 approximated muMCW_BI          BBBB-AAAA
#>  7:        AAAA        BBBB     5 approximated muMCW_BI          AAAA-BBBB
#>  8:        AAAA        BBBB     5 approximated muMCW_BI          BBBB-AAAA
#>  9:        AAAA        BBBB     5 approximated muMCW_BI          AAAA-BBBB
#> 10:        AAAA        BBBB     5 approximated muMCW_BI          BBBB-AAAA
#> 11:        AAAA        BBBB     5 approximated muMCW_BI          AAAA-BBBB
#> 12:        AAAA        BBBB     5 approximated muMCW_BI          BBBB-AAAA
#>     observed_BI expected_by_chance_BI_N pupper plower
#>           <num>                   <int>  <num>  <num>
#>  1:  0.40000000                      10    0.5    0.6
#>  2: -0.40000000                      10    0.6    0.5
#>  3:  0.06666667                      10    0.7    0.6
#>  4: -0.06666667                      10    0.6    0.7
#>  5:  0.33333333                      10    0.4    0.7
#>  6: -0.33333333                      10    0.7    0.4
#>  7: -1.00000000                      10    1.0    0.1
#>  8:  1.00000000                      10    0.1    1.0
#>  9: -1.00000000                      10    1.0    0.1
#> 10:  1.00000000                      10    0.1    1.0
#> 11: -1.00000000                      10    1.0    0.0
#> 12:  1.00000000                      10    0.0    1.0

rm(test_temp)