Unmatched-Measures Monte Carlo-Wilcoxon (uMCW) Test

The uMCW test is a statistical tool to assess whether two sets of unmatched measures and their heterogeneity are significantly biased in the same direction. Significantly different data heterogeneities between two conditions could indicate that the measure under analysis is more constrained or more relaxed in one of the conditions, potentially providing insights into the mechanisms underlying the variation of such measure. For instance, uMCW tests can be used to analyze bodyweights or transcript abundances determined for two sets of mice that have been maintained in different conditions.

Usage

uMCWtest(path, max_rearrangements)

Format

When executing the uMCWtest function, users must provide the path to a local CSV file named X_uMCWtest_data.csv, where X serves as a user-defined identifier. X_uMCWtest_data.csv can be structured in two distinct formats:

Vertical layout: This format allows appending datasets with varying structures, such as different numbers of measures per set or between each appended test. Vertical entry datasets should include the following columns:
- The condition column uniquely identifies each of the two measure sets under analysis.
- The value column contains the actual measures under analysis.
- As many informative columns as needed by users to contextualize the results of each test. The names of these columns should not include the terms condition or value. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously. All rows for each individual test must contain the same information in these columns.
Horizontal layout: This format allows appending datasets with similar structures, such as the same number of measures collected for each of the two conditions. Horizontal entry datasets should include the following columns:
- Columns condition_a and condition_b uniquely identify the two measure sets under analysis.
- Columns a.i and b.j, where i and j represent integers to differentiate specific measures within each set, contain the actual measures under analysis.
- As many informative columns as needed by users to contextualize the results of each test. The names of these columns should not contain the term condition or have the same structure as the a.i and b.j columns. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously.

Arguments

path: Path for the local CSV file containing the entry dataset formatted for uMCW tests.
max_rearrangements: User-defined maximum number of rearrangements of the dataset used by the function uMCWtest to generate a collection of expected-by-chance uMCW_BIs and uMCW_HBIs and estimate the statistical significance of observed uMCW_BIs and uMCW_HBIs. If the number of distinct dataset rearrangements is less than max_rearrangements, uMCWtest calculates uMCW_BIs and uMCW_HBIs for all possible data rearrangements. If the number of distinct dataset rearrangements is greater than max_rearrangements, uMCWtest will perform N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance uMCW_BIs and uMCW_HBIs.

Value

The uMCWtest function reports to the console the total number of tests it will execute, and their exact and approximated counts. It also creates a CSV file named X_uMCWtest_results.csv, where X is a user-defined identifier for the entry dataset CSV file. The X_uMCWtest_results.csv file contains four rows for each uMCWtest, two for uMCW_BIs calculated for each condition contrast (e.g., a-b and b-a), and two for uMCW_HBIs calculated for each condition contrast. The X_uMCWtest_results.csv file includes the following columns:

User-provided informative columns to contextualize the results of each test.
Columns condition_a and condition_b indicate the two measure sets under analysis.
Columns N, n_a and n_b indicate the total number of measures and the number of measures belonging to each set after removing missing values (NAs).
Column test_type distinguishes between exact and approximated tests.
Column BI_type indicates the bias index type (uMCW_BI and uMCW_HBI) for each row of results.
Column condition_contrast indicates the set contrast (e.g., a-b or b-a) for each row of results.
Column observed_BI contains the values of uMCW_BIs and uMCW_HBIs obtained from analyzing the user-provided dataset.
Column expected_by_chance_BI_N indicates the number of data rearrangements used to calculate the expected-by-chance uMCW_BIs and uMCW_HBIs. This value corresponds to the lowest number between all possible measure rearrangements and the parameter max_rearrangements.
Columns pupper and plower represent the P~upper~ and P~lower~ values, respectively. They denote the fraction of expected-by-chance uMCW_BIs or uMCW_HBIs with values higher or equal to and lower or equal to the observed uMCW_BIs or uMCW_HBIs, respectively.

Details

The function uMCWtest eliminate missing values (NAs) from the dataset before proceeding these steps.

To estimate the bias between the two sets of measures (e.g., a and b), the function uMCWtest performs these tasks:
- It generates all possible disjoint data pairs using measures from both sets.
- For each measure pair, it subtracts the second measure in the pair from the first measure in the pair.
- It ranks the absolute values of all non-zero measure pair differences from lowest to highest. Measure pair differences with a value of 0 are assigned a 0 rank. If multiple measure pair differences have the same absolute value, all tied measure pair differences are assigned the lowest rank possible.
- It assigns each measure pair rank a sign based on the sign of its corresponding measure pair difference.
- It sums the signed ranks for measure pairs formed with measures from the two different sets (e.g., a-b and b-a).
- For each type of disjoint set measure pairs (e.g., a-b and b-a), it calculates uMCW_BI by dividing the sum of signed ranks by the maximum number this sum could have if the corresponding measure pairs had the highest possible positive ranks. Consequently, uMCW_BI ranges between 1 when all the values for measures in the first set are higher than all the values from measures in the second set, and -1 when all the values for measures in the first set are lower than all the values from measures in the second set.
To estimate the bias between the heterogeneity of two sets of measures, the function uMCWtest performs these tasks:
- It generates all possible disjoint data pairs within each set, disregarding the order of the paired measures. For instance, the measure pair a.1-a.2 is considered equivalent to the measure pair a.2-a.1, and only the former is retained for the subsequent calculations.
- For each measure pair, it subtracts the second measure from the first measure.
- It ranks all measure pair differences with non-zero values from lowest to highest. Measure pair differences with a value of 0 are assigned a 0 rank. If multiple measure pair differences have the same absolute value, uMCWtest assigns all tied measure pair differences the lowest rank possible.
- It sums ranks for measure pairs formed with measures from the same set (e.g., a-a and b-b).
- For each type of same-set measure pairs (e.g., a-a and b-b), it divides each sum of signed ranks by the maximum number this sum could have if the corresponding measure pairs had the highest possible ranks.
- It calculates two heterogeneity bias indexes (uMCW_HBIs) by subtracting the normalized sum of signed ranks from the previous step in two possible directions (e.g., a-b and b-a). Consequently, uMCW_HBI ranges between 1 when at least two measures in the first set have distinct values and all measures in the second set have the same value, and -1 when all measures in the first set have the same value and at least two measures in the second set have distinct values.
To assess the significance of the uMCW_BIs and uMCW_HBIs obtained with the user-provided data (observed uMCW_BIs and uMCW_HBIs), the function uMCWtest performs these tasks:
- It generates a collection of expected-by-chance uMCW_BIs and uMCW_HBIs. These expected values are obtained by rearranging the measures between the two sets multiple times. The user-provided parameter max_rearrangements determines the two paths that the function uMCWtest can follow to generate the collection of expected-by-chance uMCW_BIs and uMCW_HBIs:
  - uMCW exact testing: If the number of distinct measure rearrangements that can alter their initial set distribution is less than max_rearrangements, the function uMCWtest calculates uMCW_BIs and uMCW_HBIs for all possible data rearrangements.
  - uMCW approximated testing: If the number of distinct measure rearrangements that can alter their initial set distribution is greater than max_rearrangements, the function uMCWtest will perform N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance uMCW_BIs and uMCW_HBIs.
- It calculates P~upper~ and P~lower~ values as the fraction of expected-by-chance uMCW_BIs and uMCW_HBIs that are higher or equal to and lower or equal to the observed uMCW_BIs and uMCW_HBIs, respectively.

Examples

test_temp <- tempdir()
extdata_v <- system.file("extdata", "example_vertical_uMCWtest_data.csv", package = "MCWtests")
file.copy(extdata_v, test_temp)
#> [1] TRUE
extdata_h <- system.file("extdata", "example_horizontal_uMCWtest_data.csv", package = "MCWtests")
file.copy(extdata_h, test_temp)
#> [1] TRUE
# running uMCWtest with an ideal vertical entry dataset
path_v <- file.path(test_temp, "example_vertical_uMCWtest_data.csv")
uMCWtest_vertical_results <- uMCWtest(path_v, 10)
#> total number of tests: 3
#> number of exact tests: 0
#> number of approximated tests: 3
#> running approximated tests: 
print(uMCWtest_vertical_results)
#> Key: <contrast, condition_a, condition_b>
#>     contrast condition_a condition_b     N   n_a   n_b    test_type  BI_type
#>       <char>      <char>      <char> <int> <int> <int>       <char>   <char>
#>  1:        I        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#>  2:        I        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#>  3:        I        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#>  4:        I        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#>  5:       II        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#>  6:       II        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#>  7:       II        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#>  8:       II        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#>  9:      III        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 10:      III        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 11:      III        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 12:      III        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#>     condition_contrast observed_BI expected_by_chance_BI_N pupper plower
#>                 <char>       <num>                   <int>  <num>  <num>
#>  1:          AAAA-BBBB  0.03948718                      10    0.1    0.9
#>  2:          BBBB-AAAA -0.03948718                      10    0.9    0.1
#>  3:          AAAA-BBBB -0.02857143                      10    0.2    0.8
#>  4:          BBBB-AAAA  0.02857143                      10    0.8    0.2
#>  5:          AAAA-BBBB -0.69794872                      10    1.0    0.0
#>  6:          BBBB-AAAA  0.69794872                      10    0.0    1.0
#>  7:          AAAA-BBBB -0.07766990                      10    0.7    0.3
#>  8:          BBBB-AAAA  0.07766990                      10    0.3    0.7
#>  9:          AAAA-BBBB -0.68358974                      10    1.0    0.0
#> 10:          BBBB-AAAA  0.68358974                      10    0.0    1.0
#> 11:          AAAA-BBBB  0.39130435                      10    0.0    1.0
#> 12:          BBBB-AAAA -0.39130435                      10    1.0    0.0
# running uMCWtest with an ideal horizontal entry dataset
path_h <- file.path(test_temp, "example_horizontal_uMCWtest_data.csv")
uMCWtest_horizontal_results <- uMCWtest(path_h, 10)
#> total number of tests: 9
#> number of exact tests: 0
#> number of approximated tests: 9
#> running approximated tests: 
print(uMCWtest_horizontal_results)
#> Key: <contrast, contrast_trait, element_ID, element_chr, element_start, element_end, condition_a, condition_b>
#>     contrast contrast_trait element_ID element_chr element_start element_end
#>       <char>         <char>     <char>       <int>         <int>       <int>
#>  1:        I        trait_a         x1           1          1000        2000
#>  2:        I        trait_a         x1           1          1000        2000
#>  3:        I        trait_a         x1           1          1000        2000
#>  4:        I        trait_a         x1           1          1000        2000
#>  5:        I        trait_a         x2           1          5000        5500
#>  6:        I        trait_a         x2           1          5000        5500
#>  7:        I        trait_a         x2           1          5000        5500
#>  8:        I        trait_a         x2           1          5000        5500
#>  9:        I        trait_a         x3           1         90000      100000
#> 10:        I        trait_a         x3           1         90000      100000
#> 11:        I        trait_a         x3           1         90000      100000
#> 12:        I        trait_a         x3           1         90000      100000
#> 13:       II        trait_b         x1           1          1000        2000
#> 14:       II        trait_b         x1           1          1000        2000
#> 15:       II        trait_b         x1           1          1000        2000
#> 16:       II        trait_b         x1           1          1000        2000
#> 17:       II        trait_b         x2           1          5000        5500
#> 18:       II        trait_b         x2           1          5000        5500
#> 19:       II        trait_b         x2           1          5000        5500
#> 20:       II        trait_b         x2           1          5000        5500
#> 21:       II        trait_b         x3           1         90000      100000
#> 22:       II        trait_b         x3           1         90000      100000
#> 23:       II        trait_b         x3           1         90000      100000
#> 24:       II        trait_b         x3           1         90000      100000
#> 25:      III        trait_b         x1           1          1000        2000
#> 26:      III        trait_b         x1           1          1000        2000
#> 27:      III        trait_b         x1           1          1000        2000
#> 28:      III        trait_b         x1           1          1000        2000
#> 29:      III        trait_b         x2           1          5000        5500
#> 30:      III        trait_b         x2           1          5000        5500
#> 31:      III        trait_b         x2           1          5000        5500
#> 32:      III        trait_b         x2           1          5000        5500
#> 33:      III        trait_b         x3           1         90000      100000
#> 34:      III        trait_b         x3           1         90000      100000
#> 35:      III        trait_b         x3           1         90000      100000
#> 36:      III        trait_b         x3           1         90000      100000
#>     contrast contrast_trait element_ID element_chr element_start element_end
#>     condition_a condition_b     N   n_a   n_b    test_type  BI_type
#>          <char>      <char> <int> <int> <int>       <char>   <char>
#>  1:        AAAA        BBBB     9     5     4 approximated  uMCW_BI
#>  2:        AAAA        BBBB     9     5     4 approximated  uMCW_BI
#>  3:        AAAA        BBBB     9     5     4 approximated uMCW_HBI
#>  4:        AAAA        BBBB     9     5     4 approximated uMCW_HBI
#>  5:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#>  6:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#>  7:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#>  8:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#>  9:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 10:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 11:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 12:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 13:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 14:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 15:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 16:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 17:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 18:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 19:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 20:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 21:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 22:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 23:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 24:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 25:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 26:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 27:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 28:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 29:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 30:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 31:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 32:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 33:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 34:        AAAA        BBBB    10     5     5 approximated  uMCW_BI
#> 35:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#> 36:        AAAA        BBBB    10     5     5 approximated uMCW_HBI
#>     condition_a condition_b     N   n_a   n_b    test_type  BI_type
#>     condition_contrast  observed_BI expected_by_chance_BI_N pupper plower
#>                 <char>        <num>                   <int>  <num>  <num>
#>  1:          AAAA-BBBB  0.500800000                      10    0.2    0.8
#>  2:          BBBB-AAAA -0.500800000                      10    0.8    0.2
#>  3:          AAAA-BBBB  0.402985075                      10    0.1    0.9
#>  4:          BBBB-AAAA -0.402985075                      10    0.9    0.1
#>  5:          AAAA-BBBB -0.066153846                      10    0.4    0.6
#>  6:          BBBB-AAAA  0.066153846                      10    0.6    0.4
#>  7:          AAAA-BBBB  0.064039409                      10    0.2    0.8
#>  8:          BBBB-AAAA -0.064039409                      10    0.8    0.2
#>  9:          AAAA-BBBB  0.161538462                      10    0.2    0.9
#> 10:          BBBB-AAAA -0.161538462                      10    0.9    0.2
#> 11:          AAAA-BBBB -0.043062201                      10    0.6    0.5
#> 12:          BBBB-AAAA  0.043062201                      10    0.5    0.6
#> 13:          AAAA-BBBB -0.649743590                      10    0.9    0.1
#> 14:          BBBB-AAAA  0.649743590                      10    0.1    0.9
#> 15:          AAAA-BBBB  0.024390244                      10    0.2    0.8
#> 16:          BBBB-AAAA -0.024390244                      10    0.8    0.2
#> 17:          AAAA-BBBB -0.778974359                      10    1.0    0.0
#> 18:          BBBB-AAAA  0.778974359                      10    0.0    1.0
#> 19:          AAAA-BBBB -0.123809524                      10    1.0    0.0
#> 20:          BBBB-AAAA  0.123809524                      10    0.0    1.0
#> 21:          AAAA-BBBB -0.801538462                      10    1.0    0.0
#> 22:          BBBB-AAAA  0.801538462                      10    0.0    1.0
#> 23:          AAAA-BBBB  0.009803922                      10    0.3    0.7
#> 24:          BBBB-AAAA -0.009803922                      10    0.7    0.3
#> 25:          AAAA-BBBB -0.807692308                      10    1.0    0.0
#> 26:          BBBB-AAAA  0.807692308                      10    0.0    1.0
#> 27:          AAAA-BBBB  0.303921569                      10    0.0    1.0
#> 28:          BBBB-AAAA -0.303921569                      10    1.0    0.0
#> 29:          AAAA-BBBB -0.558461538                      10    1.0    0.0
#> 30:          BBBB-AAAA  0.558461538                      10    0.0    1.0
#> 31:          AAAA-BBBB  0.288888889                      10    0.1    0.9
#> 32:          BBBB-AAAA -0.288888889                      10    0.9    0.1
#> 33:          AAAA-BBBB -0.576923077                      10    1.0    0.0
#> 34:          BBBB-AAAA  0.576923077                      10    0.0    1.0
#> 35:          AAAA-BBBB  0.527777778                      10    0.1    0.9
#> 36:          BBBB-AAAA -0.527777778                      10    0.9    0.1
#>     condition_contrast  observed_BI expected_by_chance_BI_N pupper plower

rm(test_temp)