The uMCW test is a statistical tool to assess whether two sets of unmatched measures and their heterogeneity are significantly biased in the same direction. Significantly different data heterogeneities between two conditions could indicate that the measure under analysis is more constrained or more relaxed in one of the conditions, potentially providing insights into the mechanisms underlying the variation of such measure. For instance, uMCW tests can be used to analyze bodyweights or transcript abundances determined for two sets of mice that have been maintained in different conditions.
Format
When executing the uMCWtest function, users must provide the path to a local CSV file named X_uMCWtest_data.csv, where X serves as a user-defined identifier. X_uMCWtest_data.csv can be structured in two distinct formats:
Vertical layout: This format allows appending datasets with varying structures, such as different numbers of measures per set or between each appended test. Vertical entry datasets should include the following columns:
The condition column uniquely identifies each of the two measure sets under analysis.
The value column contains the actual measures under analysis.
As many informative columns as needed by users to contextualize the results of each test. The names of these columns should not include the terms condition or value. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously. All rows for each individual test must contain the same information in these columns.
Horizontal layout: This format allows appending datasets with similar structures, such as the same number of measures collected for each of the two conditions. Horizontal entry datasets should include the following columns:
Columns condition_a and condition_b uniquely identify the two measure sets under analysis.
Columns a.i and b.j, where i and j represent integers to differentiate specific measures within each set, contain the actual measures under analysis.
As many informative columns as needed by users to contextualize the results of each test. The names of these columns should not contain the term condition or have the same structure as the a.i and b.j columns. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously.
Arguments
- path
Path for the local CSV file containing the entry dataset formatted for uMCW tests.
- max_rearrangements
User-defined maximum number of rearrangements of the dataset used by the function uMCWtest to generate a collection of expected-by-chance uMCW_BIs and uMCW_HBIs and estimate the statistical significance of observed uMCW_BIs and uMCW_HBIs. If the number of distinct dataset rearrangements is less than max_rearrangements, uMCWtest calculates uMCW_BIs and uMCW_HBIs for all possible data rearrangements. If the number of distinct dataset rearrangements is greater than max_rearrangements, uMCWtest will perform N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance uMCW_BIs and uMCW_HBIs.
Value
The uMCWtest function reports to the console the total number of tests it will execute, and their exact and approximated counts. It also creates a CSV file named X_uMCWtest_results.csv, where X is a user-defined identifier for the entry dataset CSV file. The X_uMCWtest_results.csv file contains four rows for each uMCWtest, two for uMCW_BIs calculated for each condition contrast (e.g., a-b and b-a), and two for uMCW_HBIs calculated for each condition contrast. The X_uMCWtest_results.csv file includes the following columns:
User-provided informative columns to contextualize the results of each test.
Columns condition_a and condition_b indicate the two measure sets under analysis.
Columns N, n_a and n_b indicate the total number of measures and the number of measures belonging to each set after removing missing values (NAs).
Column test_type distinguishes between exact and approximated tests.
Column BI_type indicates the bias index type (uMCW_BI and uMCW_HBI) for each row of results.
Column condition_contrast indicates the set contrast (e.g., a-b or b-a) for each row of results.
Column observed_BI contains the values of uMCW_BIs and uMCW_HBIs obtained from analyzing the user-provided dataset.
Column expected_by_chance_BI_N indicates the number of data rearrangements used to calculate the expected-by-chance uMCW_BIs and uMCW_HBIs. This value corresponds to the lowest number between all possible measure rearrangements and the parameter max_rearrangements.
Columns pupper and plower represent the P~upper~ and P~lower~ values, respectively. They denote the fraction of expected-by-chance uMCW_BIs or uMCW_HBIs with values higher or equal to and lower or equal to the observed uMCW_BIs or uMCW_HBIs, respectively.
Details
The function uMCWtest eliminate missing values (NAs) from the dataset before proceeding these steps.
To estimate the bias between the two sets of measures (e.g., a and b), the function uMCWtest performs these tasks:
It generates all possible disjoint data pairs using measures from both sets.
For each measure pair, it subtracts the second measure in the pair from the first measure in the pair.
It ranks the absolute values of all non-zero measure pair differences from lowest to highest. Measure pair differences with a value of 0 are assigned a 0 rank. If multiple measure pair differences have the same absolute value, all tied measure pair differences are assigned the lowest rank possible.
It assigns each measure pair rank a sign based on the sign of its corresponding measure pair difference.
It sums the signed ranks for measure pairs formed with measures from the two different sets (e.g., a-b and b-a).
For each type of disjoint set measure pairs (e.g., a-b and b-a), it calculates uMCW_BI by dividing the sum of signed ranks by the maximum number this sum could have if the corresponding measure pairs had the highest possible positive ranks. Consequently, uMCW_BI ranges between 1 when all the values for measures in the first set are higher than all the values from measures in the second set, and -1 when all the values for measures in the first set are lower than all the values from measures in the second set.
To estimate the bias between the heterogeneity of two sets of measures, the function uMCWtest performs these tasks:
It generates all possible disjoint data pairs within each set, disregarding the order of the paired measures. For instance, the measure pair a.1-a.2 is considered equivalent to the measure pair a.2-a.1, and only the former is retained for the subsequent calculations.
For each measure pair, it subtracts the second measure from the first measure.
It ranks all measure pair differences with non-zero values from lowest to highest. Measure pair differences with a value of 0 are assigned a 0 rank. If multiple measure pair differences have the same absolute value, uMCWtest assigns all tied measure pair differences the lowest rank possible.
It sums ranks for measure pairs formed with measures from the same set (e.g., a-a and b-b).
For each type of same-set measure pairs (e.g., a-a and b-b), it divides each sum of signed ranks by the maximum number this sum could have if the corresponding measure pairs had the highest possible ranks.
It calculates two heterogeneity bias indexes (uMCW_HBIs) by subtracting the normalized sum of signed ranks from the previous step in two possible directions (e.g., a-b and b-a). Consequently, uMCW_HBI ranges between 1 when at least two measures in the first set have distinct values and all measures in the second set have the same value, and -1 when all measures in the first set have the same value and at least two measures in the second set have distinct values.
To assess the significance of the uMCW_BIs and uMCW_HBIs obtained with the user-provided data (observed uMCW_BIs and uMCW_HBIs), the function uMCWtest performs these tasks:
It generates a collection of expected-by-chance uMCW_BIs and uMCW_HBIs. These expected values are obtained by rearranging the measures between the two sets multiple times. The user-provided parameter max_rearrangements determines the two paths that the function uMCWtest can follow to generate the collection of expected-by-chance uMCW_BIs and uMCW_HBIs:
uMCW exact testing: If the number of distinct measure rearrangements that can alter their initial set distribution is less than max_rearrangements, the function uMCWtest calculates uMCW_BIs and uMCW_HBIs for all possible data rearrangements.
uMCW approximated testing: If the number of distinct measure rearrangements that can alter their initial set distribution is greater than max_rearrangements, the function uMCWtest will perform N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance uMCW_BIs and uMCW_HBIs.
It calculates P~upper~ and P~lower~ values as the fraction of expected-by-chance uMCW_BIs and uMCW_HBIs that are higher or equal to and lower or equal to the observed uMCW_BIs and uMCW_HBIs, respectively.
Examples
test_temp <- tempdir()
extdata_v <- system.file("extdata", "example_vertical_uMCWtest_data.csv", package = "MCWtests")
file.copy(extdata_v, test_temp)
#> [1] TRUE
extdata_h <- system.file("extdata", "example_horizontal_uMCWtest_data.csv", package = "MCWtests")
file.copy(extdata_h, test_temp)
#> [1] TRUE
# running uMCWtest with an ideal vertical entry dataset
path_v <- file.path(test_temp, "example_vertical_uMCWtest_data.csv")
uMCWtest_vertical_results <- uMCWtest(path_v, 10)
#> total number of tests: 3
#> number of exact tests: 0
#> number of approximated tests: 3
#> running approximated tests:
print(uMCWtest_vertical_results)
#> Key: <contrast, condition_a, condition_b>
#> contrast condition_a condition_b N n_a n_b test_type BI_type
#> <char> <char> <char> <int> <int> <int> <char> <char>
#> 1: I AAAA BBBB 10 5 5 approximated uMCW_BI
#> 2: I AAAA BBBB 10 5 5 approximated uMCW_BI
#> 3: I AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 4: I AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 5: II AAAA BBBB 10 5 5 approximated uMCW_BI
#> 6: II AAAA BBBB 10 5 5 approximated uMCW_BI
#> 7: II AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 8: II AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 9: III AAAA BBBB 10 5 5 approximated uMCW_BI
#> 10: III AAAA BBBB 10 5 5 approximated uMCW_BI
#> 11: III AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 12: III AAAA BBBB 10 5 5 approximated uMCW_HBI
#> condition_contrast observed_BI expected_by_chance_BI_N pupper plower
#> <char> <num> <int> <num> <num>
#> 1: AAAA-BBBB 0.03948718 10 0.1 0.9
#> 2: BBBB-AAAA -0.03948718 10 0.9 0.1
#> 3: AAAA-BBBB -0.02857143 10 0.2 0.8
#> 4: BBBB-AAAA 0.02857143 10 0.8 0.2
#> 5: AAAA-BBBB -0.69794872 10 1.0 0.0
#> 6: BBBB-AAAA 0.69794872 10 0.0 1.0
#> 7: AAAA-BBBB -0.07766990 10 0.7 0.3
#> 8: BBBB-AAAA 0.07766990 10 0.3 0.7
#> 9: AAAA-BBBB -0.68358974 10 1.0 0.0
#> 10: BBBB-AAAA 0.68358974 10 0.0 1.0
#> 11: AAAA-BBBB 0.39130435 10 0.0 1.0
#> 12: BBBB-AAAA -0.39130435 10 1.0 0.0
# running uMCWtest with an ideal horizontal entry dataset
path_h <- file.path(test_temp, "example_horizontal_uMCWtest_data.csv")
uMCWtest_horizontal_results <- uMCWtest(path_h, 10)
#> total number of tests: 9
#> number of exact tests: 0
#> number of approximated tests: 9
#> running approximated tests:
print(uMCWtest_horizontal_results)
#> Key: <contrast, contrast_trait, element_ID, element_chr, element_start, element_end, condition_a, condition_b>
#> contrast contrast_trait element_ID element_chr element_start element_end
#> <char> <char> <char> <int> <int> <int>
#> 1: I trait_a x1 1 1000 2000
#> 2: I trait_a x1 1 1000 2000
#> 3: I trait_a x1 1 1000 2000
#> 4: I trait_a x1 1 1000 2000
#> 5: I trait_a x2 1 5000 5500
#> 6: I trait_a x2 1 5000 5500
#> 7: I trait_a x2 1 5000 5500
#> 8: I trait_a x2 1 5000 5500
#> 9: I trait_a x3 1 90000 100000
#> 10: I trait_a x3 1 90000 100000
#> 11: I trait_a x3 1 90000 100000
#> 12: I trait_a x3 1 90000 100000
#> 13: II trait_b x1 1 1000 2000
#> 14: II trait_b x1 1 1000 2000
#> 15: II trait_b x1 1 1000 2000
#> 16: II trait_b x1 1 1000 2000
#> 17: II trait_b x2 1 5000 5500
#> 18: II trait_b x2 1 5000 5500
#> 19: II trait_b x2 1 5000 5500
#> 20: II trait_b x2 1 5000 5500
#> 21: II trait_b x3 1 90000 100000
#> 22: II trait_b x3 1 90000 100000
#> 23: II trait_b x3 1 90000 100000
#> 24: II trait_b x3 1 90000 100000
#> 25: III trait_b x1 1 1000 2000
#> 26: III trait_b x1 1 1000 2000
#> 27: III trait_b x1 1 1000 2000
#> 28: III trait_b x1 1 1000 2000
#> 29: III trait_b x2 1 5000 5500
#> 30: III trait_b x2 1 5000 5500
#> 31: III trait_b x2 1 5000 5500
#> 32: III trait_b x2 1 5000 5500
#> 33: III trait_b x3 1 90000 100000
#> 34: III trait_b x3 1 90000 100000
#> 35: III trait_b x3 1 90000 100000
#> 36: III trait_b x3 1 90000 100000
#> contrast contrast_trait element_ID element_chr element_start element_end
#> condition_a condition_b N n_a n_b test_type BI_type
#> <char> <char> <int> <int> <int> <char> <char>
#> 1: AAAA BBBB 9 5 4 approximated uMCW_BI
#> 2: AAAA BBBB 9 5 4 approximated uMCW_BI
#> 3: AAAA BBBB 9 5 4 approximated uMCW_HBI
#> 4: AAAA BBBB 9 5 4 approximated uMCW_HBI
#> 5: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 6: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 7: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 8: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 9: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 10: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 11: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 12: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 13: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 14: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 15: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 16: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 17: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 18: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 19: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 20: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 21: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 22: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 23: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 24: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 25: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 26: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 27: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 28: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 29: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 30: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 31: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 32: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 33: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 34: AAAA BBBB 10 5 5 approximated uMCW_BI
#> 35: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> 36: AAAA BBBB 10 5 5 approximated uMCW_HBI
#> condition_a condition_b N n_a n_b test_type BI_type
#> condition_contrast observed_BI expected_by_chance_BI_N pupper plower
#> <char> <num> <int> <num> <num>
#> 1: AAAA-BBBB 0.500800000 10 0.2 0.8
#> 2: BBBB-AAAA -0.500800000 10 0.8 0.2
#> 3: AAAA-BBBB 0.402985075 10 0.1 0.9
#> 4: BBBB-AAAA -0.402985075 10 0.9 0.1
#> 5: AAAA-BBBB -0.066153846 10 0.4 0.6
#> 6: BBBB-AAAA 0.066153846 10 0.6 0.4
#> 7: AAAA-BBBB 0.064039409 10 0.2 0.8
#> 8: BBBB-AAAA -0.064039409 10 0.8 0.2
#> 9: AAAA-BBBB 0.161538462 10 0.2 0.9
#> 10: BBBB-AAAA -0.161538462 10 0.9 0.2
#> 11: AAAA-BBBB -0.043062201 10 0.6 0.5
#> 12: BBBB-AAAA 0.043062201 10 0.5 0.6
#> 13: AAAA-BBBB -0.649743590 10 0.9 0.1
#> 14: BBBB-AAAA 0.649743590 10 0.1 0.9
#> 15: AAAA-BBBB 0.024390244 10 0.2 0.8
#> 16: BBBB-AAAA -0.024390244 10 0.8 0.2
#> 17: AAAA-BBBB -0.778974359 10 1.0 0.0
#> 18: BBBB-AAAA 0.778974359 10 0.0 1.0
#> 19: AAAA-BBBB -0.123809524 10 1.0 0.0
#> 20: BBBB-AAAA 0.123809524 10 0.0 1.0
#> 21: AAAA-BBBB -0.801538462 10 1.0 0.0
#> 22: BBBB-AAAA 0.801538462 10 0.0 1.0
#> 23: AAAA-BBBB 0.009803922 10 0.3 0.7
#> 24: BBBB-AAAA -0.009803922 10 0.7 0.3
#> 25: AAAA-BBBB -0.807692308 10 1.0 0.0
#> 26: BBBB-AAAA 0.807692308 10 0.0 1.0
#> 27: AAAA-BBBB 0.303921569 10 0.0 1.0
#> 28: BBBB-AAAA -0.303921569 10 1.0 0.0
#> 29: AAAA-BBBB -0.558461538 10 1.0 0.0
#> 30: BBBB-AAAA 0.558461538 10 0.0 1.0
#> 31: AAAA-BBBB 0.288888889 10 0.1 0.9
#> 32: BBBB-AAAA -0.288888889 10 0.9 0.1
#> 33: AAAA-BBBB -0.576923077 10 1.0 0.0
#> 34: BBBB-AAAA 0.576923077 10 0.0 1.0
#> 35: AAAA-BBBB 0.527777778 10 0.1 0.9
#> 36: BBBB-AAAA -0.527777778 10 0.9 0.1
#> condition_contrast observed_BI expected_by_chance_BI_N pupper plower
rm(test_temp)