The mbMCW test is a statistical tool to assess whether two sets of inherently matched-paired measures are significantly differentially biased in the same direction. For instance, mbMCW tests can be used to analyze bodyweights or transcript abundances determined at two different timepoints for two sets of mice that have been exposed to different conditions.
Format
When executing the mbMCWtest function, users must provide the path to a local CSV file named X_mbMCWtest_data.csv, where X serves as a user-defined identifier. X_mbMCWtest_data.csv can be structured in two distinct formats:
Vertical layout: This format allows appending datasets with varying structures, such as different numbers of matched-pairs per set or between each appended test. Vertical entry datasets should include the following columns:
Columns matched_condition_a and matched_condition_b uniquely identify the two conditions under which matched-paired measure were collected.
Column unmatched_condition uniquely identifies the two sets of matched-paired measures under analysis.
Columns value_a and value_b contain the actual measures under analysis.
As many informative columns as needed by users to contextualize the results of each test. The names of these columns should not contain the terms condition or value. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously. All rows for each individual test must contain the same information in these columns.
Horizontal layout: This format allows appending datasets with similar structures, such as the same number of matched-paired measures collected for two conditions. Horizontal entry datasets should include the following columns:
Columns matched_condition_a and matched_condition_b uniquely identify the two conditions under which matched-paired measure were collected.
Columns unmatched_condition_x and unmatched_condition_y uniquely identify the two different sets of matched-paired measures under analysis.
Columns x.a.i, y.a.i, x.b.i and y.b.i, where i represents integers to differentiate each specific matched-pair of measures, contain the actual measures under analysis.
As many informative columns as needed by users to contextualize the results of each test. The name of these columns should not contain the term condition or have the same structure as the x.a.i, y.a.i, x.b.i and y.b.i columns. While these columns are optional when running a single test, at least one column is required when running multiple tests simultaneously.
Arguments
- path
Path for the local CSV file containing the entry dataset formatted for mbMCW tests.
- max_rearrangements
User-defined maximum number of rearrangements of the dataset used by the function mbMCWtest to generate a collection of expected-by-chance mbMCW_BIs and estimate the statistical significance of observed mbMCW_BIs. If the number of distinct dataset rearrangements is less than max_rearrangements, mbMCWtest calculates mbMCW_BIs for all possible data rearrangements. If the number of distinct dataset rearrangements is greater than max_rearrangements, mbMCWtest will perform N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance mbMCW_BIs.
Value
The mbMCWtest function reports to the console the total number of tests it will execute, and their exact and approximated counts. It also creates a CSV file named X_mbMCWtest_results.csv, where X is a user-defined identifier for the entry dataset CSV file. The X_mbMCWtest_results.csv file contains four rows for each mbMCWtest, with mbMCW_BIs calculated for each possible contrast between matched and unmatched measures (e.g., a-b, b-a, x-y and y-x). The X_mbMCWtest_results.csv file includes the following columns:
User-provided informative columns to contextualize the results of each test.
Columns matched_condition_a and matched_condition_b indicate the conditions for which matched-paired measures were provided.
Columns unmatched_condition_x and unmatched_condition_y indicate the two sets of matched-pairs measures.
Columns N, N_x and N_y indicate the total number of matched-paired measures, and their distribution between the two unmatched sets after removing any matched-pair with missing values (NAs).
Column test_type distinguishes between exact and approximated tests.
Column BI_type indicates mbMCW_BI.
Column matched_condition_contrast and unmatched_condition_contrast indicate the matched and unmatched condition contrast for each row of results.
Column observed_BI contains the value of mbMCW_BIs obtained from analyzing the user-provided dataset.
Column expected_by_chance_BI_N indicates the number of data rearrangements used to calculate the expected-by-chance mbMCW_BIs. This value corresponds to the lowest number between all possible measure rearrangements and the parameter max_rearrangements.
Columns pupper and plower represent the P~upper~ and P~lower~ values, respectively. They denote the fraction of expected-by-chance mbMCW_BIs with values higher or equal to and lower or equal to the observed mbMCW_BIs, respectively.
Details
The function mbMCWtest eliminates any matched-paired measures with at least one missing value (NA) before proceeding with the following steps.
To estimate the differential bias between the two sets of matched-paired measures in the dataset, the function mbMCWtest perfoms the following tasks:
For each matched-paired measure, it subtracts the values for the two possible matched condition contrasts (e.g., a-b and b-a).
For each matched condition contrast, it ranks the absolute values of non-zero differences from lowest to highest. Measure pair differences with a value of 0 are assigned a 0 rank. If multiple measure pair differences have the same absolute value, all tied measure pair differences are assigned the lowest rank possible.
It assigns each measure pair rank a sign based on the sign of its corresponding measure pair difference.
For each set of matched-paired measures (e.g., x and y), it sums the signed ranks for each matched condition contrast (e.g., a-b and b-a).
For each set of matched-paired measures (e.g., x and y) and each matched condition contrast (e.g., a-b and b-a), it calculates one mbMCW_BI. This value is obtained by dividing each sum of signed ranks by the maximum number this sum could have if the corresponding measure pairs had the highest possible positive ranks. Consequently, mbMCW_BI ranges between 1 when all the values for matched-pair measure differences in the set under analysis have the highest positive values, and -1 when all the values for matched-pair measure differences in the set under analysis have the lowest negative values.
To assess the significance of the mbMCW_BIs obtained from the user-provided dataset (observed mbMCW_BIs), the function mbMCWtest perfoms the following tasks:
It generates a collection of expected-by-chance mbMCW_BIs. These expected values are obtained by rearranging the matched-pair measures between the two sets multiple times. The user-provided parameter max_rearrangements determines the two paths the function mbMCWtest can follow to generate the collection of expected-by-chance mbMCW_BIs:
mbMCW exact testing: If the number of distinct matched-paired measure rearrangements that can alter their initial set distribution is less than max_rearrangements, the function mbMCWtest calculates mbMCW_BIs for all possible data rearrangements.
mbMCW approximated testing: If the number of distinct matched-paired measure rearrangements that can alter their initial set distribution is greater than max_rearrangements, the function mbMCWtest performs N = max_rearrangements random measure rearrangements to calculate the collection of expected-by-chance mbMCW_BIs.
It calculates P~upper~ and P~lower~ values as the fraction of expected-by-chance mbMCW_BIs that are higher or equal to and lower or equal to the observed mbMCW_BIs, respectively.
Examples
test_temp <- tempdir()
extdata_v <- system.file("extdata", "example_vertical_mbMCWtest_data.csv", package = "MCWtests")
file.copy(extdata_v, test_temp)
#> [1] TRUE
extdata_h <- system.file("extdata", "example_horizontal_mbMCWtest_data.csv", package = "MCWtests")
file.copy(extdata_h, test_temp)
#> [1] TRUE
# running mbMCWtest with an ideal vertical entry dataset
path_v <- file.path(test_temp, "example_vertical_mbMCWtest_data.csv")
mbMCWtest_vertical_results <- mbMCWtest(path_v, 10)
#> total number of tests: 2
#> number of exact tests: 0
#> number of approximated tests: 2
#> running approximated tests:
print(mbMCWtest_vertical_results)
#> Key: <contrast, matched_condition_a, matched_condition_b, unmatched_condition_x, unmatched_condition_y>
#> contrast matched_condition_a matched_condition_b unmatched_condition_x
#> <char> <char> <char> <char>
#> 1: I AAAA BBBB XXXX
#> 2: I AAAA BBBB XXXX
#> 3: I AAAA BBBB XXXX
#> 4: I AAAA BBBB XXXX
#> 5: II AAAA BBBB XXXX
#> 6: II AAAA BBBB XXXX
#> 7: II AAAA BBBB XXXX
#> 8: II AAAA BBBB XXXX
#> unmatched_condition_y N N_x N_y test_type BI_type
#> <char> <int> <int> <int> <char> <char>
#> 1: YYYY 10 5 5 approximated mbMCW_BI
#> 2: YYYY 10 5 5 approximated mbMCW_BI
#> 3: YYYY 10 5 5 approximated mbMCW_BI
#> 4: YYYY 10 5 5 approximated mbMCW_BI
#> 5: YYYY 10 5 5 approximated mbMCW_BI
#> 6: YYYY 10 5 5 approximated mbMCW_BI
#> 7: YYYY 10 5 5 approximated mbMCW_BI
#> 8: YYYY 10 5 5 approximated mbMCW_BI
#> matched_condition_contrast unmatched_condition_contrast observed_BI
#> <char> <char> <num>
#> 1: AAAA-BBBB XXXX-YYYY 0.01886792
#> 2: AAAA-BBBB YYYY-XXXX -0.01886792
#> 3: BBBB-AAAA XXXX-YYYY -0.01886792
#> 4: BBBB-AAAA YYYY-XXXX 0.01886792
#> 5: AAAA-BBBB XXXX-YYYY 0.64705882
#> 6: AAAA-BBBB YYYY-XXXX -0.64705882
#> 7: BBBB-AAAA XXXX-YYYY -0.64705882
#> 8: BBBB-AAAA YYYY-XXXX 0.64705882
#> expected_by_chance_BI_N pupper plower
#> <int> <num> <num>
#> 1: 10 0.5 0.5
#> 2: 10 0.5 0.5
#> 3: 10 0.5 0.5
#> 4: 10 0.5 0.5
#> 5: 10 0.0 1.0
#> 6: 10 1.0 0.0
#> 7: 10 1.0 0.0
#> 8: 10 0.0 1.0
# running mbMCWtest with an ideal horizontal entry dataset
path_h <- file.path(test_temp, "example_horizontal_mbMCWtest_data.csv")
mbMCWtest_horizontal_results <- mbMCWtest(path_h, 10)
#> total number of tests: 6
#> number of exact tests: 0
#> number of approximated tests: 6
#> running approximated tests:
print(mbMCWtest_horizontal_results)
#> Key: <contrast, contrast_trait, element_ID, element_chr, element_start, element_end, matched_condition_a, matched_condition_b, unmatched_condition_x, unmatched_condition_y>
#> contrast contrast_trait element_ID element_chr element_start element_end
#> <char> <char> <char> <int> <int> <int>
#> 1: I trait_a x1 1 1000 2000
#> 2: I trait_a x1 1 1000 2000
#> 3: I trait_a x1 1 1000 2000
#> 4: I trait_a x1 1 1000 2000
#> 5: I trait_a x2 1 5000 5500
#> 6: I trait_a x2 1 5000 5500
#> 7: I trait_a x2 1 5000 5500
#> 8: I trait_a x2 1 5000 5500
#> 9: I trait_a x3 1 90000 100000
#> 10: I trait_a x3 1 90000 100000
#> 11: I trait_a x3 1 90000 100000
#> 12: I trait_a x3 1 90000 100000
#> 13: II trait_b x1 1 1000 2000
#> 14: II trait_b x1 1 1000 2000
#> 15: II trait_b x1 1 1000 2000
#> 16: II trait_b x1 1 1000 2000
#> 17: II trait_b x2 1 5000 5500
#> 18: II trait_b x2 1 5000 5500
#> 19: II trait_b x2 1 5000 5500
#> 20: II trait_b x2 1 5000 5500
#> 21: II trait_b x3 1 90000 100000
#> 22: II trait_b x3 1 90000 100000
#> 23: II trait_b x3 1 90000 100000
#> 24: II trait_b x3 1 90000 100000
#> contrast contrast_trait element_ID element_chr element_start element_end
#> matched_condition_a matched_condition_b unmatched_condition_x
#> <char> <char> <char>
#> 1: AAAA BBBB XXXX
#> 2: AAAA BBBB XXXX
#> 3: AAAA BBBB XXXX
#> 4: AAAA BBBB XXXX
#> 5: AAAA BBBB XXXX
#> 6: AAAA BBBB XXXX
#> 7: AAAA BBBB XXXX
#> 8: AAAA BBBB XXXX
#> 9: AAAA BBBB XXXX
#> 10: AAAA BBBB XXXX
#> 11: AAAA BBBB XXXX
#> 12: AAAA BBBB XXXX
#> 13: AAAA BBBB XXXX
#> 14: AAAA BBBB XXXX
#> 15: AAAA BBBB XXXX
#> 16: AAAA BBBB XXXX
#> 17: AAAA BBBB XXXX
#> 18: AAAA BBBB XXXX
#> 19: AAAA BBBB XXXX
#> 20: AAAA BBBB XXXX
#> 21: AAAA BBBB XXXX
#> 22: AAAA BBBB XXXX
#> 23: AAAA BBBB XXXX
#> 24: AAAA BBBB XXXX
#> matched_condition_a matched_condition_b unmatched_condition_x
#> unmatched_condition_y N N_x N_y test_type BI_type
#> <char> <int> <int> <int> <char> <char>
#> 1: YYYY 10 5 5 approximated mbMCW_BI
#> 2: YYYY 10 5 5 approximated mbMCW_BI
#> 3: YYYY 10 5 5 approximated mbMCW_BI
#> 4: YYYY 10 5 5 approximated mbMCW_BI
#> 5: YYYY 10 5 5 approximated mbMCW_BI
#> 6: YYYY 10 5 5 approximated mbMCW_BI
#> 7: YYYY 10 5 5 approximated mbMCW_BI
#> 8: YYYY 10 5 5 approximated mbMCW_BI
#> 9: YYYY 10 5 5 approximated mbMCW_BI
#> 10: YYYY 10 5 5 approximated mbMCW_BI
#> 11: YYYY 10 5 5 approximated mbMCW_BI
#> 12: YYYY 10 5 5 approximated mbMCW_BI
#> 13: YYYY 10 5 5 approximated mbMCW_BI
#> 14: YYYY 10 5 5 approximated mbMCW_BI
#> 15: YYYY 10 5 5 approximated mbMCW_BI
#> 16: YYYY 10 5 5 approximated mbMCW_BI
#> 17: YYYY 10 5 5 approximated mbMCW_BI
#> 18: YYYY 10 5 5 approximated mbMCW_BI
#> 19: YYYY 10 5 5 approximated mbMCW_BI
#> 20: YYYY 10 5 5 approximated mbMCW_BI
#> 21: YYYY 10 5 5 approximated mbMCW_BI
#> 22: YYYY 10 5 5 approximated mbMCW_BI
#> 23: YYYY 10 5 5 approximated mbMCW_BI
#> 24: YYYY 10 5 5 approximated mbMCW_BI
#> unmatched_condition_y N N_x N_y test_type BI_type
#> matched_condition_contrast unmatched_condition_contrast observed_BI
#> <char> <char> <num>
#> 1: AAAA-BBBB XXXX-YYYY -0.18181818
#> 2: AAAA-BBBB YYYY-XXXX 0.18181818
#> 3: BBBB-AAAA XXXX-YYYY 0.18181818
#> 4: BBBB-AAAA YYYY-XXXX -0.18181818
#> 5: AAAA-BBBB XXXX-YYYY -0.18181818
#> 6: AAAA-BBBB YYYY-XXXX 0.18181818
#> 7: BBBB-AAAA XXXX-YYYY 0.18181818
#> 8: BBBB-AAAA YYYY-XXXX -0.18181818
#> 9: AAAA-BBBB XXXX-YYYY -0.09803922
#> 10: AAAA-BBBB YYYY-XXXX 0.09803922
#> 11: BBBB-AAAA XXXX-YYYY 0.09803922
#> 12: BBBB-AAAA YYYY-XXXX -0.09803922
#> 13: AAAA-BBBB XXXX-YYYY 0.88888889
#> 14: AAAA-BBBB YYYY-XXXX -0.88888889
#> 15: BBBB-AAAA XXXX-YYYY -0.88888889
#> 16: BBBB-AAAA YYYY-XXXX 0.88888889
#> 17: AAAA-BBBB XXXX-YYYY 0.41818182
#> 18: AAAA-BBBB YYYY-XXXX -0.41818182
#> 19: BBBB-AAAA XXXX-YYYY -0.41818182
#> 20: BBBB-AAAA YYYY-XXXX 0.41818182
#> 21: AAAA-BBBB XXXX-YYYY 0.69811321
#> 22: AAAA-BBBB YYYY-XXXX -0.69811321
#> 23: BBBB-AAAA XXXX-YYYY -0.69811321
#> 24: BBBB-AAAA YYYY-XXXX 0.69811321
#> matched_condition_contrast unmatched_condition_contrast observed_BI
#> expected_by_chance_BI_N pupper plower
#> <int> <num> <num>
#> 1: 10 0.7 0.3
#> 2: 10 0.3 0.7
#> 3: 10 0.3 0.7
#> 4: 10 0.7 0.3
#> 5: 10 0.8 0.2
#> 6: 10 0.2 0.8
#> 7: 10 0.2 0.8
#> 8: 10 0.8 0.2
#> 9: 10 0.7 0.3
#> 10: 10 0.3 0.7
#> 11: 10 0.3 0.7
#> 12: 10 0.7 0.3
#> 13: 10 0.0 1.0
#> 14: 10 1.0 0.0
#> 15: 10 1.0 0.0
#> 16: 10 0.0 1.0
#> 17: 10 0.0 1.0
#> 18: 10 1.0 0.0
#> 19: 10 1.0 0.0
#> 20: 10 0.0 1.0
#> 21: 10 0.0 1.0
#> 22: 10 1.0 0.0
#> 23: 10 1.0 0.0
#> 24: 10 0.0 1.0
#> expected_by_chance_BI_N pupper plower
rm(test_temp)