Skip to contents

Computes pairwise similarity/distance matrix between samples (columns) using various metrics with parallel computation support. Suitable for large-scale genomic data.

Usage

correlation(
  matrix,
  method = c("pearson", "spearman", "cosin", "euclidean"),
  cpu_num = 8
)

Arguments

matrix

Numeric matrix where columns represent samples and rows represent features

method

Similarity/distance measure: "pearson", "spearman", "cosin", or "euclidean" (default: "pearson")

cpu_num

Number of CPU cores to use for parallel computation (default: 8)

Value

A symmetric similarity/distance matrix with dimensions ncol(matrix) x ncol(matrix), where row and column names match the input matrix column names.

Details

Available similarity/distance measures:

  • Pearson correlation (linear relationship)

  • Spearman correlation (rank-based relationship)

  • Cosine similarity (angle between vectors)

  • Euclidean distance (geometric distance)

The function utilizes parallel computation via parallel package to accelerate calculations for large matrices.

See also

cor for correlation calculations, makeCluster for parallel computation setup

Author

Bin Duan (binduan\@sjtu.edu.cn)

Examples

if (FALSE) { # \dontrun{
# Using example data from scLearn package
data(QueryCellData)

# Get normalized expression matrix
norm_expr <- logcounts(QueryCellData)

# Calculate Pearson correlation
cor_mat <- correlation(
  matrix = norm_expr,
  method = "pearson",
  cpu_num = 4
)

# Calculate cosine similarity
cos_mat <- correlation(
  matrix = norm_expr,
  method = "cosin",
  cpu_num = 2
)

# Visualize results
heatmap(cor_mat, symm = TRUE)
} # }