Skip to contents

Implements a metric learning algorithm based on Multi Similarity Loss (MSL), as an alternative to Triplet Loss in LMNN. It enhances learning from limited sample pairs by considering multiple similarity cues and penalizes overfitting through L2 regularization. Optimized for single-cell expression data.

Usage

runMSL(
  high_varGenes,
  expression_profile,
  sample_information,
  alpha = 2,
  beta = 50,
  margin = 0.1,
  lambda = 1e-04,
  learn_rate = 1e-05,
  max_iter = 200,
  seed = 1,
  verbose = TRUE
)

Arguments

high_varGenes

Character vector of highly variable gene names.

expression_profile

A numeric matrix of gene expression values (genes x cells).

sample_information

A named vector of cell type labels (names match column names of expression_profile).

alpha

Scaling parameter for positive similarities (default: 2.0).

beta

Scaling parameter for negative similarities (default: 50.0).

margin

Margin threshold to filter informative pairs (default: 0.1).

lambda

L2 regularization strength (default: 1e-4).

learn_rate

Learning rate for gradient descent (default: 1e-5).

max_iter

Maximum number of optimization iterations (default: 200).

seed

Random seed (default: 1).

verbose

Logical; whether to print progress information (default: TRUE).

Value

A list containing:

  • expression_profile_trans: Transformed expression matrix (genes x cells).

  • expression_profile_origin: Original filtered expression matrix (genes x cells).

  • trans_matrix: Learned transformation matrix (L).

  • sample_information: Sorted cell type labels.