Skip to content

How to use omp in Rcpp to speed up changes to the value of big.matrix? #118

@Sherry520

Description

@Sherry520

I am trying to compute the Hadamard product of all column combinations of the two matrices (epi1,epi2) to form a new matrix that far exceeds the maximum number of columns supported by R and takes up a lot of memory. So I decided to use big.memory to store this matrix on disk (I have not yet tested whether big.matrix supports matrices with more than 2^32-1 columns). However, this calculation process is time-consuming if the for loop is used, and foreach in R and dopar cannot operate on big.matrix, so I decided to use Rcpp interface, use C++ language for calculation, and use openmpi to speed up the calculation process. I'm really not familiar with C++, and I use AI tools to help me program, but it doesn't seem believable.

Problem I encountered:

  • [1]. How to assign the final calculated variable result to big.matrix?
  • [2]. Whether return epi_data is required here, or whether it will automatically modify the external big.matrix?

R code:

epi_data <- filebacked.big.matrix(
  nrow = nrow(epi1),
  ncol = ncol(combos),
  type = "double", # char是c++的单个字符
  backingfile = "epi_data.bin", 
  backingpath = dirname("epi_datai"), 
  descriptorfile = "epi_data.des",
  dimnames = c(NULL, NULL)
)

R code:(This is the algorithm I'm going to implement with C++)

epi_data <- foreach(x=iter(combos, by='col'),.combine = "cbind",.inorder = TRUE) %dopar% {
  TRAN %*% (epi1[, x[1]] * epi2[, x[2]])
}

C++ code:

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(bigmemory)]]

#include <RcppArmadillo.h>
#include <bigmemory/MatrixAccessor.hpp>
#include <omp.h>


using namespace std;
using namespace Rcpp;

// epi_data: big.matrix to save result
// combos: all combn of epi1 and epi2

// [[Rcpp::export]]
Rcpp::XPtr<BigMatrix> ca_epi_data(Rcpp::XPtr<BigMatrix> epi_data,
                                  Rcpp::XPtr<BigMatrix> combos,
                                  arma::mat epi1,
                                  arma::mat epi2,
                                  arma::mat TRAN) {
  omp_set_num_threads(4);
  
  typedef double T;
  MatrixAccessor<T> epi_col = MatrixAccessor<T>(*epi_data);
  MatrixAccessor<T> combos_col = MatrixAccessor<T>(*combos);
  
  int m = combos->nrow();
  
  #pragma omp parallel for
  for(int i=0; i < m;i++) {
    arma::colvec row_epi1 = epi1.row(i);
    arma::colvec row_epi2 = epi2.row(i);
      
      // calculate hadamard product
      arma::colvec hadamard_product = row_epi1 % row_epi2;
      
      // matrix *
      arma::colvec result = TRAN * hadamard_product;
      
      // apply the result to big.matrix
      epi_col[i] = result;
    }
  }
  return epi_data;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions