Skip to content

Problem with duplicates = "closest" in closest #72

@jorainer

Description

@jorainer

There is a strange problem in the closest function when a query m/z matches more than one target m/z:

What works fine:

mzs <- c(175.119091510163, 349.231397186144)
mz_ref <- c(rep(175.119, 2), rep(349.2306, 2))
mz_ref
[1] 175.1190 175.1190 349.2306 349.2306
MsCoreUtils::closest(mzs, mz_ref, duplicates = "closest", tolerance = 0, ppm = 40)
[1] 1 3

But with 3 matches for the first m/z the second will no longer be found:

mz_ref <- c(rep(175.119, 3), rep(349.2306, 2))
mz_ref
[1] 175.1190 175.1190 175.1190 349.2306 349.2306
MsCoreUtils::closest(mzs, mz_ref, duplicates = "closest", tolerance = 0, ppm = 40)
[1]  1 NA

So, for whatever odd reasons the match for the second m/z is dropped. Note that without duplicates = "closest" it still works.

My session info:

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MsCoreUtils_1.1.8   BiocManager_1.30.10

loaded via a namespace (and not attached):
[1] MASS_7.3-53         compiler_4.0.2      parallel_4.0.2     
[4] S4Vectors_0.27.14   BiocGenerics_0.35.4 stats4_4.0.2       

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingwontfixThis will not be worked on

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions