Skip to content

Adding tests and data, new vignette, biocparallel, and more to remove BiocCheck errors/warnings#23

Merged
amcim merged 6 commits into
mainfrom
bioconductor-submission
Jun 1, 2026
Merged

Adding tests and data, new vignette, biocparallel, and more to remove BiocCheck errors/warnings#23
amcim merged 6 commits into
mainfrom
bioconductor-submission

Conversation

@amcim
Copy link
Copy Markdown
Contributor

@amcim amcim commented May 27, 2026

Summary

A pile of changes to clear R CMD check, BiocCheck, and a lot of Bioconductor submission rules. The most substantial additions are the new vignette and tests for the exported functions.

Code changes

  • **Parallelism: ** R/run_ML.RrunMLmodels and runMDRmodels now use bpapply and SnowParam. This is in line with what was done in amRdata. Dropped future and future.apply from Imports:.

  • No set.seed() inside package code. splitMLInputTibble (R/core_ml.R) and shuffleLabels (R/prep_ml.R) used to call set.seed() directly, which Bioconductor does not allow. Replaced with withr::with_seed wrappers. This scopes the seed change and is bioconductor friendly while also allowing us to keep determinism for testing. Added withr to Imports:.

  • Removed <<- from .parquet2LOOMatrix. The leave-one-out function in R/generate_matrices_ml.R accumulated output filenames via walk() + <<- (global assignment into the enclosing scope). Replaced with map() returning the filenames + compact()/unlist() to collect them. The code gives the same outputs.

  • is() for class checks. Bioc disallows class(x)[1] == "foo" patterns. Switched R/arg_check_ml.R and R/core_ml.R to methods::is(x, "ClassName") and added methods to Imports:.

Documentation

  • New vignette. vignettes/intro.Rmd replaces the previous vignette as it was too out of date (was calling functions that don't exist in the repo for example). Uses BiocStyle::html_document, and the bundled Sfl_parquet.duckdb fixture

  • Runnable examples on >= 80% of exported functions.

  • biocViews cleaned up. The original terms (ML, AMR, Pathogen, MicrobialGenomics) were mostly invalid. I replaced with valid ones (Software, Classification, Regression, StatisticalMethod, FeatureExtraction, MultipleComparison, FunctionalGenomics, Genetics, Visualization). Other terms can be searched for here https://bioconductor.org/packages/release/BiocViews.html

Demo data for examples

  • New demo_ml_tibble and demo_fit in data/. I created some small fixtures which were subsetted from the Sfl data.(60 rows × 80 features, plus a tuned LR fit on them). This data is used for some examples to run them quickly. For example generateMLInputs() + tuneGrid() chain now load these via data(demo_ml_tibble) / data(demo_fit).

  • inst/scripts/make_demo_data.R generated this data

  • R/data.R documents both objects

How to test

Running BiocCheck should give 1 remaining error and 14 notes. The notes will be dealt with in another commit. The remaining error must be resolved by the maintainer. It involves setting this package to be watched on their account. ERROR: Add package to Watched Tags in your Support Site profile; visit
https://support.bioconductor.org/accounts/edit/profile

Future work

Most notes are solvable, miscellaneous changes. I will split that into a separate PR though as this is getting long and there is plenty to review here.

One important thing for Bioconductor is that currently the documentation stats its possible to run RF/BT, but this isn't actually implemented. We need to decide if these features need to be implemented before Bioconductor submission, and if not this documentation will need to be updated

A BiocCheck CI will be added but because of the error involving the maintainer, it would for sure just fail right now.

The vignette will be updated to either load from experimenthubs and/or data files rather than inst/extdata.

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • I have added comments to my code to help provide understanding.
  • I have added a test which covers the code changes found within this PR.
  • I have deleted all non-relevant text in this pull request template.
  • Reviewer assignment: Tag a relevant team member to review and approve the changes.

@amcim amcim self-assigned this May 27, 2026
Copy link
Copy Markdown
Contributor

@eboyer221 eboyer221 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into two reproducibility issues to fix before merging (see comments on core_ml.R and run_ML.R). The set.seed() to withr::with_seed() swap means the seed now only covers the data split instead of the whole pipeline, so downstream steps no longer inherit it. I tested with a small toy dataset and the same seed on both branches and this one picks a different best model than main.
Everything else (BiocCheck cleanup, runnable examples, new vignette, biocViews) looks good.

Comment thread R/core_ml.R Outdated
Comment thread R/run_ML.R Outdated
Copy link
Copy Markdown
Contributor

@eboyer221 eboyer221 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested with new changes and the results match main exactly now (fit_penalty 0.1, fit_mixture 1, nmcc 0.85, all the same on both branches). local_seed is a lot cleaner too. Approving on my end.

@amcim amcim merged commit 3827bdc into main Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants