Native Getting Started with cpmr
Source:vignettes/native-getting-started.Rmd
native-getting-started.RmdOverview
cpmr’s primary workflow is a native, matrix-first API
built around cpm_spec() together with fit()
and fit_resamples().
This vignette walks through:
- how to shape CPM inputs;
- how to fit a single native CPM model;
- how to run leakage-safe resampling;
- how to inspect predictions and selected edges;
- how custom resamples and missing-data handling work.
Prepare Inputs
CPM expects one row per subject and one column per edge.
-
conmat: ann x pconnectivity matrix; -
behav: a length-nnumeric outcome vector; -
covariates: an optionaln x qnuisance matrix.
Single Fit with fit()
Use fit() on a cpm_spec() object when you
want a native single fit with explicit CPM parameters.
fit_obj <- fit(
cpm_spec(thresh_method = "alpha", thresh_level = 0.05),
conmat = conmat,
behav = behav,
covariates = covariates
)
fit_obj
#> CPM fit:
#> Call: fit(object = cpm_spec(thresh_method = "alpha", thresh_level = 0.05),
#> conmat = conmat, behav = behav, covariates = covariates)
#> Number of observations: 80
#> Complete cases: 80
#> Candidate edges: 200
#> Parameters:
#> Covariates: included
#> Threshold method: alpha
#> Threshold level: 0.05
#> Bias correction: yes
summary(fit_obj)
#> CPM summary:
#> Performance (Pearson):
#> Combined: 0.706
#> Positive: 0.620
#> Negative: 0.458
#> Selected edges:
#> Positive: 4.00%
#> Negative: 2.50%
dim(fit_obj$edges)
#> [1] 200 2For a single fit, fit() stores a p x 2
edge-selection mask with pos and neg columns
by default. This is useful for inspecting the selected network, but the
performance reported by summary() is still in-sample.
Cross-Validated Resampling with fit_resamples()
Use fit_resamples() on the same cpm_spec()
object for an out-of-sample estimate of predictive performance.
resample_obj <- fit_resamples(
cpm_spec(),
conmat = conmat,
behav = behav,
covariates = covariates,
kfolds = 5
)
resample_obj
#> CPM resamples:
#> Call: fit_resamples(object = cpm_spec(), conmat = conmat, behav = behav,
#> covariates = covariates, kfolds = 5)
#> Number of folds: 5
#> Number of observations: 80
#> Complete cases: 80
#> Edge storage: not stored
#> Use summary() for aggregate metrics.
summary(resample_obj)
#> CPM resample summary:
#> Number of folds: 5
#> Prediction error:
#> RMSE:
#> Combined: 0.708
#> Positive: 0.645
#> Negative: 0.704
#> MAE:
#> Combined: 0.574
#> Positive: 0.516
#> Negative: 0.559
#> Pooled correlations (Pearson):
#> Combined: -0.093
#> Positive: 0.059
#> Negative: -0.185
#> Fold-wise correlations (Pearson):
#> Combined: -0.075 (SE 0.046)
#> Positive: 0.029 (SE 0.054)
#> Negative: -0.181 (SE 0.099)This keeps CPM-specific steps such as covariate handling, edge selection, and model training inside each resample fold.
Inspect Predictions and Edges
Native resampling results always keep raw observation-level
predictions on the object, while summary() gives an
aggregate out-of-fold report with error metrics as the default summary
and correlations as supplementary output.
predictions <- resample_obj$predictions
resample_summary <- summary(resample_obj)
resample_summary
#> CPM resample summary:
#> Number of folds: 5
#> Prediction error:
#> RMSE:
#> Combined: 0.708
#> Positive: 0.645
#> Negative: 0.704
#> MAE:
#> Combined: 0.574
#> Positive: 0.516
#> Negative: 0.559
#> Pooled correlations (Pearson):
#> Combined: -0.093
#> Positive: 0.059
#> Negative: -0.185
#> Fold-wise correlations (Pearson):
#> Combined: -0.075 (SE 0.046)
#> Positive: 0.029 (SE 0.054)
#> Negative: -0.181 (SE 0.099)
head(resample_summary[["metrics"]])
#> level metric prediction estimate std_error method
#> 1 pooled rmse both 0.7081353 NA <NA>
#> 2 pooled rmse pos 0.6448140 NA <NA>
#> 3 pooled rmse neg 0.7036247 NA <NA>
#> 4 pooled mae both 0.5743804 NA <NA>
#> 5 pooled mae pos 0.5162523 NA <NA>
#> 6 pooled mae neg 0.5592357 NA <NA>
head(predictions)
#> row fold real both pos neg
#> 1 1 5 -0.6339456 0.342534590 0.342534590 1.994932e-17
#> 2 2 1 0.4116546 -0.478738601 0.028245399 -5.179443e-01
#> 3 3 1 0.2898583 -0.032769603 0.073529511 -1.070810e-01
#> 4 4 3 0.7870086 -0.001288122 -0.001288122 2.428613e-17
#> 5 5 5 0.1169538 -0.023682986 -0.023682986 1.994932e-17
#> 6 6 3 0.7875511 0.255961736 0.255961736 2.428613e-17
head(resample_metrics(resample_obj))
#> fold n_assess metric prediction estimate
#> 1 1 16 rmse both 0.6175234
#> 2 1 16 rmse pos 0.5564055
#> 3 1 16 rmse neg 0.5860971
#> 4 2 16 rmse both 0.8263091
#> 5 2 16 rmse pos 0.6408348
#> 6 2 16 rmse neg 0.8491404These three outputs have slightly different jobs:
-
summary(resample_obj)prints the default aggregate report; -
summary(resample_obj)[["metrics"]]exposes the compact summary-level metric table stored on the summary object; -
resample_metrics(resample_obj)returns the raw pooled or fold-wise metric tables for further filtering, plotting, or export.
By default, fit_resamples() uses
return_edges = "none" and skips edge storage. This keeps
the resampling object light when you only need predictive
performance.
If you also want fold-aggregated edge selection rates, request them explicitly:
edge_resample_obj <- fit_resamples(
cpm_spec(),
conmat = conmat,
behav = behav,
covariates = covariates,
kfolds = 5,
return_edges = "sum"
)
dim(edge_resample_obj$edges)
#> [1] 200 2predictions returns one row per original observation. If
na_action = "exclude" removed subjects before fitting,
those rows are still present and their fold value is
NA.
When return_edges = "sum", cpmr stores
fold-summed edge counts for each edge. If memory matters, keep the
default return_edges = "none" or use
return_edges = "all" only when fold-wise edge arrays are
truly needed.
Custom Resamples
If you already have a partition scheme, pass it through
resamples.
custom_resamples <- split(
seq_len(n),
cut(seq_len(n), breaks = 4, labels = FALSE)
)
custom_obj <- fit_resamples(
cpm_spec(),
conmat = conmat,
behav = behav,
resamples = custom_resamples,
return_edges = "none"
)
summary(custom_obj)
#> CPM resample summary:
#> Number of folds: 4
#> Prediction error:
#> RMSE:
#> Combined: 0.936
#> Positive: 0.905
#> Negative: 0.944
#> MAE:
#> Combined: 0.679
#> Positive: 0.642
#> Negative: 0.736
#> Pooled correlations (Pearson):
#> Combined: 0.142
#> Positive: 0.173
#> Negative: -0.008
#> Fold-wise correlations (Pearson):
#> Combined: 0.158 (SE 0.121)
#> Positive: 0.247 (SE 0.129)
#> Negative: 0.026 (SE 0.113)Custom resamples must:
- be supplied as assessment-row indices;
- cover every complete-case subject exactly once;
- not overlap across folds.
When na_action = "exclude", those indices still refer to
original row numbers of the complete-case subjects kept in the
analysis.
Missing-Data Handling
Set na_action = "exclude" if you want cpmr
to fit on complete cases while preserving original row positions in the
outputs.
behav_with_na <- behav
behav_with_na[c(3, 11)] <- NA_real_
na_obj <- fit_resamples(
cpm_spec(),
conmat = conmat,
behav = behav_with_na,
kfolds = 5,
na_action = "exclude"
)
na_predictions <- na_obj$predictions
na_predictions[na_predictions$row %in% c(3, 11), ]
#> row fold real both pos neg
#> 3 3 NA NA NA NA NA
#> 11 11 NA NA NA NA NAThis behavior matters when you need to merge predictions back to subject-level metadata after resampling.
Next Steps
After you are comfortable with the native path:
- read the workflow-selection article for guidance on how to use the
native
cpm_spec()workflow in different settings; - read the leakage-focused article if you need a more detailed covariate handling example;
- keep a
cpm_spec()object around when you want an explicit reusable parameter object.