Hosting internal R packages on r-universe

This post attempts to replicate stock analysis with the coreStatsNMR package, available via the NMR Group r-universe. Specifically, using the statsTable function outlined in this post.

First, downloading the library using the custom repos argument to point to the r-universe: install.packages("coreStatsNMR", repos = c("https://nmrgroup.r-universe.dev", "https://cloud.r-project.org"))

library(coreStatsNMR)


Attaching package: 'coreStatsNMR'

The following object is masked from 'package:base':

    mode

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

statsTable(data = iris,
           summVar = "Sepal.Length",
           groupVar = "Species",
           stats = c("n", "min", "max", "mean", "weighted.mean", "median", "sd", "iqr", "cv"),
           accuracy = 0.01,
           drop0trailing = TRUE) %>% 
  knitr::kable()

Warning in statsTable.data.frame(data = iris, summVar = "Sepal.Length", : Using placeholder weights of 1 for all data

stat	setosa	versicolor	virginica	Total
n	50	50	50	150
min	4.30	4.90	4.90	4.30
max	5.80	7.00	7.90	7.90
mean	5.01	5.94	6.59	5.84
weighted.mean	5.01	5.94	6.59	5.84
median	5.00	5.90	6.50	5.80
sd	0.35	0.52	0.64	0.83
iqr	0.40	0.70	0.67	1.30
cv	0.07	0.09	0.10	0.14

So what?

Yay! We can run summary statistics on stock R data with our own package. Why do this? We already can write expressive pipelines with various packages: dplyr, data.table, collapse, or polars. The added value of a DIY function is not apparent, especially if it’s using those packages underneath.

However, for a consulting firm, such as my previous employer, there is value in creating wrapped versions of the stock coreStats functions which incorporate project/client constraints and documentation. That way, the core functions’ focus can be on being very good in a narrow scope (for each function), but they can be combined and/or extended via wrappers for projects and/or specific, repetitive applications. This does assume time is invested in designing them to play nicely with one another, and maintaining these conditions as the codebase evolves. Having shareable “core” functions separate from “project code” allows the firm to tap into additional marketing value as well, i.e. more-visibly participating in open-source software (OSS) development.

Internal and/or public packages are also ways to embed invaluable organizational knowledge, e.g. in a package’s testing suite, warnings, errors, and documentation. Of course, embedding this knowledge requires caution so that only the sources/methods/etc appropriate for public use are exposed in public repos like the r-universe.

Hosting internal R packages on r-universe

So what?

Support my work with a cafecito

Share