Hosting internal R packages on r-universe

news
code
pkgs
ropensci
Author

Julian Ricardo

Published

August 15, 2024

This post attempts to replicate stock analysis with the coreStatsNMR package, available via the NMR Group r-universe. Specifically, using the statsTable function outlined in this post.

First, downloading the library using the custom repos argument to point to the r-universe: install.packages("coreStatsNMR", repos = c("https://nmrgroup.r-universe.dev", "https://cloud.r-project.org"))

library(coreStatsNMR)

Attaching package: 'coreStatsNMR'
The following object is masked from 'package:base':

    mode
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
statsTable(data = iris,
           summVar = "Sepal.Length",
           groupVar = "Species",
           stats = c("n", "min", "max", "mean", "weighted.mean", "median", "sd", "iqr", "cv"),
           accuracy = 0.01,
           drop0trailing = TRUE) %>% 
  knitr::kable()
Warning in statsTable.data.frame(data = iris, summVar = "Sepal.Length", : Using placeholder weights of 1 for all data
stat setosa versicolor virginica Total
n 50 50 50 150
min 4.30 4.90 4.90 4.30
max 5.80 7.00 7.90 7.90
mean 5.01 5.94 6.59 5.84
weighted.mean 5.01 5.94 6.59 5.84
median 5.00 5.90 6.50 5.80
sd 0.35 0.52 0.64 0.83
iqr 0.40 0.70 0.67 1.30
cv 0.07 0.09 0.10 0.14

So what?

Yay! We can run summary statistics on stock R data with our own package. Why do this? We already can write expressive pipelines with various packages: dplyr, data.table, collapse, or polars. The added value of a DIY function is not apparent, especially if it’s using those packages underneath.

However, for a consulting firm, such as my previous employer, there is value in creating wrapped versions of the stock coreStats functions which incorporate project/client constraints and documentation. That way, the core functions’ focus can be on being very good in a narrow scope (for each function), but they can be combined and/or extended via wrappers for projects and/or specific, repetitive applications. This does assume time is invested in designing them to play nicely with one another, and maintaining these conditions as the codebase evolves. Having shareable “core” functions separate from “project code” allows the firm to tap into additional marketing value as well, i.e. more-visibly participating in open-source software (OSS) development.

Internal and/or public packages are also ways to embed invaluable organizational knowledge, e.g. in a package’s testing suite, warnings, errors, and documentation. Of course, embedding this knowledge requires caution so that only the sources/methods/etc appropriate for public use are exposed in public repos like the r-universe.


Support my work with a cafecito

Buy Me A Coffee

Share