For the purpose of this tutorial, we are going to use the toy module
utils/seq, which is implemented in the file
utils/seq.r. The module implements some very basic
mechanisms to deal with DNA sequences (character strings consisting
entirely of the letters A, C, G and T).
First, we load the module.
seq = import('utils/seq')
ls()## [1] "seq"
utils serves as a supermodule here, which groups several submodules
(but for now, seq is the only one).
To see which functions a module exports, use ls:
ls(seq)## [1] "print.seq" "revcomp" "seq"
## [4] "table" "valid_seq" "valid_seq.default"
## [7] "valid_seq.seq"
And we can display interactive help for individual functions:
?seq$seqThis function creates a biological sequence. We can use it:
s = seq$seq(c(foo = 'GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC',
bar = 'CATAGCAACTGACATCACAGCG'))
s## >foo
## GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC
## >bar
## CATAGCAACTGACATCACAGCG
Notice how we get a pretty-printed,
FASTA-like output because
the print method is redefined for the seq class in utils/seq:
seq$print.seq## function (seq, columns = 60)
## {
## lines = strsplit(seq, sprintf("(?<=.{%s})", columns), perl = TRUE)
## print_single = function(seq, name) {
## if (!is.null(name))
## cat(sprintf(">%s\n", name))
## cat(seq, sep = "\n")
## }
## names = if (is.null(names(seq)))
## list(NULL)
## else names(seq)
## Map(print_single, lines, names)
## invisible(seq)
## }
## <environment: 0x7ff018b01988>
That’s it for basic usage. In order to understand more about the module mechanism, let’s look at an alternative usage:
# We can unload loaded modules that we assigned to an identifier:
unload(seq)
options(import.path = 'utils')
import('seq', attach = TRUE)After unloading the already loaded module, the options function call
sets the module search path: this is where import searches for
modules. If more than one path is given, import searches them all
until a module of matching name is found.
The import statement can now simply specify seq instead of
utils/seq as the module name. We also specify attach=TRUE. This has
an effect similar to package loading (or attaching an environment):
all the module’s names are now available for direct use without
necessitating the seq$ qualifier.
However, unlike the attach function, module attachment happens in
local scope only. Since the above code was executed in global scope,
there’s no distinction between local and global scope:
search()## [1] ".GlobalEnv" "module:seq" "devtools_shims"
## [4] "package:modules" "package:testthat" "package:stats"
## [7] "package:graphics" "package:grDevices" "package:utils"
## [10] "package:datasets" "rprofile" "package:methods"
## [13] "Autoloads" "package:base"
Notice the second position, which reads “module:seq”. But now let’s undo that, and attach (and use) the module locally instead.
detach('module:seq') # Name is optional
local({
import('seq', attach = TRUE)
table('GATTACA')
})## [[1]]
##
## A C G T
## 3 1 1 2
Note that this uses seq’s table function, rather than base::table
(which would have a different output). Furthermore, note that outside
the local scope, the module is not attached:
search()## [1] ".GlobalEnv" "devtools_shims" "package:modules"
## [4] "package:testthat" "package:stats" "package:graphics"
## [7] "package:grDevices" "package:utils" "package:datasets"
## [10] "rprofile" "package:methods" "Autoloads"
## [13] "package:base"
table('GATTACA')##
## GATTACA
## 1
This is very powerful, as it isolates separate scopes more effectively
than the attach function. What is more, modules which are imported and
attached inside another module remain inside that module and are not
visible outside the module by default.
Nevertheless, the normal, recommended usage of a module is with
attach=FALSE (the default), as this makes it clearer which names we
are referring to.
Modules can also be nested in hierarchies. In fact, here is the
implementation of utils (in utils/__init__.r:
since utils is a directory rather than a file, the module
implementation resides in the nested file __init__.r):
seq = import('./seq')The submodule is specified as './seq' rather than 'seq': the
explicitly provided relative path prevents lookup in the import search
path (that we set via options(import.path=…) earlier); instead, only
the current directory is considered.
We can now use the utils module:
options(import.path = NULL) # Reset search path
utils = import('utils')
ls(utils)## [1] "seq"
ls(utils$seq)## [1] "print.seq" "revcomp" "seq"
## [4] "table" "valid_seq" "valid_seq.default"
## [7] "valid_seq.seq"
utils$seq$revcomp('CAT')## ATG
We could also have implemented utils as follows:
export_submodule('./seq')This would have made all of seq’s definitions immediately available in
utils. This is sometimes useful, but should be employed with care.
utils/seq.r is, by and large, a normal R source file. In fact, there
are only two things worth mentioning:
-
Documentation. Each function in the module file is documented using the roxygen2 syntax. It works the same as for packages. The modules package parses the documentation and makes it available via
module_helpand?. -
The module exports S3 functions. The modules package takes care to register such functions automatically but this only works for user generics that are defined inside the same module. When overriding “known generics” (such as
print), we need to register these manually viaregister_S3_method(this is necessary since these functions are inherently ambiguous and there is no automatic way of finding them).
Module files can contain arbitrary code. It is executed when loaded for
the first time: subsequent imports in the same session, regardless of
whether they occur in a different scope, will refer to the loaded,
cached module, and will not reload a module.
We can illustrate this by loading a module which has side-effects,
'info'.
message('Loading module "', module_name(), '"')
message('Module path: "', basename(module_file()), '"')Let’s load it:
info = import('info')## Loading module "info"
## Module path: "vignettes"
We have imported the module, and get the diagnostic messages. Let’s re-import the module:
import('info')… no messages are displayed. However, we can explicitly reload a module. This clears the cache, and loads the module again:
reload(info)## Loading module "info"
## Module path: "vignettes"
And this displays the messages again. The reload function is a
shortcut for unload followed by import (using the exact same
arguments as used on the original import call).
The info module also show-cases two important helper functions:
-
module_namecontains the name of the module with which it was loaded. This is especially handy because outside of a modulemodule_nameisNULL. We can harness this in a similar way to Python’s__name__mechanism. -
module_fileworks equivalently tosystem.file: it returns the full path to any file within a module. This is helpful when distributing data files with modules, which are loaded from within the module. When invoked without arguments,module_filereturns the full path to the directory containing the module source file.