TAF complete analyses are characterized by the following criteria:
-
The main analysis folder must contain a
bootfolder. For older analyses, this folder may be namedbootstrap. -
The boot folder must contain a
DATA.bibfile, declaring the initial data used as a starting point in the analysis. The boot folder may also contain aSOFTWARE.bibfile. -
After running the TAF function
taf.boot(), data files must be present in theboot/datafolder, corresponding to the declarations in theDATA.bibfile. Similarly, after runningtaf.boot(), software files must be present in theboot/softwarefolder if aSOFTWARE.bibfile exists. -
If files exist both in
boot/initial/dataand theboot/datafolder, then the file contents should be identical. -
Files and folders inside the
boot/datafolder must be declared in theDATA.bibfile. Similarly, files and folders inside theboot/softwarefolder must be declared in theSOFTWARE.bibfile. -
The main analysis folder should contain TAF scripts named
data.R,model.R,output.R, andreport.R. These scripts may call other R scripts and/or dynamic reports with file extensions such as*.qmd,*.Rmd, or*.rmd. For some analyses, the model script may be namedmethod.R. -
R code should use relative paths rather than absolute paths.
-
After running the TAF function
source.all(), data files should be present in thedatafolder and output files should be present in theoutputfolder. Themodelfolder may contain intermediate results, and thereportfolder may contain final formatted results. For some analyses, the model folder may be namedmethod. -
To make it easy for the scientific community to browse and review the main data and results from the analysis, the file formats CSV (
*.csvfor tables), PNG (*.pngfor raster images), and PDF (*.pdffor vector images) should be used. These file formats are easy to open on any computer and are also easy to view on GitHub.
The above criteria can be evaluated by examining files and folders without running or modifying any part of the analysis. In contrast, the following criteria (marked with *) are best evaluated by running the full analysis, which can take a long time to run and may require special software or user authorization, and in some edge cases make irreversible changes to files.
10.* Regardless of which data and result files are stored online on GitHub, it should be possible to clone the analysis to a local computer and perform a full clean before rerunning the analysis successfully. A full clean consists of running the TAF functions clean.boot(force=TRUE) for the boot folder and clean() for the folders produced by the TAF scripts. A successful rerun of the analysis consists of running taf.boot() and source.all() without errors, producing the same or similar results as the original analysis.
11.* The TAF scripts (data.R, model.R, output.R, report.R) should run sequentially, with each script starting by reading files from a previous step and ending by writing out files.
12.* The data.R script should create the data folder and write files into that folder. Likewise, the model.R, output.R, and report.R scripts should create and write into the corresponding folders.