This project focuses on obtaining data, using R to clean the data, and finally creating a text file of the tidy data.
The x variable entries in the train and test data sets are measurements of human activity (such as walking_upstairs, walking_downstains, sitting, etc.). These measurements were recorded using a waist-mounted smartphone from 30 subjects. The y variable entries in the train and test data sets are the activities the subjects performed.
To obtain the desired output, it requires downloading the following R Project and running the following R script:
Coursera - R Training.Rprojrun_analysis.R
The following reports are required to run the script:
X_test.txtY_test.txtsubject_test.txtX_train.txtY_train.txtsubject_train.txt
Note: reports should be saved in the "raw" directory within the "getting_and_cleaning_data" directory.
The following .txt file is created from running the code:
tidydata_full.txt
Software required to generate necessary outputs for DSD deliverable:
- R v 3.5+
- R Packages
- dplyr
- tidyr
- Download the
Coursera - R Training.zipfile and unzip it to any directory on your system. - Download the data from the "Getting and Cleaning Data" project summary
- Unzip the data, placing the extracted file directory into the "raw" directory within the "getting_and_cleaning_data" directory.
- Open the
Coursera - R Training.Rprojproject and therun_analysis.Rfile - Ensure required packages above are installed prior to running the script.
- Run the script.
Once the data gets read into the environment, the script does the following:
- Merges train and test data with respect to the variable (x or y) by using rbind command.
- Subsets the data by extracting only the entries that contain "mean" or "std" by using the grepl command.
- Renames the columns of the data tables using appropriate labels.
- Creates tidy data set and writes it to a .txt file in the "finished" directory.