This is a repository for the Course Project of "Getting and Cleaning Data" Coursera course. The course is accessible here https://class.coursera.org/getdata-034 and is a part of the Data Science Specialization.
There is a script "run_analysis.R" in the repo which aims to fulfill the task for the course project for the "Getting and Cleaning Data" course on Coursera.
The script unzips Samsung Dataset from the working directory and loads subsets (test, train and support sets) into R objects. There is also a commented code in the script to download the Samsung data set from the link mentioned in the assignment instructions.
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Adds descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
The resulting sets are:
- For tasks 1 to 4 - resultSet
- For task 5 - averageSet
At the very end the script writes the averageSet to an averageSet.txt file.
It is a dplyr package (https://cran.r-project.org/web/packages/dplyr/index.html) used in to make a resulting data set.
- Firstly after unzippint the set the script gets Features (will be used to name variables) and Activity Labels (will be used to label Activities) into R objects with read.table function.
- Then it prepares Test and Training sets by getting measures, activities and subjects sets from different files into R with read.table function and then combining them with cbind().
- The script combines Test and Training sets into one set named fullSet with rbind()
- Assigns descriptive names to the variables of the fullSet
- Extracts only means and standard deviations variables by leaving in the set only those variables which contain "mean()" or "std()" in their names.
- Add descriptive activity names to name the activities in the data set by merging the set with the Activity Labels set by the activities ids.
- Makes a new data set with the average of each variable for each activity and each subject by using a pipelined group_by() and summarise_each() functions from the dplyr package.
- At the very end the script writes the averageSet to an averageSet.txt file.