Getting and Cleaning Data Course Project Repository

This is a repository for the Course Project of "Getting and Cleaning Data" Coursera course. The course is accessible here https://class.coursera.org/getdata-034 and is a part of the Data Science Specialization.

The script's destination

There is a script "run_analysis.R" in the repo which aims to fulfill the task for the course project for the "Getting and Cleaning Data" course on Coursera.

The script's structure

In the first part:

The script unzips Samsung Dataset from the working directory and loads subsets (test, train and support sets) into R objects. There is also a commented code in the script to download the Samsung data set from the link mentioned in the assignment instructions.

In the second part it does the next five things:

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Adds descriptive activity names to name the activities in the data set
Appropriately labels the data set with descriptive variable names.
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

The script's results

The resulting sets are:

For tasks 1 to 4 - resultSet
For task 5 - averageSet

At the very end the script writes the averageSet to an averageSet.txt file.

Script details

Packages used

It is a dplyr package (https://cran.r-project.org/web/packages/dplyr/index.html) used in to make a resulting data set.

Part 1

Firstly after unzippint the set the script gets Features (will be used to name variables) and Activity Labels (will be used to label Activities) into R objects with read.table function.
Then it prepares Test and Training sets by getting measures, activities and subjects sets from different files into R with read.table function and then combining them with cbind().

Part 2

The script combines Test and Training sets into one set named fullSet with rbind()
Assigns descriptive names to the variables of the fullSet
Extracts only means and standard deviations variables by leaving in the set only those variables which contain "mean()" or "std()" in their names.
Add descriptive activity names to name the activities in the data set by merging the set with the Activity Labels set by the activities ids.
Makes a new data set with the average of each variable for each activity and each subject by using a pipelined group_by() and summarise_each() functions from the dplyr package.
At the very end the script writes the averageSet to an averageSet.txt file.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting and Cleaning Data Course Project Repository

The script's destination

The script's structure

In the first part:

In the second part it does the next five things:

The script's results

Script details

Packages used

Part 1

Part 2

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Getting and Cleaning Data Course Project Repository

The script's destination

The script's structure

In the first part:

In the second part it does the next five things:

The script's results

Script details

Packages used

Part 1

Part 2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages