7/24/2015 - 10:18 AM

R Cheatsheet.

# by(data, factorlist, function)
by(pf$friend_count, pf$gender, summary)

# Getting logical
pf$mobile_check_in <- NA
pf$mobile_check_in <- ifelse(pf$mobile_likes > 0, 1, 0)
percent_mobile <- sum(pf$mobile_check_in)/length(pf$mobile_check_in) * 100

# Getting a sample and analyze it
set.seed(4231)
sample.ids <- sample(levels(yo$id), 16) 
# Get 16 samples of the yo$id parameter, we're selecting 16 householders that sells

ggplot(aes(x = time, y = price),
       data = subset(yo, id %in% sample.ids)) +
  facet_wrap(~id) +
  geom_line() +
  geom_point(aes(size = all.purchases), pch = 1)
  
# Scatterplot Matrix
install.packages('GGally')
library(GGally)

set.seed(1836) # We'll get a sample of 1000 rows within the total
pf_subset <- pf[ , c(2:15)]
names(pf_subset)
ggpairs(pf_subset[sample.int(nrow(pf_subset), 1000), ])
ggpairs(pf_subset[sample.int(nrow(pf_subset), 1000), ], axisLabels = 'internal')

libraries.md

Rendered
Source

Useful R libraries

ggplot2 Visualization library
magrittr Library for using pipe command %>% (Cmd+Shift+M)
tidyr & dplyr Data wrangling with R
pander Render R objects into Pandoc's markdown
ggthemes Themes for ggplot2 library
gridExtra For aggregate different plots with grid.arrange(p1, p2, ..., ncol = 1)
scales Implement scales in a way that is graphics system agnostic

To install a new package and use it:

install.packages('name_of_the_package', dependencies = T)
library(name_of_the_package)

R.md

Rendered
Source

R Cheatsheet

General

getwd() Get Working Directory
setwd('~/Downloads') Set Working Directory
ls() List variables on Environment
dir() List directories on Working Directory
list.files() List files on Working Directory
rm('variable1') Remove variable1 from Environment
rm(list = ls())Remove all variables on Environment
identical(data1, data2)
colnames(data) Get column names (also names(df) on data frames)
rownames(data) Get row names
data(name_dataset) Load data set into Environment
Execute script from terminal: Rscript my_script.R

Load Data

read.csv('file.csv') Read from CSV to data.frame
read.csv('file.tsv', sep = '\t') Readm from TSV to data.frame
alumni <- read.csv(path_alumni, na.strings = c('-'), colClasses = c('character', 'character', 'numeric', 'numeric'))

Data Frames

subset(df, <condition>) Example: subset(statesInfo, state.region == 1)
df[ROWS, COLUMNS]
- Example: statesInfo[statesInfo$state.region == 1, ]
- Example2: statesInfo[statesInfo$state.region == 1 & statesInfo$population > 3000, ]
nrow(df)
ncol(df)
by(data, factorlist, function) Ex: by(pf$friend_count, pf$gender, summary)

Data Overview

str(data) Structure of the data
summary(data) Summary of the data
head(data)
tail(data)
For factor variables (categoricals)
- table(variable)
- levels(variable)
- reddit$age.range <- ordered(reddit$age.range, levels = c('Under 18', '18-24', '25-34', '35-44', '45-54', '55-64', '65 or Above'))
- reddit$income.range <- factor(reddit$income.range, levels = c("Under $20,000", "$20,000 - $29,999", "$30,000 - $39,999", "$40,000 - $49,999", "$50,000 - $69,999", "$70,000 - $99,999", "$100,000 - $149,999", "$150,000 or more"), ordered = T)

Update packages

update.packages(ask=FALSE, checkBuilt = TRUE)

Load R script from GitHub gists

library(devtools)
source_gist("524eade46135f6348140", filename = "ggplot_smooth_func.R")

Cacher is the code snippet organizer for pro developers

We empower you and your team to get more done, faster

R Cheatsheet.

Useful R libraries

R Cheatsheet

General

Load Data

Data Frames

Data Overview

Update packages

Load R script from GitHub gists