STATA Vs R

STATA

Working Directory

cd

This will load from Stata’s current working directory. You can change the working directory using cd "path"

cd "C:\Users\Name\documents"

setwd

R will load this file from your current working directory. You can change the working directory with setwd("path")

setwd "C:\Users\Name\documents"
Installing Packages/user-written programs

ssc install

scc install will install the user-defined program `xyz'.

ssc install xyz

install.packages()

This will install the package –-xyz--. A window will pop-up, select a mirror site to download from (the closest to where you are) and click ok.

install.packages("xyz")
Getting help

help

help will get the information on the command 'x'- Eg. help tabulate.

help tabulate

? or help()

This will get help for an object, in this case for the --plot-- function. You can also type: help(plot)

?help
help(plot)
Loading Data

use / import

use helps to import stata file i.e. dta file. import helps to load excel or csv data set into stata platform

use "./dataset.dta"
import delimited "./dataset.csv"

read_dta() / read_csv()

R can load an arbitrary number of data sets at once, so they must each be assigned a name. It is recommended read.dta() in the haven package–part of the tidyverse–for loading Stata files because it preserves Stata labels.

mydata <- read.dta("dataset.dta")
mydata2 <- read.csv("dataset.csv")
Exploring Data

describe

Provides the structure of the dataset

describe

str(mydata)

Provides the structure of the dataset

str(mydata)

edit / browse

Browse using Data Editor to look at what a dataset stored in Stata. The browse is a convenient alternative to list.

browse

view()

There are two ways to observe datasets (or other data, like vectors). The first is to click on data in the Global Environment tab. That can be annoying if you have many datasets stored, so the second option is the view function. Its syntax is simple

view(mydata)

list

list displays the values of variables. If no varlist is specified, the values of all the variables are displayed. To show first 6 rows of the data, use list in 1/6

list
list in 1/6

head() and tail()

To display first 6 rows, use head(6) and to display last 6 rows, use tail(6).

head(6)
tail(6)

summarize

summarize calculates and displays a variety of univariate summary statistics. If no varlist is specified, summary statistics are calculated for all the variables in the dataset.

summarize

summary(mydata)

Provides basic descriptive statistics and frequencies

summary(mydata)
Renaming Variables

rename

rename changes the name of an existing variable old_varname to new_varname; the contents of the variable are unchanged.

rename old_name new_name
rename lastname last

rename()

rename()changes the names of individual variables using new_name = old_name syntax; rename_with() renames columns using a function. Load the library --plyr-- to use rename() function

mydata <- rename(mydata, c("old_name"="new_name")
Value Labels

label define, label values

label define to define the value label and label values to assign value label to variables.

label define gender 1 "male" 2 "female"
label values sex gender

factor() or ordered()

Use factor() for the nominal data and use ordered() for the ordinal data

mydata$sex <- factor(mydata$sex, levels = c(1,2), labels = c("male", "female"))

mydata$var1 <- ordered(mydata@var1, levels = c(1,2,3,4), labels = c("Strongly agree", "Somewhat agree", "Somewhat disagree", "Strongly disagree"))
Create a Variable

generate

In Stata, we just use generate.The command generate creates a new variable. The values of the variable are specified by =exp.

generate x = 1

Creating a variable with a sequence of numbers from 1 to n (where ‘n’ is the total number of observations)

generate id = _n

Creating a variable with the total number of observations

generate total = _N

dataset$new_var()

In R, we assign variables inside the data we want to use. We can do this with base R:

mydata$x <- 1

Creating a variable with a sequence of numbers from 1 to n (where ‘n’ is the total number of observations)

mydata$id <- seq(dim(mydata) [1])

Creating a variable with the total number of observations

mydata$total <- dim(mydata) [1]
Drop and Keep Variables

drop

drop helps to remove the variable(s) from the dataset. If we have variables var1 var2 var3 and we want to remove var2 from the dataset, the command should be written as below:

drop var2

Or To remove var2 var3 from the dataset:

drop var2 var3

NULL/subset()

In R, there are many methods to remove the variables/columns. If we have var1 var2 var3 in dataframe called "mydata" and we want to remove the column or variable var2 from the dataset, you can set it to NULL.

mydata$var2 <- NULL

subset() can be used to remove the varialbes along with select. To remove variable var2

mydata <- subset(mydata, select = -var2)

To remove multiple variables - Eg. removing var2 and var3

mydata <- subset(mydata, select = -c(var2, var3))

keep

If we would like to keep only few varaibles and remove others. To keep only var1 and var3 variables

keep var1 var3

subset()

This is other way to use select in subset() to keep variables. To keep only var1 and var3 variables

mydata <- subset(mydata, select = c(var1, var3))
Tabulate or Crosstab frequencies

tabulate/tab

One-way tables of frequesncies

tab gender
tab read

Two-way tables of frequesncies

tab gender read, col row

table()

One-way tables of frequesncies

table(mydata$gender)
table(mydata$read)

Two-way tables of frequesncies. In R, we create a new dataset for multiple variables to see the proportions.

readgender <- table(mydata$read, mydata$gender)
prop.table(readgender, 1) # Row
prop.table(readgender, 2) # Column
prop.table(readgender) # Total

STATA Vs R

This will load from Stata’s current working directory. You can change the working directory using cd "path"

R will load this file from your current working directory. You can change the working directory with setwd("path")

scc install will install the user-defined program `xyz'.

This will install the package –-xyz--. A window will pop-up, select a mirror site to download from (the closest to where you are) and click ok.

help will get the information on the command 'x'- Eg. help tabulate.

This will get help for an object, in this case for the --plot-- function. You can also type: help(plot)

use helps to import stata file i.e. dta file. import helps to load excel or csv data set into stata platform

R can load an arbitrary number of data sets at once, so they must each be assigned a name. It is recommended read.dta() in the haven package–part of the tidyverse–for loading Stata files because it preserves Stata labels.

Provides the structure of the dataset

Provides the structure of the dataset

Browse using Data Editor to look at what a dataset stored in Stata. The browse is a convenient alternative to list.

There are two ways to observe datasets (or other data, like vectors). The first is to click on data in the Global Environment tab. That can be annoying if you have many datasets stored, so the second option is the view function. Its syntax is simple

list displays the values of variables. If no varlist is specified, the values of all the variables are displayed. To show first 6 rows of the data, use list in 1/6

To display first 6 rows, use head(6) and to display last 6 rows, use tail(6).

summarize calculates and displays a variety of univariate summary statistics. If no varlist is specified, summary statistics are calculated for all the variables in the dataset.

Provides basic descriptive statistics and frequencies

rename changes the name of an existing variable old_varname to new_varname; the contents of the variable are unchanged.

rename()changes the names of individual variables using new_name = old_name syntax; rename_with() renames columns using a function. Load the library --plyr-- to use rename() function

label define to define the value label and label values to assign value label to variables.

Use factor() for the nominal data and use ordered() for the ordinal data

In Stata, we just use generate.The command generate creates a new variable. The values of the variable are specified by =exp.

Creating a variable with a sequence of numbers from 1 to n (where ‘n’ is the total number of observations)

Creating a variable with the total number of observations

In R, we assign variables inside the data we want to use. We can do this with base R:

Creating a variable with a sequence of numbers from 1 to n (where ‘n’ is the total number of observations)

Creating a variable with the total number of observations

drop helps to remove the variable(s) from the dataset. If we have variables var1 var2 var3 and we want to remove var2 from the dataset, the command should be written as below:

Or To remove var2 var3 from the dataset:

In R, there are many methods to remove the variables/columns. If we have var1 var2 var3 in dataframe called "mydata" and we want to remove the column or variable var2 from the dataset, you can set it to NULL.

subset() can be used to remove the varialbes along with select. To remove variable var2

To remove multiple variables - Eg. removing var2 and var3

If we would like to keep only few varaibles and remove others. To keep only var1 and var3 variables

This is other way to use select in subset() to keep variables. To keep only var1 and var3 variables

One-way tables of frequesncies

Two-way tables of frequesncies

One-way tables of frequesncies

Two-way tables of frequesncies. In R, we create a new dataset for multiple variables to see the proportions.