STATA Vs R


    STATA
    R
  1. Working Directory
    cd

    This will load from Stata’s current working directory. You can change the working directory using cd "path"

    cd "C:\Users\Name\documents"

    setwd

    R will load this file from your current working directory. You can change the working directory with setwd("path")

    setwd "C:\Users\Name\documents"

  2. Installing Packages/user-written programs
    ssc install

    scc install will install the user-defined program `xyz'.

    ssc install xyz

    install.packages()

    This will install the package –-xyz--. A window will pop-up, select a mirror site to download from (the closest to where you are) and click ok.

    install.packages("xyz")

  3. Getting help
    help

    help will get the information on the command 'x'- Eg. help tabulate.

    help tabulate

    ? or help()

    This will get help for an object, in this case for the --plot-- function. You can also type: help(plot)

    ?help
    help(plot)

  4. Loading Data
    use / import

    use helps to import stata file i.e. dta file. import helps to load excel or csv data set into stata platform

    use "./dataset.dta"
    import delimited "./dataset.csv"

    read_dta() / read_csv()

    R can load an arbitrary number of data sets at once, so they must each be assigned a name. It is recommended read.dta() in the haven package–part of the tidyverse–for loading Stata files because it preserves Stata labels.

    mydata <- read.dta("dataset.dta")
    mydata2 <- read.csv("dataset.csv")

  5. Exploring Data
    describe

    Provides the structure of the dataset

    describe

    str(mydata)

    Provides the structure of the dataset

    str(mydata)

    edit / browse

    Browse using Data Editor to look at what a dataset stored in Stata. The browse is a convenient alternative to list.

    browse

    view()

    There are two ways to observe datasets (or other data, like vectors). The first is to click on data in the Global Environment tab. That can be annoying if you have many datasets stored, so the second option is the view function. Its syntax is simple

    view(mydata)

    list

    list displays the values of variables. If no varlist is specified, the values of all the variables are displayed. To show first 6 rows of the data, use list in 1/6

    list
    list in 1/6

    head() and tail()

    To display first 6 rows, use head(6) and to display last 6 rows, use tail(6).

    head(6)
    tail(6)

    summarize

    summarize calculates and displays a variety of univariate summary statistics. If no varlist is specified, summary statistics are calculated for all the variables in the dataset.

    summarize

    summary(mydata)

    Provides basic descriptive statistics and frequencies

    summary(mydata)

  6. Renaming Variables
    rename

    rename changes the name of an existing variable old_varname to new_varname; the contents of the variable are unchanged.

    rename old_name new_name
    rename lastname last

    rename()

    rename()changes the names of individual variables using new_name = old_name syntax; rename_with() renames columns using a function. Load the library --plyr-- to use rename() function

    mydata <- rename(mydata, c("old_name"="new_name")

  7. Value Labels
    label define, label values

    label define to define the value label and label values to assign value label to variables.

    label define gender 1 "male" 2 "female"
    label values sex gender

    factor() or ordered()

    Use factor() for the nominal data and use ordered() for the ordinal data

    mydata$sex <- factor(mydata$sex, levels = c(1,2), labels = c("male", "female"))

    mydata$var1 <- ordered(mydata@var1, levels = c(1,2,3,4), labels = c("Strongly agree", "Somewhat agree", "Somewhat disagree", "Strongly disagree"))

  8. Create a Variable
    generate

    In Stata, we just use generate.The command generate creates a new variable. The values of the variable are specified by =exp.

    generate x = 1

    Creating a variable with a sequence of numbers from 1 to n (where ‘n’ is the total number of observations)

    generate id = _n

    Creating a variable with the total number of observations

    generate total = _N

    dataset$new_var()

    In R, we assign variables inside the data we want to use. We can do this with base R:

    mydata$x <- 1

    Creating a variable with a sequence of numbers from 1 to n (where ‘n’ is the total number of observations)

    mydata$id <- seq(dim(mydata) [1])

    Creating a variable with the total number of observations

    mydata$total <- dim(mydata) [1]

  9. Drop and Keep Variables
    drop

    drop helps to remove the variable(s) from the dataset. If we have variables var1 var2 var3 and we want to remove var2 from the dataset, the command should be written as below:

    drop var2

    Or To remove var2 var3 from the dataset:

    drop var2 var3

    NULL/subset()

    In R, there are many methods to remove the variables/columns. If we have var1 var2 var3 in dataframe called "mydata" and we want to remove the column or variable var2 from the dataset, you can set it to NULL.

    mydata$var2 <- NULL

    subset() can be used to remove the varialbes along with select. To remove variable var2

    mydata <- subset(mydata, select = -var2)

    To remove multiple variables - Eg. removing var2 and var3

    mydata <- subset(mydata, select = -c(var2, var3))

    keep

    If we would like to keep only few varaibles and remove others. To keep only var1 and var3 variables

    keep var1 var3

    subset()

    This is other way to use select in subset() to keep variables. To keep only var1 and var3 variables

    mydata <- subset(mydata, select = c(var1, var3))

  10. Tabulate or Crosstab frequencies
    tabulate/tab

    One-way tables of frequesncies

    tab gender
    tab read

    Two-way tables of frequesncies

    tab gender read, col row

    table()

    One-way tables of frequesncies

    table(mydata$gender)
    table(mydata$read)

    Two-way tables of frequesncies. In R, we create a new dataset for multiple variables to see the proportions.

    readgender <- table(mydata$read, mydata$gender)
    prop.table(readgender, 1) # Row
    prop.table(readgender, 2) # Column
    prop.table(readgender) # Total