Be sure to save all your files (especially the data input) in the folders below where you have started your project. To let R open files you have to use the forward slash (not the backward one). Example:
a=read.csv2("data/wages2.csv") # reads a csv file saved in the data folder. The file is named econometrics.csv
# As explained bevore everything behind a # (Hashtag) is not run by R.
# #'s are very useful for commenting and making sense out of your code.
a=read.csv2("data/econometrics.csv")
We will often watn to store reuslts of calculations to reuse them later. For this, we can work with basic objects. An object has a name and a conent. We can freely choose the name of an object givan certain rules - they have to start with a letter and include only letters, numbers and some süpecial characters (“.”, “_“,”-“). R is case sensitive so “x” and “X” are different object names. The content of an object is assigned using”<-" or “=”.
In order to assign the value of 5 to the object econ type
x = 5
# or
x <- 5
If you already before had an object named x and now create it again, the old version will be overwritten. Now you can use “x” in your calculations.
Example:
b= 5*x
b
## [1] 25
A list of currently ddefined object names can be obtained using ls()
In R Studio all the object names are also shown in the “Workspace” window on the top right side.
# Change and Delete objects:
rm(x) # Deletes an object
rm(list=ls()) # all objects are removed
For statistical calculations, we obviously need to work with data sets including many numbers of instead of scalears. The simplest way we can collect many numbers (or other tpyes of information) is called a vector in R terminology (you have already been familiarized with vectors and the page before).
To define a vector, we can collect different values using c(value1, value2,...)
.
# Examples
a=c(1,2,3,4)
b=a+1
b
## [1] 2 3 4 5
c=sqrt(b+a)
c
## [1] 1.732051 2.236068 2.645751 3.000000
# Important R functions for vectors:
# Define vector
(a <- c(7,2,6,9,4,1,3))
## [1] 7 2 6 9 4 1 3
# Basic functions:
sort(a)
## [1] 1 2 3 4 6 7 9
length(a)
## [1] 7
min(a)
## [1] 1
max(a)
## [1] 9
sum(a)
## [1] 32
prod(a)
## [1] 9072
# Creating special vectors:
numeric(20)
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rep(1,20)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
seq(50)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50
5:15
## [1] 5 6 7 8 9 10 11 12 13 14 15
seq(4,20,2)
## [1] 4 6 8 10 12 14 16 18 20
x==y # x is equal to y
x>y # x is bigger then y
x<=y # x is smaller or equal to y
x!=y # x is NOT equal to y
!b # NOT b (i.e. b is FALSE)
a|b # Either a or b is TRUE
a6b # Both a and b are TRUE
The contents of R vectors do not need to be numeric. A simple example of a different type are character vectors. For handling them, the contents simply need to be enclosed in quotation marks:
cities = c("Friedrichshafen", "Paris", "Tokio", "Tettnang", "Mailand")
cities
## [1] "Friedrichshafen" "Paris" "Tokio" "Tettnang"
## [5] "Mailand"
Another useful type are logical vectors. Each element can only take one of two values: “TRUE” or “FALSE”. Internally, “FALSE” corresponds to 0 and “TRUE” to 1.
a <- c(7,2,6,9,4,1,3)
b <- a<3 | a>=6
b
## [1] TRUE TRUE TRUE TRUE FALSE TRUE FALSE
As we have seen in Econometrics, many variables take only a binary outcome, e.g. they are a dummy variable (for example gender)
If we want to store qualitative information with more levels we can use so called factors.
# Costumer Ratings
x <- c(3,2,2,3,1,2,3,2,1,2)
xf <- factor(x, labels=c("bad","okay","good"))
x
## [1] 3 2 2 3 1 2 3 2 1 2
xf
## [1] good okay okay good bad okay good okay bad okay
## Levels: bad okay good
The elements of a vector can be named which can increase the readability of the output. Given a vector vec and a string vector namevec of the same length, the names are attached to the vecotor elements using names(vec) = namevec
.
If we want to access a single element or a subset form a vecotr, we can work with indices. They are written in swquare brackets next to the vector name. For example `myvector[4]
returns the rth element of myvector and myvector[6] = 8
changes the 6th element to take the value of 8. If the vector elements have names, we can also use those as indices like in myvector["elementname"]
# Create a vector "avgs":
avgs <- c(.366, .358, .356, .349, .346)
# Create a string vector of names:
players <- c("Cobb","Hornsby","Jackson","O'Doul","Delahanty")
# Assign names to vector and display vector:
names(avgs) <- players
avgs
## Cobb Hornsby Jackson O'Doul Delahanty
## 0.366 0.358 0.356 0.349 0.346
# Indices by number:
avgs[2]
## Hornsby
## 0.358
avgs[1:4]
## Cobb Hornsby Jackson O'Doul
## 0.366 0.358 0.356 0.349
# Indices by name:
avgs["Jackson"]
## Jackson
## 0.356
# Logical indices:
avgs[ avgs>=0.35 ]
## Cobb Hornsby Jackson
## 0.366 0.358 0.356
Matrices are important tools for econometric analyses (think back to the first Tut). R has a powerful matrix algebra system. Most often in applied econometrics, matrices will be generated from an exisiting data set. But you can also build the from scratch with matrix(vec, nrow=m)
(takes the numbers storeend in vector vec and puts them into a matrix with m rows). Other options incluede: rbind(r1,r2)
and cbind(c1,c2)
in binding several vectors (which obviously need to have the same length) by row or column.
# Generating matrix A from one vector with all values:
v <- c(2,-4,-1,5,7,0)
A <- matrix(v,nrow=2)
A
## [,1] [,2] [,3]
## [1,] 2 -1 7
## [2,] -4 5 0
# Generating matrix A from two vectors corresponding to rows:
row1 <- c(2,-1,7); row2 <- c(-4,5,0)
A <- rbind(row1, row2)
A
## [,1] [,2] [,3]
## row1 2 -1 7
## row2 -4 5 0
# Generating matrix A from three vectors corresponding to columns:
col1 <- c(2,-4); col2 <- c(-1,5); col3 <- c(7,0)
A <- cbind(col1, col2, col3)
# Giving names to rows and columns:
colnames(A) <- c("Alpha","Beta","Gamma")
rownames(A) <- c("Aleph","Bet")
A
## Alpha Beta Gamma
## Aleph 2 -1 7
## Bet -4 5 0
# Indexing for extracting elements (still using A from above):
A[2,1]
## [1] -4
A[,2]
## Aleph Bet
## -1 5
A[,c(1,3)]
## Alpha Gamma
## Aleph 2 7
## Bet -4 0
# Direct multiplication (not matrix multiplication but multiplying elements at same place)
A <- matrix( c(2,-4,-1,5,7,0), nrow=2)
B <- matrix( c(2,1,0,3,-1,5), nrow=2)
A*B
## [,1] [,2] [,3]
## [1,] 4 0 -7
## [2,] -4 15 0
# Transpose:
(C <- t(B) )
## [,1] [,2]
## [1,] 2 1
## [2,] 0 3
## [3,] -1 5
# Matrix multiplication:
(D <- A %*% C )
## [,1] [,2]
## [1,] -3 34
## [2,] -8 11
# Inverse:
solve(D)
## [,1] [,2]
## [1,] 0.0460251 -0.1422594
## [2,] 0.0334728 -0.0125523
A list is a generic collection of objects. Unlike vectors, components can be of different types.
# Generate a list object:
mylist <- list( A=seq(8,36,4), this="that", idm = diag(3))
# Print whole list:
mylist
## $A
## [1] 8 12 16 20 24 28 32 36
##
## $this
## [1] "that"
##
## $idm
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
# Vector of names:
names(mylist)
## [1] "A" "this" "idm"
# Print component "A":
mylist$A
## [1] 8 12 16 20 24 28 32 36
A data frame is an object that collects several variables and can be thought of as a rectangular shape with the rows representing the observational units and the columns representing the variables. As such, it is similar to a matrix. For us, the most important difference to a matrix is that a data frame can contain variables of different types (like numerical, logical, string and factor), whereas matrices can only contain numerical values.
Unlike a matrix, the columns alwways contain names which represent the variables. We can define a data frame from scratch by using the command data.frame
or as. data.frame
# Define one x vector for all:
year <- c(2008,2009,2010,2011,2012,2013)
# Define a matrix of y values:
product1<-c(0,3,6,9,7,8); product2<-c(1,2,3,5,9,6); product3<-c(2,4,4,2,3,2)
sales_mat <- cbind(product1,product2,product3)
rownames(sales_mat) <- year
# The matrix looks like this:
sales_mat
## product1 product2 product3
## 2008 0 1 2
## 2009 3 2 4
## 2010 6 3 4
## 2011 9 5 2
## 2012 7 9 3
## 2013 8 6 2
# Create a data frame and display it:
sales <- as.data.frame(sales_mat)
sales
## product1 product2 product3
## 2008 0 1 2
## 2009 3 2 4
## 2010 6 3 4
## 2011 9 5 2
## 2012 7 9 3
## 2013 8 6 2
The outputs of matrix sales_mat
and sales
look exactly the same, but they behave differently. In RStudio, the difference can be seen in the Workspace window. sales
is desceibed as 6 obs. of 3 variables. *** We can address a single variable var of a data frame df using the matrix-like syntax df[, "var"]
or by stating df$var
. This can be used for extracting the values of a variable but also for creating new variables. Sometimes, it is convenient not to have to type the name of the data frame several times within a command. The function with(df, some expression using vars of df
can help.
# Accessing a single variable:
sales$product2
## [1] 1 2 3 5 9 6
# Generating a new variable in the data frame:
sales$totalv1 <- sales$product1 + sales$product2 + sales$product3
sales
## product1 product2 product3 totalv1
## 2008 0 1 2 3
## 2009 3 2 4 9
## 2010 6 3 4 13
## 2011 9 5 2 16
## 2012 7 9 3 19
## 2013 8 6 2 16
# The same but using "with":
sales$totalv2 <- with(sales, product1+product2+product3)
sales
## product1 product2 product3 totalv1 totalv2
## 2008 0 1 2 3 3
## 2009 3 2 4 9 9
## 2010 6 3 4 13 13
## 2011 9 5 2 16 16
## 2012 7 9 3 19 19
## 2013 8 6 2 16 16
Sometimes, we do not want to work with a whole data set but only with a subset. This can be easily ahcieved with the command `subset(df, criterion)
, where criterion is a logical expression which evaluetes to TRUE for the rows which are to be selected.
# Subset: all years in which sales of product 3 were >=3
subset(sales, product3>=3)
## product1 product2 product3 totalv1 totalv2
## 2009 3 2 4 9 9
## 2010 6 3 4 13 13
## 2012 7 9 3 19 19