R is a free statistical programming language. It has been widely used for statistics, data analysis and data visualization. One of the fields where R is widely used is genomics. Many R packages have been developed for population genetics data analyses and visualization.
Download and install R and RStudio. if you do not have them already. Rstudio is optional but it is a great tool if you are just starting to learn R. Once you have download and installed R and Rstudio, we ll start working by creating project.
Now open Rstudio. We ll learn couple of basic things.
We ll talk about these 4 distinct panes.
Once you are inside Rstudio you can click on your top right where it says project to create a new project. By doing this you ll have all your data and script in one location.
Now we can start working in Rstudio but before we go ahead and start writing scripts we can even work in our console.
In the console type getwd() and see what happens.
# Add
5+10
## [1] 15
# Substract
10-5
## [1] 5
# Multiply
5 * 10
## [1] 50
# Divide
100 / 5
## [1] 20
A vector can contain either number, strings, or logical values but not a mixture.
x <- 100
x
## [1] 100
x <- c(1,3,2,10,5)
str(x)
## num [1:5] 1 3 2 10 5
class(x)
## [1] "numeric"
# Selecting elements from vector
x[3]
## [1] 2
# lets plot something
plot(x)
mean(x)
## [1] 4.2
median(x)
## [1] 3
sd(x)
## [1] 3.563706
var(x)
## [1] 12.7
t.test(x)
##
## One Sample t-test
##
## data: x
## t = 2.6353, df = 4, p-value = 0.05786
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.2249254 8.6249254
## sample estimates:
## mean of x
## 4.2
## See what summary command does
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 2.0 3.0 4.2 5.0 10.0
List can hold different types of data structures.
mylist <- list(name="Fred",
mynumbers=c(1,2,3),
mymatrix=matrix(1:4,ncol=2),
age=5.3)
mylist
## $name
## [1] "Fred"
##
## $mynumbers
## [1] 1 2 3
##
## $mymatrix
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## $age
## [1] 5.3
## access using $ sign.
mylist$name
## [1] "Fred"
Matrix is vector but with dimension.
my_matrix <- matrix(data = 1:10,nrow = 5, ncol = 2)
dim(my_matrix)
## [1] 5 2
rownames(my_matrix) <- c("Btl", "Ktm", "Bhw", "Bkt", "Pak")
my_matrix
## [,1] [,2]
## Btl 1 6
## Ktm 2 7
## Bhw 3 8
## Bkt 4 9
## Pak 5 10
colnames(my_matrix) <- c("ItemA", "ItemB")
my_matrix
## ItemA ItemB
## Btl 1 6
## Ktm 2 7
## Bhw 3 8
## Bkt 4 9
## Pak 5 10
x <- c(1,2,3,4)
y <- c(4,5,6,7)
matrix <- cbind(x,y)
chr <- c("chr1", "chr1", "chr2", "chr2")
strand <- c("-","-","+","+")
start <- c(200,4000,100,400)
end <- c(250,410,200,450)
mydata <- data.frame(chr,start,end,strand)
head(mydata)
## chr start end strand
## 1 chr1 200 250 -
## 2 chr1 4000 410 -
## 3 chr2 100 200 +
## 4 chr2 400 450 +
#change column names
names(mydata) <- c("Chromosome","Start_position","End_position","Strand")
mydata
## Chromosome Start_position End_position Strand
## 1 chr1 200 250 -
## 2 chr1 4000 410 -
## 3 chr2 100 200 +
## 4 chr2 400 450 +
head(mydata)
## Chromosome Start_position End_position Strand
## 1 chr1 200 250 -
## 2 chr1 4000 410 -
## 3 chr2 100 200 +
## 4 chr2 400 450 +
tail(mydata)
## Chromosome Start_position End_position Strand
## 1 chr1 200 250 -
## 2 chr1 4000 410 -
## 3 chr2 100 200 +
## 4 chr2 400 450 +
nrow(mydata)
## [1] 4
ncol(mydata)
## [1] 4
dim(mydata)
## [1] 4 4
## Save csv file
#write.csv(x = mydata, file = "mydata.csv")
Create a dataframe with following columns of your group member (First_name, Last_name, home_district, Approx_height and one random number between 0 to 100).
Use functions to find how many rows and columns are there in your dataframe
Try combining first_name and last_name column and rename it to Name
Find mean of a column which has that random number.
For this part, we ll use the workshop material developed by Dr. Sydney E. Everhart’s lab at University of Nebraska, Lincoln.