There are softwares and data handling skills needed for analytics. Usually, people with non-programming skills can use Microsoft Excel, Power BI and Tableau. Those who want to gain or have orientation towards programming skills can use programming and analysis applications. Some of the open source programming tools are R and Python. Most of the time, people from non-technical background finds it really hard to go after programming and handling data with software becomes a great challenge for them. Mostly, individuals with poor resources find it hard to purchase license to use data analysis software.
Hence, the following write-up is a modest attempt to make you learn basic skills in freely available software ‘R’.
What is R?
R is an environment that facilitates data handling and storage with a range of programming options allowing compatibility with other languages (e.g. python, css, C/C++).
The set of codes and functions to execute the data handling task come with multiple packages. There are 25 inbuilt packages that facilitate powerful analysis and handling of variety of objects.
Installation of R
You can install R using this link https://cloud.r-project.org according to your operating system.
It provides with a ‘>’ prompt on the interface (terminal) where you can input your commands.
Understanding simple input
Typing A<-1 and press enter on the > prompt in R will do followings. 1 is assigned to A. Here, A is the name of storage (called object) having a numeric value 1. Everything in R is an object. The assignment operator (->) ‘<’ and ‘-’ sign occur at the same time without space. Also, writing A<-1 is same as 1->A. So you can choose your style of writing the codes. One can also use ‘=’ in place of ‘<-’. I prefer using assignment operator in place of equal sign because equal signs are used as a logical expressions in comparing values, so it is better to avoid confusion and keep their distinct use.
Lets check how A appears in R. Type A and press enter that will give you following.
A<-1 A
## [1] 1
This tells us that there is one element in A having value 1. Likewise, you can assign more values to an object in R.
Let us do simple calculations and understand the process slowly.
A<-1 A
## [1] 1
B<-2 B
## [1] 2
A+B
## [1] 3
C<-A+B C
## [1] 3
Lets try storing words.
#See if A<-hi works? #Try this A<-"hi" A
## [1] "hi"
B<-"bye" B
## [1] "bye"
Do you want your storage and commands side by side. Not enjoying the non-GUI interface with limited options?
That is why RStudio is popular among people working with R.
Installing RStudio
The Rstudio link can guide you through the installation of R and an updated version of Rstudio Desktop using the following link
https://www.rstudio.com/products/rstudio/download/#download
Customize your view
You can choose how your Rstudio interface will look like by exploring this page https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-the-RStudio-IDE
Things to know before coding
R is a case sensitive expression language. It recognizes almost all the alphanumeric symbols depending up on the country of use. There are some essentials that everyone should know.
Where am I working?
getwd() # know your current working directory
## [1] "/Users/muditsingh/Desktop/Class/learnR"
setwd("/Users/muditsingh/Desktop/Class/learnR") # know your current working directory
Objects
The entities that R understands and uses to perform its operations are called objects. The objects can be a letter or combination of letters, storage, matrices, a file, list of files, numbers and so on. R performs its analysis based on the object type. So, it is important to understand how R classifies them. It treats the objects as mode and length.
Classes of objects
-
Character: anything inside double inverted commas (““).
A<-"hi!" A
## [1] "hi!"
mode(A)
## [1] "character"
class(A)
## [1] "character"
length(A)
## [1] 1
How mode and class differ?
mode is intrinsic to object and class is a term for technical segregation of the object types. For the simplicity, lets stick to class and mode as an specific object property.
-
Numeric: All the numbers from negative to positive.
A<-5 A
## [1] 5
B<--4 B
## [1] -4
-
Factor: These are similar to categorical variables. E.g. Gender with two categories Male and Female will be a factor variable with two levelshaving “Male” and “Female” labels.
A<-c("Name","Gender","Place") class(A)
## [1] "character"
B<-factor(A)#converting to factor with levels B
## [1] Name Gender Place ## Levels: Gender Name Place
class(B)
## [1] "factor"
length(B)
## [1] 3
-
Logical: Comparing values as shown in some examples below. x>y #x is greater than y y<x #y is less than x x!=y #x is not equal to y
A<-1 B<-2 A<B#compare A and B
## [1] TRUE
Why # sign? In R, we use # to insert comments/notes in the coding line.
Other than the four types of classification there are other types such as list, matrix, complex. At the moment, we begin with these for simplicity.
Some operations for practice
A<-c(10,12) A
## [1] 10 12
The object ‘A’ looks like [1] 10 12
Now, it tells us that there are two elements. Lets understand storage pattern of the object A while assigning more elements to an object.
A<-c(1,2,3,5,7,9) A
## [1] 1 2 3 5 7 9
A[2] indicates the Second element in A1
A[1:5] lists first to fifth elements stored in A1
Similarly, lets try storing words.
Boy<-"hi!" Boy
## [1] "hi!"
Girl<-"I want to learn R" Girl
## [1] "I want to learn R"
We can practice assigning multiple numeric values and character (e.g. names) to different objects and check the outputs.
Lets practice some simple calculations.
A<-2 B<-4 C<-A+B#addition D<-A/B#division E<-A-B#subtraction F<-A*B#multiplication G<-c(C,D,E,F) G
## [1] 6.0 0.5 -2.0 8.0
#Lets try with two elements in each A<-c(2,4) B<-c(4,6) C<-A+B#addition D<-A/B#division E<-A-B#subtraction F<-A*B#multiplication H<-c(C,D,E,F) H
## [1] 6.0000000 10.0000000 0.5000000 0.6666667 -2.0000000 -2.0000000 8.0000000 ## [8] 24.0000000
Using functions
prod(1,3,5)#product (multiplication)
## [1] 15
sum(1,2)#addition
## [1] 3
Can we join the two objects ‘A’ and ‘B’ in a meaningful way?
Lets try combining boy and girl statements from above example.
statement<-paste(Boy,Girl) statement
## [1] "hi! I want to learn R"
What is ‘paste’? It is a function that links together the objects. What are functions? A set of codes that gives the pre-defined operation on the object.One function for one task.
Lets try the following example.
first10<-paste(1:10) nth <- paste(1:7, c("st", "nd", "rd", rep("th", 4))) paste(nth, collapse = ",")
## [1] "1 st,2 nd,3 rd,4 th,5 th,6 th,7 th"
#Not looks like as expected? Try paste0 first10<-paste0(1:10) nth <- paste0(1:7, c("st", "nd", "rd", rep("th", 4))) paste(nth, collapse = ",")
## [1] "1st,2nd,3rd,4th,5th,6th,7th"
So, how do paste and paste0 differ?
‘paste’ converts the strings as ‘character’ that is why we ended up getting the unexpected result even if we supplied numerical data. Whereas ‘paste0’ treats the input as it is and gives outcome without any alteration in mode of the object. Lets see another example.
week<-c("Mon","Tue","Wed", "Thu", "Fri", "Sat", "Sun") paste(week, nth, sep = ": ", collapse = "; ")
## [1] "Mon: 1st; Tue: 2nd; Wed: 3rd; Thu: 4th; Fri: 5th; Sat: 6th; Sun: 7th"
#Using 'sep' paste("1st", "2nd", "3rd", sep = ", ")
## [1] "1st, 2nd, 3rd"