WHAT IS R PROGRAMMING?
R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. It is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes:
an effective data handling and storage facility,
a suite of operators for calculations on arrays, in particular matrices,
a large, coherent, integrated collection of intermediate tools for data analysis,
graphical facilities for data analysis and display either on-screen or on hardcopy, and
a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
DIFFERENCE BETWEEN VECTOR, LIST, MATRIX AND DATAFRAME.
A vector is a series of data elements of the same basic type. The members in the vector are known as a component.
The R object that contains elements of different types such as numbers, strings, vectors, or another list inside it, is known as List.
A two-dimensional data structure used to bind the vectors from the same length, known as the matrix. The matrix contains the same types of elements.
A Data frame is a generic form of a matrix. It is a combination of lists and matrices. In the Data frame, different data columns contain different data types.
GIVE ANY 5 FEATURES OF R.
5 features of R are:
Simple and effective programming language.
a) It is a data analysis software.
b) It gives an effective storage facility and data handling.
c) It gives high extensible graphical techniques.
d) It is an interpreted language.
WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF R?
Advantages of R are:
a) Open Source
b) Data Wrangling
c) Array of Packages
d) Platform Independent
e) Machine Learning Operations
f) Disadvantages of R are:
g) Weak origin
h) Data Handling
i) Basic Security
j) Complicated Language
k) Lesser Speed
WHAT ARE THE STEPS TO BUILD AND EVALUATE A LINEAR REGRESSION MODEL IN R?
When creating a linear regression model, the following successive actions must be taken:
In order to develop the model on the train set and assess its performance on the test set, you must first divide the data into train and test sets.
The “catools” package’s split() method. This function offers a split-ratio option that you can customise based on your requirements.
You can now proceed to building the model on the training set once you have finished dividing the data into the training and test sets.
A model is constructed using the “lm()” function.
Finally you can predict the values on the test set, using the “predict()” function.
The final step would be to find out the RMSE, the lower the RMSE value, the better the prediction.
WHAT IS THE CONFUSION MATRIX IN R?
It is possible to assess the accuracy of the created model using a confusion matrix. A cross-tabulation of observed and anticipated classes is calculated. The “confusionmatrix()” method from the “caTools” package can be used to accomplish this.
HOW WOULD YOU WRITE A CUSTOM FUNCTION IN R? GIVE AN EXAMPLE.
This is the syntax to write a custom function In R:
<object-name>=function(x){
—
—
—
}
Let’s look at an example to create a custom function in R ->
fun1<-function(x){ ifelse(x>5,100,0) }
z<-c(1,2,3,4,5,6,7,8,9,10)
fun1(z)->z
WHAT PACKAGES ARE USED FOR DATA MINING IN R?
Some packages used for data mining in R:
data.table- provides fast reading of large files
rpart and caret- for machine learning models.
GGplot- provides various data visualisation plots.
tm- to perform text mining.
Forecast- provides functions for time series analysis
HOW WOULD YOU MAKE MULTIPLE PLOTS ONTO A SINGLE PAGE IN R?
Plotting multiple plots onto a single page using base graphs is quite easy:
For, example if you want to plot 4 graphs onto the same pane, you can use the below command:
par(mfrow=c(2,2))
GIVEN A VECTOR OF VALUES, HOW WOULD YOU CONVERT IT INTO A TIME SERIES OBJECT?
Let’s say this is our vector->
a<-c(1,3,5,7,9)
To convert this into a time series object->
as.ts(a)->a
WHAT IS A WHITE NOISE MODEL AND HOW CAN YOU SIMULATE IT USING R?
A fundamental time series model is the white noise (WN) model. The simplest illustration of a stationary process is one example.
A white noise model includes:
a) a continuous fixed mean
b) a constant fixed variance
c) No pattern across time
Simulating a white noise model in R:
arima.sim(model=list(order=c(0,0,0)),n=50)->wn
ts.plot(wn)
WHAT IS A RANDOM WALK MODEL AND HOW CAN YOU SIMULATE IT USING R?
A random walk is a simple example of a non-stationary process.
A random walk has:
a) No specified mean or variance
b) Strong dependence over time
c) It’s changes or increments are white noise
Simulating random walk in R:
arima.sim(model=list(order=c(0,1,0)),n=50)->rw ts.plot(rw)
GIVE THE COMMAND TO CREATE A HISTOGRAM AND TO REMOVE A VECTOR FROM THE R WORKSPACE.
hist() is the command to create a histogram, where you can specify the details by typing hist(v,main,xlab,xlim,ylim,breaks,col,border).
– v is a vector containing numeric values used in histogram.
– main indicates the title of the chart.
– col is used to set the color of the bars.
– border is used to set the border color of each bar.
– xlab is used to give a description of x-axis.
– xlim is used to specify the range of values on the x-axis.
– ylim is used to specify the range of values on the y-axis.
– breaks is used to mention the width of each bar.
– rm() is used to remove a vector from the R workspace.
WHY DO WE USE APPLY() FUNCTION IN R?
This is used to apply the same function to each of the elements in an Array. For example, finding the mean of the rows in every row.
HOW DO YOU CREATE A VECTOR IN R?
To create a vector in R, you have to use the <- symbol to assign a name to a vector. For example if you want to store the values 4 5 8 14 as a vector in x, you will have to type the command: x<-c(4,5,8,14)
EXPLAIN THE DIFFERENT FUNCTIONS THAT CAN BE APPLIED FOR NORMAL DISTRIBUTION IN R.
The different functions that can be applied for normal distribution in R are as follows:
a) dnorm(x, mean, sd)
b) pnorm(x, mean, sd)
c) qnorm(p, mean, sd)
d) rnorm(n, mean, sd)
Following is the description of the parameters used in above functions −
a) x is a vector of numbers.
b) p is a vector of probabilities.
c) n is the number of observations(sample size).
mean is the mean value of the sample data. Its default value is zero.
sd is the standard deviation. Its default value is 1.
EXPLAIN THE DIFFERENT FUNCTIONS THAT CAN BE APPLIED FOR BINOMIAL DISTRIBUTION IN R.
The different functions that can be applied for Binomial distribution in R are as follows:
a) dbinom(x, size, prob)
b) pbinom(x, size, prob)
c) qbinom(p, size, prob)
d) rbinom(n, size, prob)
Following is the description of the parameters used −
a) x is a vector of numbers.
b) p is a vector of probabilities.
c) n is the number of observations.
size is the number of trials.
prob is the probability of success of each trial.
WHAT IS THE MAIN DIFFERENCE BETWEEN AN ARRAY AND A MATRIX?
A matrix is always two-dimensional as it has only rows and columns. But an array can be of any number of dimensions and each dimension is a matrix. For example, a 332 array represents 2 matrices each of dimension 33.
HOW CAN YOU LOAD AND USE A CSV FILE IN R?
A CSV file can be loaded using the read.csv function. R creates a data frame on reading the CSV files using this function.
HOW DO YOU GET THE NAME OF THE CURRENT WORKING DIRECTORY IN R?
The command getwd() gives the name of the current working directory in R.
HOW DO YOU INSTALL A PACKAGE IN R?
To install a package in R, you need to give the following command:
install.packages(“package name”)
WHAT IS THE OUTPUT OF RUNIF(6)?
runif(6) generates 6 random numbers from a uniform distribution between 0 and 1.
GIVE THE R COMMAND TO GET THE PROBABILITY OF GETTING 26 OR LESS HEADS FROM 51 TOSSES OF A COIN USING PBINOM.
The R command to get the probability of getting 26 or less heads from a 51 tosses of a coin using pbinom is:
x<-pbinom(26,51,0.5)
print(x)
The first command obtains the required probability and stores the value in x. The second command, ie., print(x) prints or shows the value of x.
GIVE THE COMMANDS TO OBTAIN THE MEAN, MEDIAN AND MODE OF A DATASET.
The command for obtaining the mean of a dataset is: mean(…)
The command for obtaining the median of a dataset is: median(…)
The command for obtaining the mode of a dataset is: mode(…)
HOW ARE R COMMANDS WRITTEN?
By using # at the starting of the line of code like #division commands are written.
WHAT IS T-TESTS() IN R?
It is used to determine if the means of two groups are equal or not by using the t.test() function.
WHAT IS THE USE OF SUBSET() AND SAMPLE() FUNCTIONS IN R?
Subset() is used to select the variables and observations and sample() function is used to generate a random sample of the size n from a dataset29.
HOW CAN YOU PRODUCE CO-RELATIONS AND COVARIANCES?
Cor-relations are produced by cor() and covariances are produced by cov() function.
WHAT IS THE WORKSPACE IN R?
Workspace is the current R working environment which includes any user defined objects like vectors, lists etc.
WHAT IS THE FITDISTR() FUNCTION?
It is used to provide the maximum likelihood fitting of univariate distributions. It is defined under the MASS package.
WHY IS THE LIBRARY() FUNCTION USED?
This function is used to show the packages which are installed.
ON WHICH TYPE OF DATA BINARY OPERATORS ARE WORKED?
Binary operators work on matrices, vectors and scalars.
WHICH FUNCTION IS USED TO CREATE A FREQUENCY TABLE?
Frequency table is created by the table() function.
HOW CAN YOU IDENTIFY THE DATA TYPE OF AN OBJECT?
Using the functions class() or typeof(), you can identify the data type of an object in R. The class() function returns the actual data type, whereas typeof() returns a more detailed idea of the type of data.
HOW ARE EXPLORATORY PLOTS SUCH AS HISTOGRAMS AND DENSITY PLOTS USED IN R TO UNDERSTAND CLAIM DISTRIBUTIONS?
Histograms and density plots help understand the shape, spread, and skewness of claim data. They indicate whether losses are symmetric, right-skewed, or heavy-tailed, which guides the choice of appropriate actuarial distributions.
HOW DOES R HELP COMPARE EMPIRICAL CLAIM DATA WITH THEORETICAL DISTRIBUTIONS USING PLOTS?
R allows overlaying theoretical distribution curves on empirical plots. Visual comparison highlights how well the theoretical distribution captures the central behavior and tail characteristics of the data.
HOW ARE QQ-PLOTS IN R INTERPRETED WHEN CHECKING DISTRIBUTIONAL ASSUMPTIONS FOR INSURANCE LOSSES?
QQ-plots compare empirical quantiles with theoretical quantiles. Deviations from the reference line indicate poor fit, skewness, or heavy tails, which are common in insurance loss data.
HOW DOES R SUPPORT VISUALIZATION OF SKEWED AND HEAVY-TAILED LOSS DISTRIBUTIONS?
R supports transformations, density scaling, and log-scale plots. These techniques make skewed distributions easier to interpret and help assess tail risk.
HOW ARE BOXPLOTS IN R USED TO IDENTIFY OUTLIERS IN ACTUARIAL DATASETS?
Boxplots highlight extreme observations beyond the interquartile range. In actuarial work, these points often represent large losses and must be investigated rather than removed automatically.
HOW ARE DIAGNOSTIC PLOTS IN R USED TO ASSESS THE ADEQUACY OF A REGRESSION MODEL?
Diagnostic plots assess linearity, variance stability, and residual behavior. They help identify model misspecification, influential observations, or unmet assumptions.
HOW DOES R VISUALIZE THE RELATIONSHIP BETWEEN PREDICTORS AND RESPONSE VARIABLES IN PRICING MODELS?
Scatter plots and smooth trend lines reveal relationships between risk factors and expected losses, supporting intuitive understanding before formal modeling.
HOW ARE RESIDUAL PLOTS FROM GLMS INTERPRETED IN AN ACTUARIAL CONTEXT?
Residual plots indicate whether the model captures systematic patterns. Random scatter suggests good fit, while structure or clustering suggests missing predictors or incorrect distributional assumptions.
HOW DOES R HELP VISUALIZE THE EFFECT OF RATING FACTORS IN A GLM?
R visualizes relative differences between factor levels, helping explain how rating variables influence expected claims in pricing models.
HOW ARE FITTED VERSUS OBSERVED PLOTS USED IN R TO VALIDATE ACTUARIAL MODELS?
These plots compare predicted values with actual outcomes. Close alignment indicates good predictive performance, while systematic deviation signals bias.
HOW ARE MARKOV CHAIN RESULTS VISUALIZED IN R TO EXPLAIN STATE TRANSITIONS?
Results are visualized through state probability plots or transition diagrams, making movement between states intuitive for stakeholders.
HOW DOES R HELP PLOT THE EVOLUTION OF STATE PROBABILITIES OVER TIME IN A MARKOV MODEL?
R enables time-series style plots that show how state occupancy probabilities change across projection periods, supporting long-term actuarial analysis.
HOW ARE TRANSITION PROBABILITIES INTERPRETED WHEN VISUALIZED USING R OUTPUTS?
Transition probabilities quantify the likelihood of moving between states. Visual summaries help assess stability, persistence, and exit behavior.
HOW DOES R SUPPORT COMPARISON BETWEEN DETERMINISTIC AND STOCHASTIC PROJECTIONS?
R allows both single-path projections and distribution-based outcomes to be plotted together, highlighting uncertainty and variability around expected results.
HOW ARE POISSON PROCESS SIMULATIONS VISUALIZED IN R TO EXPLAIN CLAIM ARRIVAL PATTERNS?
Event counts over time are plotted to show randomness and clustering, helping explain variability in claim frequency.
HOW ARE SURVIVAL CURVES PLOTTED IN R AND INTERPRETED FOR ACTUARIAL APPLICATIONS?
Survival curves show the probability of remaining in-force over time. They are used to analyze mortality, lapse, or disability experience.
HOW DOES R VISUALIZE HAZARD RATES ACROSS DIFFERENT RISK GROUPS?
Hazard rate plots compare instantaneous risk across groups, supporting relative risk assessment rather than absolute predictions.
HOW ARE MORTALITY CURVES PLOTTED AND COMPARED ACROSS TIME PERIODS IN R?
Mortality rates by age are plotted for different years to assess improvements or deterioration, which informs longevity risk analysis.
HOW DOES R HELP VISUALIZE EXTREME LOSSES WHEN APPLYING EXTREME VALUE THEORY?
Tail-focused plots highlight large losses beyond a threshold, helping assess catastrophe exposure and capital adequacy.
HOW ARE REINSURANCE IMPACTS VISUALIZED IN R USING LOSS DISTRIBUTION PLOTS?
Gross and net loss distributions are plotted together, showing how reinsurance reduces volatility and tail risk.
HOW IS R DIFFERENT FROM OTHER PROGRAMMING LANGUAGES USED IN DATA ANALYSIS?
R is designed specifically for statistical analysis and modeling, with extensive built-in support for probability, inference, and visualization.
HOW DOES R HANDLE DATA TYPES SUCH AS VECTORS, FACTORS, AND DATA FRAMES?
R treats vectors as fundamental objects, uses factors for categorical data, and data frames for structured datasets, which aligns well with statistical modeling needs.
HOW DOES R MANAGE MISSING VALUES DURING ANALYSIS?
R explicitly represents missing values and provides tools to detect, exclude, or account for them during analysis.
HOW DOES R SUPPORT REPRODUCIBLE ANALYSIS?
R promotes reproducibility through scripts, structured workflows, and report-generation tools that combine analysis with documentation.
HOW DOES R DIFFERENTIATE BETWEEN BASE PLOTTING AND GGPLOT-BASED PLOTTING?
Base plotting follows an imperative style, while ggplot follows a layered, declarative approach based on grammar of graphics principles.
HOW IS THE STRUCTURE OF A GGPLOT EXPLAINED CONCEPTUALLY?
A ggplot is built by mapping data to visual elements and layering geometries, scales, and annotations.
HOW DOES R LAYER INFORMATION WHEN CREATING COMPLEX PLOTS?
Layers are added incrementally, allowing multiple data representations such as points, lines, and summaries in a single plot.
HOW ARE LEGENDS, AXES, AND LABELS HANDLED IN R PLOTS?
R provides flexible control over plot annotations, ensuring clarity and interpretability for business communication.
HOW DOES R HANDLE LARGE DATASETS EFFICIENTLY?
R uses optimized data structures, vectorized operations, and memory management techniques to handle large datasets effectively.
HOW IS R COMMONLY USED IN PRODUCTION OR BUSINESS REPORTING ENVIRONMENTS?
R is used for model development, automated reporting, dashboards, and analytical pipelines that support actuarial and business decisions.