How to create a normal distribution in R with examples

Distributions.

Distribution is the arrangement of values of a variable in order of magnitude. A distribution can be described as being symmetrical, skewed, uniform, random, or normal.

A distribution is said to be symmetrical if it is mirror-image symmetrical about the center. The most common type of symmetrical distribution is the normal distribution. A distribution is said to be skewed if one tail is longer than the other. The most common type of skewed distribution is the exponential distribution. A distribution is said to be uniform if all values are the same. A distribution is said to be random if it is not possible to predict the value of the variable.

There are many ways to visualize a distribution in R. One way is to use a histogram. To create a histogram, you can use the hist() function. This function will take in a vector of values and will create a histogram with bars representing the frequency of each value in the vector.

You can also use the qplot() function to create a variety of different types of plots, including histograms, boxplots, and scatterplots. This function is part of the ggplot2 package, which must be installed and loaded before using qplot().

x <- c(1,3,5,6,3,4,2,9,8,5,3)
x
hist(x)
How to creeate a normal distribution in R

Normal Distribution

A normal distribution is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In a normal distribution, data are distributed evenly on either side of the mean, and the mean, median, and mode are all equal. The normal distribution is also known as the Gaussian distribution.

x = seq(-5, 5, 0.1)
plot(x = x, y = dnorm(x), type="l", bty="n")
How to generate a normal distribution in R

There are 4 basic functions in R for calculating in the normal distribution.

  • rnorm\left(n,\ \mu,\ \sigma\right) generates n random values from a normal distribution with a mean \mu and a standard deviation sigma . If we omitted the parameters, R would default to \mu = 0 and \sigma = 1 .
qplot(rnorm(1000))
How to create a normal distribution in R (with examples)
  • pnorm\left(x,\ \mu,\ \sigma\right) is the cummulative distribution function (cdf) of the mormal distribution with mean \mu and standard deviation \sigma .
pnorm(10:14, 12, 0.8)

The above code basically means the probability of getting values 10, 11, 12, 13 and 14 with a mean of 12 and standard deviation of 0.8.

Output

[1] 0.006209665 0.105649774 0.500000000 0.894350226 0.993790335
  • qnorm\left(p,\ \mu,\ \sigma\right) is the inverse of the cdf of the normal distribution with mean \mu and standard deviation \sigma . It returns the vlue such that pnorm\left(x,\ \mu,\ \sigma\right)=p .
qnorm(c(.25,.50,.75), 12, .8)

Output

[1] 11.46041 12.00000 12.53959
  • dnorm\left(x,\ \mu,\ \sigma\right) is the probability density function of the normal distribution with mean \mu and standard deviation \sigma .

Example: Generate a normal distribution in R.

# To make sure that we get the same results for randomization.
set.seed(1)

#generate sample of 6000 observations that follows normal 4dist. with mean=57 and sd=5
X <- rnorm(6000,mean=57, sd=5)

#view first 6 observations in sample
head(x)

[1] 53.86773 57.91822 52.82186 64.97640 58.64754 52.89766

Let’s use a histogram to display the distribution of our random values x.

hist(x, col='steelblue')
How to generate a normal distribution in R. Visual with histogram.

Example 2

Simulate x-N(57,5) given that n=6000. Obtain the upper, lower limits and the number of outliers.

> #make this example reproducible
 set.seed(1)
#Generate sample of 6000 observations that follows normal dist. with mean=57 and sd=5
x <- rnorm(6000, mean=57,sd=5)
#View first 6 observations in sample
head(x)
[1] 53.86773 57.91822 52.82186 64.97640 58.64754 52.89766
#Find mean of sample
mean(x)
[1] 56.97696
 #Find standard deviation of sample
 sd(x)
[1] 5.094073
#Visualizing using a histogram
hist(x, col='steelblue') 
How to create a normal distribution in R (with examples)
#Plotting a boxplot to find upper and lower limit
bx <- boxplot(x)
How to create a normal distribution in R (with examples)
#Getting the limits of the boxplots
bx$stats
[1,] 43.52348 #Lower whisker
[2,] 53.65919 #Q1
[3,] 56.89197 #Q2 (median)
[4,] 60.44743 #Q3
[5,] 70.57166 #Upper whisker

#Plotting a scatter plot
plot(x)
#Using upper and lower limit
abline(h=43.52348)
abline(h=70.57166)

How to create a normal distribution in R (with examples)
#Printing outlier values in the upper and lower limit
outliers <- boxplot.stats(x)$out
out
[1] 42.55540 41.95976 76.05138 42.30113 72.27871 42.01525 40.93406 75.19787 40.73390 43.02019
[11] 42.13871 72.76986 71.59614 71.20093 70.90694 41.88848 42.02449 39.30207 43.06993 71.31071
[21] 43.11204 42.57491 72.32262 42.09573 40.95972 40.98945 41.64378 42.13852 43.29674 38.64350
[31] 70.68485 43.18712 42.43587 71.30796 41.40441 41.99593 40.83695
#Finding the number of outliers exist
length(outliers)
[1] 37
#Outliers in the upper limit
for (i in outliers){
  if (i > 70.57166)
    print(i)
}
[1] 76.05138
[1] 72.27871
[1] 75.19787
[1] 72.76986
[1] 71.59614
[1] 71.20093
[1] 70.90694
[1] 71.31071
[1] 72.32262
[1] 70.68485
[1] 71.30796
#Outliers in the lower limit
for (j in outliers){
  if (j < 43.52348)
    print(j)
}
[1] 42.5554
[1] 41.95976
[1] 42.30113
[1] 42.01525
[1] 40.93406
[1] 40.7339
[1] 43.02019
[1] 42.13871
[1] 41.88848
[1] 42.02449
[1] 39.30207
[1] 43.06993
[1] 43.11204
[1] 42.57491
[1] 42.09573
[1] 40.95972
[1] 40.98945
[1] 41.64378
[1] 42.13852
[1] 43.29674
[1] 38.6435
[1] 43.18712
[1] 42.43587
[1] 41.40441
[1] 41.99593
[1] 40.83695

By Benard Mbithi

A statistics graduate with a knack for crafting data-powered business solutions. I assist businesses in overcoming challenges and achieving their goals through strategic data analysis and problem-solving expertise.