Elasticity and scalability via dynamic provisioning of servers and other resources is one of the tenets of cloud computing: in practice, being able to automatically grow the resources available to an application based on the level of demand is actually eased with patterns like WASABi, the Autoscaling Application Block for Azure.

But even if an application can automatically grow, a question is still unanswered: what is the minimum number of servers that should be reserved? And that is a complex question that requires trading off the losses of being unable to properly service requests because booting up instances takes some minutes, and the actual costs of those instances that may rest idle with no load.

Let’s consider a service with low margin (10%) and a really intensive computation (8 requests per second per Azure XL server) were the probability that a customer does not enter the service due to congestion is 1% and a much higher probability of 2.5% for those leaving the service after a request: to model the whole system an M/M/s queue is considered, that is, s servers experiencing a Poisson-like input process and an exponential service time, with an unlimited waiting queue. To solve the proposed queue model, R code and its resulting plot as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Minimum number of roles to calculate
minRoles <- 4
# Maximum number of roles to calculate
maxRoles <- 12
# Probability of not entering service due to high-load
probNotJoin <- 0.01
lossNotJoin <- 0
# Probability of leaving the service due to high-load
probLeaving <- 0.025
lossLeaving <- 0
# Revenue per request
customerRevenue <- 1
# Profit per request
customerProfit <- 0.1
# Azure XL web/worker role ($0.96/hour)
costServer <- 0.96
# New requests/hour
arrivalRate <- 90000
# Requests/hour serviced per role
serviceRate <- 30000
serviceCost <- 0
totalCost <- 0

library(pdq, help)
for(r in minRoles:maxRoles) {
Init("")
CreateMultiNode(r, "server", CEN, FCFS)
CreateOpen("requests", arrivalRate)
SetDemand("server", "requests", 1/serviceRate)

Solve(CANON)
queueLength <- GetQueueLength("server", "requests", TRANS) - arrivalRate/serviceRate

waitingTime <- 60 * queueLength/arrivalRate
numNotJoin <- queueLength * probNotJoin * arrivalRate
numLeaving <- waitingTime * probLeaving * arrivalRate
serviceCost[r] <- r * costServer
lossNotJoin[r] <- numNotJoin * customerRevenue * customerProfit
lossLeaving[r] <- numLeaving * customerRevenue * customerProfit
totalCost[r] <- serviceCost[r] + lossNotJoin[r] + lossLeaving[r]
}

plot(totalCost, xlim=c(minRoles,maxRoles), type="b", col="black", lwd=2,
main="Optimal Elasticity of Demand", xlab="Servers (m)", ylab="Cost ($)")
lines(serviceCost, xlim=c(minRoles,maxRoles), type="b", col="brown", lty="dashed")
lines(lossNotJoin, xlim=c(minRoles,maxRoles), type="b", col="green", lty="dashed")
lines(lossLeaving, xlim=c(minRoles,maxRoles), type="b", col="orange", lty="dashed")

The black curve represents the total costs: the sum of the server costs (brown curve) and losses due to not entering the service (green curve) and leaving the service (orange curve). An optimal reservation of servers is found at s=8, the minimum of the total curve costs.

 

As a follow-up to my previous post about basic computer engineering laws, this recent chart depicting Koomey’s Law, teaching us that for a fixed amount of computational power, the need for battery will fall by half every 1.6 years, or the other way round, the energy efficiency of computers doubles roughly every 18 months; a real breadth of fresh air to the current trend to conserve energy in computing systems.

 
Set your Twitter account name in your settings to use the TwitterBar Section.