Revenue Prediction by Monte Carlo and Triangular Distribution

Revenue prediction and forecasts are typically done to be too right. They create a false sense of precision that ultimately falls short and does a disservice to managers who need accurate forecasts for planning. Hey, even meteorologists equipped with voluminous historical data and supercomputers cannot get it right all the time!

Why Monte Carlo Simulation?

Monte Carlo simulations can build valid probability models that give us much more accurate view into the future. There could be thousands of potential variables involved in monthly sales, and we cannot manage them all. MC simulation can help emulate real world combinations of few variables. Remember that MC simulation is not a magic bullet to create forecasts. For products that have seasonality and promotions, I typically use ARIMA, Stepwise Regression and Time Series techniques.

Revenue Prediction by Monte Carlo and  Triangular Distribution

This simple example is driven by stochastic inputs. The stochastic input includes the units sold and sales price for a product without history. Since these values are non-deterministic, the user is asked to guess the best, worst, and most likely (mode) cases. A triangular random variable traverses the values that may occur between the best case and worst case values. For this revenue prediction application, we want to generate a random number from the triangular distribution for a given probability of this input. Hence, the triangular distribution is modeled with the following parameters:

a (worst case value)

b (best case value)

m (most likely value) or the mode; (a m b)

Both a and b serve as the boundary parameters (a < b)

Probability Density Function PDF can be defined as,

traingle-distribution-monte-carloThe inverse triangular distribution function is defined as,

traingle-inverse-distribution-monte-carlo

We basically use this PDF to create values for the stochastic input such as units sold during the simulation runs. In every single run, these values are updated and recorded on a simulation calculation.

The spreadsheet in this post uses the above standard inverse cumulative method for generating a random number. Note that the Excel Rnd function is a pseudo random number generator and is not robust. You can use alternative uniform random number generator from tools such as Matlab or R.

The application is simple to use. Although few features require manual setup. You can begin by keying in best, worst and most likely (mode) of units sold and sale prices in columns C5 through E6. Enter total trials or simulation runs in column C10. I’ve simulated the revenue model using 5000 runs.

traingle-inverse-input-parameters

 

Next, start the simulation by clicking on the button. Higher number of runs makes your data set more realistic but it comes at the expense of computing speed and processor limitations. Cells B13 through E5012 are output of the simulation. The TraingleDist() takes as input the best case, worst case, and most likely values from the above input and outputs a random number that is found from this triangular distribution for a given probability.

traingle-dist-code-snippetThe results are summary of basic stats from the simulation runs – Mean, Min, Max, and Sample Std. Deviation. They are in Cells K4 through K7.

traingle-inverse-resultsFew Manual things to do:

So far, the VBA did the simulation from triangular distribution automatically. The histogram part requires few simple manual steps. If you plan on running the simulation for 5000 trials, you can ignore this step. There are 16 bins that represent the revenue data intervals. For number of trials other than 5000, simply change the following formula in cell K13.

 =COUNTIFS($E$13:$E$5012,”>=”&J13,$E$13:$E$5012,”<“&J14)

Say you ran the simulation for 2000 runs, change the formula to

=COUNTIFS($E$13:$E$2012,”>=”&J13,$E$13:$E$2012,”<“&J14)

and drag it through K28.

 

traingle-dist-histogram

 

The histogram plot is driven by the frequency distribution table above.

 

traingle-inverse-graphAs you can notice that the results cluster around the mean of 24K. 5000 simulations may be unrealistic or not enough to get a good sample set. The limitations are mainly due to memory, and processor speed. Careful planning must be take into account for the long tail events. Questions to be addressed are – how to deal with just 16K revenue or how can a business support unexpected growth of 36K in revenue?

Download Revenue Prediction by Monte Carlo and Triangular Distribution spreadsheet

#excel-vba, #monte-carlo-simulation, #revenue-prediction

Optimize Warehouse Location in Excel

There are many factors to consider when building a new warehouse facility. Making the right decisions with regards to both warehousing and distribution methods could make all the difference for your firm. This can range from storage requirements, the level of automation in material handling, and workforce availability.  The most important among all is the Physical Location.

Physical Location

When zeroing on which warehouse to use, choosing the one with the best physical location is important. The first question to ask yourself is, which region are you looking to serve? Ensuring your product is stored in a region near your customers is important for prompt deliveries. This also factors into considering cost. Calculating landed transportation costs to facility from manufacturing, and expected transportation costs from facility to end customer, will help decide where you can afford to keep your product.

The location’s proximity to carrier facilities should also be taken into consideration. Look for an all-encompassing solution that offers both warehousing and transportation to get the most value out of your investment, or ensure your storage facility is as close to your carrier as possible.

The general objective of transport optimization is minimizing transport cost subjected to the customer service policy. Normally, the transport solution affects inventory carrying and warehousing cost considerably.

Consider a simple scenario – A company stores all of its products in a single warehouse. All of its customers are located in the eastern sea board of the US. The management wishes to determine the best location for their warehouse in order to minimize total transportation costs. The number of shipments made to each of its customers for a period is given. The main objective is to determine the warehouse location based on its latitude and longitude coordinates.

Optimize Warehouse Location in Excel

The input for this problem is the location of each address identified by its lat-long. The user is asked to fill in the shipping addresses of the customer and the number of shipments in a given period, and the spreadsheet works tirelessly behind the scenes to determine the optimal lat-long coordinates of the warehouse to minimize transportation costs. Once you have filled out the shipping point addresses, simply click on “Geocode” command button. Columns C, D, and E are computed using Google Geo web service.

warehouse-transport-optim-data-latlongThe distance formula is based on spherical law of cosines and bears more accuracy than the straightforward orthogonal distance. R is 69 and is based on earth’s curvature and is used in calculating latitude and longitude distances for cities.

d = acos(sin θ1 * sin θ2 + cos θ1 * cos θ2 * cos Δλ ) * R

Geo-coding and distance computation are done behind the scenes by a free web service from Google. A Caveat – Google is nice enough to geo-code for free but they won’t put up with thousands of requests or attempts per hour from the same IP. After about 10 seconds or so, they will stop sending coordinates for a brief period. You may occasionally get “Not Found” message. The solution is to wait for few seconds and try again. The underlying code has an intentional delay of 2 seconds for every address that needs to be geo-coded. If you persistently get the “Not Found” message check to make sure that your address is as complete as it can be. Adding things like the province and the country can help Google narrow its search and provide a better result.

Setting up Solver

The decision variables are the latitude and longitude values of the warehouse location in columns H3 and I3 respectively.

Latitude and Longitude coordinates are the only constraint to this problem and they must be between 0 and 120 degrees. We only need to add the constraint that they be less than or equal to 120 and they must be non-negative.

warehouse-transport-optim-solver-setupThe objective function to minimize the total distance between the warehouse and the shipment addresses is set to cell J12. It is the sum of the array product between the column of distances and the column of shipments made to each shipping addresses from the input table. The column range for shipments is “WhouseShipments” and distance as “WHouseDistance”. Thus, the formula for the objective function becomes:

=SUMPRODUCT(WhouseShipments,WHouseDistance)

The decision variables, cells H3 and I3 are to be determined by the solver. Cells H3 and I3 must be non-negative. Choose “Evolutionary” as the solving method.

Now click on Options to view the Evolutionary Solver options. Check Assume Non-Negative and set the population size and mutation rate. Initial population size in this problem is 100 and mutation rate as 25%. warehouse-transport-optim-populationreportwarehouse-transport-optim-constraints-varOnce the solver finishes its operation, you can view the number of iterations of the Genetic Algorithm in the Population and Results report. The outcome of the solution is shown below in the Population report.

warehouse-transport-optim-solver-answerBased on those 24 customer shipping points in this example, the solver by evolutionary algorithm has found the coordinates of (41.74, 76.25) as optimal.

Download Optimize Warehouse Location Spreadsheet

#excel-google-geo-xml, #linear-programming, #lp-simplex, #transport-optimization

A Simple Adword Spend Optimization

How much should my AdWords budget be?

This is a very common and an important question for advertisers. It is also one that doesn’t have a straightforward answer. If AdWords is profitable for you, then by all means your budget should be as HIGH as possible. If AdWords is not profitable for you, then your budget should be ZERO.

For most advertisers, the real answer lies somewhere in between. Some products and services can be profitable for you to advertise, while others will not be. Some keywords, geographic locations, mobile vs traditional platforms, and other targeting options will drive Adwords profitability.

If you expect AdWords to make profit for you from day one, you are setting yourself up for unpleasant surprises. There is often an initial investment required until you can ascertain what is working and what isn’t.

Generating optimal number of qualified leads within Adword Budget Constraints

An advertiser wants to allocate a pay-per-click marketing  budget among five internet marketing vehicles – Google Display, Google Adwords, Facebook, LinkedIn and Twitter. Every single one of the pay-per-click vehicles has a different pricing and effectiveness per click. The objective is to achieve the highest number of qualified leads by dividing the advertising budget optimally among these 5 pay-per-click vehicles.  The advertiser is constrained by budget limits and each pay-per-click vehicle has an advertising spending limit as well.

A Simple Adword Budget Optimization

An analyst has come up with the following details about the cost and effectiveness of each of the 5 pay-per-click advertising vehicles for a consumer electronics manufacturer:

Average
Cost Per Click
Expected
No. of
Clicks required
To Get
1 Inquiry
Expected
No. of
Inquiries required
To Get
1 Qualified
Lead
Google Display $0.88 7 12
Google Adwords $0.79 9 10
Facebook $0.81 5 11
 LinkedIn $0.94 8 21
 Twitter $0.90 7 10

“One size does not fit all” – The above inquiry and lead conversion figures are from  hardware equipment/electronics sector and are pretty meaningless for a single manufacturer. Way too many variables come into picture to have a “universal lead conversion rate”. The notable ones are:

  • Landing Page Layout
  • Traffic Source
  • Definition of Conversion (different for B2B/B2C, non-profit, educational or full blown E-Commerce sites)
  • Industry
  • Price of Product
  • Ease of Checkout

They have just been used in this example to simplify the optimization problem so as the lead conversion numbers can be indexed to something realistic. Refer to MarketingSherpa’s 2012 PPC Edition chart of their Search Marketing Benchmark Reports and the conversion rates by industry.

Again, the purpose of this problem is to achieve the highest number of qualified leads by allocating the ad budget optimally among these 5 pay-per-click vehicles.

Solution Steps:

Step 1 – Setup the objective

The main task is to maximize the total number of qualified leads. Number of qualified leads depend on Ad Spend$ (column C), Average Cost per Click (Column D), Expected Clicks Required to get 1 inquiry (Column F), Expected number of Active Inquiries (Column G), Expected Inquiries required to get 1 Qualified Lead (Column H).

adword-optim-formulation-varsStep 2 – Identify the decision Variables

The key to find out the decision variables is to determine the amounts to spend on each pay-per-click vehicle to maximize the number of leads. Columns C3 through C7 serve as the decision variables or the Ad Spend $. The initial amount was set based on the constraints in the next step. You need to ensure that the sum of all Ad Spend (columns C3 through C7) must add up to $50K which is our allowed budget. The solver will optimize the spend allocation based on the objective and constraints.

Step 3 – List all constraints

As seen from below screen grab from excel, the problem is bounded by:

  • Total allowed budget of $50K. Total Ad spend cannot exceed this amount
  • Display Network has an upper limit of $12K
  • Ad Word Budget has an upper limit of $15K
  • Facebook Budget has an upper limit of $6K
  • LinkedIn Budget has an upper limit of $12K
  • Twitter Budget has an upper limit of $12K
  • All decision variables or Ad Spend are integer
  • Unconstrained variables are all non-negative

Integer constraints ensure that Ad Spend are kept in whole dollar amounts.

adword-optim-formulation-constraintsStep 4 – Setup Excel Solver

The objective cell is S9 which is maximization of “total expected number of qualified leads”. Since the objective function is linear first order, Simplex LP is the chosen solving method.

Note: You must have Excel Solver installed before you do this exercise. In Excel 2010, Solver can be found under Data menu. If you are unsure about it, the installation steps can be found here.

adword-optim-solver-setupStep 5 – Interpret the Answer Report

Solver has finished its job and has optimized Ad Spend allocation among 5 pay-per-click vehicles (Column M3 through M7). The amounts are highlighted in yellow.

adword-optim-resultsAs you can see, LinkedIn constraint did not hit the target limit and $7000 slack is available. After running this optimization model, the objective function value (expected number of qualified leads) improved from original 495 to 528, which remained our main pursuit.

adword-optim-results1Download Ad Spend Optimization Spreadsheet

#linear-programming, #lp-simplex, #marketing-budget-optimization

Adword Spend Optimization using goal programming

Adword-Budget-Spend

The difficulty of figuring out an optimal solution of a multiple objective engineering or a business problem lies in the competing conflicts that may exist between the optimal solutions for the different objectives. The best solution for a given objective might be undesirable for another.

multi-objective-pareto

 

 

 

 

 

 

Both functions 1 and 2 are maximized at two different point of feasible region. A feasible solution is considered efficient or Pareto optimal if there are no other feasible solution with all the objective functions.

Adword Spend Optimization using goal programming

A common inbound marketing budget requirement is to generate maximum number of qualified leads as cheaply as possible. Excel Solver and Matlab can be used as effectively to allocate a marketing budget among various internet marketing vehicles. A Marketing Research Analyst collected the following Ad-word conversion rate across different social media and search engine vehicles for a client in financial industry.

Internet Mktg. Vehicle Min. Lead Conversion Rate Max. Lead Conversion Rate Expected Lead Conversion Rate
Google Search 3.50% 8.00% 7.32%
Google Display 1.65% 5.82% 4.97%
Bing Search 2.10% 6.32% 5.25%
Bing Display 1.52% 6.95% 5.21%
Yahoo Search 1.89% 5.82% 4.97%
Yahoo Display 1.45% 5.92% 5.10%
Facebook 4.50% 8.50% 6.25%
LinkedIn 3.89% 7.20% 6.30%
Twitter 3.23% 4.80% 4.15%
    The client has stipulated these conditions,
  • All $1M is to be invested
  • At most $600,000 of ad-word spend can be in Search and Display Network
  • At least $650,000 of the total ad spend should have the maximum potential of qualified lead conversion rate to be 4% or greater

The client requirement stipulates  to set up a goal based programming where the minimum lead conversion rate is assumed to be to twice as important as the expected lead conversion rate and three times as important as Ad-words on Display network. Particularly, the following goals are set on each objective:

1. The minimum overall lead conversion rate must be at least 3.5%

2. The expected lead conversion rate must be at least 5.5%

3. At most, $200,000 can be allocated in Display networks (Bing, Google, and Yahoo)

4. At minimum, $250,000 must be allocated towards Social Media networks (Facebook, LinkedIn, and Twitter)

Mathematical Background:

For a set of objective functions, we need to repose multi-objective problem as a single objective, fp(x) by placing limits on remaining objectives, and keep hard constraints.

multi-func1

 

 

 

subject to

gj(x) ≤ 0   (inequality constraints)

hk(x) = 0  (Hessian Matrix)

xiL ≤ xi ≤ xiU

The above problem can be formulated as

minimize fp(x)

subject to fl(x) ≤ εl      l = 1,2,…,nobj (l ≠ p)

gj(x) ≤ 0

hk(x) = 0

xiL ≤ xi ≤ xi

Problem Formulation

In this goal attainment problem, the following variables need to be defined for under and overachieving goals.

U1 – Under achievement of goal 1 which is the minimum lead conversion rate

O1 – Over achievement of goal 1

U2 – Under achievement of goal 2 which is expected lead conversion rate

O2 – Over achievement of goal 2

U3 – Under achievement of goal 3

O3 – The planned budget $200,000 allocated towards Display Network advertising should not be exceeded. O3 needs to be minimized.

U4 – At least $250,00 must be allocated towards Social Media advertising. Under achieving U4 is undesirable and thus needs to be minimized

O4 – Over achievement of goal 4

The objective function must be constructed in a way that the sum of undesirable deviations from the predetermined goals is minimized.

Goal 1 – is to have 3.5% minimum lead conversion over the number of clicks. Underachievement of the goal of 3.5% is not desirable. Hence, the variable to minimize is therefore U1.

Goal 2 – is to have 5.5% expected lead conversion over the number of clicks. Under achievement of 5.5% goal is undesirable. U2 therefore needs to be minimized.

Goal 3 – Budget allocation towards online display network should be restricted within $200,000. Over achievement of this goal is undesirable. This variable has to be minimized.

The objective function f(x), can be written as

f(x) = 3U1 + 1.5U2 + 1O3

Since, minimum lead conversion rate is assumed to be to twice as important as the expected lead conversion rate and three times as important as Ad-words on Display network.

Decision Variables – Let GS represent the Google Search Network, GD represent Google display network, similar abbreviations for other search engines, FB represents FaceBook, TW represents the Twitter, and LI for LinkedIn.

There are few soft and hard constraints in this problem. Soft constraints give more scope for relaxing the restrictions depending on the preference of the constraint being satisfied in the objective.

SEM-Pareto-Solver

SEM-Problem-Setup-ParetoSolver results interpretation:

1. Goal 1 is over achieved since O1 = 40.  The client wanted to limit the minimum overall lead conversion to $35,000. Anyone would love higher returns than expected as long as other hard constraints have been satisfied !

2. Goal 2 is also over achieved since the slack variable O2 = $9,700. This actually a good news since the expected lead conversion rate exceeds the target $55,000 (5%). The actual conversion is $55,000 + $9,700 = $64,700 or 6.47% !

3. Goal 3 is fully achieved as both U3 and O3 are zero.

4. Goal 4 is over achieved.

The client’s ad budget allocation should be as:

$400,00 towards Google Search Ad-words

$0 for Google Display Ads

$0 for Bing Search Ad-words

$200,000 towards Bing Display Ads

$0 for both Yahoo Search and Display

$400,000 towards Facebook

$0 for both Twitter and LinkedIn

to achieve the ad-spend allocation optimally.

Note that the goal setting approach to multi-objective optimization can bring inefficient solutions. Although, they can satisfy all the goals, but cannot be considered as optimal.

Download Adword Spend Optimization Spreadsheet

#efficient-frontier, #linear-goal-programming, #lp-simplex, #multi-objective-optimization, #pareto-optimization

Estimate System Reliability From Monte Carlo Simulation

Your material handling system serves as oxygen of your business, so you want to do everything in your disposal to minimize downtime. The best way to increase system uptime is with a good preventative maintenance plan. As an engineer, you may have come across maintenance/performing repairs on Material Handling Systems such as Autonomous Vehicles, Industrial Robots, Conveyor Belts, or Machine Cells. Additionally, you probably have randomly generated demand, created inter-arrival rate of shipments, or simulated processing time in different machines etc. In this simulation study, I am going to estimate reliability of a material handling module of a warehouse over a period of time relevant to the system. The Weibull distribution is most commonly employed to determine reliability function.

Estimate System Reliability From Monte Carlo Simulation

A continuous distribution to plot a model fit from the failure rate of the system is created from a Monte Carlo simulation of 7000 trials. The idea behind Monte Carlo simulation is the generation of random events in a computer model, and this generation is repeated many times and counted for occurrence of a specific condition. The Monte Carlo simulation allows us to consider various aspects of system characteristics which cannot be easily captured by analytical methods such as K-out-of-N systems, redundancies, fault trees, repair and maintenance for components.

Furthermore, the reliability analysis from the Weibull distribution provides the information needed for identifying failures, troubleshooting, scheduling periodic maintenance and inspections. The restrictive modeling assumptions that had to be introduced to fit the models to available solutions, is thus avoided.

A series-parallel system of machines is considered here, each working in parallel and connected serially. To determine failure or Mean Time To Failure (MTTF) to be precise, consider each machine type to have k out of n system. Given that there are n machines in a machine type, the entire system fails when k of those n machines fail at any time. The simulation demonstrated here has 2 Warehouse Management Servers (k1), 3 Autonomous Guided Vehicles (k2), and 2 Robotic Palletizers.

 

Reliability Block Diagram

reliability-whouse-block

WMS  – 2 Warehouse Management Servers in parallel

AGV – 3 Autonomous Guided Vehicles in parallel

RP – 2 Robotic Palletizers in parallel

MC – Machining Cell

As you can see in the block diagram, all individual have some sort of redundancy. The failure rate of the AGV depends on the failure of WMS components, similar dependency applies to two Palletizers and Machine Cell.

The failure rate of the above components was formulated as:

weibull-failure-rate

  • manufacturer provided reliability data. As evident, the failure rate almost doubles every 180 days from WMS.

The failure test runs for each day for 600 days. Each sub-system (components) should operate until it fails. Once it fails, it remains in failed status. The time that each component fails is recorded in the spreadsheet.

The cut sets are as follows:

  • {WMS1, WMS2}
  • {AGV1, AGV2, AGV3}
  • {RP1, RP2}
  • {MC}

Monte Carlo simulation of the system was conducted using Excel and the number of failures during 7000 trials was 990. The mean time to failure or MTTF of the system was 563 days. Look at the chart below, the probability of failure of the system converged to 14.15% after 7000 trials and it can be reasonably assumed that it will remain around 14.15% over a long period of time. The probability of failure is 0.1415 and was accurate within 3% after 1600 trials.

weibull-failure-graph1

 Download the Reliability Simulation Spreadsheet here

#monte-carlo-simulation, #system-reliability, #weibull-distribution