by Salil Mehta, Statistical Ideas
Probability simulations help show distributions for both parametric and non-parametric analysis. We will show this below, starting with multi-dimensional random walk. Then we will apply this technique and integrate Bayesian likelihood analysis, connected to a spherical map.
So first with the random walk, we follow a binomial process. This applies independently to the horizontal axis, as well as the vertical axis. Then with each trial, per axis, the random walk proceeds ½ a step in per axis. In our case below, each trial would equally result in ½ steps in any of these four directions: (A) right and up, (B) left and up, (C) right and down, or (D) left and down. For more on random walks, one can enter this into the search bar of the blog, and read this note for a special geometric case of it.
Each simulation tracks the result of 1000 trials, which is statistically significant for this analysis. Also we show below, the result of 1000 of these 1000-trial simulations. We note that the underlying parametric (can be generated from a probability model) model is a binomial, where n equals 1000 trials, each with a “success” probability p equaling 50%. We can think of “success” here to imply a movement in one direction of the axis, and “failure” would imply a movement in the opposite direction on the same axis.
The variance of a binomial distribution is n*p*(1-p). Or in this case, 1000*50%2=250. So the standard deviation (σ), per direction, would therefore be √250=16. One can get a sense of this by examining the chart above, again.
A two-dimensional confidence interval (CI) could be applied as a square range to fit the bi-normal approximation in both the horizontal and the vertical dimensions (also reinforced by the fact that the result on both axis must be independent of one another). Doing this for 1 σ in both axis would carve out the middle 44% of the illustrated sample (or 68% per axis, squared).
And for 1.5 and 2 σ, we get 75% and 91%, respectively. These again are the squares of the middle sections carved out by 1.5 and 2 σ of a normal distribution (or 87%2 and 95%2, respectively).
Mathematically, a circular CI could approximate this square CI as we increase the number of σ. This is because the probability weights associated with the corners of the square region become relatively insignificant, as the square region castes a larger shadow over the mound-shaped, 2-dimensional distribution. Here is neat trick below that we’ve developed to quickly gauge the accuracy of the circular CI, through 2 σ. We may write more on expressing the mathematical theory behind this accuracy gauge, in a future note.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••
confidence range^[1/(3-number of σ)]
Which for our example here comes to:
68%(1/2) = 82% accuracy at 1 standard deviation
87%(1/1.5) = 91% accuracy at 2 standard deviations
95%(1/1) = 95% accuracy at 3 standard deviations
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••
Now let’s continue this random walk approach, but instead look at a non-parametric (can not be generated from a probability model) model. We apply this, considering the Malaysia Airlines Flight 370 that’s been chronicled missing for more than two weeks. We often see in the media just a highly broad circular region in the southern section (also less reliable) of a two-dimensional Mercator map. The media, and perhaps national strategists in the region, scour the the potential area that the plane could have survived a six-hour fueled flight radially crossing over the globe, without any needed polar complications.
However isn’t this a fairly weak prior starting point for discussion? What are the methods to mathematically narrow the probability region, and provide Bayesian weights to where the plane likely is located. We know some basic discrete information that must serve to narrow the geography and discussion based on mathematical likelihood.
For example, the plane is less likely to be found exactly at the very location the last known, per civilian radar contact prior to Inmarsat satellite data, to have made a sharp turn just south of Vietnam. This area was already searched more thoroughly by international search teams. Also, not every direction is equally likely. The aircraft is less likely to have gone six hours undetected (by humans or local radar) over a highly-populated India, which by that time would be about sunrise. Even the meteorite (less than 5% of the plane’s size), which 13 months ago hit Earth in the lightly-populated Chelyabinsk (Russia), was witnessed. And despite a speed of a hundred times faster than flight MH370’s, the meteorite was even recorded on phone videos. Other clues must be considered, as they lend general Bayesian smoothing data to a probabilist. Of course the plane did not land near its Beijing (China) destination, so that too should carry less likelihood in the broadly eclipsing “circular range”. And a number of factors could cause the plane to impact Earth earlier than six additional hours of flight (one can explore an orbital simulation template here).
We therefore have a non-parametric model, where we applied a sophisticated enhancement to see the proportional depths, through which areas of the circular range are more or less likely. See this on the illustration below, where there is a high (though not full) likelihood over the Indian Ocean versus other areas shown. We also know from the above error estimation that these basic probability framework, just updating our knowledge from the facts of the plane and other circumstances, that the probability methods below have in the range of a 90% accuracy in providing the basic directional biases (the 10% inaccuracy is heavily concentrated only on the edges).
Now this fractal estimation differs from the ethnic population target of the eastern Damascus attacks, which follows a support vector machine analysis. Here in this note we start an iterative process, using Bayesian models. This is very similar to the techniques that efficiently narrowed the search for John Kennedy, Junior’s aircraft that crashed in 1999 over the Atlantic Ocean, near Martha’s Vineyard. In neither case do advanced statistical techniques need to understand the rationale for the aviation mishap. They simply better educate others on what the region housing the missing plane looks like, and hopefully they soon find all of those who were on board.