The German

Tank Problem

Introduction

The German tank problem originates from the Second World War, when it was crucial for the allies to obtain information on the actual number of tanks that the German army had. A more detailed historical background is given below.

The basic idea is the following: suppose that an unknown number of $N$ items have been produced and each has been given a serial number between $1$ and $N$. A random sample of size $k$ is taken with the goal of estimating $N$. The goal of this project is to see how well various approaches work.

Discrete Uniform Distribution

Let $X$ be the random variable that gives a value in the range of $1$ to $N$, each value being returned with equal probability. We say that $X$ follows a discrete uniform distribution. It can easily be seen that $$\text{E}[X]=\frac{N+1}{2}$$ and $$\text{Var}[X]=\frac{N^2-1}{12}.$$In the context of the German Tank Problem, we suppose that a random tank is available for inspection and returns a serial number completely randomly, according to the discrete uniform distribution.

Under this assumption, try answering the following questions.

Some Exercises

  1. Let $X_1,\ldots,X_k$ be a random sample of size $k$ from $X$, whose parameter $N>k$ is unknown. Find the maximum-likelihood estimator for $N$. Does this estimator seem useful? Explain!
  2. Let $X_1,\ldots,X_k$ be a random sample of size $k$ from $X$, whose parameter $N>k$ is unknown. Find the methods-of-moments estimator for $N$. Give an example to show that this estimator can give a give a non-sensical result by returning an estimate $\widehat{N}$ that is smaller than the largest observed result.
  3. Is it strictly speaking appropriate to model this problem through the taking of a random sample, i.e., a collection of $k$ i.i.d. copies of $X$? Explain!

A better approach is described in the following article, which also gives more historical context:

G. Clark, A. Gonye and S. J. Miller, Lessons from the German Tank Problem, arXiv:2101.08162 [stat.OT] (January 2021)

The article may be downloaded at https://arxiv.org/abs/2101.08162. The corresponding author (S. J. Miller) presents the paper’s results in the following video:

Some More Exercises

After reading the article and/or watching the video, try answering the following questions:

  1. In the discussion presented by Clark, Gonye and Miller, is the sample of serial numbers treated as a sequence of i.i.d. random variables? Explain!
  2. The Method of Moments is based on replacing the expectation $\text{E}[X]$ with its unbiased estimator, $\overline{X}$. In what way is the approach taken by Clark, Gonye and Miller analogous?



Historical Background

The following text is excerpted from the Wikipedia article about the German Tank Problem:

During the course of the Second World War, the Western Allies made sustained efforts to determine the extent of German production and approached this in two major ways: conventional intelligence gathering and statistical estimation. In many cases, statistical analysis substantially improved on conventional intelligence. In some cases, conventional intelligence was used in conjunction with statistical methods, as was the case in estimation of Panther tank production just prior to D-Day.

The allied command structure had thought the Panzer V (Panther) tanks seen in Italy, with their high velocity, long-barreled 75 mm/L70 guns, were unusual heavy tanks and would only be seen in northern France in small numbers, much the same way as the Tiger I was seen in Tunisia. The US Army was confident that the Sherman tank would continue to perform well, as it had versus the Panzer III and Panzer IV tanks in North Africa and Sicily. Shortly before D-Day, rumors indicated that large numbers of Panzer V tanks were being used.

To determine whether this was true, the Allies attempted to estimate the number of tanks being produced. To do this, they used the serial numbers on captured or destroyed tanks. The principal numbers used were gearbox numbers, as these fell in two unbroken sequences. Chassis and engine numbers were also used, though their use was more complicated. Various other components were used to cross-check the analysis. Similar analyses were done on wheels, which were observed to be sequentially numbered (i.e., $1, 2, 3, ..., N$).

The analysis of tank wheels yielded an estimate for the number of wheel molds that were in use. A discussion with British road wheel makers then estimated the number of wheels that could be produced from this many molds, which yielded the number of tanks that were being produced each month. Analysis of wheels from two tanks (32 road wheels each, 64 road wheels total) yielded an estimate of 270 tanks produced in February 1944, substantially more than had previously been suspected.

German records after the war showed production for the month of February 1944 was 276. The statistical approach proved to be far more accurate than conventional intelligence methods, and the phrase "German tank problem" became accepted as a descriptor for this type of statistical analysis.

Estimating production was not the only use of this serial-number analysis. It was also used to understand German production more generally, including number of factories, relative importance of factories, length of supply chain (based on lag between production and use), changes in production, and use of resources such as rubber.


Wikipedia contributors, "German tank problem," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=German_tank_problem&oldid=1066366829 (accessed February 20, 2022).