Federal Deposit
Insurance Corporation

Each depositor insured to at least $250,000 per insured bank

Home > Industry Analysis > Research & Analysis > FDIC Banking Review

FDIC Banking Review

Statistical Sampling as a Management Tool in Banking

by Charles D. Cowan*

The purpose of this article is to discuss potential uses of statistical sampling in a financial institution environment. Banks and other financial firms are faced with a number of managerial challenges where the use of sampling can provide information at a reasonable cost. Given today’s competitive environment and the move toward consolidation in the banking industry, it is imperative for financial managers to be able to value assets of target institutions quickly and efficiently.  Similarly, as customers are faced with an ever increasing array of financial-service providers, the quality of service provided to customers  becomes increasingly important for maintaining market share. Manufacturing firms, hotel  chains, and other businesses have long used sampling as a means to assure that customers are receiving quality goods and service.  Financial institutions similarly can use sampling techniques to monitor the quality of customer service and interactions.  The need for this clearly increases as back office processing operations and customer service hot lines become remote from the branch originating the business.   The Federal Deposit Insurance Corporation (FDIC) also performs functions that are similar, or identical, to those in banks and thrifts, including the management of loans acquired from failed banks and thrifts and the resultant need to estimate both the financial risk and the value of these loan portfolios.  This article describes the use of sampling in a financial setting and focuses, as an illustration, on some of the methods used by the FDIC to value assets in liquidation as part of its preparation of financial statements.  By using sampling, as opposed to valuing each asset, significant cost savings are achieved while ensuring the accuracy and quality of the results.

In fulfilling its mission as deposit insurer and receiver of failed banks, the FDIC has acquired hundreds of thousands of assets, including loans and real-estate properties, that it manages until disposition.  These range across the broad spectrum  of asset types and collateral.  Some of the loans will be held by the FDIC until they are paid off in full, some will be resolved through settlement negotiations with the borrower, some will be sold as individual assets, some will be sold as part of a bulk sale, and some will be written off.  Similar to the portfolios of banks, the FDIC portfolio is turning over continuously as new loans are acquired, old assets exit the population of assets, and some loans are converted to real and personal property.  In banking a similar process occurs.  New loans are underwritten with different amounts and types of collateral; loans are paid off, sold, written off, or converted to property owned by the bank through foreclosure.

In order for an institution’s management to assess its financial condition, not to mention a potential merger target, it needs to value the asset portfolio.  The same is true for the FDIC.  The FDIC annually is required to prepare financial statements for each of the insurance funds under its management, in accordance with Generally Accepted Accounting Principles (GAAP), and these statements are audited by the U. S. General Accounting Office.  As part of the preparation of its financial statements the FDIC must determine the value of its claim against the various receiverships for failed institutions by determining the value of the underlying assets of the receiverships.  More-over, intermediate valuations are prepared to assist in the monitoring of the financial position of the funds and to assist in the liquidation planning process.  The use of sampling allows for more frequent valuations at relatively low cost.

Given the existence of a large number of loans and other assets, and changing conditions in the economy and financial markets, there is no way to value simultaneously all of the loans held in a portfolio.  Nor would one want  to do so, given the expense and time constraints that surround the preparation of financial statements. Instead, a sample of assets is valued, and the sample is extrapolated to the full population of loans.  This process can be repeated multiple times during the year and can be used to track the ongoing value of the portfolio.  The same can be done in any bank, using the records and loan systems of the bank and a few formulas that are presented below.

Sampling has become a standard part of audit methodology and is fairly well known and accepted in that profession as a useful tool.  This article will, therefore, not attempt to introduce the reader to the rudiments of sampling, but rather will focus on the introduction of some basic assumptions to standard audit sampling methodology.  By introducing these assumptions one can decrease the required sample size and make the estimation process more efficient.  While the article focuses on the valuation of assets, the same methodologies and techniques can be used to develop a sample of customers to survey regarding their satisfaction with the services provided by the bank.

Basics — The Tools We Use

Suppose one has to value a single pool of homogeneous assets, such as performing commercial loans.  The discussion that follows describes how to set a sample size for the valuation of the pool, and how to extrapolate an estimate from the sample to the full population of loans.  This methodology can easily be extended to multiple asset types by treating each of the asset types as a stratum, making separate estimates for each by stratum, and adding across strata to derive an estimate for the full population.

There are two basic functions that can be used to estimate the market value of the pool.   We assume that for each asset there is a current book value that is known and easily retrievable.  We also assume that we are going to select a simple random sample without replacement.  The question then becomes:  How many assets need to be selected?  Alternatively,  how large a sample is needed to obtain a good estimate of market value for the entire loan portfolio?  The answer is driven by the underlying distribution of the values we are trying to estimate and the “estimator” we plan to use with the sample.  An estimator is the mathematical formula that summarizes the data and extrapolates the sample back to the population.  One estimator we could use is a direct sample projection, hereafter called the “direct” estimator.  Another is the “ratio” estimator that uses the additional information we can get from the book value of the loans.  The formulas for each are:

The direct estimate is simply the average market value of the assets in the sample, multiplied by the number of assets in the population.  If we value 100 assets in a sample taken from 1,000 assets in the population, we calculate the average market value for the 100 assets (a market value per asset) and multiply it times 1,000.  The ratio estimate works in a similar fashion, but weights the market values obtained from the sample by their book values, rather than by counting each with equal weight, which is what the direct estimate does.  The ratio estimate calculates the average recovery rate and multiplies this rate times the total dollar book value in the pool.  If the market value is well-correlated with the book value, for example, the larger the book value, the larger the market value, and they track well, then the ratio estimate will be more accurate than the direct estimate.  That is, the ratio estimate will, in general, be closer to the true population value than the direct estimate.  The regression estimate is similar to the ratio estimate.

If one is to rely on estimates derived from samples, a rule for determining the accuracy of the estimates is needed.  This rule usually is determined by the size of the confidence interval around the estimated market value.   Suppose we want a 95-percent level of confidence that the true population value will be within plus or minus 5 percent of our estimate.  To obtain this, we simply set the ratio of the half confidence interval over the estimate equal to 5 percent .1

The market value estimate is obtained from one of the three formulas given above, and the variance is the sampling variance estimated from the sample collected. Sampling variance is the variability due to the fact that each sample yields a slightly different estimate, which on average will be equal to the value we want to estimate.  In order to estimate the market value of a portfolio and determine the sample size, one must have some knowledge of the variance of the market value estimate as well as the estimate itself.  Obviously this is problematic because we do not know either the market value or its variance.

Knowing this in advance of doing the survey is the key to determining the sample size, but this appears somewhat  backward because this is what we are trying to estimate.  However, what we can know in advance is derived by having a good intuitive feel for the data, some expectation about the results one might obtain,  possibly from previous analyses, and a little help from probability theory.  The next section reviews some assumptions and how they work to solve the above equations to derive the sample size.

Before turning to the use of external knowledge in the sample selection process, it is worth  reviewing the definitions for sample variance for both the direct and ratio estimators for market value.   The following are the mathematical expressions for the “population recovery rate,” that is, the ratio of the population’s market value to book value, and the variance  of the book and market values for the population as a whole and the variance terms for the estimates of market value obtained by using the direct estimator and the ratio estimator:

The direct estimator for total market value has sampling variance VD, which is the variance of the estimate because we used a sample:

N is the number of assets in the population,
n is the number of assets in the sample, which is to be determined.

The ratio estimator for total market value has sampling variance :

Where p is the correlation between the book value x and the market value y.

Incorporating Knowledge in the Planning Process

Knowledge about the values to be estimated can be incorporated in the estimation process in advance, and the use of this information can reduce the sample size needed and improve the accuracy of the estimates.  The more information that is available, the more cost-efficient the process of estimating a total will become.

There are many sets of assumptions that can be used to determine in advance the sample size needed to achieve a fixed confidence interval around estimates.  This discussion focuses on four  that are easy to implement and that highlight the different assumptions made about the data and the estimator to be used.  Two of the scenarios are commonplace, while the other two, though less well known, lead to a better understanding of the sample sizes needed for estimation.  Each of the four scenarios makes assumptions about what we know about the market values relative to the book values.  There is no way to determine sample sizes for the sampling process without making some assumptions, no matter how simplistic.  

Scenario 1:

Traditional Sampling Theory With No Connection to the Valuations

The most common procedure in basic survey sampling is to turn any problem into one that assumes a binomial distribution.  This approach provides an easy solution to the sample size problem and requires no explicit assumptions about the data.  Instead, the approach uses an implicit assumption of an upper bound on the variance.   If one is attempting to use sampling to estimate an error rate in financial records, that is, the proportion of records containing an error, the largest variance will be found when the error rate is 0.5.  Based on this knowledge, one can work backwards to derive the sample size necessary to satisfy the criterion involving a fixed proportional confidence limit band.  This rationale, however, is inadequate for valuation measures because one is measuring a dollar value, rather than a proportion or number of errors.  Hence, while this scenario is commonly used as a fallback procedure for estimating sample size, it is inappropriate for many financial applications because of the need to estimate items calibrated in dollar values.

Scenario 2:

Traditional Sampling Theory With No Probability Assumptions Placed on the Valuations

The simplest assumption is that almost nothing is known about the market value of the assets, but that the book values of the assets in the pool are known.  This is a very conservative assumption.  By using the book value of all the assets in the population, one can calculate the population value of the variability of the book values (S2x) and the population value of the average book value (X).  Both of these are known with certainty and can be calculated easily from the data in most record systems.  If one assumes that the relative variability of the market value is equal to the relative variability of the book value, then one can substitute the relative variance of book values for the relative variance of  market values.  This is not the same as assuming that the book value and the market value move in the same direction or that they are correlated in any way.

One can derive the necessary sample size by using the assumption that the relative variances of book values and market values are equal and then solve for the sample size. By plugging these two values into the formula presented above for the confidence interval, one can solve for “n” the sample size needed for the valuation of a portfolio.2 We do not need to know anything more. Everything is based on one simple assumption, two numbers easily calculated from the population, and reliance is placed on the direct estimator, not the ratio estimator. However, the simple assumption ignores anything else known about the relationship between the book values and the market values. This may be an overly conservative assumption for many situations.

Scenario 3:

Linear Estimators in Sampling Theory Combined With Assumptions That the Values to Be Estimated Have an Underlying Normal Distribution

To use supplemental information for the calculation of the sample size, we need to make some assumptions regarding the joint distributions of the book value and the market value. A simple assumption is that the book value and the market value are related. A commonly made distributional assumption is that the variables are jointly normally distributed. This allows us to incorporate easily a straight line relationship assumption and a parameter for correlation between the variables. The normal distribution is not required we can actually use any joint distribution that does not constrain the values of either variable, but does allow the two variables to be correlated. However, any moderate distributional assumption like this can lead to some very severe complications, as will be seen as we develop this method.

The use of the assumption of a bivariate normal distribution requires that one know something about five parameters: the two population means, the two population variances, and the correlation between market value and book value. Typically, information is readily available only on two of the five parameters: the mean and variance of the book values (see the previous section). Estimates of the population mean and variance of the market values and the correlation between the book value and the market value are needed. The latter can be estimated by using simple regression analysis. In the case where one assumes a straight line relationship between book value and market value, one can assume to know something about the relative return on the assets, namely the rate of recovery, R. This value will be approximately what one would expect to see as the measure of return in a linear equation:

Market Value = a+b (Book Value). The value “b”estimated in this relationship is the same as the coefficient “b” estimated with regression analysis. However, one can assume that “b” is approximately equal to R, and from past experience an estimate of R may be obtainable. 3 Specifically, data on past sales, audits, auction information, etc., may be available. This assumption will be formalized in the next scenario.

The use of regression analysis allows for the estimation of the correlation between book value and market value. However, this still requires some knowledge about the variation of the market values (the y’s) separate from the variation of the book values (the x’s). Because market values and book values are supposed to be closely related, one can define two (extreme) assumptions: (1)the market values are as variable as the book values, or (2) the market values are only a portion of the book values, therefore they vary proportionally less.

Finally, an assumption about the expected market value total is needed: specifically, that the expected value of the market value estimate is equal to the recovery rate, times our known total book value.

Using the regression relationship and the first variance assumption (S2y = S2x), a simplified variance formula that can be solved for “n” is derived, using the regression estimator given above.4 This formula looks remarkably like the formula obtained in scenario 2, but it is multiplied by a factor that incorporates information about the expected recovery rate. This factor will play an important role in determining the sample size needed.

If the second variance assumption (S2y = R2y = R2y) is used instead, one gets a very strange result -- the sampling variance is always equal to zero! This occurs because the assumption leads to a situation in which there are too many restrictions and the variability of the data is assumed away under a normal distribution. This makes no sense, because it may be reasonable to expect the market value to be less than the book value, and that the variability of the market value will be less than the variability of the book value but there still will be variability. Based on these assumptions, this would be a lower bound for the variability of the market value. And logically, if the market value is only a proportion of the book value, one would expect that the variability of the market value would not be greater than the variability of the book value, thus providing an upper bound. This assumption is used because it is conservative.

In this scenario the assumptions placed no limitations on either the book value or the market value of the assets. Both the book value and the market value can take on any value between - and +, though very large values of either are highly unlikely, and negative values also are unlikely. In practice one is more likely to find that assets carried with a positive value have a market value that is positive and even highly distressed assets typically will have a sales value slightly greater than zero. There may be exceptions, however, such as in the case of environmentally contaminated property where the costs of remediation (and potential legal liability) are greater than the value of the property cleaned up, thus yielding a negative market value. The normal joint distribution also allows the market value to be greater than the book value, which would happen if market interest rates are below the rates on financial assets or when property has appreciated in value.

Scenario 4:

Ratio Estimators in Sampling Theory Combined With Assumptions That the Values to Be Estimated Have an Underlying Conditional Gamma Distribution

An alternative approach that incorporates more information about the data clearly is more useful in making the sampling process more efficient and cost-effective. For example, when using the assumption that the data were normally distributed (scenario 3), book and market values could conceivably range from - to + Assuming that both the book values and market values in the portfolio are both gamma distributed, one can incorporate both assumptions about the portfolio as well as previously known information. The gamma distribution is a probability distribution that is skewed to the right, implying that the portfolio will have many assets with relatively small book values and only a few with very large book values. In addition, it is assumed that the market value is less than, or equal to, the book value.

Three parameters must be dealt with by using this approach, although there is an additional assumption that the market value is bounded above and below by the book value and zero, respectively. This assumption is consistent with floating-rate assets and those with short maturities. However, it will prove problematic when valuing fixed-rate loans with above-market rates or for fixed assets where depreciation may have reduced the book value below the market value.

Under these assumptions, and using this distribution, one can determine the population variances for both the book value (x) and the market value (y), and also the correlation between the book value and the market value. Substituting these values into the variance equation given above for a ratio estimator, we get:5

As was the case in scenario 3, there is a fortuitous relationship between the known values and the parameters that causes unknown values to cancel out and thus the equation can be solved directly. This is much more appealing because we can get three parameter values and the interrelationships between market value and book value directly from the three values we can observe or know through other sources, namely, total book value of the portfolio, the variability of the book values, and the anticipated recovery rate.6

This results in the equation above, and solving for n again we get:

The sampling variance equation is much more intuitive because it says that, as R approaches 1.0, which can only happen when all the assets have a market value equal to the book value, the sampling variance declines to zero. The same is true as R approaches 0.0, meaning all assets have no value, and so again there is no sampling variability. By placing bounds on the results when one estimates the market value, and by allowing the market value to be a random value conditional on the book value, one obtains a more sensible solution in terms of prior expectations, and at the same time a more sensible solution in terms of the effort required to conduct the valuation.

Solving for the Sample Size

If the population size N is large, then the term 1 over N in the denominator of each of the solutions disappears, and we get equations for the sample size that are easier to use and interpret. Each of the following equations for the sample size is subscripted to correspond to the assumptions in each of the scenarios discussed above.

Note that each of the sample sizes can be expressed as a function of the sample size n2, the sample size required when we make no distributional assumptions. For the sample size required when we assume that the book value and the market value are normally distributed and that the variance of the book value is equal to the variance of the market value, we get much larger required sample sizes until the recovery rate exceeds 0.7. Note also that the sample size required for the joint gamma distribution assumption is less than the sample size required for the no assumption scenario when the recovery rate exceeds 0.5. More importantly, the joint gamma assumption always yields a smaller required sample size than the normal assumption. This is especially gratifying, because scenario 4 required fewer assumptions and placed some reasonable bounds on the data to be observed.

How to Choose

Based on the preceding discussion, the choices may appear to be exceptionally confusing. How does one use this information and draw a reliable sample? Fortunately, there is good news regarding both the variance estimate and the assumption choices.

First, the variance estimates. The assumptions reviewed above are offered as alternative ways to think about the data before assets are valued in order to avoid the selection of a larger sample than is necessary. However, once the sample has been selected and the valuation completed, the assumptions no longer come into play. The estimation of the market value and the confidence interval around that estimate is strictly a function of the actual data. The assumptions are not used in the estimation process, but only in the equations used to derive sample size. With the equations presented above, one can use either the direct method or the ratio method, and compare the results to see which makes more sense and provides the better confidence interval. The only requirement with respect to their use is that the sample selected is a simple random sample. Thus, even if the assumptions were flawed, the estimates are not a function of the assumptions but of the actual data sampled. However, bad assumptions may lead to the choice of a sample that is either too large or too small, yielding results that may not measure up to expectations.

The other piece of good news is that one can test some of the assumptions before choosing which scenario to use in deriving sample sizes. This is done easily by simply charting the book values as a histogram and determining whether the distribution appears to be uniform, normal, or gamma. (See the appendix for a discussion on how to create the histogram.) This allows a reasoned choice of scenario and distribution assumption to be used in the calculation of sample size. In addition, if one believes that the market values will fall between zero and book value, the choice of scenario 4 is clearly supported.

Hedging Our Sampling Bets

The last point to be made is that this process requires caution. If one uses the assumptions listed above to determine how large a sample to select, significant cost and time savings can be achieved. However, excessive optimism regarding the market value of the assets to be valued can lead to the derivation of a sample size that is too small for the task. For example, if one expects that the market value is 90 percent of the book value of the loans being valued, then one might be able to use the assumptions of scenario 4 to reduce the sample size. However, if the market value of the loans in the portfolio is estimated to be 80 percent after a valuation is completed using a sample that assumed a 90 percent valuation, the sample size was too small to achieve the desired confidence level used in the sample construction. In order to obtain the desired level of confidence the sample size will have to be increased. The impact on the final confidence interval will differ according to the distributional assumption that is used. Specifically, in the situation described for the normal distribution the standard error (half the confidence interval) will increase by 133 percent. For the gamma distribution the standard error of the estimate will increase by approximately 137 percent. In either case, this is a rather large increase in the confidence interval and results from overly optimistic assumptions. This problem can be minimized by the conservatism in making a priori assumptions about the value of the portfolio, while at the same time recognizing the cost tradeoffs associated with these decisions.


Raj, Des.Sampling Theory. New York: McGraw-Hill Book Company, 1968.

Mardia, K.V. Families of Bivariate Distributions. London: Charles Griffin & Company Limited, 1970.


Creation of the Histogram

One can easily create a histogram to assess which assumption concerning the distribution of the assets in a portfolio is the appropriate one to use. This process requires that a tabulation of book values of assets be obtained from the general ledger and that 20 segments be created. These segments for tabulation can be created by taking the maximum book value minus the minimum book value and dividing by 20. We call this value w:

The twenty categories appear as follows:


Count the number of assets with book values in each of these ranges and tabulate as a bar chart.If the distribution looks flat, then the portfolio has a uniform distribution and the best plan is to use scenario 2. If the distribution looks bell shaped and symmetric, then the best option is scenario 3, because the data appear to be normally distributed (no unusually large high or low values). If the distribution looks like a mound that leans to the right (most values are smaller, but there are a significant number of large values), then the data are good candidates for scenario 4. Scenario 4 is enhanced if you believe your market values fall between worthless (zero) and the book value.

Last Updated 8/2/1999 Questions, Suggestions & Requests

Skip Footer back to content