## Abstract

Using a high-resolution chronology of graptoloid first and last appearances, we apply mathematical models that allow the simultaneous inference of the probability distribution of species durations and the effective sampling rate relating those durations to the observed stratigraphic ranges. This approach allows the completeness of the documented palaeontological record of graptoloids to be assessed. We estimate that *c.* 75% of species in the geographical regions that contribute to the stratigraphic data have been sampled and that, of those species known from more than a single stratigraphic horizon, *c.* 85% of their original durations, on average, are represented by their global composite stratigraphic ranges. As expected in light of their biostratigraphic importance, graptoloids have one of the most completely documented records among groups of fossil organisms. We expect that application of the methods used herein will show comparably complete records for other biostratigraphically relevant groups.

**Supplementary material:** The R code and data are available at https://doi.org/10.6084/m9.figshare.c.4547174

In light of their demonstrated utility in high-resolution biostratigraphy and global correlation (Rickards 1976; Hughes 1994; Sadler *et al.* 2009; Sadler *et al.* 2011; Cooper *et al.* 2012; Melchin *et al.* 2012), graptoloids would seem *prima facie* to have a relatively complete documented palaeontological record. But what do we mean by a ‘relatively complete’ record? Questions about the completeness of the known stratigraphic record of graptoloids include taxonomic, geographical and temporal aspects. We pursue two of the many possibilities. First, at a large geographical scale, of all the graptoloid taxa that once lived at any time in locations from which fossil graptoloids are known, what fraction has been described? The answer depends on the efficiency of data mining, both in the rocks and in the library. It may be approached by examining the diminishing returns of the mining effort (Sadler *et al.* 2011) or examined, by inversion, from the distribution of the observed stratigraphic range lengths (this paper). Second, at a small taxonomic scale, what fraction of the real durations of individual species do the observed ranges represent? The answer depends not only on the mining effort, but also on how we have constructed global composite ranges from local records. These two aspects of palaeontological completeness are distinct from the concept of stratigraphic completeness, i.e. the proportion of geological time represented by the preserved stratigraphic record (Sadler 1981; Dingus & Sadler 1982; Schindel 1982). The most meaningful answers to the second question require correlation and seriation methods that identify the shortest composite taxon ranges that are consistent with the local stratigraphic ranges in the known rock record. This fact determined our choice of global composite ranges to invert for answers to both questions. These composite ranges are derived from observed first and last appearances in local stratigraphic sections via the method of constrained optimization (CONOP), which seeks to minimize inconsistencies between local ranges and the global composite (Kemple *et al.* 1995; Sadler 2009, 2013*a*, *b*; Sadler *et al.* 2011).

Figure 1 illustrates the relationships among the true duration of a species, its true global stratigraphic range, its local stratigraphic ranges and its scaled global composite range estimated via CONOP. Note that the CONOP composite range approaches the global range as the sampling of sections increases; it is not intended to estimate the true duration, which is not knowable (see also Table 1). Given the CONOP ranges, we modify previous models relating the probability distribution of species durations and sampling rate, on the one hand, to the distribution of stratigraphic ranges on the other. Using the method of maximum likelihood, we find the parameters of the species duration distribution and the sampling rate that best account for the stratigraphic ranges. From these parameters we derive two completeness measures: (1) the proportion of species sampled at least once; and (2) the median proportion of the original durations represented by the scaled global composite ranges. Palaeontological studies, especially of biodiversity, have mainly focused on the proportion of species sampled (e.g. Paul 1982; Foote & Sepkoski 1999; Alroy 2008). However, the proportion of the original duration actually preserved is potentially more relevant for the analysis of speciation and extinction rates, where the limits of stratigraphic ranges rather than numbers of species *per se* are the primary currency (Cooper *et al.* 2014; Crampton *et al.* 2016, 2018; Foote *et al.* 2018). The inverse procedure is not meant to estimate the true time of speciation and extinction of each individual species (cf. Strauss & Sadler 1989), but rather the entire probability distribution of species durations.

We then use the simulation of local stratigraphic ranges to demonstrate simple scaling relationships between factors that contribute to completeness – sampling within local sections, the number of sections and the time spanned by sections – and their expression in global stratigraphic ranges. We also process a subset of simulations through the CONOP algorithm to explore a possible bias in completeness estimates and to suggest a simple correction for this bias. We then address possible biases in our simplifying assumptions regarding the uniformity of sampling among species and over time.

## Data

The raw data consist of the stratigraphic ranges of >2000 graptoloid species from *c.* 500 local sections. Our analyses rest on a processed data product: the temporally scaled, global composite sequence of graptoloid stratigraphic ranges that form the basis of the international Ordovician and Silurian timescales (Sadler *et al.* 2009, 2011; Cooper *et al.* 2012, 2014; Melchin *et al.* 2012; Crampton *et al.* 2016, 2018). The age-scaled global composite sequence that we chose (Crampton *et al.* 2016, 2018) was constructed in three stages. Here, we summarize only those aspects crucial to the resulting composite ranges. The first stage built a global sequence of range-end events on the strong assumption that locally preserved taxon ranges tend to underestimate the true global durations (Fig. 1). It follows that local sections can preserve contradictory event sequences, but the local observation that the first occurrence of one taxon is older than the last occurrence of another is a constraint that must apply to the composite sequence; in the absence of significant mixing, a first-before-last observation cannot be an artefact of under-represented local ranges (Alroy 1994*b*). A global sequence of species first and last occurrences was sought that satisfied this constraint without making the composite taxon ranges any longer than necessary. A sequence was chosen to which all local outcrop and drill core sequences could be fit with a minimum net extension of locally observed ranges. The CONOP was performed with an objective function that minimizes range extensions as measured by their span of event levels in the local sections. It favours the sequences observed in richer (i.e. more fossiliferous) local sections, rather than thicker sections. Let us call this the ‘Level' function, after its name in the CONOP configuration file (‘PENALTY = ‘LEVEL’’) (Sadler 2013*a*, *b*). It does not penalize for range extensions beyond the limits of the local section. For this purpose, a minor secondary measure of misfit was added for the number of coexisting taxon pairs implied by the composite sequence, but not observed in any local section. Both components of the objective function serve to keep the composite ranges as short as possible, consistent with all the local observations. Shorter ranges tend to produce lower completeness estimates and therefore this minimization of ranges leads to conservative estimates of completeness.

The output of the first stage is a parsimonious and purely ordinal sequence of events with no ties. The second stage scales the intervals between events in this sequence, using weak assumptions that associate greater biotic change and thicker rock intervals with longer time spans. Sequencing is completed before scaling so that the weak scaling assumptions do not compromise the ordinal composite sequence. Edwards (1978) introduced this strategy of ‘No-space graphs' to strengthen traditional graphic correlation. Events for which the evidence of sequence is equivocal are assigned zero separation in this second stage; in other words, two or more events may be collapsed to a single event level. The third stage generates an age-scaled composite using the interval composite section for interpolation between radioisotopically dated events. As the number of radioisotopic control points increases, the weak assumptions of stage two become less influential.

As in previous studies (e.g. Crampton *et al.* 2016, 2018; Foote *et al.* 2018), we have omitted from analysis the relatively sparse data near the beginning and end of the evolutionary history of graptoloids. Because the mean stratigraphic range drops abruptly in the late Ordovician (Cooper *et al.* 2014; Crampton *et al.* 2016), we divide the data into two parts: the first consisting of 1001 species with first appearances between 481 and 447 Ma; and the second consisting of 936 species that first appear between 447 and 419 Ma. We refer to these subsets of data as Ordovician and Silurian, respectively. Figure 2 shows the frequency distribution of stratigraphic ranges and Table 2 gives the summary statistics, along with standard errors based on the bootstrap resampling of ranges (Efron & Tibshirani 1993). The distribution is strongly skewed towards very short ranges (Crampton *et al.* 2016), and *c.* 12% of species have a composite stratigraphic range of zero, i.e. their first and last appearances fall at a single resolved event level.

## Methods

### Modelling stratigraphic ranges

Our analysis is based on mathematical modelling that relates a probability distribution of true durations to the distribution of stratigraphic ranges via a rate of sampling (Foote & Raup 1996; Foote 1997; Solow & Smith 1997). This approach is dictated by the nature of the data, which consist of temporally scaled, global first and last appearances, but not the full inventory of intermediate occurrences that is used by other estimation methods (Paul 1982; Nichols & Pollock 1983; Solow & Smith 1997; Alroy 2008). Because the distribution of durations and the sampling rate fully determine the probability distribution of ranges, they can be inferred from the ranges via standard inverse methods. Given an assumed canonical distribution of durations, we simultaneously fit the parameters of that distribution and the sampling rate via maximum likelihood. The approach assumes that all species are characterized by the same sampling rate, which itself is constant over time. We later explore violations of these assumptions. We emphasize again that our completeness estimates are based on scaled global composite ranges rather than local ranges, although the approach presented here could as well be applied to ranges within single sections.

Let *Θ* denote the parameter(s) of the duration distribution. Then *g*_{T}(*T*|*Θ*) is the density function giving the relative probability that the duration takes on some value *T*. For the simple exponential distribution, corresponding to a constant extinction risk within the lifetime of each species (Van Valen 1973), the single parameter, *μ*, is the extinction rate per lineage-million-years (Lmy), and the density is given by *g*_{T}(*T*|*μ*) = *μ*e^{−μT}.

To go from durations to ranges, we introduce a sampling rate, *s*, per lineage-million-years. This rate subsumes all of the processes that determine whether or not we know of the existence of a species at a given time (e.g. fossil preservation, preservation of the stratigraphic record, sampling of the record and publication). The probability that a species with duration *T* will be sampled at least once is equal to 1 − e* ^{−sT}*. Therefore the overall proportion of species sampled at least once is given by

In the exponential case, this expression reduces to a simple form:

Let *x* denote the observed stratigraphic range, which is necessarily less than or equal to the true duration. For a given duration, the density function relating the range to the duration and sampling rate is equal to

Therefore the overall probability density for a stratigraphic range *x* is given by

The division by *P*_{s} (equation 1) conditions on the species being sampled at least once, and the lower limit of integration reflects the fact that the duration must be at least as long as the range. The case *x* = 0 means that the species is known from a single resolvable stratigraphic horizon. For *x* = 0 we have a discrete probability: for *x* > 0 *f* (*x*|*Θ*,*s*) is a continuous probability density function. For the exponential distribution of durations, *f*(0|*μ*,*s*) = *μ*/(*μ* + *s*) and *f* (*x*|*μ*,*s*) = s*μ*e* ^{−μx}*/(

*μ*+

*s*), if

*x*> 0. For other distributions, we evaluate the integral numerically.

Note that for all duration distributions we consider, *f*(0|*Θ*,*s*) increases as the sampling rate decreases. This fact stands to reason: if sampling is sufficiently poor, most species will never be sampled or will be sampled at exactly one level.

Given the observed stratigraphic ranges, *x*_{1}, …, *x _{n}*, the log-likelihood (support) for a candidate set of parameters (

*Θ*and

*s*) is equal to

We use numerical optimization to find the values of (*Θ*, *s*) that maximize this expression. These are the maximum likelihood estimators, *f*_{0} is the observed proportion of species with a range equal to zero and *Θ*, *s*) we obtain the maximum likelihood estimate of the proportion of species sampled,

We use a Monte Carlo procedure to simulate stratigraphic ranges to determine the proportion of the original duration preserved, i.e. the ratio of range to duration. We draw a large number (10^{6}) of durations at random from the appropriate distribution. Within each duration, we drop occurrences at random at a rate

We derive the standard error of maximum likelihood parameter estimates using the normal approximation. If *S* evaluated at **Σ**, of the model parameters (Edwards 1992, p. 104–109). Because the values *Θ* and *s* are drawn at random from a multivariate normal distribution with mean vector **Σ**. These values are then used to calculate *P*_{s} with equation (1) and *M*_{R} with the Monte Carlo procedure described earlier. The standard deviation over the realizations is the estimated standard error of the derived statistic.

On average, the species that are sampled at least once come from the longer-lived part of the duration distribution. Using Bayes’ theorem, we can calculate the probability that a species had a particular original duration given that it was sampled at least once. The conditional density is equal to*T* and is sampled at least once, and the denominator is the unconditional probability of being sampled (equation 1). From this expression, we obtain the mean duration of species sampled at least once:

### Duration distributions

The exponential distribution corresponds to the case where the risk of extinction is constant during the lifetime of a species (Van Valen 1973). Our previous analysis (Crampton *et al.* 2016) suggests that an age-dependent extinction model is more appropriate, in particular a Weibull model in which young species are most at risk at any time and the extinction risk decreases with species age. However, that analysis tacitly assumed that the distribution of stratigraphic ranges is a reasonable proxy for the distribution of original durations. Here we relax that assumption by fitting both exponential and Weibull distributions that explicitly incorporate incomplete sampling. In addition, we consider a log-normal distribution of durations. With this distribution, the extinction risk does not increase or decrease monotonically with species age. Instead, species duration represents the product of numerous independent contributing factors – for example, geographical range, population size and environmental tolerance. In contrast with the exponential distribution and the Weibull distribution with an extinction risk that decreases with species age (Crampton *et al.* 2016), the peak density of the log-normal distribution corresponds to a duration greater than zero.

### Simulations relating local sampling to inferred global sampling

Our empirical estimates of completeness are based on global composite stratigraphic ranges, and the parameter *s* represents the overall global sampling rate. Sampling, however, takes place in local sections. We therefore simulated the stratigraphic ranges to explore the scaling relationships between the determinants of sampling (i.e. the local sampling rate, the number of sections and the section length) and global sampling estimates. These simulations are not part of the empirical estimates, but they help inform what those estimates, especially the sampling rate *s*, actually mean.

The simulations consist of four components: (1) the generation of a temporal sequence of species with true times of origination and extinction and corresponding true durations; (2) selection of the temporal limits of local stratigraphic sections and the number of sections; (3) sampling within sections to produce local stratigraphic ranges; and (4) aggregation of local first and last appearances into a temporally scaled global sequence. Parameters governing (2) and (3) were varied over a wide range of values to yield effective global sampling rates spanning more than three orders of magnitude (see Fig. 8).

#### Species durations

To emphasize the effects of sampling rather than variation in the diversification process, we simulated the history of speciation and extinction with a Moran process (Moran 1958: pp. 61–62; Nee 2006: p. 9), which yields constant diversity with stochastic variation in the exact times of speciation and extinction. (Alternative simulations that allow stochastic variation in diversity yield similar results; see later.) Each realization starts with *n*_{0} species at time *t* = 0. For each of these species, a duration is sampled at random, and that duration specifies its time of extinction. A new species originates precisely when a previously existing species becomes extinct, and the new species is assigned a duration at random. This procedure is iterated until a specified upper time limit, *t*_{max}, is reached. For simplicity, we use an exponential distribution of durations. We scale the simulations to make them roughly comparable with graptoloid evolutionary history, with an extinction rate *μ* = 0.6 per Lmy (i.e. a mean duration equal to 1.67 myr), *n*_{0} = 100 and *t*_{max} = 50 myr. Thus, on average, 3000 species are produced in each realization. The number of these species actually sampled depends on three factors: the section length, *L*; the number of sections, *N*; and the local sampling rate, *r*.

#### Stratigraphic sections

We drew section lengths at random from a distribution scaled to have about the same shape as the empirical distribution of section lengths in the graptoloid data. The aim is not to reproduce the durations of sections, which are known only inexactly, but rather to obtain a reasonable proportional scaling. The length of each empirical section was set equal to the time elapsed between the oldest first appearance and youngest last appearance, according to the global composite, of all species found in that section. The empirical values of the relative section length, *L*_{R}, are simply these section lengths divided by the total span of time encompassed by the global composite. For each realization, the simulated distribution of sections was formed by pulling *N* values of *L*_{R} at random from the empirical distribution, then multiplying the *L*_{R} values by *t*_{max} and an additional scaling factor *L*_{S} to yield absolute section lengths, *L*. We then dropped the scaled sections at random so that they all fell entirely between *t* = 0 and *t* = *t*_{max}*.* For these simulations, *N* was varied between 10 and 100, and the scaling factor *L*_{S} was varied between 0.1 and 1.0. With *t*_{max} = 50, the corresponding mean absolute section length varied between 0.85 and 8.5 myr.

The simulations assume that all species are available for sampling in all sections that overlap their lifetimes. In other words, there is no provinciality. The introduction of provinciality would turn the problem into one of simple mixing. For example, consider a fauna generated with *n*_{0} = 100 and *N* = 100 and consisting of two provinces that share no species in common and that have 50 sections each. On average, such a simulation would yield the same results, in terms of numbers of species sampled and their stratigraphic ranges, as the combination of two independent simulations, each with *n*_{0} = 50 and *N* = 50.

#### Local sampling

For the part of each species’ duration that falls within a local section, occurrences were randomly dropped at a rate *r* per Lmy, and the oldest and youngest occurrences in each section were recorded. The local sampling rate was varied between 0.05 and 1.0 per Lmy. Note that our protocol makes the unrealistic assumption that species are sampled independently rather than as collections (cf. Alroy 1994*b*; Sheets *et al.* 2012; Melchin *et al.* 2017). We relax this assumption in the following.

#### Global aggregate ranges

For the simulated data, we define the global aggregate range as the difference between the oldest and youngest of all sampled local first and last appearances (Fig. 1). In other words, we assume complete knowledge of the ages of all local events in the sampled sections, with infinite precision. We do not suggest that such knowledge is realistic, but the assumption is appropriate for our goal of determining scaling relationships between local and global sampling.

### Applying the CONOP algorithms to the simulated range data

As described earlier, CONOP starts with local first and last appearance events, expressed in thickness, in those sections that have been sampled. It is potentially of interest to compare completeness estimates obtained with CONOP's composite stratigraphic ranges with those obtained with a perfect knowledge of the ages of all local first and last appearances in the sampled sections. For an additional set of simulations using the protocol just described, we determined the global aggregate ranges as before and we also recorded the height of all local first and last appearances, using the simplifying assumption of a constant accumulation rate within each section, to yield the relative spacing of events within each section. These local event sequences were then input to the CONOP program to generate a scaled global composite sequence using the scaling approach described by Sadler *et al.* (2009). The time units of this sequence are arbitrary and have no bearing on the two estimates of completeness that we derive, *P*_{s} and *M*_{R}. We generated ten realizations of each of 12 combinations of *r, L*_{S} and *N*, which were chosen to yield a wide range of completeness values. In contrast with the previous set of simulations, we set *n*_{0} so that *c.* 500 species would be sampled for each realization, a pragmatic choice that was necessary to enable this many solutions to be obtained in a reasonable amount of time (*c*. 2 × 10^{3} CPU hours). For these sampled species, we compared completeness estimates based on the global aggregate ranges with those derived from the CONOP ranges. We used the following CONOP settings:

PENALTY=’SEQUEL’

FORCECOEX=’SS’

FORCEFb4L=’ON’

SOLVER=’ANNEAL’

STEPS=20 000

TRIALS=2000

STARTEMP=200

RATIO=0.9995

HOODSIZE=’BIG’

In particular, note that the PENALTY setting specifies the objective function to be minimized. The most computationally intensive aspect of the ‘Level' objective function, which was used to develop the graptoloid global composite, is the need to repeatedly tally thousands of implied shortfalls in local ranges. A faster alternative turns the sequence constraint into the objective function and abandons measurement of the implied local range extensions. In CONOP terms, this is best achieved by the ‘Sequel' objective function, which minimizes the number of first occurrence before last occurrence pairs that are implied by the global composite, without support from a real section – that is, it considers local evidence for the sequence of range-end event pairs, but not the number of other events that separate them. Such an objective function uses less of the stratigraphic information, but becomes generally more acceptable as the number of local sections increases and with it the number of taxon pairs seen in the same section (Sadler 2009). The optimization is further accelerated because this measure of misfit does not need a secondary penalty for ranges that extend beyond the limits of local sections. It becomes fast enough to compare numerous runs of model data. Consider also the treatment of unconstrained sequence elements: two graptoloid assemblages only ever found in separate sections (Sadler & Sabado 2009). One assemblage may be much younger than the other, or they may belong to separate coeval provinces. The ‘Level' function can interleave the two assemblages arbitrarily, as if they were from separate provinces. The ‘Sequel' objective function will tend to separate them in the composite sequence because the penalty for implying unobserved coexistences is much higher. This is appropriate for our model data, which do not include separate provinces. If the lack of shared occurrences is an artefact of sparse sampling, then the tendency of the ‘Sequel’ function is to err in favour of shorter ranges that imply fewer coexisting pairs of taxa. It remains an appropriately conservative approach for our purposes.

### Simulations exploring the sensitivity of our estimation procedure to violations of model assumptions

Our approach to estimating completeness assumes that the sampling rate is constant with time and is uniform among species, a notion that is patently false (Patzkowsky & Holland 2012). To test the sensitivity of our results to violations of these assumptions, we simulated stratigraphic ranges with sampling rates that vary in a known way and then analysed the simulated ranges with the naïve assumption that they do not vary. As before, all the simulations assume a constant extinction risk of *μ* = 0.6 per Lmy. We generated and averaged 100 realizations for each set of prescribed sampling conditions. Each realization consisted of 1000 simulated stratigraphic ranges, randomly drawn from the larger set generated. This figure is comparable with the number of graptoloid species analysed in each stratigraphic subset of data.

First we explored a cyclical variation in sampling rate, modelled for simplicity as a sinusoid varying around a long-term mean rate, ^{−}^{1}) with six values of cycle length, in terms of multiples of mean species duration (0.1, 0.2, 0.5, 1.0, 1.5 and 2.0).

Sampling may also vary systematically within the lifetime of a species (Nicol 1954; Raia *et al.* 2006; Foote *et al.* 2007; Liow & Stenseth 2007). We therefore carried out a second simulation in which we varied the sampling rate as a half sine wave from *z* from 0.1 to 0.5 and varied the mean sampling rate,

In addition to temporal variations in sampling, we need to consider variation among species. Following Wagner & Marcot (2013), we modelled among-species variation with a log-normal distribution. For each value of the mean sampling rate, *μ*_{log} and *σ*_{log}, so that the arithmetic mean was conserved while the ratio of the standard deviation to the mean took on specified values ranging from 0.1 to 1.0. Each stratigraphic range depends on a randomly drawn duration from the exponential distribution and a randomly and independently chosen sampling rate from the log-normal distribution.

Except for the CONOP algorithms and a few functions translated to C for speed, computation was carried out in the R environment (R Core Team 2016). The key functions are included in the Supplementary materials.

## Results

### Empirical completeness estimates

Figure 3 shows completeness estimates for the three models of species duration we consider (see also Tables 3–5). Comparing among models, the two measures of completeness are inversely correlated – that is, when the proportion of species sampled is higher, then the proportion of their duration preserved as the global composite range is generally lower. This result stands to reason. The probability of being sampled at least once increases monotonically with duration. Therefore duration distributions that are more strongly skewed towards shorter durations (Fig. 4a) yield a lower proportion of species sampled. The mean length of time missing from the start of any species’ duration is simply the expected waiting time until the first instance of sampling, which is equal to 1/*s*, and similarly for the average time missing from the end of the species’ duration. The two bits that are snipped off represent an ever smaller proportion of the duration as the duration increases. For any distribution, the sampled species are biased towards the longer-lived end of the distribution (compare distribution shapes in Fig. 4a, b) and this bias is generally larger for distributions that are skewed towards shorter-lived species. For example, if we consider the distributions fitted to Ordovician species ranges (Fig. 2a; solid squares in Fig. 3), then the ratio of the mean duration of sampled species (equation 6) to the mean duration of all species is 1.12 for the exponential model, 1.17 for the log-normal model and 1.22 for the Weibull model.

When jointly fitting the sampling rate and the duration distribution to a set of stratigraphic ranges, an important observation is the proportion of species, *f*_{0}, with a range of zero – that is, those species confined to a single stratigraphic level. In general, the lower this proportion, then the higher the estimate of the sampling rate. As a sensitivity test, we took all minimal but non-zero stratigraphic ranges – namely, those where the first and last appearance in the scaled global composite are separated by a single event level – and collapsed them to zero. This protocol must, of course, increase *f*_{0} (Table 2) and decrease the estimate of sampling rate. The point of the test is to determine the magnitude of these effects (Fig. 3).

Considering the spectrum of duration distributions and the alternative treatments of extremely short stratigraphic ranges, we obtain a wide range of completeness estimates, with the proportion of species sampled varying from *c.* 70% to nearly 90% and the median proportion of duration preserved varying from *c.* 65 to 95%. Two factors allow us to narrow the range a bit. First, the Weibull distribution, with its preponderance of short-lived species, fits considerably better than the exponential and log-normal models (Table 6), even though its best-fit parameters under-predict *f*_{0} at *c*. 0.09 rather than *c*. 0.12. Second, we present simulation results suggesting that collapsing stratigraphic ranges is perhaps too conservative and yields systematic underestimates of completeness. We therefore suggest nominal estimates that *c*. 75% of species are sampled and *c*. 85% of their original durations are represented by their global composite ranges, values that represent the approximate mid-point between the two sets of Weibull-based estimates in Figure 3.

### Scaling simulations

The empirical distribution of section lengths, scaled to the total elapsed time in the graptoloid composite, is shown in Figure 5. Figure 6 gives an example of simulated section limits generated with *L*_{S} = 0.5 and *N* = 50. Each tick mark in Figure 6 depicts a local first or last appearance (or both) for a simulation in which *n*_{0} = 20 and *r *= 0.5. (Note that this value of *n*_{0} was chosen to keep the figure legible; in fact, we used *n*_{0} = 100 for the simulations presented here.) The total number of species produced in this simulation was 621, of which 417 were sampled at least once, with a total of 1718 unique local sampling levels. The local first and last appearance of a species are identical in 797 instances, i.e. the species is confined to a single level in the given section.

The estimated global sampling rate increases roughly linearly with each of the three contributing factors – the local sampling rate, the number of sections and the mean section length – when the other two factors are held fixed (Fig. 7). This result suggests that the global sampling rate, one of the parameters that we estimate from the empirical stratigraphic range data, has a fairly straightforward, concrete interpretation. If we construct a multiplicative regression model in which the estimated global sampling rate is equal to the product of the contributing factors, each raised to a power, then the match between the predicted and simulated sampling rates is excellent (*r*^{2} = 0.96) (Fig. 8).

We conducted alternative simulations in which speciation can occur at any point during the duration of a species, not just at its time of extinction, and in which a species can have multiple direct descendants. We modelled our approach on the continuous-time algorithm of Przeworski & Wall (1998), which is computationally simpler and more exact than discrete-time approximations (e.g. Raup *et al.* 1973). As for our constant diversity simulations, *μ* = 0.6 per Lmy, *n*_{0} = 100 and *t*_{max} = 50 myr. We set the speciation rate, *λ*, equal to *μ*. Thus diversity is expected to be constant on average, but it follows a random walk during each realization of the process. The results of these simulations are very similar to those of the constant diversity simulations. In particular, compared with Figure 8, we find *β*_{r} = 0.90, *β*_{N} = 0.70, *β*_{L} = 0.73 and *r*^{2} = 0.97.

As expected in light of equation (1) and the nearly linear relationship of Figure 7a, the relationship between the local sampling rate and the proportion of species sampled is non-linear, showing diminishing returns (Fig. 9). The same is true for the proportion of duration preserved and for the relationships between these quantities and other factors that contribute to the global sampling rate. Consistent with these simulated results, Sadler *et al.* (2011, fig. 5) previously showed a non-linear increase in the number of graptoloid species sampled with an increase in the number of local sections studied.

As mentioned earlier, these simulations assume that species within local sections are sampled independently. We modified the simulation protocol so that collections or horizons occur randomly within sections at a given rate, *h*, per section-million-years. Any species extant at the time of the sampled horizon has a probability, *P*, of being sampled at that horizon. As with the less realistic, but computationally simpler, model of independent occurrences, the model of clustered occurrences yields nearly linear relationships between each contributing factor and the estimated global sampling rate (Fig. 10), and the product of these factors predicts that rate rather well (Fig. 11). In other words, whether we assume that species are sampled independently or together in horizons, we find simple scaling relationships between the estimated global sampling rate and the local factors that determine it.

### CONOP simulations

For biostratigraphic or macroevolutionary purposes, it generally matters little whether the stratigraphic range of a species is confined to a single resolvable level or two very closely spaced levels. To estimate completeness from stratigraphic ranges alone, however, the role of single-horizon species implies that the distinction between a range of zero and a short non-zero range can be significant. If the CONOP algorithm tended to separate the first and last appearances of species in which these events were not in fact resolvable, then the frequency of single-horizon species could be underestimated and the sampling rates concomitantly overestimated. We agree that the truth of the global sequence of first and last appearances is not in principle knowable; that the best available approach is therefore to determine the parsimonious global sequence that implies the fewest contradictions with local sequences of events; and that simulated data do not provide a thorough test of sequencing algorithms because the realism of the simulation conditions cannot be verified (Sadler 2013*b*, p. 31). We nonetheless see value in using the relationship between a simulated ‘truth' and the corresponding CONOP solution to explore the sensitivity of completeness estimates to the best-fit solution.

Figure 12 compares completeness estimates based on two sets of stratigraphic ranges: the perfectly known, simulated global aggregate ranges; and the scaled global composite ranges from the CONOP solution to the simulated local data. The most striking feature of Figure 12 is that the two completeness estimates are well correlated. However, the scatter around the 1:1 line is not random. Instead, the CONOP stratigraphic ranges consistently overestimate completeness for some combinations of contributing factors, whereas for other combinations they systematically underestimate completeness. Inspection of the position of the points in Figure 12 with respect to the values of the sampling parameters shows that, for each parameter, the lower values correspond with a greater tendency to lie above the 1:1 line and that this tendency is more pronounced for the number of sections and for section length than it is for the local sampling rate (Table 7). To the extent that the simulated results reflect the real world, they suggest that improved sampling, coupled with CONOP analysis, may lead to conservative estimates of completeness.

For conditions in which the CONOP ranges overestimate completeness, the bias results in part from separating the first and last appearances of single-horizon species. If we take species whose first and last appearances are separated by a single event level and collapse their ranges to zero, then the bias vanishes (Fig. 13). In fact, the protocol of collapsing ranges overcompensates and yields completeness estimates that are systematically lower than those based on the global aggregate range. To the extent that the simulated local ranges are relevant to real world data, these results imply that completeness estimates derived from a CONOP seriation and accompanied by collapsed ranges may be conservative. Referring back to Figure 3, we therefore suggest that empirical completeness measures for graptoloids probably lie between the two alternative sets of values.

### Sensitivity tests

The results of simulated variable sampling are presented in Figures 14–16. In all cases, the abscissa shows the actual measures of completeness in the simulated data and the ordinate shows the estimates obtained by naïvely analysing these data as if the sampling were constant.

With sampling that varies cyclically and independently of the times of speciation and extinction, the simulated and estimated completeness measures agree reasonably well (Fig. 14). Species durations are just as likely to intersect times of rising and falling sampling and the effects of variation nearly cancel out. At longer cycle lengths, there is a slight upward bias, but it is within the analytical uncertainty of our empirical completeness estimates (Fig. 3). The maximal proportional deviations between the true and estimated values are 2.4% (Fig. 14a) and 4.8% (Fig. 14b).

When sampling rises and falls symmetrically within the history of each species, the estimated proportion of species sampled accurately captures the true value; in other words, the mean sampling rate in the simulated scenario determines whether a species is sampled (Fig. 15a). Especially at larger amplitudes of variation, however, the naïve estimates of the proportion of duration preserved are biased upwards, with a maximum proportional deviation of 17.8% (Fig. 15b). Because sampling is systematically lower early in the history of a species, the waiting time between speciation and the first instance of sampling is extended compared with a constant sampling model (so more of its true duration goes unsampled) and likewise for the later part of its history.

The two aspects of completeness respond differently to among-species variance in sampling (Fig. 16). With the mean fixed, a higher variance in a log-normal distribution corresponds to more species with very low sampling rates. These species tend not to be sampled at all, so that the actual proportion sampled is lower than what we would estimate from the ranges of the species that are sampled (Fig. 16a); the maximum proportional deviation between the true and estimated values is 20.5%. At the same time, those species that are sampled at least once tend to come from the better-sampled part of the distribution; they have a mean sampling rate higher than the mean of the distribution as a whole. This bias has two effects. First, as would be expected, species with higher sampling rates tend to have longer stratigraphic ranges for a given true duration. Second, they show a slight tendency to have shorter true durations because a short-lived species has a higher probability of being sampled if its sampling rate is higher. These two effects combine to yield a higher median ratio of range to duration when the sampling rate varies (Fig. 16b, abscissa). With a lower variance: mean ratio, the naïve estimate (Fig. 16b, ordinate) is close to what the true value would be if the sampling rate was constant. With a more variable distribution, the proportional bias of the naïve estimate is up to 9.3% too low.

## Discussion

Our starting premise is that the global stratigraphic record of graptoloids is adequate to make them effective in international zonation and correlation. Our goal therefore has not been to ask whether the record is good enough, but rather to quantify how good is good enough. In rough terms, we estimate that 75% of Ordovician–Silurian graptoloid species have been sampled and that, of those known from more than one resolvable stratigraphic level, *c.* 85% of their original durations are represented by their composite stratigraphic ranges. It is important to emphasize that the relevant pool of species consists of those that lived in the regions represented by the preserved sections actually sampled. No doubt there were many species living only in areas that have no known record or a record that has not been included in forming the scaled global composite. So, when we estimate that 75% of species have been sampled, we mean 75% of those species that had the potential to be represented in light of where sediment was accumulating in the past, has survived to the present day and has been studied. Moreover, our conclusions regarding the integrated global record of graptoloids are not meant to imply that local and regional records approach the global record in terms of completeness.

Using a slightly different version of the global graptoloid data, Sadler *et al.* (2011) documented *c.* 2200 species and, based on the pattern of the increase in the number of sampled species with sampled sections, projected that the ultimate number of species could be ‘2800 or more’. A figure of 2800 would imply that 79% of species have been sampled at least once, an estimate close to those derived herein in a completely different way.

To put the graptoloid figures in context, we consider several completeness estimates that have been developed for marine animals with data that are meant to be representative of the palaeontological record as a whole. Applying a form of mark-recapture analysis (Nichols & Pollock 1983) to *c*. 31 000 genus-level first and last appearances from the compendium of Sepkoski (2002), Foote (2003, fig. 5) estimated sampling probabilities that varied widely over time, but were in the neighbourhood of 0.5 per genus per stage (with stages averaging *c.* 7 myr). The estimates of Connolly & Miller (2001, fig. 2) for selected Ordovician groups were a little higher, between 0.6 and 0.8 per genus per 8-myr-long interval. Using a discrete-time variant of the method of our paper with Sepkoski's data, Foote & Sepkoski (1999) estimated sampling probabilities of *c.* 0.5 on average per genus per 5-myr-long interval. Crampton *et al.* (2006, table 2) analysed data on Cenozoic molluscs of New Zealand, using variants on mark-recapture analysis and the methods of this paper, to estimate that *c.* 32% of species were sampled at least once. With data from the Paleobiology Database (https://paleobiodb.org), which include multiple occurrences and therefore allow gaps in the record to be analysed, we find that, in aggregate, 45% of genera known to be present in a stage (because they are found both before and after it) are, in fact, sampled in that stage (see Foote *et al.* 2016 for description of data). Alroy (2008, 2014) developed an alternative approach to estimating sampling probabilities, which is meant, in part, to reduce the effects of spurious gaps in the record caused, for example, by taxonomic error. Applying the ‘three-timer' method of Alroy (2008, 2014) to the same data, we estimate an average sampling probability of 78% per genus per stage.

The foregoing estimates of the proportion of taxa sampled per time interval are likely to be inflated, insofar as they involve genera with multiple species and they integrate over several million years, longer than the characteristic lifetime of a graptoloid species. Despite this upwards bias, the completeness of this haphazard sample of fossil taxa is generally lower than that of the biostratigraphically important graptoloids.

Consider another group of organisms that has also been central in biostratigraphy and in macroevolutionary studies: Cenozoic mammals. Alroy (1992, 1994*a*, *b*, 1996, 2000; Alroy *et al.* 2000) used an algorithm, analogous to CONOP, to develop a time-scaled ordination of first and last appearances of North American mammal species. Applying the same methods that we applied to graptoloid ranges, and bearing in mind that completeness estimates pertain only to those species that lived in areas represented in the data of Alroy, we estimate that 88% of species have been sampled at least once, and that the median proportion of the duration preserved, of species with a non-zero range, is 77% (Table 8). These estimates are comparable with those we find for graptoloids. By contrast, estimates of extinction and sampling rates for Cretaceous mammals imply on the order of 10% of species sampled (Foote *et al.* 1999). The difference is not surprising in light of, for example, the larger ecological footprint of Cenozoic mammal species and the general loss of terrestrial records with increasing geological age. Moreover, the data of Alroy benefit from the aggregation of thousands of faunal lists into a single composite sequence, and, consistent with the goal of developing a resolved chronology useful for macroevolutionary analysis, they are also high-graded – for example, by eliminating taxa known from just a single faunal list.

Our approach has been to fit a set of parameters, describing the probability distribution of durations and the sampling rate, to an observed distribution of stratigraphic ranges. However, it is the derived quantities – the proportion of species sampled and the proportion of duration preserved – rather than the fitted parameters that are of primary interest. Consider the exponential model of durations (Fig. 17). The parameters (the extinction rate and the sampling rate) have uncertainties that co-vary, as shown by the orientation of the log-likelihood contours in Figure 17a. This covariance is intuitively reasonable. For example, a slight overestimate of extinction rate – that is, an underestimate of mean durations – is in effect compensated for by a corresponding overestimate of sampling rate. Given the relationship that

Our empirical estimates of completeness assume that the sampling rate is constant over time and among species. This assumption is unrealistic for graptoloids, as for any group. Boyle *et al.* (2017) documented the among-species variation in sampling proxies within graptoloids, and, if we take the number of local sections that contribute to the global composite as a sampling proxy, then we see that sampling also varies over time (Sadler *et al.* 2011, fig. 7), although major gaps are unlikely to be common given the nature of the graptolitic shale facies (Sadler *et al.* 2011).

The results of simulating temporal variation in sampling rate suggest a slight upwards bias in completeness estimates if the sampling rate varies cyclically over time, but in a way that is not synchronized with the origins and extinctions of species (Fig. 14). By contrast, a systematic variation in sampling among species and within the lifetimes of species may lead to larger biases (Figs 15 and 16). The biases from these last two sources work in opposite directions. Determining their relative importance empirically requires more detailed data on species occurrences that transcend the first and last appearance data at hand. Some previous studies suggest that the occurrence rate within the lifetimes of species generally deviates from the long-term average by <50% (Raia *et al.* 2006, fig. 4; Foote *et al.* 2007, fig. 2; Liow & Stenseth 2007, fig. 1), meaning that the largest bias shown in Figure 15 may be unrealistically high. Turning to among-species variation, if we tabulate the total number of sections in which each graptoloid species is found, this count has a mean of 4.9 and a standard deviation of 7.0. Thus the ratio of the standard deviation to the mean is a little higher than the largest value shown in Figure 16, but this ratio may well be an overestimate because the count of sections is integrated over the entire species duration. The very rough empirical scaling just laid out suggests a possible, fortuitous cancellation of biases, but it would be premature to draw firm conclusions.

Rather than simulate the magnitude of bias caused by fitting an overly simplistic model, it is also possible to fit a more complex, potentially more realistic, model. To begin to explore this possibility, we modified our maximum likelihood method to allow a log-normal variation in sampling rate among species (Wagner & Marcot 2013), while still keeping sampling constant within the history of each species. Admittedly, we hesitated before going down this rabbit hole, because we suspected that the data at hand, which consist only of stratigraphic ranges and include no information on multiple occurrences within individual species, would not provide adequate constraints for more parameter-rich models than those we have emphasized so far. Indeed, we find a very broad range of solutions that fit the data nearly equally well. Figure 18 illustrates the nature of the problem, with the exponential model of durations applied to Ordovician species. The log-likelihood is sensitive to the extinction rate (*μ*) and the location parameter (*μ*_{log}) of the log-normal distribution of sampling rates, but it is nearly insensitive to the shape parameter (*σ*_{log}) of this distribution. This is not to say that the among-species variance in sampling is, in fact, irrelevant (see Fig. 16), simply that we have insufficient data to constrain it. Turning to our derived measures of completeness, their maximum likelihood estimates (Fig. 18) are nearly indistinguishable from those that we obtain with the assumption of uniform sampling (Fig. 17).

Our sensitivity tests explore simple violations of model assumptions – in particular, the independent sampling of species and uniform sampling among species and over time – and the effects of these violations on completeness estimates. Further complications could also be explored, many of which are covered under the umbrella of ecology. Species may expand and contract in geographical range over their lifetimes, leading to true diachroneity of first and last appearances that transcends the sampling-based, false diachroneity in our models (Foote *et al.* 2018). Species also vary in their preferred habitats and the breadth of habitats they can tolerate (Cooper *et al.* 1991; Underwood 1993; Sheets *et al.* 2016). This aspect of ecology leads to a greater clustering of species occurrences than in our horizon-based sampling model, and to the non-random location of horizons within sections and shifting centres of occurrence as facies migrate (Holland 1995). Indeed, an alternative approach to estimating palaeontological completeness is to add empirically calibrated ecological tolerances and geologically controlled facies variation to simulated speciation and extinction histories such as those used herein (Holland & Patzkowsky 2002). This approach allows estimates of local as well as global completeness.

In conclusion, by estimating sampling rates of Ordovician–Silurian graptoloids, we find them to have one of the most complete, or most completely documented, records among groups of fossil organisms. We predict that other groups central to biostratigraphy – including conodonts, calcareous nannoplankton, diatoms, radiolarians, planktonic forams and ammonoids – will be found to have similarly complete global records if the methods developed herein are applied to them.

## Acknowledgements

J. Alroy kindly provided data on stratigraphic ranges of North American mammals. We thank the contributors of the Paleobiology Database, including: M. Aberhan, J. Alroy, D. Bottjer, M. Clapham, F. Fürsich, N. Heim, A. Hendy, S. Holland, L. Ivany, W. Kiessling, B. Kröger, A. McGowan, A. Miller, P. Novack-Gottshall, T. Olszewski, M. Patzkowsky, M. Uhen, L. Villier, and P. Wagner. For discussion and advice, we thank F. Ciesla, G. Hunt, D. Jablonski, S.A. Kurtz, S.M. Kidwell, C. Parins-Fukuchi, M. Przeworski, G.J. Slater, P.D. Smits, P.J. Wagner, J.D. Wall, S.C. Wang and M. Webster. We thank S.M. Holland and an anonymous reviewer for thoughtful comments on this paper. Computational resources were provided by the University of Chicago.

## Funding

This work was funded by the University of Chicago.

## Author contributions

MF: Conceptualization (lead), formal analysis (supporting), investigation (supporting), methodology (supporting), software (supporting), writing – original draft (lead) and writing – review and editing (lead). PMS: Data curation (equal), formal analysis (equal), investigation (equal), methodology (equal), software (lead), writing – original draft (equal) and writing – review and editing (equal). RAC: Data curation (equal), formal analysis (equal), investigation (equal), methodology (equal), writing – original draft (equal) and writing – review and editing (equal). JSC: Data curation (equal), investigation (equal), methodology (equal), writing – original draft (equal), writing – review and editing (equal).

*Scientific editing by Philip Donoghue*

- © 2019 The Author(s). Published by The Geological Society of London. All rights reserved