AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
Abstract--A new method of finding the optimal group membership and number of groupings to partition population genetic distance data is presented. The software program Partitioning Optimization with Restricted Growth Strings (PORGS), visits all possible set partitions and deems acceptable partitions to be those that reduce mean intracluster distance. The optimal number of groups is determined with the gap statistic which compares PORGS results with a reference distribution. The PORGS method was validated by a simulated data set with a known distribution. For efficiency, where values of n were larger, restricted growth strings (RGS) were used to bipartition populations during a nested search (bi-PORGS). Bi-PORGS was applied to a set of genetic data from 18 Chinook salmon (Oncorhynchus tshawytscha) populations from the west coast of Vancouver Island. The optimal grouping of these populations corresponded to four geographic locations: 1) Quatsino Sound, 2) Nootka Sound, 3) Clayoquot + Barkley sounds, and 4) southwest Vancouver Island. However, assignment of populations to groups did not strictly reflect the geographical divisions; fish of Barkley Sound origin that had strayed into the Gold River and close genetic similarity between transferred and donor populations meant groupings crossed geographic boundaries. Overall, stock structure determined by this partitioning method was similar to that determined by the unweighted pair-group method with arithmetic averages (UPGMA), an agglomerative clustering algorithm.
**********
Genetic diversity in salmon species is thought to be maintained through high homing fidelity, which limits gene flow between spawning sites (Ricker, 1972; Quinn and Dittman, 1990). As a general rule, populations that are geographically close tend to be genetically similar, creating natural clusters of similar populations. Identification of genetically similar salmonid populations is important for fisheries management initiatives directed at conserving genetic diversity (Riddell, 1993; Waples et al., 2001). Consequently, managers are faced with the challenge of defining the number and size of these genetic groups. Furthermore, determining valid groupings of populations at a fine scale allows managers to make informed decisions regarding harvest levels and population-enhancement strategies. For British Columbia Chinook salmon (Oncorhynchus tshawytscha) populations, genetic markers have been used to determine genetic distance between populations and to provide considerable power for defining regional stock structure (Teel et al., 2000; Beacham et al., 2006a).
Clustering or grouping data are useful in many disciplines; as a result there is a wide assortment of methods available for representing data, measuring proximity between data elements, and grouping elements (e.g., Jain et al., 1999). For Pacific salmon, population-specific allelic frequencies are ascertained from spawning ground samples by using genetic markers at a number of loci. From these allelic frequencies, a metric of overall genetic difference between populations is used to estimate pair-wise genetic distances. Three commonly used distance measures are Nei's distance, [D.sub.s] (Nei, 1987), Nei's modified Cavalli-Sforza chord distance [D.sub.A] (Cavalli-Sforza and Edwards, 1967; Nei et al., 1983), and Weir and Cockerham's (1984) estimator of [F.sub.st], the coancestory coefficient [theta]. Once a distance measure is selected, a proximity matrix is created which shows genetic distance between each pair of populations.
Clustering is often used to group populations, either by merging small clusters into larger ones (agglomerative) or by splitting larger clusters into smaller ones (divisive). A number of algorithms are available to decide which small clusters are merged or which larger clusters are split (e.g., Swofford et al., 1996; Jain et al., 1999). Groupings can be depicted as a branching tree or dendrogram where branch length is scaled to represent genetic distance. A drawback with the hierarchical approach is that the result is sensitive to initial groupings, which are not permitted to change once an assignment has been made. Furthermore, arbitrary tie-breaking actions, either in the original proximity data or during agglomeration, can cause instability in the tree structure (van der Kloot et al., 2005). Consensus from multiple tree constructions by bootstrapping across loci provides a measure of robustness of the apparent dominant tree structure (Felsenstein, 1985). A majority-rule consensus tree can provide a phylogeny with groups that occur in a majority of the bootstrap samples. However, the incorporation of variation from consensus trees appears to have limited quantitative application, and the optimum cluster number is not obvious.
[FIGURE 1 OMITTED]
This article provides a new method for partitioning genetic distance data by finding the optimal group membership and number of groupings. We validate the method using simulated data. To demonstrate the utility of this partition method, we applied it to genetic distance data calculated from samples taken from 18 Chinook salmon populations along the west coast of Vancouver Island, British Columbia (Fig. 1). The groupings determined by this method were evaluated with respect to known transfers of broodstock and histories of stock enhancement. Furthermore, results from both the simulated and Chinook salmon data sets were compared to results from a commonly used clustering method for genetic data.