Objective
In this lab, we will use a variety of approaches to visualize and analyze patterns of species composition and diversity between sites. These approaches include ordination, rank abundance diagrams, and diversity indices.
Methods
First, you will need to load and inspect the data file from the combined Tuesday and Wednesday labs.
data <- read.csv("insect.data.csv")
head(data)
Next we need to load the package vegan that provides all the functions for the analyses below (you may need to install the package first).
library(vegan)
Rank-abundance diagrams
Rank-abundance curves give you a visual picture of diversity by plotting abundance on the y-axis ordered by ranked-abundance on the x-axis. Your text has a discussion of rank-abundance diagrams that you can consult for further details.
In order to make a rank-abundance diagram (which we will do at the family level), we first need to determine the total number of individuals sampled for each family. We do this by cross tabulation using the function table() in R:
## Create a "crosstabulation table" of Site and Family families.site <- table(data$Site,data$Family) ## Extract Family labels from two sites for plotting later HMF.labels <- names(sort(families.site["HMF",], decreasing=TRUE)) MasonHill.labels <- names(sort(families.site["MasonHill",], decreasing=TRUE))
Be sure to enter families.site to visualize the summarized data.
Next, we can use the function rad.lognormal() to generate the rank-abundance data as well as a best-fit curve based on a lognormal distribution.
## Rank abundance diagram (rad) analysis result.rad <- apply(families.site,1,rad.lognormal)
Finally we can create our rank-abundance diagrams.
# RAD for Mason Hill par(mai=c(1.75,1,.25,.25)) # Increase margin size for labels plot(result.rad$MasonHill, xaxt="n", xlab="") axis(side=1, at=c(1:length(MasonHill.labels)), labels=MasonHill.labels, las=3) # Add labels # RAD for HMF par(mai=c(1.75,1,.25,.25)) # Increase margin size for labels plot(result.rad$HMF, xaxt="n", xlab="") axis(side=1, at=c(1:length(HMF.labels)), labels=HMF.labels, las=3) # Add labels # RAD plots for plots combined par(mai=c(1,1,.25,.25)) # Reset margins to default plot(result.rad$MasonHill, pch=16, cex.lab=1.5) # MasonHill site points(result.rad$HMF, pch=16, col="blue") # Add HMF data lines(result.rad$HMF, col="blue") # Add HMF prediction
Ordination
We can also use ordination to link differences in community composition to differences in environmental variables.
Some definitions of ordination (from: http://ordination.okstate.edu/glossary.htm)
- "Ordination is the collective term for multivariate techniques that arrange sites along axes on the basis of data on species composition" (ter Braak 1987)
- "The term 'ordination' derives from early attempts to order a group of objects, for example in time or along an environmental gradient. Nowadays the team is used more generally and refers to an 'ordering' in any number of dimensions (preferably few) that approximates some pattern of response of the set of objects. The usual objective of ordination is to help generate hypotheses about the relationship between the species composition at a site and the underlying environmental gradients" (Digby and Kempton 1987)
The technique that we will use, canonical correspondence analysis (CCA) is currently the preferred method used by community ecologists. For this analysis, we will focus on the ordinal level. As above, we use cross tabulation to summarize the raw data prior to analysis:
## Create a "cross tabulation table" of Site, Substrate, Habitat, and Order orders <- table(paste(data$Site,data$Substrate,data$Habitat),data$Order)
You can enter orders to visualize these summarized data.
Finally we do the analysis:
## Extract variables from above
vars <- matrix(unlist(strsplit(row.names(orders), split=" ")),
ncol=3, byrow=T)
## Define substrate and habitat variables
substrate <- factor(vars[,2])
habitat <- factor(vars[,3])
## Plot of Canonical Correspondence Analysis
plot(cca(orders~substrate+habitat))
Diversity indices
Finally, you may want to compare the overall diversity between sites. Diversity can be decomposed into richness (the number of species in a community) and evenness (the distribution of species in a community). A number of diversity indices have been developed that integrate these two components - we will use the Shannon-Weaver diversity index to compare diversity between substrates and habitats (note that as above, the analysis begins with cross tabulation of the raw data.):
## Create a "crosstabulation table" of Substrate, Habitat, and Family families.env <- table(paste(data$Substrate,data$Habitat),data$Family) ## Number of families per site Number <- specnumber(families.env) ## Shannon-Weiner diversity index between sites Diversity <- diversity(families.env, index="shannon") ## Table of results rbind(Number, Diversity)
One issue with using a single index of diversity between sites is that this does not account for differences in sampling effort (or effectiveness). In particular, the number of species will increase in proportion to the number of samples (this is analogous to the idea of a species-area curve that will be covered in class). To account for sampling bias, we "rarefy" our samples to the lowest number samples common to all sites. In other words, if site A had 8 species in 100 samples and site B had 4 species in 50 samples, we generate the predicted number of species in site A for 50 samples (+/- a confidence interval). This approach allows us to compare species diversity between sites and facilitates a statistical comparison:
## Minimum of total samples colllected between sites min.samples <- min(apply(families.env, 1, sum)) ## Generate predicted diversity at common sample size between sites rarefy.result <- rarefy(families.env, sample=min.samples, se=TRUE) rarefy.mean <- rarefy.result[1,] # Extract mean rarefy.se <- rarefy.result[2,] # Extract SE
Finally, we can use the function grasshopper.plot from the Grasshopper Lab to visualize the results. Note that you will need to copy and paste the function prior to using it.
grasshopper.plot(rarefy.mean,rarefy.se) # Plot results
Questions
Use the analyses above to address the following questions:
- Is there a difference among the habitats in the kind of insects found?
- What is the effect of substrate?
- What is the effect of water speed?
- What is the effect of watershed?
- Which community (ies) appear most diverse? Least diverse? Explain your
answer. - What environmental factor seems most important in supporting high diversity of aquatic insects?
| Attachment | Size |
|---|---|
| insect.data.csv | 11.92 KB |

