formatDistData.Rd
Convert individual-level distance data to the transect-level
format required by distsamp
or gdistsamp
formatDistData(distData, distCol, transectNameCol, dist.breaks,
occasionCol, effortMatrix)
data.frame where each row is a detected individual. Must have at least 2 columns. One for distances and the other for transect names.
character, name of the column in distData that contains the distances. The distances should be numeric.
character, column name containing transect names. The transect column should be a factor.
numeric vector of distance interval cutpoints. Length must equal J+1.
optional character. If transects were visited more
than once, this can be used to format data for gdistsamp
. It is
the name of the column in distData that contains the occasion
numbers. The occasion column should be a factor.
optional matrix of 1 and 0s that is M * T in size and will allow for the insertion of NAs where the matrix = 0, indicating that a survey was not completed. When not supplied a matrix of all 1s is created since it is assumed all surveys were completed.
This function creates a site (M) by distance interval (J) response
matrix from a data.frame containing the detection distances for each
individual and the transect names. Alternatively, if each transect was
surveyed T times, the resulting matrix is M x JT, which is the format
required by gdistsamp
, seeunmarkedFrameGDS
.
An M x J or M x JT matrix containing the binned distance data. Transect names will become rownames and colnames will describe the distance intervals.
It is important that the factor containing transect names includes levels for all the transects surveyed, not just those with >=1 detection. Likewise, if transects were visited more than once, the factor containing the occasion numbers should include levels for all occasions. See the example for how to add levels to a factor.
# Create a data.frame containing distances of animals detected
# along 4 transects.
dat <- data.frame(transect=gl(4,5, labels=letters[1:4]),
distance=rpois(20, 10))
dat
#> transect distance
#> 1 a 5
#> 2 a 6
#> 3 a 9
#> 4 a 11
#> 5 a 13
#> 6 b 4
#> 7 b 7
#> 8 b 9
#> 9 b 8
#> 10 b 11
#> 11 c 16
#> 12 c 4
#> 13 c 11
#> 14 c 7
#> 15 c 9
#> 16 d 11
#> 17 d 7
#> 18 d 9
#> 19 d 11
#> 20 d 12
# Look at your transect names.
levels(dat$transect)
#> [1] "a" "b" "c" "d"
# Suppose that you also surveyed a transect named "e" where no animals were
# detected. You must add it to the levels of dat$transect
levels(dat$transect) <- c(levels(dat$transect), "e")
levels(dat$transect)
#> [1] "a" "b" "c" "d" "e"
# Distance cut points defining distance intervals
cp <- c(0, 8, 10, 12, 14, 18)
# Create formated response matrix
yDat <- formatDistData(dat, "distance", "transect", cp)
yDat
#> [0,8] (8,10] (10,12] (12,14] (14,18]
#> a 2 1 1 1 0
#> b 3 1 1 0 0
#> c 2 1 1 0 1
#> d 1 1 3 0 0
#> e 0 0 0 0 0
# Now you could merge yDat with transect-level covariates and
# then use unmarkedFrameDS to prepare data for distsamp
## Example for data from multiple occasions
dat2 <- data.frame(distance=1:100, site=gl(5, 20),
visit=factor(rep(1:4, each=5)))
cutpt <- seq(0, 100, by=25)
y2 <- formatDistData(dat2, "distance", "site", cutpt, "visit")
umf <- unmarkedFrameGDS(y=y2, numPrimary=4, survey="point",
dist.breaks=cutpt, unitsIn="m")
## Example for datda from multiple occasions with effortMatrix
dat3 <- data.frame(distance=1:100, site=gl(5, 20), visit=factor(rep(1:4, each=5)))
cutpt <- seq(0, 100, by=25)
effortMatrix <- matrix(ncol=4, nrow=5, rbinom(20,1,0.8))
y3 <- formatDistData(dat2, "distance", "site", cutpt, "visit", effortMatrix)