As simple as it may seem, sample data generation is not a trivial task, especially when random landmarks are to be generated. Usually, one would use multivariate normal distribution-based generator (like mvrnorm in R`s MASS package) in order to generate correlated data. The following function was used to generate data similar to the real world datased, unfortunately based directly on it, by using the coefficients from regressions between successive columns in a data matrix, which represent landmark coordinates in XY data matrix. The following function uses this data matrix (446 individuals and 28 landmarks), and performs regressions between X-Y pairs for all coordinates. Finally it uses intercepts and slopes to infer mean and SD for rnorm function, random number generator.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
When the function finishes the output is also an XY matrix, which needs to be converted to an array in order to use it in gpagen function from the geomorph package. After that the procedure follows all the usual steps of the GM analysis, with the exception of factor levels generation in order to simulate grouping, and finally performing PCA on the Procrustes shape variables.
1 2 3 4 5 6 7 8 9 10
PCA is then done using the usual R`s prcomp function and ggplot2 for plotting the data points using fantastic ColorBrewer color schemes (which are the names of types and palettes in ggplot2 scale_color_brewer geom). In order to fine-tune the PCA figure, the ggplot2 can also use custom fonts for plot annotation. Prior to that, fonts must be imported and registered, which is greatly facilitated by using the extrafont library.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
The plot indicates very little differentiation between the populations, but I guess that`s well expected since so much randomness is at hand.