14

JOHN AITCHISON

D-dimensional real space RD. If we insist on a symmetric set of log ratios then we

may take

(10.3)

Zi

=

log{(xi/g(x)}, i =

1, ...

,D,

with inverse

(10.4)

Xi=

exp(zi)/{exp(zt) +· ·· +exp(zv)}, i = 1, ... ,D,

where

g(x)

is the geometric mean of the components of

x.

This is a transformation

between the unit simplex

Sd

and the hyperplane Z1 + · · · +

ZD

=

0 in d-dimensional

real space R_d. The new constraint on the transformed composition is not a trans-

fer of the so-called constant-sum constraint but a penalty for the insistence on a

symmetric treatment of the components of the composition. It is linked to the use

of the singular centered log ratio covariance matrix r(x) at (6.8). In practice this

singularity causes no interpretational or computational problem.

There are essentially four steps in any log ratio analysis of compositional data.

(1)

Reformulate the compositional problem in terms of log

ratios of the com-

ponents.

(2) Transform the compositional

data set into compatible log ratio vectors.

(3) Since the log ratio vectors are

in real space and free of the constant sum

constraint simply apply the

appropriate multivariate methodology associ-

ated with unconstrained

vectors.

( 4)

Reinterpret the inference from the statistical analysis of the log ratios into

terms of the compositions.

A wide variety of compositional problems which can be studied through the

above log ratio transformation techniques is described in Aitchison

[A5].

These in-

clude tests of distributional form, log linear modeling to take account of experimen-

tal design and concomitant factors, testing various forms of pseudo-independence,

discriminant analysis, log contrast principal component analysis.

Moreover the link to the multivariate normal allows simple Bayesian analysis in-

cluding the use of predictive distributions. A question that often arises in the use

of form

(10.1)

of the log ratio transformation is whether the inference is sensitive

to the choice of divisor. Aitchison

[A5]

demonstrates that all these procedures are

invariant under the group of permutations of the components, and so in particular

of the choice of divisor.

Rather than reiterate these procedures we concentrate on

some more recent

developments.

11. Graphical display of compositional data

The biplot

[Gl], [G2]

is a well-established graphical aid in other branches

of

statistical analysis. Its adaptation for compositional and probability statement

data is simple and can prove a useful exploratory and expository tool. For the

compositional data set (6.3) the biplot is based on a singular value decomposition

of the doubly centered log ratio matrix

Z

=

[zri], where

N

Zri

=

log{xri/g(xr)}-

N-1Llog{Xri/g(xr)},

r=l

i

=

1, ... ,D,

r

=

1, ... ,N.

Let

Z

=

U

diag(k1,... ,kR)VT be the singular value

decomposition, where R is the rank of Z, in practice usually R

=

d,

and where

the singular values k1, ... , kR are in descending order of magnitude. The biplot