SIMILARITIES BETWEEN INFORMATION GEOMETRY AND EUCLIDEAN GEOMETRY

. Dually (cid:13)at spaces play a key role in the diﬀerential geometrical approach in statistics (information geometry) and many divergences have been studied as an amount which measures the discrepancy between two probability distributions. In a dually (cid:13)at space, a canonical divergence is naturally introduced and it satis(cid:12)es a relational expression called triangular relation. This can be regarded as a generalization of the law of cosines in Euclidean space. We introduce a new divergence corresponding to the squared distance in a dually (cid:13)at space. For this divergence, we show that more relational equations and theorems similar to Euclidean space hold and study the relationship with other divergences.


Introduction
Given a probability distribution p θ (x), we consider a manifold M with a parameter vector θ = (θ 1 , θ 2 , • • • , θ n ) ∈ R n as a coordinate system.Manifolds with probability distributions as elements are called statistical manifolds, and dually flat space is an important concept of it [1].In a dually flat space, it is possible to introduce a dual coordinate system η = (η 1 , η 2 , • • • , η n ) and the convex functions called potential ψ(θ) and ϕ(η).
For example, the manifold of the exponential family p θ (x) = exp(C(x)+ ∑ n i=1 θ i F i (x)− ψ(θ)) is a dually flat space for θ and η = E[F(x)], where x is a random variable, F i (x), C(x) are known functions and E [•] indicates expected value.Similarly, the manifold of the mixture family p η (x) = p 0 (x) + ∑ i η i (p i (x) − p 0 (x)) is a dually flat space, where p i (x)(i = 0, 1, 2, • • • , n) are probability distributions.
Given two probability distributions, the Kullback-Leibler divergence(KL-divergence) has long been known as an amount which measures the discrepancy.In a dually flat space, we may define an amount called a canonical divergence, and in exponential families the canonical divergence is consistent with the KL-divergence [1][2][3][4].The canonical divergence satisfies relational expressions called triangular relations for three points P, Q and R ∈ M .This can be regarded as a generalization of the law of cosines in Euclidean space.When a curve P Q and a curve QR are "orthogonal" , the triangular relation becomes the same expression as the Pythagorean theorem [1].This generalized Pythagorean theorem is an important role in the projection theorem.
The present paper aims at studying the properties similar to Euclidean space in dually flat spaces.We introduce a new divergence called "affine divergence " which can be expressed as an inner product of θ-coordinate and its dual η-coordinate.The affine divergence satisfies triangular relational expression as well as the canonical divergence.
We study the behavior of the affine divergence on θ and η-geodesics and show that the affine divergence satisfies the same summation formula as the squared Euclidean distance.We further show that the generalized parallelogram law and the generalized polarization identity hold like Euclidean space for the sum of vectors in θ and η-coordinate systems.
Using the sum of vectors of the affine coordinates, we introduce new dual divergences called ψ and ϕ-divergences.These divergences can be represented only by potentials.The ϕ-divergence is consistent with Jensen-Shannon divergence for mixture families.We derive inequalities that holds between the affine divergence and ψ, ϕ-divergence.Table1.shows the summary of the properties of divergences.

Canonical divergence
Let M be a dually flat space, we may define dual affine connections ∇, ∇ * and a Riemannian metric g ij .There exist dual affine coordinate systems θ, η and dual convex functions ψ(θ), ϕ(η) on M .An affine coordinate system θ corresponds to the connection ∇, and η corresponds to the connection ∇ * .ψ(θ) and ϕ(η) are in a relationship of Legendre transformation with each other.
where we use Einstein notation A relationships between the Riemannian metric and affine coordinates are (5) For two points P, Q ∈ M , we may define a divergence D(P ∥Q) called the canonical divergence as follows [1,2].
The canonical divergence is not symmetric with respect to P and Q.
We denote ψ(θ(P )) as ψ(P ) and ϕ(η(P )) as ϕ(P ).For the exponential family p θ (x) = exp(C(x) + θ i F i (x) − ψ(θ)), The Riemannian metric can be expressed as The right hand side of ( 7) is Fisher information matrix.The canonical divergence is consistent with the KL-divergence.The KL-divergence is for discrete distribution, and This formula plays important roles in this paper.When the dual geodesic connecting P and Q is orthogonal at Q to the dual geodesic connecting Q and R, the generalized Pythagorean Theorem holds.
Because coordinates θ ≡ {θ i } and η ≡ {η i } are affine, a curve represented in the form is a geodesic for ∇-connection and is a geodesic for ∇ * -connection, where {a i } and {a i } are constant vectors in R n , t ∈ R is a parameter along the geodesic.Propertiy 4. For the point Q(t) on θ-geodesic, θ i (Q(t)) = a i t+θ i (P ), the following equation holds.
3. Affine divergence 3.1.Definition of the affine divergence.In this section we introduce new divergence called "affine divergence" as an inner product of dual affine coordinates.First, we show the affine divergence satisfies three distance axioms except for the triangle inequality, and the affine divergence satisfies the properties similar to Euclidean space on geodesics.
Second, we prove the generalized parallelogram law and the generalized polarization identity.
Definition 2. We define the affine divergence D A : M × M → R as follows.
The affine divergence is an inner product of dual affine coordinates.In a selfdual space(θ i = η i for all i), the affine divergence is consistent with the squared Euclidean distance The affine divergence can be expressed as the sum of the canonical divergence.
Proof.The result follows by substituting P = R in the triangular relation(11) and using (19).
For the exponential family ), the affine divergence can be expressed as follows.
The right hand side of this equation is the Jeffreys divergence [5,7].

Properties along geodesics.
We show that the canonical divergence and the affine divergence monotonically increase along θ and η-geodesics.For three points P, Q, R ∈ M on the same geodesic, we show that the same relational expression as Euclidean space holds.Proposition 3. Let {a i } and {a i } be constant vectors in R n and t be t ∈ R. When points P, Q ∈ M are on θ-geodesic the affine divergence can be expressed as the affine divergence can be expressed as Because g ij and g ij are positive definite, the affine divergence monotonically increases with T .
Proof.We prove for θ.The same is true of η.

By combining
and (4) yields By substituting ( 27) to (19), we have the result.Equation (25) corresponds to Property 4. of the canonical divergence.

Corollary 1.
Let {a i } be a constant vector in R n and t be t ∈ R. When points the canonical divergence can be expressed Because g ij is positive definite, the canonical divergence D(P ∥Q(T )) monotonically increases with T . Proof.
and combining the canonical divergence definition( 6) and ( 2) yields (30) Hence, we have By substituting (25) to this equation and integrating with respect to t, we have the result.
Lemma 1.Let t be t ∈ R and P, R be points on a dually flat space M .For a point holds.
Proof.We prove for θ-geodesic.The same is true of η-geodesic.By triangular relation (11), we have By assumption, holds.By substituting (35) to (34), the result follows.
When t ∈ (0, 1), taking into account D A (R, Q) ≥ 0, we have Corollary 2. Let t be t ∈ R and P, R be points on a dually flat space M .For a point holds.
Proof.We exchange P and R in (34), we can prove corollary2. in the same way as Lemma1.
Theorem 1.Let t be t ∈ R and P, R be points on a dually flat space M .For a point θ holds.
Proof.Taking the sum of (32) and (37), the result follows.
Theorem1.holds for the points P, Q, R on the same straight line and the squared Euclidean distance.
3.3.Generalized expansion formula for the sum of vectors, parallelogram law, polarization identity.For the affine divergence, we show that the generalized law of cosines holds as well as the canonical divergence.We consider about the sum of vectors θ or η, we show that the generalized expansion formula for the sum of vectors, the parallelogram law and the polarization identity hold for the affine divergence.
Definition 3. We define ⟨, ⟩ : Symbol {Q ↔ R} means replacement of R and Q.
holds.For a self-dual space(θ i = η i for all i), this is consistent with a dot product.
We now use the assumption θ(P ) + θ(R) = θ(Q) + θ(S) again.In the same way, we obtain the following relational equation.
Substituting ( 47) and ( 48) to (46), and using (40), we prove that (43) holds.Because the assumption is symmetric with respect to P and R, we exchange P and R in (43) and we prove that (44) holds.Theorem 2. is a generalization of the expansion formula for the squared norm of the sum of vectors.
where (, ) is a dot product and ∥ • ∥ is a Euclidean norm.
The left hand side is the sum of four sides of rectangle P QRS and the right hand side is the sum of diagonal lines of rectangle P QRS.Proof.Applying Corollary3.for points Q, R and S, we have

2(⟨Q, S⟩
Proof. By using (44)and (51), the result follows.Corollary 4 is a generalization of polarization identity, , the following equations holds.53) and taking the sum with (53), we have the result.Corollary 5. is a generalization that the sum of adjacent interior angles of a parallelogram in Euclidean space is π.

ψ and ϕ-divergences
In this section, we newly introduce dual divergences by using the sum of vectors of affine coordinates.We call these divergence "ψ-divergence" and "ϕ-divergence".The ψ and ϕ-divergences can be expressed by potential functions.We show the ψ and ϕ-divergences are a kind of the skew Jensen divergence [7] and show that the ϕdivergence is consistent with the (skew) Jensen-Shannon divergence(JS-divergence) for mixture families.Next, we derive inequalities that holds between ψ or ϕdivergence and the affine divergence, and show that the inequality is consistent with the generalized Lin's inequality for probability distributions of discrete random variable X over X = {0, 1, 2 • • • n}.For propbability distributions p and q, Lin's inequality [6] is where D JS is the JS-divergence and D J (p, q) ≡ D KL (p∥q) + D KL (q∥p) is the Jeffreys divergence mentioned in section 2.
Proposition 4. Let manifold M be a dual flat space and a, b be a, b ∈ R. When points P, Q, R ∈ M satisfy θ(R) = aθ(P ) + bθ(Q) , the following equations holds.
In Proposition 4., substituting a = 1 − α and b = α and using the definition of the α-skew ψ and ϕ-divergences, the result follows.
Because ψ(θ) and ϕ(η) are convex functions, these divergences are a kind of the Jensen divergences [7].Potentials ψ and ϕ are equal to 1 2 ∑ i (θ i ) 2 in a self-dual flat space, ψ and ϕ-divergences are equal to α(1−α) probability distributions, and let probability distribution p η (x) be the mixture family as follows.
For the dually flat space M with the above affine coordinate η, the α-skew ϕdivergence is consistent with the α-skew JS-divergence as follows.
Substituting this equation to (74), we have 76) On the other hand, the α-skew JS-divergence for continuous distribution is This equation holds for discrete distribution, too.Comparing this equation with (76), we have the result.
Theorem 4. Let P, Q be the points in a dually flat space M .For the affine divergence and the α-skew ψ or ϕ-divergence, the following inequality holds. Proof.
Let R be a point which satisfies θ(R) = (1 − α)θ(P ) + αθ(Q).From (63) and using D(P ∥Q) ≤ D A (P, Q), we have Using Theorem 1., we have From this equation, the result follows.The same is true of ϕ-divergence.
Probability mass function p(x) can be expressed as where θ i = ln pi p0 and p 0 = 1 − Hence, p(x) is a mixture family, too.For the mixture family, by Proposition 5., the α-skew ϕ-divergence is consistent with the α-skew JS-divergence.Furthermore, For the exponential family, the affine divergence is consistent with the Jeffreys divergence.By combining these relations with Theorem 4., the result follows.If α = 1 2 , (82) is consistent with Lin's inequality.

Examples
In this section, we show concrete examples of the affine divergence.
5.1.Normal distribution.As representative example of continuous distribution, we think 1-dimentional normal distribution.
The relations between θ, η and σ, µ are We calculate the affine divergence of normal distribution as follows.
For points P, Q, R ∈ M which satisfies θ(R) = aθ(P ) + bθ(Q), an expected value and a variance are Equation( 89) means weighted harmonic mean and µ R is a point to internally divide the straight line µ P µ Q into bσ P 2 : aσ Q 2 .For the point P, Q, R ∈ M which satisfies η(R) = aη(P ) + bη(Q), an expected value and a variance are For these quantities, if a = t, b = 1 − t, t ∈ R, Theorem 1. hods.In the case of a = b = 1, Theorem 2., the generalized parallelogram law(Theorem 3.) and the generalized polarization identity(Corollary 4.) hold.

Binomal distribution.
As representative example of discrete distribution, we think 1-dimentional binomal distribution.
where n is constant and p ∈ [0, 1].The relations between θ, η and p are θ = ln p 1 − p (94) We calculate the affine divergence of binomal distribution as follows.For the point P, Q, R ∈ M which satisfies η(R) = aη(P ) + bη(Q), the parameter p is The above operation is defined only for a, b which satisfies p R ∈ [0, 1].For these quantities, if a = t, b = 1 − t, t ∈ R, Theorem 1. hods.In the case of a = b = 1, Theorem 2., the generalized parallelogram law(Theorem 3.) and the generalized polarization identity(Corollary4.) hold.

Conclusion
We have introduced the affine divergence and studied the properties similar to Euclidean space in a dually flat space.We have shown the affine divergence satisfies semimetric axioms and triangular relations as well as the canonical divergence.We also have shown that both the canonical divergence and the affine divergence increase monotonically along geodesics of affine coordinates.As a property peculiar to the affine divergence on geodesics, we have shown that the same summation formula as the squared Euclidean distance hold for three points on the same geodesic.
In addition, for the sum of vectors in affine coordinate systems, we have shown that the generalized expansion formula, the generalized parallelogram law and the generalized polarization identity hold.
The affine divergence can be expressed by the dual affine coordinates.By this analogy, we have introduced ψ and ϕ-divergence which can be expressed by the potentials.We have derived inequalities between the affine and ψ or ϕ-divergence and have shown the relations between the affine, ϕ-divergence and the KL-divergence, the Jeffreys divergence, the Jensen-Shannon divergence.
It is expected that dually flat spaces further have structures similar to Euclidean space and there are more relations between divergences.

Table 1 .
The summary of the properties of divergencesIn this table, the bold fonts indicate the divergences which can be represented by the affine coordinates or potentials.With the canonical divergence as example, the table indicates the canonical divergence is consistent with the KL-divergence for the exponential families and satisfy generalized the law of cosines as the property similar to Euclidean geometry.The blank cells indicate unknown.