Entropic Measure on Multidimensional Spaces Karl-Theodor Sturm 9 0 0 2 Abstract. WeconstructtheentropicmeasurePβ oncompactmanifoldsofany dimension.ItisdeﬁnedasthepushforwardoftheDirichletprocess(another n randomprobabilitymeasure,well-knowntoexistonspacesofanydimension) a under the conjugation map J 3 C:P(M)→P(M). 1 This conjugation map is a continuous involution. It can be regarded as the canonical extension to higher dimensional spaces of a map between proba- ] R bility measures on 1-dimensional spaces characterized by the fact that the distribution functions of µ and C(µ) are inverse to each other. P We also present an heuristic interpretation of the entropic measure as . h 1 t dPβ(µ)= exp(−β·Ent(µ|m))·dP0(µ). a Z m MathematicsSubjectClassiﬁcation(2000).60G57;28C20;49N90;49Q20;58J65. [ Keywords. Optimal transport, entropic measure, Wasserstein space, entropy, 1 gradient ﬂow, Brenier map, Dirichlet distribution, random probability mea- v sure. 5 1 8 1 . 1 0 9 1. Introduction 0 : Gradient ﬂows of entropy-like functionals on the Wasserstein space turned out v to be a powerful tool in the study of various dissipative PDEs on Euclidean or i X Riemannian spaces M, the prominent example being the heat equation. See e.g. r the monographs [Vi03, AGS05] for more examples and further references. a In [RS08], von Renesse and the author presented an approach to stochastic per- turbation of the gradient ﬂow of the entropy. It is based on the construction of a Dirichlet form (cid:90) (u,u) = u 2(µ) dPβ(µ) E (cid:107)∇ (cid:107) P(M) 2 Karl-Theodor Sturm where u denotes the norm of the gradient in the Wasserstein space (M) as introdu(cid:107)c∇ed(cid:107)by Otto [Ot01]. The fundamental new ingredient was the mePasure Pβ on the Wasserstein space. This so-called entropic measure is an interesting and challenging object in its own right. It is formally introduced as 1 dPβ(µ)= exp( β Ent(µm)) dP0(µ) (1.1) Z − · | · withsome(non-existing)‘uniformdistribution’P0 ontheWassersteinspace (M) P and the relative entropy as a potential. A rigorous construction was presented for 1-dimensional spaces. In the case M = [0,1] it is based on the bijections (x)=µ([0,x]) g=f(−1) g(y)=ν([0,y]) µ ←−−−−−−−→ f ←−−−−−−−−−→ g ←−−−−−−−−→ ν betweenprobability measures, distribution functionsandinverse distribution func- tions (where f(−1)(y) = inf x 0 : f(x) y more precisely denotes the ‘right { ≥ ≥ } inverse’ of f). If C : (M) (M) denotes the map µ ν then the entropic measure Pβ is just thPe push→foPrward under C of the Diri(cid:55)→chlet-Ferguson process Qβ. The latter is a random probability measure which is well-deﬁned on every probability space. Forlongtimeitseemedthatthepreviousconstructionisdeﬁnitivelylimitedtodi- mension1sinceitheavilydependsontheuseofdistributionfunctions(andinverse distributionfunctions),–objectswhichdonotexistinhigherdimensions.Thecru- cialobservationtoovercomethisrestrictionistointerpretg astheuniqueoptimal transport map which pushes forward m (the normalized uniform distribution on M) to µ: µ=g m. ∗ Due to Brenier [Br87] and McCann [Mc01] such a ‘monotone map’ exists for each probabilitymeasureµonaRiemannianmanifoldofarbitrarydimension.Moreover, alsoinhigherdimensionssuchamonotonemapg hasauniquegeneralizedinverse f,againbeingamonotonemap(withgeneralizedinversebeingg).Thisobservation allows to deﬁne the conjugation map C: (M) (M), µ ν P →P (cid:55)→ for any compact manifold M. It is a continuous involution. By means of this map we deﬁne the entropic measure as follows: Pβ :=C Qβ ∗ where Qβ denotes the Dirichlet-Ferguson process on M with intensity measure β m. (Actually, such a random probability measure exists on every probability · space.) In order to justify our deﬁnition of the entropic measure by some heuristic argu- ment let us assume that Pβ were given as in (1.1). The identity Qβ = C Pβ then ∗ Entropic Measure on Multidimensional Spaces 3 deﬁnes a probability measure which satisﬁes 1 dQβ(ν)= exp( β Ent(mν)) dQ0(ν). (1.2) Z − · | · Given a measurable partition M = (cid:83)N M and approximating arbitrary proba- i=1 i bility measures ν by measures with constant density on each of the sets M of the i partition the previous ansatz (1.2) yields – after some manipulations – Qβ (dx) M1,...,MN Γ(β) = xβ·m(M1)−1 ... xβ·m(MN−1)−1 xβ·m(MN)−1 N · 1 · · N−1 · N × (cid:81) Γ(βm(M )) i i=1 δ (dx )dx ...dx . × (1−NP−1xi) N N−1 1 i=1 These are, indeed, the ﬁnite dimensional distributions of the Dirichlet-Ferguson process. 2. Spaces of Convex Functions and Monotone Maps Throughout this paper, M will be a compact subset of a complete Riemannian manifold Mˆ with Riemannian distance d and m will denote a probability measure with support M, absolutely continuous with respect to the volume measure. We assume that it satisﬁes a Poincar´e inequality: c>0 ∃ (cid:90) (cid:90) u2dm c u2dm |∇ | ≥ · M M for all weakly diﬀerentiable u:M R with (cid:82) udm=0. → M For compact Riemannian manifolds, there is a canonical choice for m, namely, thenormalizedRiemannianvolumemeasure.Thefreedomtochoosemarbitrarily might be of advantage in view of future extensions: For Finsler manifolds and for non-compact Riemannian manifolds there is no such canonical probability mea- sure. The main ingredient of our construction below will be the Brenier-McCann repre- sentation of optimal transport in terms of gradients of convex functions. Deﬁnition 2.1. A function ϕ : M R is called d2/2-convex if there exists a function ψ :M R such that → → (cid:20) (cid:21) 1 ϕ(x)= inf d2(x,y)+ψ(y) −y∈M 2 for all x M. In this case, ϕ is called generalized Legendre transform of ψ or ∈ conjugate of ψ and denoted by ϕ=ψc. 4 Karl-Theodor Sturm Let us summarize some of the basic facts on d2/2-convex functions. See [Ro70], [Ru¨96], [Mc01] and [Vi08] for details.1 Lemma 2.2. (i) A function ϕ is d2/2-convex if and only if ϕcc =ϕ. (ii) Every d2/2-convex function is bounded, Lipschitz continuous and diﬀeren- tiable almost everywhere with gradient bounded by D = sup d(x,y). x,y∈M In the sequel, = (M) will denote the set of d2/2-convex functions on M and ˜ = ˜(M)wilKldenoKtethesetofequivalenceclassesin withϕ ϕ iﬀϕ ϕ 1 2 1 2 K K K ∼ − is constant. will be regarded as a subset of the Sobolev space H1(M,m) with K norm (cid:20)(cid:90) (cid:90) (cid:21)1 2 u = u 2 dm+ u2dm H1 (cid:107) (cid:107) |(cid:53) | M M and ˜ = /const will be regarded as a subset of the space H˜1 = H1/const with K K norm (cid:20)(cid:90) (cid:21)1 2 u = u 2 dm . (cid:107) (cid:107)H˜1 |∇ | M Proposition 2.3. For each Borel map g :M M the following are equivalent: → (i) ϕ ˜ :g =exp( ϕ) a.e. on M; ∃ ∈K ∇ (ii) g is an optimal transport map from m to f m in the sense that it is a min- ∗ imizer of h (cid:82) d2(x,h(x))m(dx) among all Borel maps h : M M with (cid:55)→ M → h m=g m. ∗ ∗ In this case, the function ϕ ˜ in (i) is deﬁned uniquely. Moreover, in (ii) the ∈ K map f is the unique minimizer of the given minimization problem. A Borel map g :M M satisfying the properties of the previous proposition will → be called monotone map or optimal Lebesque transport. The set of m-equivalence classes of such maps will be denoted by = (M). Note that (M) does not G G G depend onthechoiceofm(aslongasmisabsolutelycontinuouswithfullsupport)! (M) will be regarded as a subset of the space of maps L2((M,m)(M,d)) with G metric d (f,g)=(cid:2)(cid:82) d2(f(x),g(x))m(dx)(cid:3)12. 2 M Accordingtoourdeﬁnitions,themapΥ:ϕ exp( ϕ)deﬁnesabijectionbetween ˜ and . Recall that = (M) denotes th(cid:55)→e set o∇f probability measures µ on M K G P P (equipped with its Borel σ-ﬁeld). Proposition 2.4. The map χ : g g m deﬁnes a bijection between and (M). ∗ (cid:55)→ G P That is, for each µ there exists a unique g – called Brenier map of µ – ∈ P ∈ G with µ=g m. ∗ 1Afunctionϕisd2/2-convexinoursenseifandonlyifthefunction−ϕisc-concaveinthesense of [Ro70, Ru¨96, Mc01, Vi08] with cost function c(x,y)=d2(x,y)/2. In our presentation, the c standsfor‘conjugate’.Fortherelationbetweend2/2-convexityandusualconvexityonEuclidean spacewerefertochapter4. Entropic Measure on Multidimensional Spaces 5 The map χ of course strongly depends on the choice of the measure m. (If there is any ambiguity we denote it by χ .) m Duetothepreviousobservations,thereexistcanonicalbijectionsΥandχbetween the sets ˜, and . Actually, these bijections are even homeomorphisms with K G P respect to the natural topologies on these spaces. Proposition 2.5. Consider any sequence ϕ in ˜ with corresponding se- { n}n∈N K quences g = Υ(ϕ ) in and µ = χ(g ) in and let { n}n∈N { n }n∈N G { n}n∈N { n }n∈N P ϕ ˜, g =Υ(ϕ) , µ=χ(g) . Then the following are equivalent: ∈K ∈G ∈P (i) ϕ ϕ in H˜ n 1 −→ (ii) g g in L2((M,m),(M,d)) n −→ (iii) g g in m-probability on M n −→ (iv) µ µ in L2-Wasserstein distance d n W −→ (v) µ µ weakly. n −→ Proof. (i) (ii)CompactnessofM andsmoothnessoftheexponentialmapimply ⇔ that there exists δ >0 such that x M, v ,v T M with v , v D and 1 2 x 1 2 ∀ ∈ ∀ ∈ | | | |≤ v v <δ: 1 2 | − | 1 d(exp v ,exp v )/ v v 2. 2 ≤ x 1 x 2 | 1− 2 |TxM≤ Hence, ϕ ϕ in H˜1, that is (cid:82) ϕ (x) ϕ(x) 2 m(dx) 0, is equiv- alentto(cid:82)n −d→2(g (x),g(x))m(dx)M |∇0,nthat−is,∇tog |TxMg inL2((−M→,m),(M,d)). M n −→ n −→ (ii) (iii) Standard fact from integration theory (taking into account that ⇔ d(g ,g) is uniformly bounded due to compactness of M). n (ii) (iv) If µ =(g ) m and µ =g m then (g ,g) m is a coupling of µ and n n ∗ n ∗ n ∗ n ⇔ µ. Hence, (cid:90) d2 (µ ,µ) d2(g (x),g(x))m(dx). (2.1) W n ≤ n M (iv) (v) Trivial. ⇔ (ii) (iv) [Vi08], Corollary 5.21. ⇔ (cid:3) Remark 2.6. Since M is compact, assertion (ii) of the previous Proposition is equivalent to (iii’) g g in Lp((M,m),(M,d)) n −→ for any p [1, ) and similarly, assertion (iv) is equivalent to ∈ ∞ (iv’) µ µ in Lp-Wasserstein distance. n −→ Remark 2.7. In n = 1, the inequality in (2.1) is actually an equality. In other words, the map χ:( ,d ) ( ,d ) 2 W G → P is an isometry. This is no longer true in higher dimensions. The well-known fact (Prohorov’s theorem) that the space of probability measures onacompactspaceisitselfcompact,togetherwiththepreviouscontinuityresults immediately implies compactness of ˜ and . K G 6 Karl-Theodor Sturm Corollary 2.8. (i) ˜ is a compact subset of H˜1. K (ii) is a compact subset of L2((M,m),(M,d)). G 3. The Conjugation Map LetusrecallthedeﬁnitionoftheconjugationmapC :ϕ ϕc actingonfunctions K ϕ:M R as follows (cid:55)→ → (cid:20) (cid:21) 1 ϕc(x)= inf d2(x,y)+ϕ(y) . −y∈M 2 The map C maps bijective onto itself with C2 =Id. For each λ R, C (ϕ+ λ) = C (ϕ)K λ. HeKnce, C extends to a bijectioKn C : ˜ ˜. Co∈mposinKg this K − K K˜ K → K map with the bijections χ: and Υ: ˜ we obtain involutive bijections G →P K→G C =Υ C Υ−1 : G ◦ K˜ ◦ G →G and C =χ C χ−1 : , P G ◦ ◦ P →P called conjugation map on or on , respectively. Given a monotone map g , G P ∈G the monotone map gc :=C (g) G will be called conjugate map or generalized inverse map; given a probability mea- sure µ the probability measure ∈P µc :=C (µ) P will be called conjugate measure. Example 3.1. (i) Let M =Sn be the n-dimensional sphere, and m be the normal- ized Riemannian volume measure. Put µ=λδ +(1 λ)m a − for some point a M and λ ]0,1[. Then ∈ ∈ 1 µc = 1 m 1 λ M\Br(a)· − where r >0 is such that m(B (a))=λ. r [ Proof. The optimal transport map g = exp(∇ϕ) which pushes m to µ is determined by the d2/2-convexfunction ϕ=( 122(ˆπrr−2r−)ˆdd22((aa,(cid:48)x,x)˜)−(π−r)2˜ iinnBBrπ(−ar)(a(cid:48))=M\Br(a) Itsconjugateisthefunction r 1 ϕc(y)=− d2(a(cid:48),y)+ r(π−r). ] 2π 2 Entropic Measure on Multidimensional Spaces 7 λδ a (1 λ)m − m µ ϕ ∇ m µc ϕc ∇ (ii) Let M = Sn, the n-dimensional sphere, and µ = δ for some a M. Then a ∈ µc =δ with a(cid:48) M being the antipodal point of a. a(cid:48) [ Proof. Limit of (i)∈as λ(cid:37)1. Alternatively: explicit calculations with ϕ(x)= 1[π2−d2(a,x)] 2 and „ 1 1 1 « 1 ϕc(y)=sup − d2(x,y)+ d2(a,x)− π2 =− d2(a(cid:48),y). ] x 2 2 2 2 (iii) Let M = Sn, the n-dimensional sphere, and µ = 1δ + 1δ with north and 2 a 2 a(cid:48) south pole a,a(cid:48) M. Then µc is the uniform distribution on the equator, the ∈ (n 1)-dimensional set Z of points of equal distance to a,a(cid:48). − (iv) Let M =S1 be the circle of length 1, m = uniform distribution and k (cid:88) µ= α δ i xi i=1 with points x <x <...<x <x in cyclic order on S1 and numbers α [0,1], 1 2 k 1 i (cid:80) ∈ α =1. Then i k (cid:88) µc = β δ i yi i=1 with β = x x and points y <y <...<y <y =y on S1 satisfying i i+1 i 1 2 k k+1 1 | − | y y =α . i+1 i i+1 | − | [Proof.EmbeddinginR1 andexplicitcalculationofdistributionandinversedistributionfunc- tions.] Remark 3.2. The conjugation map C : P P →P 8 Karl-Theodor Sturm depends on the choice of the reference measure m on M. Actually, we can choose two diﬀerent probability measures m , m and consider C =χ C χ−1. 1 2 P m2 ◦ G ◦ m1 Proposition 3.3. Let µ = g m be absolutely continuous with density η = dµ. ∗ ∈ P dm Put f =gc and ν =f m=µc. ∗ (i)Ifη >0a.s.thenthemeasureν isabsolutelycontinuouswithdensityρ= dν > dm 0 satisfying η(x) ρ(f(x))=ρ(x) η(g(x))=1 for a.e. x M. · · ∈ (ii) If ν is absolutely continuous then f(g(x))=g(f(x))=x for a.e. x M. ∈ (iii) Under the previous assumption the Jacobian detDf(x) and detDg(x) exist for almost every x M and satisfy ∈ detDf(g(x)) detDg(x)=detDf(x) detDg(f(x))=1, · · σ(x) η(x)=σ(f(x)) detDf(x), σ(x) ρ(x)=σ(g(x)) detDg(x) · · · · for almost every x M where σ = dm denotes the density of the reference ∈ dvol measure m with respect to the Riemannian volume measure vol. Proof. (i) For each Borel function v :M R + → (cid:90) (cid:90) (cid:90) 1 (cid:90) 1 (cid:90) 1 vdν = v fdm= v f dµ= v f dµ= v dm. ◦ ◦ ·η ◦ ·η(g f) ·η g M M M M ◦ M ◦ Hence,νisabsolutelycontinuouswithrespecttomwithdensity 1 .Interchanging η◦g the roles of µ and ν (as well as f and g) yields the second claim. (ii), (iii) Part of Brenier- McCann representation result of optimal transports. (cid:3) Corollary 3.4. Under the assumption η >0 of the previous Proposition: Ent(µc m)=Ent(m µ). | | Proof. With notations from above (cid:90) (cid:90) 1 1 (cid:90) 1 1 Ent(µc m)= ρlogρdm= log dm= log dµ=Ent(m µ). | η g η g η η | ◦ ◦ (cid:3) Lemma 3.5. The conjugation map C : K K→K is continuous. Proof. TosimplifynotationdenoteC byC.Chooseacountabledenseset y K { i}i∈N in M and for k N deﬁne C : ϕ ϕc on by ϕc(x) = inf [1d2(x,y )+ ∈ k (cid:55)→ k K k −i=1,...,k 2 i (ϕ(y )]. Then as k i →∞ ϕc ϕc pointwise on M. k (cid:37) Recall that each ϕ is Lipschitz continuous with Lipschitz constant D. ∈K Entropic Measure on Multidimensional Spaces 9 Foreachε>0choosek =k(ε) Nsuchthattheset y isanε-covering ∈ { i}i=1,...,k(ε) of the compact space M. Then 1 1 C (ϕ)(x) C(ϕ)(x) sup inf d2(x,y) d2(x,y )+ϕ(y) ϕ(y ) | k − |≤y∈Mi=1,...,k| 2 − 2 i − i | sup inf 2D d(y,y ) 2Dε uniformly in x M and ϕ . i ≤y∈Mi=1,...,k · ≤ ∈ ∈K Now let us consider a sequence (ϕl)l∈N in with ϕl ϕ in H1(M). Then for each k N as l K → ∈ →∞ C (ϕ ) C (ϕ) k l k → pointwise on M and thus also in L2(M). Together with the previous uniform convergence of C C it implies k → C(ϕ ) C(ϕ) l → in L2(M) as l . Moreover, we know that C(ϕ ) is bounded in H1(M) → ∞ { l }l∈N (since all gradients are bounded by D). Therefore, ﬁnally C(ϕ ) C(ϕ) l → in H1(M) as l . This proves the continuity of C: with respect to the H1-norm. →∞ K→K (cid:3) Theorem 3.6. The conjugation map C : P P →P is continuous (with respect to the weak topology). Proof. Let us ﬁrst prove continuity of the conjugation map C : ˜ ˜ (with K˜ K → K respect to the H˜1-norm on ˜). Indeed, this follows from the previous continuity result together with the factKs that the embedding H1 H˜1, ϕ ϕ˜ = ϕ+c : c R is continuous (trivial fact) and that the map H˜→1 H1, ϕ˜(cid:55)→= ϕ+{c : c R∈ }ϕ (cid:82) ϕdm is continuous (consequence of Poincar→´e inequality){. ∈ }(cid:55)→ − M Thisinturnimplies,duetoProposition2.5,thattheconjugationmapC : G G →G is continuous (with respect to the L2-metric on ). Moreover, due to the same G Proposition it therefore also implies that the conjugation map C : P P →P is continuous (with respect to the weak topology). (cid:3) Remark 3.7. In dimension n = 1, the conjugation map C : is even an G G → G isometry from , equipped with the L1-metric, into itself. G 10 Karl-Theodor Sturm 4. Example: The Conjugation Map on M Rn ⊂ Inthischapter,wewillstudyindetailtheEuclideancase.WeassumethatM isa compact convex subset of Rn. (The convexity assumption is to simplify notations and results.) The probability measure m is assumed to be absolutely continuous with full support on M. A function ϕ : M R is d2/2-convex if and only if the function ϕ (x) = ϕ(x)+ 1 → x2/2 is convex in the usual sense: | | ϕ (λx+(1 λ)y) λϕ (x)+(1 λ)ϕ (y) 1 1 1 − ≤ − (for all x,y M and λ [0,1]) and if its subdiﬀerential lies in M: ∈ ∈ ∂ϕ (x) M 1 ⊂ for all x M. ∈ Afunctionψ istheconjugateofϕifandonlyifthefunctionψ (y)=ψ(y)+ y 2/2 1 | | is the Legendre-Fenchel transform of ϕ : 1 ψ (y)= sup [ x,y ϕ (x)]. 1 1 (cid:104) (cid:105)− x∈M A Borel map g :M M is monotone if and only if → g(x) g(y),x y 0 (cid:104) − − (cid:105)≥ for a.e. x,y M. Equivalently, g is monotone if and only if g = ϕ for some 1 convex ϕ :M∈ R. ∇ 1 → Lemma 4.1. (i) If µ=λδ +(1 λ)ν then there exists an open convex set U M z − ⊂ with m(U)=λ such that the optimal transport map g with g m=µ satisﬁes g z ∗ ≡ a.e. on U. (ii) The conjugate measure µc does not charge U: µc(U)=0. Proof. (i) Linearity of the problem allows to assume that z = 0. Let g = ϕ 1 ∇ denote the optimal transport map with ϕ being an appropriate convex function. 1 Let V be the subset of points in M in which ϕ is weakly diﬀerentiable with 1 vanishinggradient.Bythepushforwardpropertyitfollowsthatm(V)=λ.Firstly, then convexity of ϕ implies that ϕ has to be constant on V, say ϕ α on V. 1 1 1 ≡ Secondly, the latter implies that ϕ α on the convex hull W of V. The interior 1 ≡ U of this convex set W has volume m(U) = m(W) m(V) = λ and ϕ is 1 ≥ constant on U, hence, diﬀerentiable with vanishing gradient. Thus ﬁnally U V ⊂ and m(U)=λ. (ii) Let µ , (cid:15) [0,1], denote the intermediate points on the geodesic from µ =µ (cid:15) 0 ∈ to µ =m. Then µ =(g ) m with g =exp((1 (cid:15)) ϕ)=(cid:15) Id+(1 (cid:15)) g and 1 (cid:15) (cid:15) ∗ (cid:15) − ∇ · − · each µ is absolutely continuous w.r. to m. Hence, gc =g−1 a.e. on M. Therefore, (cid:15) (cid:15) (cid:15) the conjugate measure µc satisﬁes (cid:15) µc(U)=m(cid:0)(gc)−1(U)(cid:1)=m(g (U))=(cid:15)n m(U)=(cid:15)n λ. (cid:15) (cid:15) (cid:15) · ·

Entropic Measure on Multidimensional Spaces PDF

0.32 MB

English

by Karl-Theodor Sturm

#additional_collections #journals #arxiv

Checking for file health...

Preview Entropic Measure on Multidimensional Spaces

Entropic Measure on Multidimensional Spaces Karl-Theodor Sturm 9 0 0 2 Abstract. WeconstructtheentropicmeasurePβ oncompactmanifoldsofany dimension.ItisdeﬁnedasthepushforwardoftheDirichletprocess(another n randomprobabilitymeasure,well-knowntoexistonspacesofanydimension) a under the conjugation map J 3 C:P(M)→P(M). 1 This conjugation map is a continuous involution. It can be regarded as the canonical extension to higher dimensional spaces of a map between proba- ] R bility measures on 1-dimensional spaces characterized by the fact that the distribution functions of µ and C(µ) are inverse to each other. P We also present an heuristic interpretation of the entropic measure as . h 1 t dPβ(µ)= exp(−β·Ent(µ|m))·dP0(µ). a Z m MathematicsSubjectClassiﬁcation(2000).60G57;28C20;49N90;49Q20;58J65. [ Keywords. Optimal transport, entropic measure, Wasserstein space, entropy, 1 gradient ﬂow, Brenier map, Dirichlet distribution, random probability mea- v sure. 5 1 8 1 . 1 0 9 1. Introduction 0 : Gradient ﬂows of entropy-like functionals on the Wasserstein space turned out v to be a powerful tool in the study of various dissipative PDEs on Euclidean or i X Riemannian spaces M, the prominent example being the heat equation. See e.g. r the monographs [Vi03, AGS05] for more examples and further references. a In [RS08], von Renesse and the author presented an approach to stochastic per- turbation of the gradient ﬂow of the entropy. It is based on the construction of a Dirichlet form (cid:90) (u,u) = u 2(µ) dPβ(µ) E (cid:107)∇ (cid:107) P(M) 2 Karl-Theodor Sturm where u denotes the norm of the gradient in the Wasserstein space (M) as introdu(cid:107)c∇ed(cid:107)by Otto [Ot01]. The fundamental new ingredient was the mePasure Pβ on the Wasserstein space. This so-called entropic measure is an interesting and challenging object in its own right. It is formally introduced as 1 dPβ(µ)= exp( β Ent(µm)) dP0(µ) (1.1) Z − · | · withsome(non-existing)‘uniformdistribution’P0 ontheWassersteinspace (M) P and the relative entropy as a potential. A rigorous construction was presented for 1-dimensional spaces. In the case M = [0,1] it is based on the bijections (x)=µ([0,x]) g=f(−1) g(y)=ν([0,y]) µ ←−−−−−−−→ f ←−−−−−−−−−→ g ←−−−−−−−−→ ν betweenprobability measures, distribution functionsandinverse distribution func- tions (where f(−1)(y) = inf x 0 : f(x) y more precisely denotes the ‘right { ≥ ≥ } inverse’ of f). If C : (M) (M) denotes the map µ ν then the entropic measure Pβ is just thPe push→foPrward under C of the Diri(cid:55)→chlet-Ferguson process Qβ. The latter is a random probability measure which is well-deﬁned on every probability space. Forlongtimeitseemedthatthepreviousconstructionisdeﬁnitivelylimitedtodi- mension1sinceitheavilydependsontheuseofdistributionfunctions(andinverse distributionfunctions),–objectswhichdonotexistinhigherdimensions.Thecru- cialobservationtoovercomethisrestrictionistointerpretg astheuniqueoptimal transport map which pushes forward m (the normalized uniform distribution on M) to µ: µ=g m. ∗ Due to Brenier [Br87] and McCann [Mc01] such a ‘monotone map’ exists for each probabilitymeasureµonaRiemannianmanifoldofarbitrarydimension.Moreover, alsoinhigherdimensionssuchamonotonemapg hasauniquegeneralizedinverse f,againbeingamonotonemap(withgeneralizedinversebeingg).Thisobservation allows to deﬁne the conjugation map C: (M) (M), µ ν P →P (cid:55)→ for any compact manifold M. It is a continuous involution. By means of this map we deﬁne the entropic measure as follows: Pβ :=C Qβ ∗ where Qβ denotes the Dirichlet-Ferguson process on M with intensity measure β m. (Actually, such a random probability measure exists on every probability · space.) In order to justify our deﬁnition of the entropic measure by some heuristic argu- ment let us assume that Pβ were given as in (1.1). The identity Qβ = C Pβ then ∗ Entropic Measure on Multidimensional Spaces 3 deﬁnes a probability measure which satisﬁes 1 dQβ(ν)= exp( β Ent(mν)) dQ0(ν). (1.2) Z − · | · Given a measurable partition M = (cid:83)N M and approximating arbitrary proba- i=1 i bility measures ν by measures with constant density on each of the sets M of the i partition the previous ansatz (1.2) yields – after some manipulations – Qβ (dx) M1,...,MN Γ(β) = xβ·m(M1)−1 ... xβ·m(MN−1)−1 xβ·m(MN)−1 N · 1 · · N−1 · N × (cid:81) Γ(βm(M )) i i=1 δ (dx )dx ...dx . × (1−NP−1xi) N N−1 1 i=1 These are, indeed, the ﬁnite dimensional distributions of the Dirichlet-Ferguson process. 2. Spaces of Convex Functions and Monotone Maps Throughout this paper, M will be a compact subset of a complete Riemannian manifold Mˆ with Riemannian distance d and m will denote a probability measure with support M, absolutely continuous with respect to the volume measure. We assume that it satisﬁes a Poincar´e inequality: c>0 ∃ (cid:90) (cid:90) u2dm c u2dm |∇ | ≥ · M M for all weakly diﬀerentiable u:M R with (cid:82) udm=0. → M For compact Riemannian manifolds, there is a canonical choice for m, namely, thenormalizedRiemannianvolumemeasure.Thefreedomtochoosemarbitrarily might be of advantage in view of future extensions: For Finsler manifolds and for non-compact Riemannian manifolds there is no such canonical probability mea- sure. The main ingredient of our construction below will be the Brenier-McCann repre- sentation of optimal transport in terms of gradients of convex functions. Deﬁnition 2.1. A function ϕ : M R is called d2/2-convex if there exists a function ψ :M R such that → → (cid:20) (cid:21) 1 ϕ(x)= inf d2(x,y)+ψ(y) −y∈M 2 for all x M. In this case, ϕ is called generalized Legendre transform of ψ or ∈ conjugate of ψ and denoted by ϕ=ψc. 4 Karl-Theodor Sturm Let us summarize some of the basic facts on d2/2-convex functions. See [Ro70], [Ru¨96], [Mc01] and [Vi08] for details.1 Lemma 2.2. (i) A function ϕ is d2/2-convex if and only if ϕcc =ϕ. (ii) Every d2/2-convex function is bounded, Lipschitz continuous and diﬀeren- tiable almost everywhere with gradient bounded by D = sup d(x,y). x,y∈M In the sequel, = (M) will denote the set of d2/2-convex functions on M and ˜ = ˜(M)wilKldenoKtethesetofequivalenceclassesin withϕ ϕ iﬀϕ ϕ 1 2 1 2 K K K ∼ − is constant. will be regarded as a subset of the Sobolev space H1(M,m) with K norm (cid:20)(cid:90) (cid:90) (cid:21)1 2 u = u 2 dm+ u2dm H1 (cid:107) (cid:107) |(cid:53) | M M and ˜ = /const will be regarded as a subset of the space H˜1 = H1/const with K K norm (cid:20)(cid:90) (cid:21)1 2 u = u 2 dm . (cid:107) (cid:107)H˜1 |∇ | M Proposition 2.3. For each Borel map g :M M the following are equivalent: → (i) ϕ ˜ :g =exp( ϕ) a.e. on M; ∃ ∈K ∇ (ii) g is an optimal transport map from m to f m in the sense that it is a min- ∗ imizer of h (cid:82) d2(x,h(x))m(dx) among all Borel maps h : M M with (cid:55)→ M → h m=g m. ∗ ∗ In this case, the function ϕ ˜ in (i) is deﬁned uniquely. Moreover, in (ii) the ∈ K map f is the unique minimizer of the given minimization problem. A Borel map g :M M satisfying the properties of the previous proposition will → be called monotone map or optimal Lebesque transport. The set of m-equivalence classes of such maps will be denoted by = (M). Note that (M) does not G G G depend onthechoiceofm(aslongasmisabsolutelycontinuouswithfullsupport)! (M) will be regarded as a subset of the space of maps L2((M,m)(M,d)) with G metric d (f,g)=(cid:2)(cid:82) d2(f(x),g(x))m(dx)(cid:3)12. 2 M Accordingtoourdeﬁnitions,themapΥ:ϕ exp( ϕ)deﬁnesabijectionbetween ˜ and . Recall that = (M) denotes th(cid:55)→e set o∇f probability measures µ on M K G P P (equipped with its Borel σ-ﬁeld). Proposition 2.4. The map χ : g g m deﬁnes a bijection between and (M). ∗ (cid:55)→ G P That is, for each µ there exists a unique g – called Brenier map of µ – ∈ P ∈ G with µ=g m. ∗ 1Afunctionϕisd2/2-convexinoursenseifandonlyifthefunction−ϕisc-concaveinthesense of [Ro70, Ru¨96, Mc01, Vi08] with cost function c(x,y)=d2(x,y)/2. In our presentation, the c standsfor‘conjugate’.Fortherelationbetweend2/2-convexityandusualconvexityonEuclidean spacewerefertochapter4. Entropic Measure on Multidimensional Spaces 5 The map χ of course strongly depends on the choice of the measure m. (If there is any ambiguity we denote it by χ .) m Duetothepreviousobservations,thereexistcanonicalbijectionsΥandχbetween the sets ˜, and . Actually, these bijections are even homeomorphisms with K G P respect to the natural topologies on these spaces. Proposition 2.5. Consider any sequence ϕ in ˜ with corresponding se- { n}n∈N K quences g = Υ(ϕ ) in and µ = χ(g ) in and let { n}n∈N { n }n∈N G { n}n∈N { n }n∈N P ϕ ˜, g =Υ(ϕ) , µ=χ(g) . Then the following are equivalent: ∈K ∈G ∈P (i) ϕ ϕ in H˜ n 1 −→ (ii) g g in L2((M,m),(M,d)) n −→ (iii) g g in m-probability on M n −→ (iv) µ µ in L2-Wasserstein distance d n W −→ (v) µ µ weakly. n −→ Proof. (i) (ii)CompactnessofM andsmoothnessoftheexponentialmapimply ⇔ that there exists δ >0 such that x M, v ,v T M with v , v D and 1 2 x 1 2 ∀ ∈ ∀ ∈ | | | |≤ v v <δ: 1 2 | − | 1 d(exp v ,exp v )/ v v 2. 2 ≤ x 1 x 2 | 1− 2 |TxM≤ Hence, ϕ ϕ in H˜1, that is (cid:82) ϕ (x) ϕ(x) 2 m(dx) 0, is equiv- alentto(cid:82)n −d→2(g (x),g(x))m(dx)M |∇0,nthat−is,∇tog |TxMg inL2((−M→,m),(M,d)). M n −→ n −→ (ii) (iii) Standard fact from integration theory (taking into account that ⇔ d(g ,g) is uniformly bounded due to compactness of M). n (ii) (iv) If µ =(g ) m and µ =g m then (g ,g) m is a coupling of µ and n n ∗ n ∗ n ∗ n ⇔ µ. Hence, (cid:90) d2 (µ ,µ) d2(g (x),g(x))m(dx). (2.1) W n ≤ n M (iv) (v) Trivial. ⇔ (ii) (iv) [Vi08], Corollary 5.21. ⇔ (cid:3) Remark 2.6. Since M is compact, assertion (ii) of the previous Proposition is equivalent to (iii’) g g in Lp((M,m),(M,d)) n −→ for any p [1, ) and similarly, assertion (iv) is equivalent to ∈ ∞ (iv’) µ µ in Lp-Wasserstein distance. n −→ Remark 2.7. In n = 1, the inequality in (2.1) is actually an equality. In other words, the map χ:( ,d ) ( ,d ) 2 W G → P is an isometry. This is no longer true in higher dimensions. The well-known fact (Prohorov’s theorem) that the space of probability measures onacompactspaceisitselfcompact,togetherwiththepreviouscontinuityresults immediately implies compactness of ˜ and . K G 6 Karl-Theodor Sturm Corollary 2.8. (i) ˜ is a compact subset of H˜1. K (ii) is a compact subset of L2((M,m),(M,d)). G 3. The Conjugation Map LetusrecallthedeﬁnitionoftheconjugationmapC :ϕ ϕc actingonfunctions K ϕ:M R as follows (cid:55)→ → (cid:20) (cid:21) 1 ϕc(x)= inf d2(x,y)+ϕ(y) . −y∈M 2 The map C maps bijective onto itself with C2 =Id. For each λ R, C (ϕ+ λ) = C (ϕ)K λ. HeKnce, C extends to a bijectioKn C : ˜ ˜. Co∈mposinKg this K − K K˜ K → K map with the bijections χ: and Υ: ˜ we obtain involutive bijections G →P K→G C =Υ C Υ−1 : G ◦ K˜ ◦ G →G and C =χ C χ−1 : , P G ◦ ◦ P →P called conjugation map on or on , respectively. Given a monotone map g , G P ∈G the monotone map gc :=C (g) G will be called conjugate map or generalized inverse map; given a probability mea- sure µ the probability measure ∈P µc :=C (µ) P will be called conjugate measure. Example 3.1. (i) Let M =Sn be the n-dimensional sphere, and m be the normal- ized Riemannian volume measure. Put µ=λδ +(1 λ)m a − for some point a M and λ ]0,1[. Then ∈ ∈ 1 µc = 1 m 1 λ M\Br(a)· − where r >0 is such that m(B (a))=λ. r [ Proof. The optimal transport map g = exp(∇ϕ) which pushes m to µ is determined by the d2/2-convexfunction ϕ=( 122(ˆπrr−2r−)ˆdd22((aa,(cid:48)x,x)˜)−(π−r)2˜ iinnBBrπ(−ar)(a(cid:48))=M\Br(a) Itsconjugateisthefunction r 1 ϕc(y)=− d2(a(cid:48),y)+ r(π−r). ] 2π 2 Entropic Measure on Multidimensional Spaces 7 λδ a (1 λ)m − m µ ϕ ∇ m µc ϕc ∇ (ii) Let M = Sn, the n-dimensional sphere, and µ = δ for some a M. Then a ∈ µc =δ with a(cid:48) M being the antipodal point of a. a(cid:48) [ Proof. Limit of (i)∈as λ(cid:37)1. Alternatively: explicit calculations with ϕ(x)= 1[π2−d2(a,x)] 2 and „ 1 1 1 « 1 ϕc(y)=sup − d2(x,y)+ d2(a,x)− π2 =− d2(a(cid:48),y). ] x 2 2 2 2 (iii) Let M = Sn, the n-dimensional sphere, and µ = 1δ + 1δ with north and 2 a 2 a(cid:48) south pole a,a(cid:48) M. Then µc is the uniform distribution on the equator, the ∈ (n 1)-dimensional set Z of points of equal distance to a,a(cid:48). − (iv) Let M =S1 be the circle of length 1, m = uniform distribution and k (cid:88) µ= α δ i xi i=1 with points x <x <...<x <x in cyclic order on S1 and numbers α [0,1], 1 2 k 1 i (cid:80) ∈ α =1. Then i k (cid:88) µc = β δ i yi i=1 with β = x x and points y <y <...<y <y =y on S1 satisfying i i+1 i 1 2 k k+1 1 | − | y y =α . i+1 i i+1 | − | [Proof.EmbeddinginR1 andexplicitcalculationofdistributionandinversedistributionfunc- tions.] Remark 3.2. The conjugation map C : P P →P 8 Karl-Theodor Sturm depends on the choice of the reference measure m on M. Actually, we can choose two diﬀerent probability measures m , m and consider C =χ C χ−1. 1 2 P m2 ◦ G ◦ m1 Proposition 3.3. Let µ = g m be absolutely continuous with density η = dµ. ∗ ∈ P dm Put f =gc and ν =f m=µc. ∗ (i)Ifη >0a.s.thenthemeasureν isabsolutelycontinuouswithdensityρ= dν > dm 0 satisfying η(x) ρ(f(x))=ρ(x) η(g(x))=1 for a.e. x M. · · ∈ (ii) If ν is absolutely continuous then f(g(x))=g(f(x))=x for a.e. x M. ∈ (iii) Under the previous assumption the Jacobian detDf(x) and detDg(x) exist for almost every x M and satisfy ∈ detDf(g(x)) detDg(x)=detDf(x) detDg(f(x))=1, · · σ(x) η(x)=σ(f(x)) detDf(x), σ(x) ρ(x)=σ(g(x)) detDg(x) · · · · for almost every x M where σ = dm denotes the density of the reference ∈ dvol measure m with respect to the Riemannian volume measure vol. Proof. (i) For each Borel function v :M R + → (cid:90) (cid:90) (cid:90) 1 (cid:90) 1 (cid:90) 1 vdν = v fdm= v f dµ= v f dµ= v dm. ◦ ◦ ·η ◦ ·η(g f) ·η g M M M M ◦ M ◦ Hence,νisabsolutelycontinuouswithrespecttomwithdensity 1 .Interchanging η◦g the roles of µ and ν (as well as f and g) yields the second claim. (ii), (iii) Part of Brenier- McCann representation result of optimal transports. (cid:3) Corollary 3.4. Under the assumption η >0 of the previous Proposition: Ent(µc m)=Ent(m µ). | | Proof. With notations from above (cid:90) (cid:90) 1 1 (cid:90) 1 1 Ent(µc m)= ρlogρdm= log dm= log dµ=Ent(m µ). | η g η g η η | ◦ ◦ (cid:3) Lemma 3.5. The conjugation map C : K K→K is continuous. Proof. TosimplifynotationdenoteC byC.Chooseacountabledenseset y K { i}i∈N in M and for k N deﬁne C : ϕ ϕc on by ϕc(x) = inf [1d2(x,y )+ ∈ k (cid:55)→ k K k −i=1,...,k 2 i (ϕ(y )]. Then as k i →∞ ϕc ϕc pointwise on M. k (cid:37) Recall that each ϕ is Lipschitz continuous with Lipschitz constant D. ∈K Entropic Measure on Multidimensional Spaces 9 Foreachε>0choosek =k(ε) Nsuchthattheset y isanε-covering ∈ { i}i=1,...,k(ε) of the compact space M. Then 1 1 C (ϕ)(x) C(ϕ)(x) sup inf d2(x,y) d2(x,y )+ϕ(y) ϕ(y ) | k − |≤y∈Mi=1,...,k| 2 − 2 i − i | sup inf 2D d(y,y ) 2Dε uniformly in x M and ϕ . i ≤y∈Mi=1,...,k · ≤ ∈ ∈K Now let us consider a sequence (ϕl)l∈N in with ϕl ϕ in H1(M). Then for each k N as l K → ∈ →∞ C (ϕ ) C (ϕ) k l k → pointwise on M and thus also in L2(M). Together with the previous uniform convergence of C C it implies k → C(ϕ ) C(ϕ) l → in L2(M) as l . Moreover, we know that C(ϕ ) is bounded in H1(M) → ∞ { l }l∈N (since all gradients are bounded by D). Therefore, ﬁnally C(ϕ ) C(ϕ) l → in H1(M) as l . This proves the continuity of C: with respect to the H1-norm. →∞ K→K (cid:3) Theorem 3.6. The conjugation map C : P P →P is continuous (with respect to the weak topology). Proof. Let us ﬁrst prove continuity of the conjugation map C : ˜ ˜ (with K˜ K → K respect to the H˜1-norm on ˜). Indeed, this follows from the previous continuity result together with the factKs that the embedding H1 H˜1, ϕ ϕ˜ = ϕ+c : c R is continuous (trivial fact) and that the map H˜→1 H1, ϕ˜(cid:55)→= ϕ+{c : c R∈ }ϕ (cid:82) ϕdm is continuous (consequence of Poincar→´e inequality){. ∈ }(cid:55)→ − M Thisinturnimplies,duetoProposition2.5,thattheconjugationmapC : G G →G is continuous (with respect to the L2-metric on ). Moreover, due to the same G Proposition it therefore also implies that the conjugation map C : P P →P is continuous (with respect to the weak topology). (cid:3) Remark 3.7. In dimension n = 1, the conjugation map C : is even an G G → G isometry from , equipped with the L1-metric, into itself. G 10 Karl-Theodor Sturm 4. Example: The Conjugation Map on M Rn ⊂ Inthischapter,wewillstudyindetailtheEuclideancase.WeassumethatM isa compact convex subset of Rn. (The convexity assumption is to simplify notations and results.) The probability measure m is assumed to be absolutely continuous with full support on M. A function ϕ : M R is d2/2-convex if and only if the function ϕ (x) = ϕ(x)+ 1 → x2/2 is convex in the usual sense: | | ϕ (λx+(1 λ)y) λϕ (x)+(1 λ)ϕ (y) 1 1 1 − ≤ − (for all x,y M and λ [0,1]) and if its subdiﬀerential lies in M: ∈ ∈ ∂ϕ (x) M 1 ⊂ for all x M. ∈ Afunctionψ istheconjugateofϕifandonlyifthefunctionψ (y)=ψ(y)+ y 2/2 1 | | is the Legendre-Fenchel transform of ϕ : 1 ψ (y)= sup [ x,y ϕ (x)]. 1 1 (cid:104) (cid:105)− x∈M A Borel map g :M M is monotone if and only if → g(x) g(y),x y 0 (cid:104) − − (cid:105)≥ for a.e. x,y M. Equivalently, g is monotone if and only if g = ϕ for some 1 convex ϕ :M∈ R. ∇ 1 → Lemma 4.1. (i) If µ=λδ +(1 λ)ν then there exists an open convex set U M z − ⊂ with m(U)=λ such that the optimal transport map g with g m=µ satisﬁes g z ∗ ≡ a.e. on U. (ii) The conjugate measure µc does not charge U: µc(U)=0. Proof. (i) Linearity of the problem allows to assume that z = 0. Let g = ϕ 1 ∇ denote the optimal transport map with ϕ being an appropriate convex function. 1 Let V be the subset of points in M in which ϕ is weakly diﬀerentiable with 1 vanishinggradient.Bythepushforwardpropertyitfollowsthatm(V)=λ.Firstly, then convexity of ϕ implies that ϕ has to be constant on V, say ϕ α on V. 1 1 1 ≡ Secondly, the latter implies that ϕ α on the convex hull W of V. The interior 1 ≡ U of this convex set W has volume m(U) = m(W) m(V) = λ and ϕ is 1 ≥ constant on U, hence, diﬀerentiable with vanishing gradient. Thus ﬁnally U V ⊂ and m(U)=λ. (ii) Let µ , (cid:15) [0,1], denote the intermediate points on the geodesic from µ =µ (cid:15) 0 ∈ to µ =m. Then µ =(g ) m with g =exp((1 (cid:15)) ϕ)=(cid:15) Id+(1 (cid:15)) g and 1 (cid:15) (cid:15) ∗ (cid:15) − ∇ · − · each µ is absolutely continuous w.r. to m. Hence, gc =g−1 a.e. on M. Therefore, (cid:15) (cid:15) (cid:15) the conjugate measure µc satisﬁes (cid:15) µc(U)=m(cid:0)(gc)−1(U)(cid:1)=m(g (U))=(cid:15)n m(U)=(cid:15)n λ. (cid:15) (cid:15) (cid:15) · ·

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.