loading

Logout succeed

Logout succeed. See you again!

ebook img

Entropic Measure on Multidimensional Spaces PDF

file size0.32 MB
languageEnglish

Preview Entropic Measure on Multidimensional Spaces

Entropic Measure on Multidimensional Spaces Karl-Theodor Sturm 9 0 0 2 Abstract. WeconstructtheentropicmeasurePβ oncompactmanifoldsofany dimension.ItisdefinedasthepushforwardoftheDirichletprocess(another n randomprobabilitymeasure,well-knowntoexistonspacesofanydimension) a under the conjugation map J 3 C:P(M)→P(M). 1 This conjugation map is a continuous involution. It can be regarded as the canonical extension to higher dimensional spaces of a map between proba- ] R bility measures on 1-dimensional spaces characterized by the fact that the distribution functions of µ and C(µ) are inverse to each other. P We also present an heuristic interpretation of the entropic measure as . h 1 t dPβ(µ)= exp(−β·Ent(µ|m))·dP0(µ). a Z m MathematicsSubjectClassification(2000).60G57;28C20;49N90;49Q20;58J65. [ Keywords. Optimal transport, entropic measure, Wasserstein space, entropy, 1 gradient flow, Brenier map, Dirichlet distribution, random probability mea- v sure. 5 1 8 1 . 1 0 9 1. Introduction 0 : Gradient flows of entropy-like functionals on the Wasserstein space turned out v to be a powerful tool in the study of various dissipative PDEs on Euclidean or i X Riemannian spaces M, the prominent example being the heat equation. See e.g. r the monographs [Vi03, AGS05] for more examples and further references. a In [RS08], von Renesse and the author presented an approach to stochastic per- turbation of the gradient flow of the entropy. It is based on the construction of a Dirichlet form (cid:90) (u,u) = u 2(µ) dPβ(µ) E (cid:107)∇ (cid:107) P(M) 2 Karl-Theodor Sturm where u denotes the norm of the gradient in the Wasserstein space (M) as introdu(cid:107)c∇ed(cid:107)by Otto [Ot01]. The fundamental new ingredient was the mePasure Pβ on the Wasserstein space. This so-called entropic measure is an interesting and challenging object in its own right. It is formally introduced as 1 dPβ(µ)= exp( β Ent(µm)) dP0(µ) (1.1) Z − · | · withsome(non-existing)‘uniformdistribution’P0 ontheWassersteinspace (M) P and the relative entropy as a potential. A rigorous construction was presented for 1-dimensional spaces. In the case M = [0,1] it is based on the bijections (x)=µ([0,x]) g=f(−1) g(y)=ν([0,y]) µ ←−−−−−−−→ f ←−−−−−−−−−→ g ←−−−−−−−−→ ν betweenprobability measures, distribution functionsandinverse distribution func- tions (where f(−1)(y) = inf x 0 : f(x) y more precisely denotes the ‘right { ≥ ≥ } inverse’ of f). If C : (M) (M) denotes the map µ ν then the entropic measure Pβ is just thPe push→foPrward under C of the Diri(cid:55)→chlet-Ferguson process Qβ. The latter is a random probability measure which is well-defined on every probability space. Forlongtimeitseemedthatthepreviousconstructionisdefinitivelylimitedtodi- mension1sinceitheavilydependsontheuseofdistributionfunctions(andinverse distributionfunctions),–objectswhichdonotexistinhigherdimensions.Thecru- cialobservationtoovercomethisrestrictionistointerpretg astheuniqueoptimal transport map which pushes forward m (the normalized uniform distribution on M) to µ: µ=g m. ∗ Due to Brenier [Br87] and McCann [Mc01] such a ‘monotone map’ exists for each probabilitymeasureµonaRiemannianmanifoldofarbitrarydimension.Moreover, alsoinhigherdimensionssuchamonotonemapg hasauniquegeneralizedinverse f,againbeingamonotonemap(withgeneralizedinversebeingg).Thisobservation allows to define the conjugation map C: (M) (M), µ ν P →P (cid:55)→ for any compact manifold M. It is a continuous involution. By means of this map we define the entropic measure as follows: Pβ :=C Qβ ∗ where Qβ denotes the Dirichlet-Ferguson process on M with intensity measure β m. (Actually, such a random probability measure exists on every probability · space.) In order to justify our definition of the entropic measure by some heuristic argu- ment let us assume that Pβ were given as in (1.1). The identity Qβ = C Pβ then ∗ Entropic Measure on Multidimensional Spaces 3 defines a probability measure which satisfies 1 dQβ(ν)= exp( β Ent(mν)) dQ0(ν). (1.2) Z − · | · Given a measurable partition M = (cid:83)N M and approximating arbitrary proba- i=1 i bility measures ν by measures with constant density on each of the sets M of the i partition the previous ansatz (1.2) yields – after some manipulations – Qβ (dx) M1,...,MN Γ(β) = xβ·m(M1)−1 ... xβ·m(MN−1)−1 xβ·m(MN)−1 N · 1 · · N−1 · N × (cid:81) Γ(βm(M )) i i=1 δ (dx )dx ...dx . × (1−NP−1xi) N N−1 1 i=1 These are, indeed, the finite dimensional distributions of the Dirichlet-Ferguson process. 2. Spaces of Convex Functions and Monotone Maps Throughout this paper, M will be a compact subset of a complete Riemannian manifold Mˆ with Riemannian distance d and m will denote a probability measure with support M, absolutely continuous with respect to the volume measure. We assume that it satisfies a Poincar´e inequality: c>0 ∃ (cid:90) (cid:90) u2dm c u2dm |∇ | ≥ · M M for all weakly differentiable u:M R with (cid:82) udm=0. → M For compact Riemannian manifolds, there is a canonical choice for m, namely, thenormalizedRiemannianvolumemeasure.Thefreedomtochoosemarbitrarily might be of advantage in view of future extensions: For Finsler manifolds and for non-compact Riemannian manifolds there is no such canonical probability mea- sure. The main ingredient of our construction below will be the Brenier-McCann repre- sentation of optimal transport in terms of gradients of convex functions. Definition 2.1. A function ϕ : M R is called d2/2-convex if there exists a function ψ :M R such that → → (cid:20) (cid:21) 1 ϕ(x)= inf d2(x,y)+ψ(y) −y∈M 2 for all x M. In this case, ϕ is called generalized Legendre transform of ψ or ∈ conjugate of ψ and denoted by ϕ=ψc. 4 Karl-Theodor Sturm Let us summarize some of the basic facts on d2/2-convex functions. See [Ro70], [Ru¨96], [Mc01] and [Vi08] for details.1 Lemma 2.2. (i) A function ϕ is d2/2-convex if and only if ϕcc =ϕ. (ii) Every d2/2-convex function is bounded, Lipschitz continuous and differen- tiable almost everywhere with gradient bounded by D = sup d(x,y). x,y∈M In the sequel, = (M) will denote the set of d2/2-convex functions on M and ˜ = ˜(M)wilKldenoKtethesetofequivalenceclassesin withϕ ϕ iffϕ ϕ 1 2 1 2 K K K ∼ − is constant. will be regarded as a subset of the Sobolev space H1(M,m) with K norm (cid:20)(cid:90) (cid:90) (cid:21)1 2 u = u 2 dm+ u2dm H1 (cid:107) (cid:107) |(cid:53) | M M and ˜ = /const will be regarded as a subset of the space H˜1 = H1/const with K K norm (cid:20)(cid:90) (cid:21)1 2 u = u 2 dm . (cid:107) (cid:107)H˜1 |∇ | M Proposition 2.3. For each Borel map g :M M the following are equivalent: → (i) ϕ ˜ :g =exp( ϕ) a.e. on M; ∃ ∈K ∇ (ii) g is an optimal transport map from m to f m in the sense that it is a min- ∗ imizer of h (cid:82) d2(x,h(x))m(dx) among all Borel maps h : M M with (cid:55)→ M → h m=g m. ∗ ∗ In this case, the function ϕ ˜ in (i) is defined uniquely. Moreover, in (ii) the ∈ K map f is the unique minimizer of the given minimization problem. A Borel map g :M M satisfying the properties of the previous proposition will → be called monotone map or optimal Lebesque transport. The set of m-equivalence classes of such maps will be denoted by = (M). Note that (M) does not G G G depend onthechoiceofm(aslongasmisabsolutelycontinuouswithfullsupport)! (M) will be regarded as a subset of the space of maps L2((M,m)(M,d)) with G metric d (f,g)=(cid:2)(cid:82) d2(f(x),g(x))m(dx)(cid:3)12. 2 M Accordingtoourdefinitions,themapΥ:ϕ exp( ϕ)definesabijectionbetween ˜ and . Recall that = (M) denotes th(cid:55)→e set o∇f probability measures µ on M K G P P (equipped with its Borel σ-field). Proposition 2.4. The map χ : g g m defines a bijection between and (M). ∗ (cid:55)→ G P That is, for each µ there exists a unique g – called Brenier map of µ – ∈ P ∈ G with µ=g m. ∗ 1Afunctionϕisd2/2-convexinoursenseifandonlyifthefunction−ϕisc-concaveinthesense of [Ro70, Ru¨96, Mc01, Vi08] with cost function c(x,y)=d2(x,y)/2. In our presentation, the c standsfor‘conjugate’.Fortherelationbetweend2/2-convexityandusualconvexityonEuclidean spacewerefertochapter4. Entropic Measure on Multidimensional Spaces 5 The map χ of course strongly depends on the choice of the measure m. (If there is any ambiguity we denote it by χ .) m Duetothepreviousobservations,thereexistcanonicalbijectionsΥandχbetween the sets ˜, and . Actually, these bijections are even homeomorphisms with K G P respect to the natural topologies on these spaces. Proposition 2.5. Consider any sequence ϕ in ˜ with corresponding se- { n}n∈N K quences g = Υ(ϕ ) in and µ = χ(g ) in and let { n}n∈N { n }n∈N G { n}n∈N { n }n∈N P ϕ ˜, g =Υ(ϕ) , µ=χ(g) . Then the following are equivalent: ∈K ∈G ∈P (i) ϕ ϕ in H˜ n 1 −→ (ii) g g in L2((M,m),(M,d)) n −→ (iii) g g in m-probability on M n −→ (iv) µ µ in L2-Wasserstein distance d n W −→ (v) µ µ weakly. n −→ Proof. (i) (ii)CompactnessofM andsmoothnessoftheexponentialmapimply ⇔ that there exists δ >0 such that x M, v ,v T M with v , v D and 1 2 x 1 2 ∀ ∈ ∀ ∈ | | | |≤ v v <δ: 1 2 | − | 1 d(exp v ,exp v )/ v v 2. 2 ≤ x 1 x 2 | 1− 2 |TxM≤ Hence, ϕ ϕ in H˜1, that is (cid:82) ϕ (x) ϕ(x) 2 m(dx) 0, is equiv- alentto(cid:82)n −d→2(g (x),g(x))m(dx)M |∇0,nthat−is,∇tog |TxMg inL2((−M→,m),(M,d)). M n −→ n −→ (ii) (iii) Standard fact from integration theory (taking into account that ⇔ d(g ,g) is uniformly bounded due to compactness of M). n (ii) (iv) If µ =(g ) m and µ =g m then (g ,g) m is a coupling of µ and n n ∗ n ∗ n ∗ n ⇔ µ. Hence, (cid:90) d2 (µ ,µ) d2(g (x),g(x))m(dx). (2.1) W n ≤ n M (iv) (v) Trivial. ⇔ (ii) (iv) [Vi08], Corollary 5.21. ⇔ (cid:3) Remark 2.6. Since M is compact, assertion (ii) of the previous Proposition is equivalent to (iii’) g g in Lp((M,m),(M,d)) n −→ for any p [1, ) and similarly, assertion (iv) is equivalent to ∈ ∞ (iv’) µ µ in Lp-Wasserstein distance. n −→ Remark 2.7. In n = 1, the inequality in (2.1) is actually an equality. In other words, the map χ:( ,d ) ( ,d ) 2 W G → P is an isometry. This is no longer true in higher dimensions. The well-known fact (Prohorov’s theorem) that the space of probability measures onacompactspaceisitselfcompact,togetherwiththepreviouscontinuityresults immediately implies compactness of ˜ and . K G 6 Karl-Theodor Sturm Corollary 2.8. (i) ˜ is a compact subset of H˜1. K (ii) is a compact subset of L2((M,m),(M,d)). G 3. The Conjugation Map LetusrecallthedefinitionoftheconjugationmapC :ϕ ϕc actingonfunctions K ϕ:M R as follows (cid:55)→ → (cid:20) (cid:21) 1 ϕc(x)= inf d2(x,y)+ϕ(y) . −y∈M 2 The map C maps bijective onto itself with C2 =Id. For each λ R, C (ϕ+ λ) = C (ϕ)K λ. HeKnce, C extends to a bijectioKn C : ˜ ˜. Co∈mposinKg this K − K K˜ K → K map with the bijections χ: and Υ: ˜ we obtain involutive bijections G →P K→G C =Υ C Υ−1 : G ◦ K˜ ◦ G →G and C =χ C χ−1 : , P G ◦ ◦ P →P called conjugation map on or on , respectively. Given a monotone map g , G P ∈G the monotone map gc :=C (g) G will be called conjugate map or generalized inverse map; given a probability mea- sure µ the probability measure ∈P µc :=C (µ) P will be called conjugate measure. Example 3.1. (i) Let M =Sn be the n-dimensional sphere, and m be the normal- ized Riemannian volume measure. Put µ=λδ +(1 λ)m a − for some point a M and λ ]0,1[. Then ∈ ∈ 1 µc = 1 m 1 λ M\Br(a)· − where r >0 is such that m(B (a))=λ. r [ Proof. The optimal transport map g = exp(∇ϕ) which pushes m to µ is determined by the d2/2-convexfunction ϕ=( 122(ˆπrr−2r−)ˆdd22((aa,(cid:48)x,x)˜)−(π−r)2˜ iinnBBrπ(−ar)(a(cid:48))=M\Br(a) Itsconjugateisthefunction r 1 ϕc(y)=− d2(a(cid:48),y)+ r(π−r). ] 2π 2 Entropic Measure on Multidimensional Spaces 7 λδ a (1 λ)m − m µ ϕ ∇ m µc ϕc ∇ (ii) Let M = Sn, the n-dimensional sphere, and µ = δ for some a M. Then a ∈ µc =δ with a(cid:48) M being the antipodal point of a. a(cid:48) [ Proof. Limit of (i)∈as λ(cid:37)1. Alternatively: explicit calculations with ϕ(x)= 1[π2−d2(a,x)] 2 and „ 1 1 1 « 1 ϕc(y)=sup − d2(x,y)+ d2(a,x)− π2 =− d2(a(cid:48),y). ] x 2 2 2 2 (iii) Let M = Sn, the n-dimensional sphere, and µ = 1δ + 1δ with north and 2 a 2 a(cid:48) south pole a,a(cid:48) M. Then µc is the uniform distribution on the equator, the ∈ (n 1)-dimensional set Z of points of equal distance to a,a(cid:48). − (iv) Let M =S1 be the circle of length 1, m = uniform distribution and k (cid:88) µ= α δ i xi i=1 with points x <x <...<x <x in cyclic order on S1 and numbers α [0,1], 1 2 k 1 i (cid:80) ∈ α =1. Then i k (cid:88) µc = β δ i yi i=1 with β = x x and points y <y <...<y <y =y on S1 satisfying i i+1 i 1 2 k k+1 1 | − | y y =α . i+1 i i+1 | − | [Proof.EmbeddinginR1 andexplicitcalculationofdistributionandinversedistributionfunc- tions.] Remark 3.2. The conjugation map C : P P →P 8 Karl-Theodor Sturm depends on the choice of the reference measure m on M. Actually, we can choose two different probability measures m , m and consider C =χ C χ−1. 1 2 P m2 ◦ G ◦ m1 Proposition 3.3. Let µ = g m be absolutely continuous with density η = dµ. ∗ ∈ P dm Put f =gc and ν =f m=µc. ∗ (i)Ifη >0a.s.thenthemeasureν isabsolutelycontinuouswithdensityρ= dν > dm 0 satisfying η(x) ρ(f(x))=ρ(x) η(g(x))=1 for a.e. x M. · · ∈ (ii) If ν is absolutely continuous then f(g(x))=g(f(x))=x for a.e. x M. ∈ (iii) Under the previous assumption the Jacobian detDf(x) and detDg(x) exist for almost every x M and satisfy ∈ detDf(g(x)) detDg(x)=detDf(x) detDg(f(x))=1, · · σ(x) η(x)=σ(f(x)) detDf(x), σ(x) ρ(x)=σ(g(x)) detDg(x) · · · · for almost every x M where σ = dm denotes the density of the reference ∈ dvol measure m with respect to the Riemannian volume measure vol. Proof. (i) For each Borel function v :M R + → (cid:90) (cid:90) (cid:90) 1 (cid:90) 1 (cid:90) 1 vdν = v fdm= v f dµ= v f dµ= v dm. ◦ ◦ ·η ◦ ·η(g f) ·η g M M M M ◦ M ◦ Hence,νisabsolutelycontinuouswithrespecttomwithdensity 1 .Interchanging η◦g the roles of µ and ν (as well as f and g) yields the second claim. (ii), (iii) Part of Brenier- McCann representation result of optimal transports. (cid:3) Corollary 3.4. Under the assumption η >0 of the previous Proposition: Ent(µc m)=Ent(m µ). | | Proof. With notations from above (cid:90) (cid:90) 1 1 (cid:90) 1 1 Ent(µc m)= ρlogρdm= log dm= log dµ=Ent(m µ). | η g η g η η | ◦ ◦ (cid:3) Lemma 3.5. The conjugation map C : K K→K is continuous. Proof. TosimplifynotationdenoteC byC.Chooseacountabledenseset y K { i}i∈N in M and for k N define C : ϕ ϕc on by ϕc(x) = inf [1d2(x,y )+ ∈ k (cid:55)→ k K k −i=1,...,k 2 i (ϕ(y )]. Then as k i →∞ ϕc ϕc pointwise on M. k (cid:37) Recall that each ϕ is Lipschitz continuous with Lipschitz constant D. ∈K Entropic Measure on Multidimensional Spaces 9 Foreachε>0choosek =k(ε) Nsuchthattheset y isanε-covering ∈ { i}i=1,...,k(ε) of the compact space M. Then 1 1 C (ϕ)(x) C(ϕ)(x) sup inf d2(x,y) d2(x,y )+ϕ(y) ϕ(y ) | k − |≤y∈Mi=1,...,k| 2 − 2 i − i | sup inf 2D d(y,y ) 2Dε uniformly in x M and ϕ . i ≤y∈Mi=1,...,k · ≤ ∈ ∈K Now let us consider a sequence (ϕl)l∈N in with ϕl ϕ in H1(M). Then for each k N as l K → ∈ →∞ C (ϕ ) C (ϕ) k l k → pointwise on M and thus also in L2(M). Together with the previous uniform convergence of C C it implies k → C(ϕ ) C(ϕ) l → in L2(M) as l . Moreover, we know that C(ϕ ) is bounded in H1(M) → ∞ { l }l∈N (since all gradients are bounded by D). Therefore, finally C(ϕ ) C(ϕ) l → in H1(M) as l . This proves the continuity of C: with respect to the H1-norm. →∞ K→K (cid:3) Theorem 3.6. The conjugation map C : P P →P is continuous (with respect to the weak topology). Proof. Let us first prove continuity of the conjugation map C : ˜ ˜ (with K˜ K → K respect to the H˜1-norm on ˜). Indeed, this follows from the previous continuity result together with the factKs that the embedding H1 H˜1, ϕ ϕ˜ = ϕ+c : c R is continuous (trivial fact) and that the map H˜→1 H1, ϕ˜(cid:55)→= ϕ+{c : c R∈ }ϕ (cid:82) ϕdm is continuous (consequence of Poincar→´e inequality){. ∈ }(cid:55)→ − M Thisinturnimplies,duetoProposition2.5,thattheconjugationmapC : G G →G is continuous (with respect to the L2-metric on ). Moreover, due to the same G Proposition it therefore also implies that the conjugation map C : P P →P is continuous (with respect to the weak topology). (cid:3) Remark 3.7. In dimension n = 1, the conjugation map C : is even an G G → G isometry from , equipped with the L1-metric, into itself. G 10 Karl-Theodor Sturm 4. Example: The Conjugation Map on M Rn ⊂ Inthischapter,wewillstudyindetailtheEuclideancase.WeassumethatM isa compact convex subset of Rn. (The convexity assumption is to simplify notations and results.) The probability measure m is assumed to be absolutely continuous with full support on M. A function ϕ : M R is d2/2-convex if and only if the function ϕ (x) = ϕ(x)+ 1 → x2/2 is convex in the usual sense: | | ϕ (λx+(1 λ)y) λϕ (x)+(1 λ)ϕ (y) 1 1 1 − ≤ − (for all x,y M and λ [0,1]) and if its subdifferential lies in M: ∈ ∈ ∂ϕ (x) M 1 ⊂ for all x M. ∈ Afunctionψ istheconjugateofϕifandonlyifthefunctionψ (y)=ψ(y)+ y 2/2 1 | | is the Legendre-Fenchel transform of ϕ : 1 ψ (y)= sup [ x,y ϕ (x)]. 1 1 (cid:104) (cid:105)− x∈M A Borel map g :M M is monotone if and only if → g(x) g(y),x y 0 (cid:104) − − (cid:105)≥ for a.e. x,y M. Equivalently, g is monotone if and only if g = ϕ for some 1 convex ϕ :M∈ R. ∇ 1 → Lemma 4.1. (i) If µ=λδ +(1 λ)ν then there exists an open convex set U M z − ⊂ with m(U)=λ such that the optimal transport map g with g m=µ satisfies g z ∗ ≡ a.e. on U. (ii) The conjugate measure µc does not charge U: µc(U)=0. Proof. (i) Linearity of the problem allows to assume that z = 0. Let g = ϕ 1 ∇ denote the optimal transport map with ϕ being an appropriate convex function. 1 Let V be the subset of points in M in which ϕ is weakly differentiable with 1 vanishinggradient.Bythepushforwardpropertyitfollowsthatm(V)=λ.Firstly, then convexity of ϕ implies that ϕ has to be constant on V, say ϕ α on V. 1 1 1 ≡ Secondly, the latter implies that ϕ α on the convex hull W of V. The interior 1 ≡ U of this convex set W has volume m(U) = m(W) m(V) = λ and ϕ is 1 ≥ constant on U, hence, differentiable with vanishing gradient. Thus finally U V ⊂ and m(U)=λ. (ii) Let µ , (cid:15) [0,1], denote the intermediate points on the geodesic from µ =µ (cid:15) 0 ∈ to µ =m. Then µ =(g ) m with g =exp((1 (cid:15)) ϕ)=(cid:15) Id+(1 (cid:15)) g and 1 (cid:15) (cid:15) ∗ (cid:15) − ∇ · − · each µ is absolutely continuous w.r. to m. Hence, gc =g−1 a.e. on M. Therefore, (cid:15) (cid:15) (cid:15) the conjugate measure µc satisfies (cid:15) µc(U)=m(cid:0)(gc)−1(U)(cid:1)=m(g (U))=(cid:15)n m(U)=(cid:15)n λ. (cid:15) (cid:15) (cid:15) · ·

See more

The list of books you might like