Moment bounds and geometric ergodicity of diffusions with random switching and unbounded transition rates
 Xin T. Tong^{1}Email authorView ORCID ID profile and
 Andrew J. Majda^{2}
https://doi.org/10.1186/s4068701600892
© The Author(s) 2016
Received: 12 April 2016
Accepted: 26 October 2016
Published: 14 November 2016
Abstract
A diffusion with random switching is a Markov process that consists of a stochastic differential equation part \(X_t\) and a continuous Markov jump process part \(Y_t\). Such systems have a wide range of applications, where the transition rates of \(Y_t\) may not be bounded or Lipschitz. A new analytical framework is developed to understand the stability and ergodicity of these processes and allows for genuinely unbounded transition rates. Assuming the averaged dynamics is dissipative, the first part of this paper explicitly demonstrates how to construct a polynomial Lyapunov function and furthermore moment bounds. When the transition rates have multiple scales, this construction comes interestingly as a dual process of the averaging of fast transitions. The coefficients of the Lyapunov function can be seen as the potential dissipation of each regime in different scales, and a comparison principle comes naturally under this interpretation. On the basis of these results, the second part of this paper establishes geometric ergodicity for the joint processes. This can be achieved in two scenarios. If there is a commonly accessible regime that satisfies the minorization condition, the geometric convergence to the ergodic measure takes place in the total variation distance. If there is contraction on average, the geometric convergence takes place in a proper Wasserstein distance and is proved through an application of the asymptotic coupling framework.
1 Background
 1.
For stochastic lattice models in climate science [12, 14, 15, 25, 26, 31, 33, 48, 49], \(X_t\) represents the dry atmosphere, so (1.1) is the spatial discretization of a fluid equation. Meanwhile, \(Y_t\) represents the unresolved behavior of moisture and clouds.
 2.
In material science [21–24] and molecular biology [6, 45], \(X_t\) represents some macroscopic quantities such as the transmembrane electronic potential, and \(Y_t\) stands for the behavior of some particular clusters, proteins, channels and cells.
 3.
As a simulation strategy for complex processes [8, 33], a Markov jump process can be used as a stochastic parameterization of some subgrid scale processes. It reduces the model dimension and preserves most of the statistical quantities. It can be seen as the \(Y_t\) part in our joint process.
 4.
In filtering and predictive modeling [16, 30, 32], diffusions with random switching are used as test beds to quantify the uncertainty from model errors.

When does the joint process \(Z_t=(X_t,Y_t)\) possess an invariant measure? What kind of statistics, for example moments of \(X_t\), is integrable under the invariant measures?

Is the invariant measure unique? How does it attract other statistical states?
In the last decade, a series of works have been devoted for the questions above [1, 3–5, 7, 11, 39, 46]. In the simplest setting, the transition rates are constants, \(\lambda (x,y,y')=\lambda (y,y')\), and \(X_t\) is driven by the linear equation (1.2). Both questions above are relatively well understood in this setting, thanks to an application of Perron–Frobenius theorem [3, 11]. These intuitive results can be extended to nonconstant transition rates through a probabilistic coupling argument [7, 46]. However, for this argument to work, the transition rates have to be globally bounded and Lipschitz. This restriction excludes many important applications [14, 36] or imposes additional nonphysical compact requirements on the model space [6]. This paper intends to bridge this gap by developing a new analytical framework.
 1.
Inspired by the formulation in [48, 49], we assume there is a \(X_t\) controlled multiscale structure in the transition rates \(\lambda \), while the fast averaging procedures induce a dissipation. It is important to note that the multiple scales here are not introduced by an auxiliary variable \(\epsilon \) as in other standard settings [43, 44].
 2.
There is a comparison principle in favor of dissipation.
 1.
If there is a commonly reachable regime that satisfies the minorization condition, Theorem 3.4 proves geometrically ergodicity in the total variation distance.
 2.
If there is contraction on average, and the transition rates and their first derivatives are bounded by the Lyapunov function, Theorem 3.6 shows geometric ergodicity in a proper Wasserstein distance.
The remainder of this paper is arranged as follows. Section 2 discusses criterions that lead to dissipation on average, and how to construct a Lyapunov function in different scenarios. Section 3 gives the precise statements of geometric ergodicity when there is a hypoelliptic regime or there is contraction on average. Conditions leading to the second scenario are briefly discussed and compared with results in Sect. 2. The proofs of geometric ergodicity are contained in Sect. 4, where we also discuss how to verify the accessibility of one regime. Section 5 summarizes the results and discusses some related questions.
2 Dissipation on average and Lyapunov functions
A simple way to generalize (1.2) to a nonlinear setting is to assume that a rate function \(\gamma :F\mapsto \mathcal {R}\) measures the dissipation and inflation of each regime in F.
Assumption 2.1
Notice that \(\gamma (y)\) could be negative, which introduces an inverse dissipation or inflation, and makes the global dissipation problem nontrivial. Given a transition dissipation pair, \((\lambda , \gamma )\), the main objective of this section is to find intuitive criteria that lead to dissipation on average in different scenarios, and show that there exists a polynomiallike Lyapunov function.
2.1 Constant transition rates
Theorem 2.2
Suppose that \(X_t\) follows (1.2) and \(Y_t\) is an ergodic Markov jump process with constant transition rate \(\Lambda \). Suppose also the average dissipation is positive \(\sum \pi (y)\gamma (y)>0\), with \(\pi \) being the ergodic measure of \(Y_t\). Let \(\Gamma \) be the diagonal matrix with entries \(\gamma (y)\) on the y, yth component, then there is an \(m>0\) such that the spectrum of \(m\Gamma +\Lambda \) lies in the negative half plane, and \(V(x)=a(y)x^m\) is a Lyapunov function. Here a as a vector is the Perron eigenvector of \(m\Gamma +\Lambda \).
Proof
Second, at \(m=0\), the Perron eigenvalue is 0. Through a perturbation analysis of m to the positive direction, one can show the spectrum of \(m\Gamma +\Lambda \) lies in the negative half plane for small enough m. The details of these results can be found in proposition 4.2 of [3].
Combining these two arguments, we find a strictly positive m, such that the spectrum of \(m\Gamma +\Lambda \) is in the negative half plane, and the Perron eigenvector a of \(\exp (m\Gamma t+\Lambda t)\) is an eigenvector of \(m\Gamma +\Lambda \) associated with a negative eigenvalue, while all the components of a are strictly positive.\(\square \)
Remark 2.3
2.2 Multiscale transitions: one fast scale
Since Lyapunov functions concern only large x, see Lemma 6.2, it is intuitive that the highest order transition \(\Lambda _J\) plays a dominating role; if over the invariant measure of \(\Lambda _J\) the average of \(\gamma \) is positive, then there should be a Lyapunov function that quantifies dissipation on average. The complications to this argument may come from two aspects: (1) the support of each \(\Lambda _j\) may not be the whole state space F, so different subsets of F may have different transition scales; (2) on the support of each \(\Lambda _j\), \(\Lambda _j\) may not induce an irreducible Markov chain. Here \(F'\subset F\) is the support of a transition rate matrix \(\Lambda \) if \(F'\) is the minimal subset such that \(\lambda (y,y')=0\) if y and \(y'\) are not both in \(F'\). We will leave the first issue to the next subsection and focus first on the averaging phenomenon from multiscale transitions and possible reducible structures.
 1.
There is a constant \(F'\times F'\) matrix \(\Lambda _{F'}\) such that \(\Lambda (x)_{F'}x^n \Lambda _{F'}\) is of order \(x^{n\delta }\) for some \(\delta >0\). Here \(\Lambda (x)_{F'}\) is the subdiagonal matrix of \(\Lambda (x)\) with indices in \(F'\).
 2.
For any \(y,y'\in F'\), there is a path \(y=y_0,y_1,\ldots , y_m=y'\) such that for each i, either \(\lambda _{F'}(y_i, y_{i+1})>0\) or \(\lambda _{F'}(y_{i+1},y_i)>0\).
 3.
\(F'\) is maximal as there is no strict superset of \(F'\) that also satisfies the conditions above.
Next we define the irreducible components, which is also called the closed communicating classes in the literature like [42]. A subset \(G\subset F'\) is an irreducible component , if (1) for all \(y,y'\in G\), there is a path \(y=y_0,\ldots , y_n=y'\) in G such that \(\lambda _{F'}(y_i,y_{i+1})>0\); (2) for all \(y\in g,y'\notin G\) such a path does not exist. We will use \(G^c\) to denote the transient set, which consists of states in \(F'\) not being in any irreducible components.
Now we consider the simple case where the highest order component is F itself.
Theorem 2.4
Proof
Lemma 2.5
Proof
2.3 Multiscale transitions: multiple scaling structures
When F is not the maximal connected component, the transitions inside a maximal connected component \(F'\) of highest order will be significantly faster than transitions outside. These fast transitions will average the dissipation of each irreducible component \(G_k\) inside \(F'\) and also the manner how \(Y_t\) leaves \(F'\). In the perspective of the states outside \(F'\), each \(G_k\) is essentially a single point, and its dissipation rate is the averaged dissipation over \(\pi _k\); each transient state \(y\in G^c\) is an intermediate state that can jump to any of the \(G_k\), while the time \(Y_t\) spent on it is ignorable, as long as the rates toward nonirreducible parts, \(\lambda (x,y,y')\) with \(y'\in G^c\cup F/F'\), are not too strong.
Note that in these averaging procedures, the transition rates from \(y\in G^c\) to \(y'\in G^c\cup F/F'\) are completely wiped out. So we need these rates to be not too strong, else the averaged structure cannot represent this information. In particular we have the following nondominating condition.
Assumption 2.6
For any transient state y and any \(y'\in G^c\cup F/F'\), suppose that \(p_{F'} (y,g_k)>0\), then there is a \(y''\in G_k\) such that \(\lambda (x,y,y')\) has at most the same polynomial order in x as \(\lambda (x,y'',y')\).
Since there are only finitely many states, there are only finitely many, say \(m_J\), connected components with the highest order \(n_J\) in (2.11). After applying an averaging step on one of these components, \(F'\), the transition rates related to \(F'\) are of order strictly less than \(n_J\), and the states after averaging have a smaller cardinality \(\widetilde{F}\le F\). So after \(m_J\) steps, the transition rates are of order \(n_{J1}\). We can repeat this argument J times and finally end up with an averaged transition matrix \(\widetilde{\Lambda }\) being a constant matrix. Intuitively, this constant matrix dictates whether the original system is dissipative on average.
Theorem 2.7
Let state space \(\widetilde{F}\), constant transition rates \(\widetilde{\Lambda }\) and dissipation rates \(\tilde{\gamma }\) be the final result of a sequence of averaging procedures. Suppose at each averaging step, the transient transition rates follow the nondominating condition, Assumption 2.6. Then the original system has a polynomiallike Lyapunov function of some order \(m>0\), if \(\widetilde{F}\) consists of only irreducible components of \(\widetilde{\Lambda }\), while on each of them the average dissipation of \(\tilde{\gamma }\) is positive. If in addition \(\tilde{\gamma }(y)>0\) for all \(y\in \widetilde{F}\), m can be any positive number.
Theorem 2.4 was a special one averaging step case of the theorem above, and the conditions there were not optimal. But we keep Theorem 2.4 for its simpler intuition.
Proof of Theorem 2.7
Lastly, we notice the detail transition of V(z) is the sum of detail transition of \(V_0(z)\) and Q(z), the first part is of order m from previous discussion, and \(\lambda (x,y,y')(Q(x,y')Q(x,y))\) is of order at most m as well. So the detail transitions of V(z) are of order at most m. \(\square \)
Remark 2.8
Once we finish the construction of \(V=\sum a_i(y)x^{m_i}\) and look back, we can see \(a_i(y)\) captures the potential dissipation with the transition rates of order \(x^{m_Im_i}\) and within a maximal connected component of that order. And for y in this maximal connected components, \(a_j(y)\) are of identical value for \(j\ge i\). In other words, from the value of the sequence \(\{a_i(y)\}_{i\le I}\), we can actually tell which connected component of what order does y belong to. The following subsection gives a simple and concrete example.
2.4 A multiscale transition example
With the final step of averaging, we end up with one state in (3), so the transition matrix is the constant matrix of zero. The dissipation rate is given by \(\tilde{\pi }(ab)\tilde{\gamma }(ab)+\tilde{\pi }(d)\tilde{\gamma }(d)=\frac{1}{3}\). So the whole system is dissipative on average, Theorem 2.7 applies, and the Lyapunov function can be of any order.
2.5 Comparison principle
The other way to deal with nonconstant transition rate is through a comparison principle. To be specific, suppose \((\Lambda (x), \gamma )\) is dissipative on average, which may be established by Theorem 2.2 or Theorems 2.4 and 2.7. Suppose also in another transition dissipation pair \((\widetilde{\Lambda }(x), \tilde{\gamma })\), the dissipation is stronger in all regimes, while the regime transitions are more favorable for dissipation; then, intuitively \((\widetilde{\Lambda }(x), \tilde{\gamma })\) would also admit a dissipation on average.
Theorem 2.9
Proof
The simplicity of this proof comes from our interpretation of dissipation on average through polynomial Lyapunov functions. On the other hand, comparison principles can also be demonstrated by coupling methods when the transition rates are bounded. Cloez and Hairer [7] and Shao [46] have shown dissipation on average in the following birth–death scenario through proofs of considerate length, while it is only a special case of Theorem 2.9.
Corollary 2.10
Proof
3 Geometric ergodicity
In Sect. 2, Lyapunov functions are constructed for diffusions with random switching when the dynamics is dissipative on average. Because these Lyapunov functions are in the form of polynomials, \(\mathbb {E}X_t^m\) is bounded uniformly for a proper m. Then by the Krylov–Bogoliubov theorem [9], there is at least one invariant measure for the joint process \(Z_t=(X_t,Y_t)\). It is natural to ask whether this invariant measure is unique, and how does the law of \(Z_t\) converge to the unique invariant measure \(\pi \).
 1.
in total variation distance if there is a commonly reachable minorization regime;
 2.
in a proper Wasserstein distance if there is contraction on average.
3.1 Convergence in total variation with a minorization regime
Theorem 3.1
When F consists of only one regime, \(X_t\) is simply an SDE on \(\mathcal {R}^d\). In this context, following the arguments in [40], the minorization conditions of Theorem 3.1 can be verified by the hypoellipticity and reachability conditions below:
Assumption 3.2
 1.Hypoellipticity condition: let \(\mathcal {L}\) be the Lie algebra generated by the vector fieldswith \(\sigma _i\) being the columns of \(\sigma \), and \(\mathcal {L}_0\) is the ideal in \(\mathcal {L}\) generated by \(\{\sigma _1,\ldots , \sigma _m\}\), assuming \(\mathcal {L}_0\) spans \(\mathcal {R}^d\) at all points.$$\begin{aligned} \{f,\sigma _1,\ldots , \sigma _m\} \end{aligned}$$
 2.Reachability condition: there is a point \(x_h\in \mathcal {R}^d\) such that for any compact set C and \(\epsilon >0\), there is a \(t_0\) such that from any \(x\in C\) there is a piecewise constant process \(w_t\) such that the solution to the following ODEsatisfies \(x_{t_0}x_h\le \epsilon \).$$\begin{aligned} \hbox {d}x_t=[b(x_t)+\sigma (x_t)w_t] \hbox {d}t,\quad x_0=x, \end{aligned}$$
Theorem 3.4 below indicates that for diffusions with random switching, it suffices to check minorization condition for one particular regime, using say Assumption 3.2, and show this regime is commonly accessible and satisfies a mild growth condition for \(V(Z_t)\). In particular, we define
Definition 3.3
Theorem 3.4
Let \(Z_t=(X_t,Y_t)\) be a diffusion with random switching that admits a Lyapunov function V. Suppose the transition rates satisfy \(\bar{\lambda }(z)\le M V(z)\), moreover there is a regime \(y_h\in F\) such that it is commonly accessible and has polynomial growth for V. Then if the SDE given by \(\hbox {d}X'_t=b(X'_t,y_h)\hbox {d}t+\sigma (X'_t,y_h)\hbox {d}W_t\) satisfies the minorization condition in Theorem 3.1, the diffusion \(Z_t\) has an invariant measure \(\pi \) and is geometrically ergodic under the total variation distance.
The proof is located in Sect. 4, where we will also show a simple way to verify the common accessibility of one regime.
3.2 Wasserstein metric convergence with contraction on average
The total variation norm is often too stringent to capture contraction. For example, consider a trivial deterministic process in \(\mathcal {R}\), \(\hbox {d}X_t=\rho X_t \hbox {d}t\). The invariant measure is obviously \(\delta _0\), a point mass at the origin, and it attracts other points. Yet, starting from any nonzero point, the distribution of \(X_t\) is a point mass at \(e^{\rho t}X_0\), which has total variation distance 2 from \(\delta _0\).
The distance function here can be very flexible. One remarkable discovery in [17, 19] is that by properly incorporating the Lyapunov function into d, the corresponding Wasserstein distance can characterize relatively weak convergence. This is known as the asymptotic coupling framework. As for diffusions with random switching, this framework allows us to generalize geometric ergodicity results to cases where the transition rates and their derivatives are bounded only by the Lyapunov function.
Similar to the situation with dissipation, the notion of contraction on average is essential for our discussion. The precise statement is the following:
Theorem 3.5
 1.V(x, y) has polynomial growth in x, so there are n, M such that$$\begin{aligned} V(x+u, y)\le M(V(x)+u^{n})\quad \text {and}\quad V(x,y)\ge \frac{1}{M}x^{\frac{1}{n}}. \end{aligned}$$
 2.The transition rates and their derivatives are bounded by MV with a constant \(M>0\)$$\begin{aligned} \bar{\lambda }(z)\le MV(z),\quad \sum _{y'}\nabla _x \lambda (x,y,y')\le MV(z). \end{aligned}$$
 3.Each regime admits a contraction rate \(\rho \) in the sense of (3.2), and the averaged dynamics is contractive as there are \(C_\rho ,m,\bar{\rho }>0\):$$\begin{aligned} \mathbb {E}^z \exp \bigg (m\int ^t_0 \rho (Y_s){\text {d}}s\bigg )\le C_\rho \exp (\bar{\rho } t). \end{aligned}$$
 4.
There is a commonly accessible regime \(y_c\), such that \(\rho (y_c)>0\) and V has at most polynomial growth in the sense of Definition 3.3.
3.3 Contraction on average
Given the contraction rates \(\rho \) in each regime, the contraction on average condition (3.3) can be verified using arguments similar to the ones of dissipation on average in Sect. 2. Yet, there is an important difference. When we construct a Lyapunov function, it suffices to consider the transitions for large x, and upper bounds suffice to hold modulo a constant, see Lemma 6.2. This is no longer the case for contraction on average, and we will need the “\(\lesssim \)” inequalities in Sect. 2 to hold with “\(\le \).” As a consequence, the spectrum and comparison arguments still work with a variant, but the scaling argument no longer works.
The following theorem is the contraction version of Theorem 2.2, while the transition rates are allowed to be nonconstant.
Theorem 3.6
Proof
Following the representation (2.10), we can see a(y) in (3.5) as the potential contraction of one regime. Therefore, the comparison principle can be formulated as follows:
Proposition 3.7
The proof is a direct verification and ignored here. On the other hand, contraction on average will require the contraction to hold homogenous inside \(\mathcal {R}^d\), but not just for large enough x. This is probably an intrinsic requirement due to the following example:
Example 3.8
4 Geometric ergodicity through random PDMPs
4.1 Random PDMPs
On the other hand, diffusions with random switching can be viewed as random PDMPs. As noted in Sect. 3.2, the solution of the SDE in a fixed regime can be written as \(X_t=\Psi ^{y,\omega }_{s,t} X_s\), where \(\omega \) denotes the realization of the Wiener process \(W_s\), and we denote the law of \(\omega \) as \(P_W\). Therefore, if we condition on each realization \(\omega \), a diffusion with random switching is simply a PDMP. Following the nomenclature of statistical physics [47], this PDMP will be called a quenched process, as it is the conditioning of the original joint process \(Z_t\), on one realization of random outcome \(\omega \). In contrast, the original process without conditioning will be called the annealed process. We will adopt this simple terminology.
4.2 Accessibility analysis
In both the minorization and contraction on average scenarios, we need a good regime to be commonly accessible. In this section we will discuss the consequence of this assumption and also provide a simple way to verify in Lemma 4.2. Most derivations here are relatively standard and may have a simpler version in [4, 5, 7] when the transition rates are bounded. We provide the complete proofs here to be selfcontained.
Lemma 4.1
 1.
If \(\mathbb {P}^z(Y_t=\tilde{y})>0\), then \(\mathbb {P}^z(Y_{t+s}=\tilde{y})>0\) for any \(s\ge 0\);
 2.For each z and \(t>0\), there exists a neighbor of x, \(O_x\subset \mathbb {R}^d\), such that$$\begin{aligned} \mathbb {P}^{x',y}(Y_t=\tilde{y})\ge \frac{1}{2}\mathbb {P}^z(Y_t=\tilde{y}),\quad \forall x'\in O_x; \end{aligned}$$
 3.If there is a \(\tilde{y}\in F\) that is commonly accessible, then for any fixed compact set C, there exists some \(t_0,m_0>0\) such that$$\begin{aligned} \mathbb {P}^z(Y_{t_0}=\tilde{y})\ge m_0,\quad \forall z\in C. \end{aligned}$$
Proof
The following Lemma provides an easy verification that a regime \(y^*\) is commonly accessible.
Lemma 4.2
Proof
4.3 Ergodicity with a minorization regime
Due to Theorem 3.1, the proof of Theorem 3.4 is a relative standard verification of the small set condition for the full Markov semigroup \(P_t\).
Proof of Theorem 3.4
4.4 Ergodicity with contraction on average
The proof of Theorem 3.5 uses the asymptotic coupling mechanism introduced by [18, 19]. Theorem 4.8 of [19] presented below formulates our application of this mechanism.
Theorem 4.3
 1.\(P_t\) is locally contracting in d:$$\begin{aligned} d(P_t^* \delta _z, P_t^*\delta _{z'})\le \frac{1}{2}d(z,z'),\quad \forall d(z,z')<1. \end{aligned}$$
 2.
Smallness: for any two \(z,z'\) such that \(V(z),V(z')\le K\), \(d(P_t^* \delta _z, P_t^*\delta _{z'})\le 1\epsilon \).
The reason that (1) is called a local contraction, is that in most applications, \(\hbox {d}(z,z')=1\) unless z and \(z'\) are very close. Theorem 4.3 essentially extends a local contraction to a global one.
4.4.1 Contracting distance
For the construction of a contracting distance, we have the following lemma. It is a variant of Lemma 4.13 in [19] which uses a Lyapunov function instead of a super Lyapunov function. The proof goes very similar.
Proposition 4.4
Before we move on to the proof of Proposition 4.4, we need two pieces of arguments. The first one indicates \(d(z,z')\approx 1_{y\ne y'}+ 1_{y=y'}\wedge \delta ^{1}(V(z)xx')^r.\)
Lemma 4.5
Proof
The second lemma gives a bound on the perturbation of measures caused by perturbation on the initial condition:
Lemma 4.6
Proof
\(\square \)
We are finally at the position to prove Proposition 4.4:
Proof of Proposition 4.4
By the definition of contracting metric, it suffices for us to show that \(\hbox {d}(P^*_T\delta _z,P^*_T\delta _{z'})\le \frac{1}{2}\hbox {d}(z,z')\) when \(\hbox {d}(z,z')<1\). This implies that \(y=y'\), and \(xx'\le \frac{1}{2}\).
4.4.2 Small set verification
Verification of condition (2) of Theorem 4.3 is given by the following
Lemma 4.7
Proof
4.4.3 Proof of Theorem 3.5
With the conditions of Theorem 4.3 verified, it is rather elementary to show Theorem 3.5.
Proof
5 Conclusion and discussion
Diffusions with random switching are a general class of stochastic processes with applications in many different areas. Such system consists of a diffusion process part \(X_t\) and a Markov jump part \(Y_t\), and their dynamics can be fully coupled. The stability and ergodicity of such processes remain open questions if the transition rates are not bounded or Lipschitz. This paper closes this gap by developing a new analytical framework.
The first part of this paper constructs polynomialtype Lyapunov functions when there is dissipation on average. These functions can be used to derive moment bounds on the diffusion part. The incorporation of potential dissipation of each regime is found to be an efficient way to capture averaged dissipation. This idea can be easily applied to the classical case where the transition rates are constants as Theorem 2.2. It also leads to a simple illustration of comparison principles, Theorem 2.9. Moreover, with a Fredhlom alternative argument, we demonstrate how can the Lyapunov function be inductively constructed, as a dual process of the averaging procedure, Theorems 2.4, 2.7, assuming the transition rates have a multiscale structure.
The second part of this paper is devoted to the geometric ergodicity of diffusions with random switching, assuming a Lyapunov function exists. If there is a commonly accessible regime that satisfies the minorization condition, Theorem 3.4 proves geometric ergodicity under total variation distance. When there is contraction on average, using the asymptotic coupling framework of [17, 19], Theorem 3.6 demonstrates geometric ergodicity under a proper Wasserstein distance.
 1.
The authors conjecture that the results here hold in a similar form if the regime process \(Y_t\) is instead a continuous stochastic process. One way to see this is taking the limit of a jump process on grid points of vanishing size. But the authors suspect that an independent mechanism can be set up for these processes without little change of the proofs. Such theory will be applicable to many nonlinear models that exhibits intermittency, for example [34].
 2.
The attraction or contraction rates used in this paper provides a uniform control over each component of \(X_t\). A more general situation is that each component of \(X_t\) has a different regimebased attraction rates; in other words, the attraction rate is given by a matrix. A simple example will be \(\hbox {d}X_t=A(Y_t)X_t \hbox {d}t\), where A is a matrixvalued function. It is known such system is very sensitive to the switching even if A has all eigenvalues of negative real parts [28]. It will be interesting if our results can be generalized to this case.
 3.
Theorem 3.4 can probably be generalized. Bakhtin and Hurth [1] show that a PDMP is geometric ergodic in total variation despite the fact that each regime is degenerate. The proofs there only require a Hörmander type of condition on the vector fields generated by different regimes. The authors conjecture that for diffusions with random regime switching, in order to have geometric ergodicity, it suffices to have the Lie algebra generated by stochastic flows of all regimes spanning \(\mathcal {R}^d\). But this requires a completely different set of techniques, and Assumption 3.2 should be general enough to cover most applications.
Declarations
Author's contributions
XTT proposed the problems and drafted the proof. AJM proposed the problems and supervised the proof. Both authors read and approved the final manuscript.
Acknowledgements
The authors would like to thank R. van Handel for his suggestions over the presentation of this article. This research is supported by the MURI Award Grant N000141210912, where A.J.M. is the principal investigator, while X.T.T. was supported as a postdoctoral fellow. This research is also partly supported by the NUS Grant R146000226133, where X.T.T. is the principal investigator.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Bakhtin, Y., Hurth, T.: Invariant densities for dynamical systems with random switching. Nonlinearity 25(10), 2937–2952 (2012)MathSciNetView ArticleMATHGoogle Scholar
 Bakhtin, Y., Hurth, T., Mattingly, J.C.: Regularity of invariant densities for 1D systems with random switching. Nonlinearity 28(11), 3755–3787 (2015)MathSciNetView ArticleMATHGoogle Scholar
 Bardet, J.B., Guérin, H., Malrieu, F.: Long time behavior of diffusions with Markov switching. ALEA Lat. Am. J. Probab. Math. Stat. 7, 151–170 (2010)MathSciNetMATHGoogle Scholar
 Benaïm, M., Le Borgne, S., Malrieu, F., Zitt, P.A.: Quantitative ergodicity for some switched dynamical systems. Electron. Commun. Probab. 17(56), 1–14 (2012)MathSciNetMATHGoogle Scholar
 Benaïm, M., Le Borgne, S., Malrieu, F., Zitt, P.A.: Qualitative properties of certain piewise deterministic Markov process. Ann. Inst. H. Poincaré Probab. Stat. 51(3), 1040–1075 (2015)View ArticleMATHGoogle Scholar
 Buckwar, E., Riedler, M.: An exact stochastic hybrid model of excitable membranes including spatiotemporal evolution. J. Math. Biol. 63(6), 1051–1093 (2011)MathSciNetView ArticleMATHGoogle Scholar
 Cloez, B., Hairer, M.: Exponential ergodicity for Markov processes with random switching. Bernoulli 21(1), 505–536 (2015)MathSciNetView ArticleMATHGoogle Scholar
 Crommelin, D.T., VandenEijinden, E.: Subgrid scale parameterization with conditional Markov chains. J. Atmos. Sci. 65, 2661–2675 (2008)View ArticleGoogle Scholar
 Da Prato, G., Zabczyk, J.: Ergodicity for Infinite Dimensional Systems. London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge (2006)Google Scholar
 Davis, M.H.A.: Markov Models and Optimization. Monographs on Statisitics and Applied Probability, vol. 49. Chapman & Hall, London (1993)View ArticleGoogle Scholar
 de Saporta, B., Yao, J.: Tail of a linear diffusion with Markov switching. Ann. Appl. Probab. 15(1B), 992–1018 (2005)MathSciNetView ArticleMATHGoogle Scholar
 Deng, Q., Khouider, B., Majda, A.J.: The MJO in a coarseresolution GCM with a stochastic multicloud parameterization. J. Atmos. Sci. 72(1), 55–74 (2015)View ArticleGoogle Scholar
 Dudley, R.M.: Real Analysis and Probability. Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (2004)Google Scholar
 Frenkel, Y., Majda, A.J., Khouider, B.: Using the stochastic multicloud model to improve tropical convective parameterization: a paradigm example. J. Atmos. Sci. 69(3), 1080–1105 (2012)View ArticleGoogle Scholar
 Frenkel, Y., Majda, A.J., Khouider, B.: Stochastic and deterministic multicloud parameterizations for tropical convection. Clim. Dyn. 41(5–6), 1527–1551 (2013)View ArticleGoogle Scholar
 Gershgorin, B., Harlim, J., Majda, A.J.: Test models for improving filtering with model errors through stochastic parameter estimation. J. Comput. Phys. 229, 1–31 (2010)MathSciNetView ArticleMATHGoogle Scholar
 Hairer, M., Mattingly, J.C.: Ergodicity of the 2D Navier–Stokes equations with degnerate stochastic forcing. Ann. Math. 164, 993–1032 (2006)MathSciNetView ArticleMATHGoogle Scholar
 Hairer, M., Mattingly, J.C.: A theory of hypoellipticity and unique ergodicity for semilinear stochastic PDEs. Electron. Commun. Probab. 16(23), 658–738 (2011)MathSciNetView ArticleMATHGoogle Scholar
 Hairer, M., Mattingly, J.C., Scheutzow, M.: Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations. Probab. Theory Relat. Fields 149(1–2), 223–259 (2011)MathSciNetView ArticleMATHGoogle Scholar
 Jacobson, M.: Point Process Theory and Applications: Marked Point and Piecewise Deterministic Processes. Probability and Its Applications. Birkhauser, Boston (2006)Google Scholar
 Katsoulakis, M.A., Majda, A.J., Sopasakis, A.: Multiscale couplings in prototype hybrid deterministic/stochastic systems: part I, deterministic closures. Commun. Math. Sci. 2(2), 255–294 (2004)MathSciNetView ArticleMATHGoogle Scholar
 Katsoulakis, M.A., Majda, A.J., Sopasakis, A.: Multiscale couplings in prototype hybrid deterministic/stochastic systems: Part II, stochastic closures. Commun. Math. Sci. 3(3), 453–478 (2005)MathSciNetView ArticleMATHGoogle Scholar
 Katsoulakis, M.A., Majda, A.J., Sopasakis, A.: Intermittency, metastability and coarse graining for coupled deterministic–stochastic lattice systems. Nonlinearity 19, 1021–1323 (2006)MathSciNetView ArticleMATHGoogle Scholar
 Katsoulakis, M.A., Majda, A.J., Sopasakis, A.: Hybrid deterministic stochastic systems with microscopic lookahead dynamics. Commun. Math. Sci. 8(2), 409–437 (2010)MathSciNetView ArticleMATHGoogle Scholar
 Khouider, B.: A coarse grained stochastic multitype particle interacting model for tropical convection: nearest neighbour interactions. Commun. Math. Sci. 12(8), 1379–1407 (2014)MathSciNetView ArticleMATHGoogle Scholar
 Khouider, B., Majda, A.J., Katsoulakis, M.A.: Coarsegrained stochastic models for tropical convection and climate. Proc. Natl. Acad. Sci. 100(21), 11941–11946 (2003)MathSciNetView ArticleMATHGoogle Scholar
 Kunita, H.: Lectures on Stochastic Flows and Applications. Springer, Berlin (1986)MATHGoogle Scholar
 Lawley, S.D., Mattingly, J.C., Reed, M.C.: Sensitivity to switching rates in stochastically switched odes. Commun. Math. Sci. 12(7), 1343–1352 (2014)MathSciNetView ArticleMATHGoogle Scholar
 Lawley, S.D., Mattingly, J.C., Reed, M.C.: Stochastic switching in infinite dimensions with applications to random parabolic PDEs. SIAM J. Math. Anal. 47(4), 3035–3063 (2015)MathSciNetView ArticleMATHGoogle Scholar
 Lee, W., Stuart, A.M.: Derivation and analysis of simplified filters for complex dynamical systems. arXiv:1512.03647
 Majda, A.J., Franzke, C., Khouider, B.: An applied mathematics perspective on stochastic modelling for climate. Philos. Trans. R. Soc. 336(1875), 2427–2453 (2008)MathSciNetView ArticleMATHGoogle Scholar
 Majda, A.J., Harlim, J.: Filtering Complex Turbulent Systems. Cambridge University Press, Cambridge (2012)View ArticleMATHGoogle Scholar
 Majda, A.J., Khouider, B.: Stochastic and mesoscopic models for tropical convection. Proc. Natl. Acad. Sci. 99, 1123–1128 (2002)View ArticleMATHGoogle Scholar
 Majda, A.J., Lee, Y.: Conceptual dyanmical models for turbulence. Proc. Natl. Acad. Sci. 111(18), 6548–6533 (2014)MathSciNetView ArticleGoogle Scholar
 Majda, A.J., Tong, X.T.: Ergodicity of truncated stochastic Navier–Stokes with deterministic forcing and dispersion. J. Nolinear Sci. 2, 1–1 (2016). doi:https://doi.org/10.1007/s0033201693100 MathSciNetGoogle Scholar
 Majda, A.J., Tong, X.T.: Geometric ergodicity for piecewise contracting processes with applications for tropical stochastic lattice models. Commun. Pure Appl. Math. 69(6), 1110–1153 (2016)MathSciNetView ArticleMATHGoogle Scholar
 Majda, A.J., Wang, X.: Nonlinear Dynamics and Statistical Theories for Basic Geophysical Flows. Cambridge University Press, Cambridge (2006)View ArticleMATHGoogle Scholar
 Malrieu, F.: Some simple but challenging Markov processes. arXiv:1412.7516
 Mao, X., Yuan, C.: Asymptotic stability in distribution of stochastic differential equations with Markovian switching. Stochast. Process. Appl. 103, 277–291 (2003)MathSciNetView ArticleMATHGoogle Scholar
 Mattingly, J.C., Stuart, A.M., Higham, D.J.: Ergodicity for SDEs and approximations: locally lipschitz vector fields and degenerate noise. Stoch. Process. Their Appl. 101(2), 185–232 (2002)MathSciNetView ArticleMATHGoogle Scholar
 Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. Springer, Berlin (1993)View ArticleMATHGoogle Scholar
 Norris, J.R.: Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (1997)Google Scholar
 Pakdaman, K., Thieullen, M., Wainrib, G.: Asymptotic expansion and central limit theorem for multiscale piecewisedeterministic markov processes. Stochast. Process. Appl. 122, 2292–2318 (2012)MathSciNetView ArticleMATHGoogle Scholar
 Pavliotis, G.A., Stuart, A.M.: Multiscale Methods: Averaging and Homogenization, Volume 53 of Texts in Applied Mathematics. Springer, Berlin (2008)Google Scholar
 Schutte, C., Walter, J., Hartmann, C., Huisinga, W.: An averaging principle for fast degrees of freedom exhibiting longterm correlations. Multiscale Model. Simul. 2(3), 501–526 (2004)MathSciNetView ArticleMATHGoogle Scholar
 Shao, J.: Ergodicity of regimeswitching diffusions in Wasserstein distances. Stochast. Process. Appl. 125, 739–758 (2015)MathSciNetView ArticleMATHGoogle Scholar
 Tavaré, S. and Zeitouni, O.: Lectures on Probability Theory and Statistics, Ecole d’Eté de Probabilités de SaintFlour XXXI2001, volume 1837 of Lecture Notes in Mathematics. Springer, Berlin (2004)Google Scholar
 Thual, S., Majda, A.J., Stechmann, S.: Assymetrical intraseasonal events in the stochastic skeleton MJO model with seasonal cycle. Clim. Dyn. (2014) (accepted)Google Scholar
 Thual, S., Majda, A.J., Stechmann, S.: A stochastic skeleton model for MJO. J. Atmos. Sci. 71, 697–715 (2014)View ArticleGoogle Scholar