Channels with Action Dependent States and Additional Private Messages

In channels with action dependent states, a message is conveyed using two encoders operating sequentially, viz. an action encoder and a channel encoder. The actions drive the output of a discrete-memoryless channel (DMC), which in turn forms the state process for the DMC between the channel encoder and receiver. Assuming non-causal knowledge of the state-process at the channel encoder, a single letter characterization of the capacity is known in the discrete memoryless case. We consider the action dependent state channel where an additional private message needs to be communicated by the channel encoder. In addition, we consider a common reconstruction (CR) of the state at the channel encoder and decoder. Capacity characterizations for the discrete memoryless and Gaussian versions are presented. As a consequence, we settle the capacity characterization of the Gaussian action dependent channel with only a common message and CR. We further show that the availability of strictly causal state feedback to the action encoder, even with randomization allowed, does not improve the capacity of the action dependent state channel.


I. INTRODUCTION
The problem of coding for channels with state was introduced in the seminal paper by Shannon [1], wherein causal state information was assumed at the encoder. The capacity for the non-causal setting was established by Gelfand and Pinsker [2]. In these models, the state process is assumed to be given by nature. Motivated by applications involving multi-stage encoding (for instance, two-stage recording on a magnetic storage device), Weissman [3] introduced the notion of a channel with action dependent states (ADSC). In this setting, the transmitter can take actions that influence the formation of channel states in the first stage, and the encoding in the second stage is based upon the channel state sequence so generated and the message. Notice that the actions play a dual role of message communication as well as controlling the channel states. The capacity of this model was derived in [3].
Following [3], the action-dependent framework has been extended in several directions. Permuter et al. [4] studied the source coding dual in which the decoder can take actions based on the observed compression index that influence the formation of side information. Asnani et al. [5] considered a setting in which the encoder as well as the decoder can take probing actions to learn the channel state, with a cost constraint associated with each. Choudhuri et al. [6] considered causal state communication over an action-dependent channel and characterized the trade-off between message communication and state estimation distortion. Ahmadi et al. [7] studied action-dependent channels with the additional constraint of common reconstructions (CR) [8] of the state at the encoder and decoder. Recently, Kittichokechai et al. [9] studied source and channel coding settings with action-dependent states and CR constraints, wherein the actions control the partial channel state information available at the encoder and decoder. The action dependent model has been extended to multi-user channels as well-for instance see Steinberg et al. [10] and Steinberg [11].
Notice that single letter capacity characterizations for the discrete-memoryless action dependent state channel (DMADSC) as well as the DMADSC with CR constraints were presented in [3] and [7] respectively. However, the Gaussian counterparts are listed therein as open problems. Another interesting question listed in [3] is whether strictly causal state-information at the action encoder increases the capacity. In this work, we consider a more general model where the channel encoder has an additional private message stream along with CR constraints. Our contributions are summarized below.
• The capacity region for the ADSC with additional private message and CR is derived for both the discrete-memoryless and Gaussian versions. • We prove the optimality of Gaussian auxiliaries for the Gaussian action dependent model with CR constraints [7]. • We show that the capacity of an ADSC is unchanged by strictly causal state feedback to the action encoder, even if the encoders are allowed to randomize, thereby settling a question left open in [3]. We note that our problem framework without common reconstruction constraints is somewhat similar to the co-operative multiple access model studied by Zaidi et al. [12] [13]. The setting in [13] consists of a state dependent multiple access channel with degraded message sets, wherein the encoder sending only the common message (non-cognizant encoder) observes the state strictly causally, while the other encoder observes the state non-causally. The capacity region was derived for both the discrete memoryless as well as Gaussian versions, with the observation that strictly causal state information can enlarge the capacity region compared to the case of no state observation at the non-cognizant encoder.
Though the action encoder in our setting can be viewed to be playing the role of the non-cognizant encoder, the fundamental difference between the two settings is that the state S n in [13] is generated by nature IID, while it is the output of a DMC fed with the message dependent action sequence in our case. An equivalence can be established between the two settings if and only if p(s|a) = p(s), i.e. the state sequence is independent of the action sequence. As such, our conclusion that strictly causal state feedback does not increase the capacity for the common message case is consistent with the observation that the common message capacity in [13] is independent of strictly causal state observation at the non-cognizant encoder-see Corollary 2 and Remark 5 in [13]. Nevertheless, the capacity region in the Gaussian case without common reconstructions is the same as that established in Theorem 4 and Corollary 3 of [13]. The novelty in our model is that we are interested in common reconstruction of the state process in the action-dependent setting with additional messages, which also settles the optimality of Gaussian auxiliaries for the model studied in [7].
Organization: We introduce the system model and main results in Section II. Section III contains the proof of our main result. The extension to the Gaussian case is given in Section IV, which also deals with the Gaussian action dependent channel with common reconstruction constraints as well as reversible input constraints. The generalization of strictly causal state information with randomization at the action encoder is discussed in Section V. Concluding remarks are given in Section VI.
II. SYSTEM MODEL Consider the model shown in Fig. 1. There are two encoders, namely an action encoder E act and a channel encoder E chan . A common message W c is observed by both E act and E chan . The action encoder chooses the output symbol a ∈ A which is fed to a DMC p(s|a). The output of this DMC forms the state sequence of a channel with input x ∈ X , output y ∈ Y and transition probability p(y|x, s). The encoder E chan has non-causal access to the state sequence S n . E chan also needs to convey a private message W 1 . We term this model as DMADSC with additional private message. More generally, one can consider common reconstruction (CR) of the state process [8], wherein E chan and the decoder must agree on a reconstruction of the state viz.Ŝ n . We call this model as DMADSC with CR. The inputs have to satisfy an average cost constraint defined by a vector function γ : A × X → [0, ∞) 2 , where the cost function for sequences is Fig. 1. Action-Dependent Channel with additional private message, and with/without common reconstructions defined as γ(a n , x n ) = 1 n n i=1 γ(a i , x i ). Define a single letter distortion measure d : S ×Ŝ → [0, ∞) for state reconstruction, where the distortion between sequences is defined as d(s n ,ŝ n ) = 1 n n i=1 d(s i ,ŝ i ). We assume that the distortion measure is bounded and let D max = max s∈S,ŝ∈Ŝ d(s,ŝ). We also assume that all the alphabets |A|, |S|, |X |, |Y|, |Ŝ| are finite.
For the DMADSC with additional private message, an (n, R 1 , R c , Γ, ) scheme consists of two encoder maps: 2 nRc ] such that for independent and uniformly distributed choices of (W c , W 1 ), we have (where γ k and Γ k represent the kth coordinates of γ and Γ.) For the DMADSC with CR, we additionally define a sender quantization map φ : S n × {1, · · · , 2 nRc } × {1, · · · , 2 nR 1 } →Ŝ n and a decoder reconstruction map ψ 2 (Y n ) : Y n →Ŝ n s.t.
We say that a tuple (R 1 , R c , D, Γ) is achievable if an (n, R 1 , R c , D, Γ, ) coding scheme exists for every > 0 for sufficiently large n. Let C dmadsc CR be the collection of all achievable (R 1 , R c , D, Γ) tuples. Our main result is stated now.
Theorem 2. For the DMADSC (p(s|a), p(y|x, s)) with CR, C dmadsc CR is the closure of the union of all where the union is over distributions of the form p(a)p(s|a)p(u|a, s)p(x|u, s)p(y|x, s) for which there exists a map φ : Remark 3. For the special case of no CR (i.e. D ≥ D max ) and R c = 0 (W c = φ), the private message capacity is which corresponds to selecting the action sequence that leads to the maximum Gelfand-Pinsker rate. On the other hand, when D ≥ D max and R 1 = 0 (W 1 = φ), the common message capacity is R c ≤ max p(a), p(u|a,s), p(x|u,s) which is nothing but the characterization in [3, Theorem 1].
III. PROOF OF THEOREM 2 Proof of Achievability: The achievability is proven using a combination of Gelfand-Pinsker (GP) coding and superposition coding. Here, an action codebook is built first based on the common message. Then for each a n (action) sequence, a conditional GP codebook U shall be generated according to the private message. The details are given in Appendix A. Proof of Converse: By Fano's inequality, we can write: We suppress the n n terms in the sequel. We can write the following chain of inequalities for the private rate: where (a) follows since W 1 ⊥ ⊥ W c , (b) follows sinceŜ n is a deterministic function of (W 1 , W c , S n ), (c) follows from Fano's inequality and since W c determines A n , (d) follows since ), (f) follows from the Csiszár sum lemma and (g) follows with a choice of Now the proof is completed by replacing (Q, U Q ) = U , Y Q = Y , S Q = S, X Q = X and A Q = A and noting that the Markov conditions U → (X, S) → Y and A → (U, S) → X hold. For the sum rate, consider the following chain of inequalities: where (a) follows sinceŜ n is a deterministic function of (W 1 , W c , S n ) and since W c determines A n , (b) follows from Fano's inequality, (c) follows since , (e) follows from the Csiszár sum lemma and (f) follows since U i = (Ŝ n , W 1 , W c , Y i−1 , S n i+1 , A n ). Now by introducing a time-sharing RV Q, as in (9), we have (11) For the bound on input costs, we proceed as follows: where the last step follows from the fact that for any successful (n, R 1 , R c , D, Γ, ) scheme, we have from (2) We next consider the bound on the distortion. LetŜ n d be the receiver's reconstruction of the state sequence.

IV. GAUSSIAN SETTING The Gaussian action-dependent state channel [3] (GADSC) is given by
with W ∼ N (0, σ 2 W ) and Z ∼ N (0, σ 2 Z ), W ⊥ ⊥ Z, and A and X being constrained in average power to P A and P X respectively. The state itself is given by S = A + W , with A being independent of W . We have the following theorem.
Theorem 4. For the GADSC with CR constraints, the capacity region is achieved by appropriate jointly Gaussian choices of p(a), p(u, x|a, s) in Theorem 2. Specifically, the capacity region C gadsc CR is given by where the union is over ρ 1 , ρ 2 ∈ [−1, 1] satisfying ρ 2 1 + ρ 2 2 ≤ 1 and D ≥ , for the rate functions (R priv , R sum ) as in (19), (20) and α as in (23).
Proof. We will prove this theorem using the single-letter expression in Theorem 2. While Theorem 2 was shown for the discrete memoryless case, it does hold for the Gaussian case as well. The achievability of Theorem 2 for the Gaussian case follows from an application of the discretization procedure [14, Sec. 3.4.1]. But note that the converse for the discrete memoryless case assumed finite |Ŝ| (see (7), i.e. Fano's inequality forŜ n ). However, without loss of generality, we can restrict attention to an exponential number of agreed reconstructions, i.e. |Ŝ n | = O(2 nc ), where c is a constant, as explained next.

Remark 5.
It can be shown that if there exists a scheme without any cardinality bounds on the reproductions which achieves (R 1 , R c , D), then for any δ > 0, (R 1 , R c , D+δ) can be achieved using a scheme where the reproductions are confined to lie in an alphabet of size which scales as 2 n·c(δ) . This can be proven using, for instance, a scalar quantizer whose average distortion is δ and whose rate scales with n as 2 n·c(δ) . Thus, the converse of Theorem 2 also applies for the Gaussian case. (20) We now prove the rest of the converse starting with expressions (5) and (6) from Section II: Similarly, for the sum rate where the variance terms in (16) and (17) are defined as Thus it can be seen that, via the differential entropy maximizing property of Gaussian random variables for a given variance, the optimal auxiliary U must be jointly Gaussian with (A, S, X, Y ). This completes the proof of converse. For the achievability, we choose p(a), p(u, x|a, s) as follows. The action input is chosen as A ∼ N (0, P A ). We take the channel input to be where ρ 1 and ρ 2 satisfy ρ 2 1 + ρ 2 2 ≤ 1, and G ∼ N (0, (1 − ρ 2 1 − ρ 2 2 )P X ) is independent of (A, W, Z). We choose the auxiliary random variable as follows: where δ = −1/ ρ 1 P X P A and the coefficient α is chosen as (23) 8 The common reconstruction constraint is as follows: Now on evaluating the terms in (16), (17) with the above jointly Gaussian choices, we arrive at the rate constraints in (19), (20).

Remark 6.
For the GADSC with only a common message and CR constraints considered in [7], the capacity characterization can be obtained by setting W 1 = φ i.e. R 1 = 0 in Theorem 4, which is given by (24). Note that [7] only gave an achievable region, while its optimality was not established therein.
Remark 7. On further setting D ≥ σ 2 W , i.e. the case of no CR constraints, the characterization simplifies to C = max which is a complete characterization of the Gaussian action dependent channel. This can also be inferred from the results in Zaidi et al. [12] [13, Theorem 4].

A. Action-Dependent Channels with Reversible inputs
We consider a GADSC with reversible inputs, i.e. with an additional constraint that the input to the channel must be reconstructed reliably at the receiver, in addition to the message. [9] derived the capacity of this model in the discrete case to be C = max I(A, X; Y ) − I(X; S|A), (26) where the joint pmf is of the form p(a)p(s|a)p(x|a, s)p(y|x, s) and the maximization is over all p(a) and p(x|a, s) such that 0 ≤ I(X; Y |A) − I(X; S|A). In the following, we establish the capacity characterization in the Gaussian setting. The converse can be proved from the single letter characterization due to [9]: Thus it can be seen that via the max-entropy principle that (A, S, X, Y ) must be jointly Gaussian. This completes the proof of converse. For the achievability, we now choose A ∼ N (0, P A ) and p(x|a, s) as in (21) and compute (27), (28). The final characterization evaluates to (29), (30).
V. STRICTLY CAUSAL STATE WITH COMMON RANDOMNESS AT THE ACTION ENCODER In [3], the question of whether strictly causal state feedback to the action encoder can increase capacity was left open. In other words, the problem can be stated as whether actions of the form A i (M, S i−1 ) help compared to A n (M ). We additionally assume a more general encoding mechanism in which an unlimited amount of common randomness is allowed between the channel and action encoders, in the form of a real-valued RV Θ uniformly distributed on [0, 1]. We see that such state feedback does not help, and the capacity still remains In other words, the optimal strategy is the one which ignores the extra information S i−1 . The converse can be proved by a slightly different choice of auxiliary RV in [3], as shown next.

VI. CONCLUSION
We studied a generalization of the action-dependent channel with multiple messages and CR constraints, and derived complete characterizations for both discrete memoryless as well as Gaussian models. As a result, we obtained the capacity for a Gaussian action dependent model with only common message and CR. Furthermore, we proved that the capacity of the action-dependent model remains unchanged even if the actions are randomly drawn based on past channel states and the message.
APPENDIX A ACHIEVABILITY PROOF FOR THEOREM 2 Codebook Generation: Fix the p.m.f. p(a)p(u|a, s)p(x|u, s). Randomly and independently generate 2 nRc sequences a n (w c ), w c ∈ [1 : 2 nRc ] i.i.d. according to n i=1 p A (a i ). For each a n (w c ) sequence, randomly and conditionally independently generate 2 n(R 1 +R ) sequences u n (w c , w 1 , j) for w 1 ∈ [1 : 2 nR 1 ] and j ∈ [1 : 2 nR ], i.i.d. according to n i=1 p U |A (u i |a i (w c )). Let B U (w 1 ) be the set of sequences within the bin indexed by w 1 ∈ [1 : 2 nR 1 ] in the ucodebook. Encoding: Fix > 0. Given w c ∈ [1 : 2 nRc ], the action encoder selects the corresponding sequence a n (w c ) in the A codebook. The state sequence S n is generated in response to the action sequence via the channel p(s|a). Given w 1 ∈ [1 : 2 nR 1 ], the channel encoder picks the least index j such that (a n (w c ), u n (w c , w 1 , j), s n ) ∈ T n (U, A, S). An error is declared if no such index is found. The channel encoder then draws x n i.i.d. conditionally given (u n , s n ) according to n i=1 p(x i |u i , s i ), and sends it. The channel encoder also generates its state reconstructionŜ i = φ(u i ), i ∈ [1 : n]. Decoding: Let > . We use simultaneous decoding. The decoder declares that (ŵ c ,ŵ 1 ) is sent if it is the unique message pair such that (a n (ŵ c ), u n (ŵ c ,ŵ 1 ,ĵ), y n ) ∈ T n (U, A, Y ) for someĵ ∈ B U (ŵ 1 ). An error is declared otherwise. Error Analysis: Assume without loss of generality that the messages W 1 = 1 and W c = 1 were sent, and the index of the chosen U n sequence is J. The encoding error events are E 1 = {(A n (1), U n (1, 1, j), S n ) / ∈ T n ∀ j ∈ [1 : 2 nR ]}.
But if (x n , a n ) ∈ T n , then we have γ(x n , a n ) ≤ Γ by the typical average lemma [14]. Distortion Analysis: The mentioned distortion can be obtained by making the estimate on a per-letter basis. Since φ(U ) satisfies the distortion constraint, it follows from the random codebook construction that as n → ∞, we have Also, since both the encoder as well as the decoder (assuming correct decoding) can generateŜ i = φ(U i ), i ∈ [1 : n], the common reconstruction constraint is met.