Spectral Unmixing With Perturbed Endmembers

We consider the problem of supervised spectral unmixing with a fully-perturbed linear mixture model where the given endmembers, as well as the observations of the spectral image, are subject to perturbation due to noise, error, or model mismatch. We calculate the Fisher information matrix and the Cramer–Rao lower bound associated with the estimation of the abundance matrix in the considered fully-perturbed linear spectral unmixing problem. We develop an algorithm for estimating the abundance matrix by minimizing a constrained and regularized maximum-log-likelihood objective function using the block coordinate-descend iterations and the alternating direction method of multipliers. We analyze the convergence of the proposed algorithm theoretically and perform simulations with real hyperspectral image data sets to evaluate its performance. The simulation results corroborate the efficacy of the proposed algorithm in mitigating the adverse effects of perturbation in the endmembers.


I. INTRODUCTION
S PECTRAL unmixing is an important way of extracting knowledge from spectral image data.Each pixel of a multiband spectral image comprises the measurements related to several parts (bands) of the electromagnetic spectrum.The imaged area corresponding to each pixel often contains multiple materials that have distinct spectral signatures.The popular linear mixture model suggests that the spectral measurements (spectrum) of each pixel is a possibly noisy observation of a linear combination of the spectra of the materials present in the scene captured by the spectral image.These materials are considered to have unique spectral signatures called endmembers.The weight in the linear combination representing the contribution of an endmember to the spectrum of a pixel is called the abundance of that endmember in the pixel [1]- [7].
Since, in practice, the mixing is realized with actual photons, the abundances are expected to be nonnegative.In addition, if one assumes that the considered linear mixture model accounts for all endmembers as well as the composition of the spectra of all pixels, it is reasonable to expect the abundances of each pixel add up to one.However, this assumption may not strictly hold when there is nonlinearity in the mixture process or variability in the endmembers [8], [9].
Linear spectral unmixing is the practice of inferring the abundance values for all pixels through adopting the linear mixture model and exploiting prior knowledge or assumptions that ensure the feasibility and enhance the accuracy of the inference.If the endmembers are not known beforehand, the R. Arablouei is with the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Pullenvale QLD 4069, Australia (email: reza.arablouei@csiro.au).process of spectral unmixing is a blind (unsupervised) source separation task and involves the inference of both endmembers and abundances.The extraction of the endmembers and the estimation of the abundances can be done simultaneously or at separate stages [10]- [20].
When an over-complete set of candidate endmembers, such as a library of spectral signatures, is available, semi-supervised spectral unmixing algorithms based on spares regression can yield both the endmembers and the abundances.Usually the number of available signatures in spectral libraries is large but the number of endmembers present in a typical spectral image is comparatively small.Therefore, linear sparse regression techniques can efficiently determine the endmembers as an optimal subset of the given spectral library as well as estimating the abundances by employing appropriate sparsityinducing regularizers [21]- [27].
If the endmembers are already known or identified in a previous processing stage, the spectral unmixing is an instance of supervised source separation and can be cast as a constrained linear regression problem.In [28], a constrained leastsquares algorithm for supervised spectral unmixing is proposed that generalizes the active-set-based nonnegative least-squares algorithm popularized by [29] via augmenting the underlying system of linear equations to incorporate the abundance sumto-one constraint.In [30], the spectral unmixing problem is treated in the Bayesian estimation context and the prior distribution for the abundances is specified to satisfy the abundance nonnegativity and sum-to-one constraints.Then, an algorithm based on the Markov-chain Monte Carlo method is developed to estimate the abundances by drawing samples from the associated posterior distribution.In [31], the underlying constrained least-squares problem is solved using a primaldual interior-point method claimed to be amenable to parallel computing.In [32], Dykstra's algorithm for finding projections onto the intersection of convex sets is employed to estimate the abundances, which are required to satisfy the nonnegativity and sum-to-one constraints.
The above-mentioned supervised spectral unmixing algorithms assume that the endmembers are known exactly.However, in practice, the endmembers are extracted from real spectral images that may be contaminated by measurement noise or error.In addition, the available knowledge of the endmembers might not exactly match the true endmembers of the spectral image at hand since the spectral signature of the same material may be slightly altered in different images or distinct but confusingly similar spectral signatures may be mixed up.Unaccounted nonlinear mixing phenomena or unsuspected variability in the endmembers throughout the same image can also be sources of perturbation in the endmembers.
In this paper, we consider a supervised linear spectral unmixing problem where both the spectral image data and the available knowledge of the endmembers are subject to perturbation.We calculate the corresponding (un)constrained Fisher information matrix (FIM) and Cramer-Rao lower bound (CRLB) for the estimation of the abundance matrix based on the considered model.In order to account for the perturbations on the observations of both spectral image and endmembers, we formulate a constrained weighted maximum-log-likelihood objective function and regularize it using two penalty terms, i.e., the vector total-variation of the abundance matrix and the sum-squared-distance of the endmembers.We develop an algorithm for solving the formulated optimization problem by utilizing the block coordinate-descent (BCD) iterations [33]- [35] together with the alternating direction method of multipliers (ADMM) [36]- [40].We examine the convergence properties of the proposed algorithm theoretically and evaluate its unmixing performance through simulation experiments with three real hyperspectral image datasets.The simulation results show that the proposed algorithm copes with the endmember perturbations effectively and yields a significantly more accurate estimation of the abundance matrix compared with the conventional constrained least-squares algorithms that are oblivious to any perturbation in the endmembers.

II. DATA MODEL
Let us denote a multiband spectral image that is made of N pixels and L spectral bands in the matrix form as X ∈ R L×N .There are K linearly-independent endmembers 1 associated with this image that are arranged as the columns of the endmember matrix E ∈ R L×K .According to the linear mixture model, each column of X representing the spectral values of the corresponding pixel can be written as a linear (conical) combination of the endmembers.Therefore, X can be factorized as where the columns of A ∈ R K×N contain the fractional abundances of the endmembers for all pixels.In supervised spectral unmixing, we are primarily interested in estimating the abundance matrix A given the observations/measurements of X and the knowledge of E extracted from the available data or taken from a library of candidate endmembers based on some prior information.
In practice, multiband spectral images are collected via spectral imaging devices that are prone to measurement noise or error.The linear mixture model (1) is often not perfect either.Therefore, we presume to have access to a perturbed observation of the original spectral image X that is denoted by X and expressed as where P X ∈ R L×N is the perturbation matrix factoring in measurement noise/error as well as possible shortfall of the linear mixture model.In addition, we consider that instead of 1 Technically, the linear mixture model implicates that the endmembers are affinely independent.However, since the number of endmembers K is usually much smaller that the number of spectral bands L, it is often safe to assume that the endmembers are linearly independent.
the exact values of the endmember matrix E, we have access to a perturbed version of it denoted by Ẽ and related to E as where P E ∈ R L×K represents the endmember perturbation.The equations ( 2) and (3) make up our fully-perturbed linear mixture model.We assume that the perturbation matrices P X and P E are statistically independent of each other and have matrix normal distributions such that where 0 L×N is an L × N matrix with all zero entries, I L is the L × L identity matrix, and Σ ∈ R L×L and Ξ ∈ R L×L are diagonal matrices with positive diagonal entries denoting the respective row-covariance matrices for P X and P E .Note that we set the column-covariance matrices to identity to reflect the realistic assumption that the perturbations pertaining to different pixels or endmembers are independent and identically distributed.However, the diagonal row-covariance matrices imply that the perturbations are independent of each other in the spectral domain but may have different variances at different bands.
It is worth mentioning that the aforesaid fully-perturbed linear mixture model is essentially different from the ones recently proposed in the literature that have been particularly designed to cope with endmember variability or nonlinear mixing effects such as those of [41]- [43].The perturbed linear mixing model utilized in [41] explicitly accounts for the spatial and spectral variability of the endmembers by explaining the endmember variability in each pixel through perturbations added to the reference endmembers.In [42], an extended linear mixing model that allows for pixelwise spatially-coherent local variations of the endmembers is adopted.A special case of this model stipulates that each pixel is a linear combination of scaled versions of the reference endmembers where the scaling factors may differ among pixels.The model assumed in [43] is an augmentation of the linear mixture model with an additive residual term that is meant to incorporate the effects of endmember variability, nonlinear mixing, mismodeling, and outliers.Therefore, in general, the mixture models used in [41]- [43] and similar works assume that the effective endmembers can vary from pixel to pixel, hence the perturbations of the reference endmembers can be different at each pixel.However, in our model, i.e., (2) and (3), we assume that the mixing occurs linearly and the endmembers are the same for all pixels while we have access only to a single perturbed version of the endmember matrix.

III. PROBLEM
In view of (2)-(5), we have Thus, the probability density functions of X and Ẽ parameterized over the unknown matrix parameters E and A are written as where • F denotes the Frobenius norm and ∝ means proportional up to a constant that does not depend on E, A, or their observations.Therefore, we define a weighted likelihood function [44] of the unknowns E and A given their observations as where α is a positive weight parameter.Hence, the corresponding weighted log-likelihood function is given by Remark 1: The weighted log-likelihood function ( 6) is equivalent to the canonical log-likelihood function when α = 1.We consider this weighted form of the log-likelihood function since it has the advantage of being viewed as an approximation of the genuine log-likelihood function when there is correlation between P X and P E .In such cases, an appropriate value of α can help achieve a balance between the contributions and effects of the two summands of the loglikelihood function.

A. Cramer-Rao Lower Bound
In order to gain insights into the theoretical performance limits given the above-mentioned data model and the weighted log-likelihood function (6), we derive the CRLB for the estimation of A while treating E as a nuisance matrix parameter.
For the convenience of the notation, we define where vec {•} is the vectorization operator that stacks the columns of its matrix argument on top of each other.The gradients of the function l (E, A) with respect to its arguments are calculated as It is easy to verify the following expectations which indicate the satisfaction of the relevant regularity conditions [45].Hence, the corresponding Fisher information matrix (FIM) is expressed as In Appendix A, we calculate the partial derivatives in (7) and show that where ⊗ denotes the Kronecker product and P L,K and P K,L are the vec-permutation matrices of appropriate dimensions as explained in Appendix A.
We show in Appendix B that F is non-singular and invertible.Hence, according to the block matrix inversion formula [46], the KN × KN lower-right block of F −1 is the inverse of the Schur complement of F 11 , i.e., Therefore, the CRLB on the variance or mean square-error (MSE) of any unbiased estimator of A, denoted by Â, is given by [45] where tr {•} is the matrix trace operator.We show in Appendix C that when the row-covariance matrix of perturbation of the endmembers is a scalar multiple of the row-covariance matrix of the perturbation of the observed spectral image, i.e., Ξ = ηΣ with η > 0, (9) becomes The FIM is regular (non-singular) thanks to the additive term I K ⊗ αΞ −1 with α > 0 in its upper-left block (see Appendix B), which in turn is due to the second additive term of the log-likelihood function [see (6)].This makes the abundance matrix A identifiable even with no constraint on A or E. Indeed, the second summand on the right-hand side of (6) serves to eliminate the identifiability ambiguities inherent to the first summand, which, if considered alone, entails a weighted nonnegative matrix factorization (NMF) problem [47].
Remark 3: When the perfect knowledge of the endmembers is available, the FIM reduces to Therefore, one can argue that the perturbation of the endmembers incurs an information loss regarding the estimation of A quantifiable by F 21 F −1 11 F 12 and a consequent increase in the CRLB.When Ξ = ηΣ, as shown in Appendix C, this information loss can be expressed as A and, as seen from ( 10) and ( 11), the increase in the CRLB is by a factor of N + η α A 2 F .Notice that a carefully selected non-unity value for α can help curb the information loss and the resultant increase in the CRLB incurred by the perturbation in the endmembers, although at the cost of model inaccuracy and increased estimation bias.

B. Constrained Cramer-Rao Lower Bound
It is shown in [48]- [50] that the nonnegativity (or any other inequality) constraint on the unknown parameter (in our case, A) does not affect the CRLB.However, equality constraints such as the sum-to-one constraint on the columns of A can affect the CRLB.At the presence of equality constraints, the full-rank FIM corresponding to the unconstrained case needs to be substituted by a reduced-rank version that is obtained by projecting the full-rank FIM onto the tangent hyperplane of the constraint set [48].
We can express the sum-to-one constraint on the columns of A as where 1 Q denotes a Q × 1 vector with all entries being unity.The Jacobian of the constraint function c (A) is computed as [49], the CRLB of MSE for estimation of A under the constraint ( 12) is given by N is a matrix whose columns form an orthonormal basis for the nullspace of the constraint Jacobian matrix J.It is clear from (13) that the constrained CRLB is invariant to any automorphism on H. Therefore, H only requires to have linearly-independent columns that span the nullspace of J. Hence, we set H = Y ⊗ I N where the ith column of Y ∈ R K×(K−1) , denoted by y i , is constructed as so that Y spans the orthogonal complement of 1 T K .We show in Appendix C that, when Ξ = ηΣ, (13) turns into

C. Constrained Maximum-Likelihood Estimation
The constrained maximum-likelihood estimates of A and E can be found by maximizing the weighted log-likelihood function (6) with respect to its arguments while constraining the entries of A to be nonnegative or additionally the columns of A to add up to unity.As the knowledge of the perturbation covariance matrices Σ and Ξ is often unavailable, we drop them from the weighted log-likelihood objective function by setting them to identity and rely on the parameter α to adjust for their possible disparity.Thus, the resulting optimization problem associated with the constrained maximum-likelihood estimation of A (and E as a nuisance or an auxiliary matrix variable) is where S is the set of all K × N matrices that have entries in the range [0, 1].If the sum-to-one constraint on the columns of A is also desired, S will be the standard (K − 1)-simplex.
The nonnegativity constraint is required to obtain physically plausible abundance values.We include the abundance sumto-one constraint as an option and not a necessity as, in some scenarios, it may not be practically justifiable due to issues such as endmember variability [1].We do not impose any constraint on E since, in this paper, we are only interested in the estimation of A and treat E in (15) and the subsequent related optimization problems as an auxiliary matrix variable.In addition, in our numerical experiments, we did not find any noticeable benefit in constraining the entries of E to be nonnegative, specifically, for low to moderate levels of perturbation in the endmembers.

D. Penalties
In addition to the above-mentioned constraints on the abundance matrix A, we add two penalty terms to the objective function in (15).The first term is an isotropic vector totalvariation penalty [51]- [53], defined as where • 2,1 is the 2,1 -norm operator that returns the sum of 2 -norms of all the columns of its matrix argument, and D h and D v are discrete differential matrix operators that apply independently on all abundance bands and, respectively, yield the horizontal and vertical first-order backward differences (gradients) of the abundance bands in the spatial domain.
Natural images are known to mostly consist of piecewise homogeneous regions with few discontinuities and abrupt changes at object boundaries or edges.The vector totalvariation regularization allows us to take advantage of this prior knowledge by promoting sparsity in the spatial gradient of the abundance bands, i.e., local differences between the values for adjacent pixels, while encouraging the local differences to be spatially aligned across different bands [53].
The second penalty term that we add to the objective function is the so-called sum-squared-distance of endmembers, defined as This term is the sum of squared distances between all the endmembers, which are considered to be the vertices of the simplex that circumscribes the spectral data of all pixels in the image.It was originally proposed in [11] within the context of blind hyperspectral unmixing as a computationallyefficient surrogate for the volume of the simplex formed by the endmembers and has since been reproposed through various interpretations, e.g., in [12].Adding this penalty term to the objective function can help the estimates of the endmember matrix stay close to the true value hence aid in estimating a more accurate abundance matrix.Consequently, in order to develop an effective algorithm for unmixing multiband spectral images based on the fullyperturbed linear mixture model, we propose to solve the following optimization problem that incorporates the above constraints and penalty terms min where β > 0 and γ > 0 are regularization parameters that together with α govern the trade-offs arising from assigning different weights to various terms of the objective function.

IV. ALGORITHM A. Block Coordinate-Descent Iterations
We use the block coordinate-descent (BCD) algorithm to solve (16) with respect to E and A alternatively and in an iterative fashion.Therefore, we repeat the following alternating minimizations until the iterates converge Here, a variable with the superscript (n) denotes the estimate of its respective parameter at the nth iteration.
The minimization with respect to E in ( 17) is straightforward and results in where we use the fact that the matrix W is symmetric and idempotent, i.e., WW T = W.

B. Alternating Direction Method of Multipliers
For the minimization with respect to A in (18), we define a set of auxiliary matrix variables, called K×N , and apply variable splitting to break the problem into smaller pieces.Therefore, we find an estimate of A (n) by solving the following problem ) where ı S (•) is the indicator function that takes the value of zero when its matrix argument is in the set S and the value of infinity otherwise.
The augmented Lagrangian function for the constrained optimization problem ( 19) is given by where G 1 , G 2 , G 3 , G 4 ∈ R K×N are the scaled Lagrange multipliers and µ ≥ 0 and λ ≥ 0 are the penalty parameters.We use the alternating direction method of multipliers (ADMM) to minimize the augmented Lagrangian function (20) in an iterative fashion.At each iteration, we alternate the minimization with respect to the main unknown variable A and the auxiliary variables; then, we update the scaled Lagrange multipliers.Hence, we compute the iterates as .
Note that the above ADMM iterations are sub-iterations within the BCD iterations of ( 17) and (18).That is why we use m as the index for the ADMM iterations to differentiate them from the BCD iterations indexed by n.At the end of the ADMM iterations within every BCD iteration, we set the value of A (n) equal to the value of A (n,m) for the last ADMM iteration.
The solution of ( 21) is In order to improve the estimation performance, we replace E (n) in ( 23) with the perturbed endmember matrix Ẽ and change (23) to The method of instrumental variable (IV) [54]- [57] has inspired us to make this modification and take advantage of the available, albeit perturbed, knowledge of the endmember matrix, i.e., Ẽ, in the estimation of the abundance matrix.We expect this to be particularly beneficial when the perturbation in the endmember matrix, i.e., P E , is not too large to cause a substantial distortion.In Appendix E, we sketch the reason behind the expected benefit of using Ẽ as an IV in (24).Our simulation results presented in Section VI ahead also attest to the performance advantages of employing this IV technique.Remark 4: We do not apply the IV method in the orthodox manner but the way we utilize it is similar to solving (21) while replacing E (n) with Ẽ in (19), then using E (n) as an IV and calculating A (n,m) as an IV estimate rather than the ordinary least-squares solution.In that sense, E (n) is a good IV since it is highly dependant on Ẽ (as Ẽ is the perturbed version of the true endmember matrix E and E (n) is an estimate of E) but is not independently correlated with X, i.e., it is not correlated with P X , the perturbation in X [54].
The optimization in (22) can be decoupled with respect to V 1 , V 2 , V 3 , and V 4 , i.e., (22) can be written as The solution of (25) can be achieved through cyclic optimizations with respect to V 1 and V 2 , V 3 until the iterates converge.However, since an exact solution of (22) is not necessary at each iteration of the ADMM, we perform these optimizations only once at each ADMM iteration.Therefore, we replace (25) with The solution of ( 27) is In addition, the problem ( 28) can be decomposed with respect to the columns of V 3 .Then, each subproblem can be solved by making use of the Moreau proximity operator of the 2,1 -norm calculated via column-wise vector-softthresholding [58], [59].Therefore, by defining , the jth columns of V Finally, the solution of ( 26) is the projection of the term If S is the set of all K × N matrices that have entries in the range [0, 1] (associated with the abundance nonnegativity constraint), the projection onto it can be realized by setting the negative entries to zero and the entries greater than one to one.On the other hand, if S is the standard (K − 1)-simplex (associated with the abundance nonnegativity and sum-to-one constraints), projection onto it can be realized using, e.g., the algorithm of [60].
Algorithm 1: The proposed RCTLS-IV algorithm.initialize: for n = 1, 2, . . ., until a stopping criterion is met do for m = 1, 2, . . ., until a stopping criterion is met do We summarize the proposed algorithm, which we call regularized constrained total least-squares with instrumental variable (RCTLS-IV), in Algorithm 1.We use this name in view of the similarity of the proposed algorithm to the total leastsquares methods [61]- [63] in performing estimation based on a linear errors-in-variables model.Algorithm 2 is a version of the proposed algorithm that uses only one iteration of the ADMM algorithm with warm start at each BCD iteration.In this algorithm, as a result of running a single inner ADMM iteration, the inner iteration index (m) is merged with the outer BCD iteration index (n).

C. Computational Complexity
It is clear that the most expensive steps are the calculations of E (n) , A (n,m) , and 1 .The calculation of E (n) requires two matrix-matrix multiplications with complexities of O (KLN ) and O K 2 N as well as the solution of L systems of linear equation with a common K × K coefficient matrix with the complexity of O K 2 L .The calculation of A (n,m) requires two matrix-matrix multiplications with complexities of O K 2 L and O (KLN ) as well as the solution of N systems of linear equation with a common K × K coefficient matrix with the complexity of O K 2 N .For the calculation of V (m) 1 , the matrix multiplications involving the differential matrix operators D h and D v and their transposes can be implemented using only subtractions.The multiplication by Algorithm 2: The proposed RCTLS-IV algorithm with a single inner ADMM iteration.initialize: for n = 1, 2, . . ., until a stopping criterion is met do can also be efficiently performed using the fast Fourier transform (FFT) algorithm and the circular convolution theorem [64] with the complexity of O (KN log N ), when the differential operators apply with periodic boundaries.Consequently, the proposed algorithm has a computation complexity of order O (KN max {L, log N })+ O K 2 max {L, N } per iteration or simply O (KLN ) per iteration in the likely case of having N ≥ L ≥ log N .

V. CONVERGENCE
The objective function of the optimization problem ( 16) is nonconvex and may have multiple local minima at various coordinate points of its augmented matrix argument [E, A].Therefore, the proposed algorithm, which is based on the BCD and ADMM algorithms, is not guaranteed to find the globally optimal solution of ( 16).Nonetheless, as we discuss below, the proposed algorithm converges at least to a local minimum of the objective function.We have observed experimentally that this local minimum is often very close to the global optimum when the perturbations are not too large.
First, we study the convergence properties of the BCD algorithm of ( 17) and ( 18) applied to solve (16).Incorporating the constraint A ∈ S into the objective function in (16) gives the following equivalent unconstrained objective function that can be decomposed as where The function f is continuous on its effective domain and has compact level sets.It is also biconvex in (E, A), i.e., it is convex over E if A is fixed and vice versa.In addition, f 0 is Gâteaux-differentiable on its domain, which is open, and f 1 is convex.Therefore, according to Lemma 3.1 and Theorem 4.1(b) of [33], the iterates generated by the BCD algorithm of ( 17) and ( 18) are defined and every cluster point is a stationary point of f .In other words, the algorithm converges to a local minimum of the objective function f .
Next, we examine the convergence properties of the ADMM algorithm that we use to solve (18) or equivalently (19).To this end, we rewrite (19) as The functions g 1 and g 2 are closed, proper, and convex and the matrix B has full column rank.Therefore, according to [36, Theorem 1], the ADMM algorithm described in Section IV will converge to a solution of (19), if there exists any, with any pair of positive penalty parameters µ and λ regardless of the initial values of the iterates.
VI. EVALUATION To evaluate the performance of the proposed algorithm in comparison with its closest contenders, we consider the problem of linear unmixing of a hyperspectral image given its noisy observations and a perturbed version of its endmembers as explained below.
In our simulations, we use parts of three publicly available hyperspectral image datasets, namely, Botswana, Indian Pines [65], and Washington DC 2 after removing the uncalibrated, high-noise, and water-absorption spectral bands.The RGB illustrations of the considered parts are shown in Fig. 1 and their spatial and spectral resolutions are given in Table I.The Botswana dataset has been collected by the Hyperion sensor aboard NASA's Earth Observing 1 (EO-1) satellite 3,4 , the Indian Pines dataset by NASA's Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) instrument 5 , and the Washington DC dataset through the Hyperspectral Digital Imagery Collection Experiment (HYDICE) [66].All three datasets cover the visible near-infrared (VNIR) and short-wavelength infrared (SWIR) ranges.
To create the ground truth for our experiments, We extract K = 10 endmembers from each dataset using the vertex component analysis (VCA) algorithm [10] and use the extracted endmembers to unmix the dataset using the SUnSAL algorithm [21].We take the output of the VCA and SUnSAL algorithms as the ground-truth endmember and abundance matrices, respectively, and multiply them to generate the ground-truth hyperspectral images for each dataset.We then generate the perturbed hyperspectral images and the perturbed endmember matrices by contaminating them with additive zero-mean white Gaussian noise to achieve a target signal-to-noise ratio (SNR) for both.
We compare the performance of the proposed RCTLS-IV algorithm with those of four other algorithms, which we call constrained least-squares (CLS), constrained least-squares with total-variation regularization (CLS-TV), constrained total least-squares (CTLS), and constrained total least-squares with instrumental variable (CTLS-IV).The CLS algorithm solves the following optimization problem and the CTLS-IV algorithm solves the same problem as the CTLS algorithm does, i.e., (29), nonetheless using the instrumental-variable technique mentioned in Section IV.To implement these algorithms, we use the same methodology employed to develop the proposed RCTLS-IV algorithm, i.e., utilizing the BCD and ADMM algorithms, as well as identical parameter values.Note that the CLS algorithm solves the same problem as in [28] while taking Ẽ as the endmember matrix and the CLS-TV algorithm [22], [24] extends it by incorporating a totalvariation regularization.In addition, the objective function of the CTLS algorithm is very similar to that of the R-CoNMF algorithm proposed in [20].The CTLS-IV algorithm can also be considered a version of the proposed RCTLS-IV algorithm that lacks the regularization penalties detailed in Section III-D.
We use two performance metrics, namely, the inverse normalized mean square error (INMSE), also known as the signalto-reconstruction-error ratio (SRE) [24], defined as and the inverse normalized bias (INB) defined as where Â denotes the estimated abundance matrix.Higher values of these metrics indicate better performance.We approximate the expectations by averaging over 100 independent trials.At each trial, we run the algorithms for 200 iterations and initialize by A (0) = 0 K×N .In the CTLS, CTLS-IV and RCTLS-IV algorithms, at each outer BCD iteration, we implement only one inner iteration of the ADMM algorithm with warm start as shown in Algorithm 2 for the RCTLS-IV algorithm.The data and MATLAB scripts used to produce the simulation results can be found at [67].
The proposed RCTLS-IV algorithm has five tunable parameters, the regularization parameters α, β, and γ and the ADMM penalty parameters µ and λ.The automatic tuning of the values of these parameters is an interesting and challenging subject for which there exists a few strategies such as those proposed in [58] and [71].The values of µ and λ impact the convergence speed of the proposed algorithm and, as long as being within an appropriate range, have little influence on the accuracy of the proposed algorithm.Therefore, in our  experiments with all considered datasets, we tune them so that the algorithms sufficiently converge within 200 iterations.
On the other hand, the values of α, β, and γ affect the performance of the proposed algorithm in subtle ways.In Fig. 2, we plot the performance metrics of the proposed algorithm for unmixing the Botswana dataset with SNR = 30 dB versus a number of values for α, β, and γ.The results indicate that the performance of the proposed algorithm is not highly sensitive to the values of these regularization parameters.In addition, choosing the proper parameter values may involve a trade-off between the highest attainable INMSE and INB.Consequently, we tune the parameters α, β, and γ only coarsely to obtain approximately the best performance for each dataset.We give these values in Table II.Note that the difference in the order of magnitude of the parameter values is due to the scaling and dimension of the signals and perturbations and does not imply any meaningful difference in their significance.Fig. 3 depicts how the INMSE of different algorithms evolve over iterations.Fig. 4 shows the values of the INMSE for each abundance band corresponding to an endmember, i.e., the INMSE for each row of Â. Fig. 5 shows the values of the INMSE for each pixel, i.e., the INMSE of each column of Â, sorted in an ascending order.Figs.3-5 contain the results for all considered algorithms and datasets when the SNR is approximately 30 dB.Figs.6-8 display the abundance values for three endmembers of each considered dataset estimated through a single run of the CLS, CTLS, and RCTLS-IV algorithms together with the corresponding ground-truth values in the form of two-dimensional abundance maps (images).We have not included the results for the CLS-TV and CTLS-IV algorithms in Figs.6-8 since their estimated abundance maps are visually very similar to those of the CLS and RCTLS-IV algorithms, respectively.The SNR in the experiments resulting in Figs.6-8 was approximately 30 dB.In Table III, we provide the values of the performance metrics for all considered algorithms and datasets as well as the corresponding CRLBs for three SNRs of 25, 30, and 35 dB.We also give the average algorithm run times (using MATLAB, a 2.9GHz Core-i7 CPU, and 24GB of DDR3 RAM) in Table IV.
We observe in Figs.3-5 that, as expected, accounting for the perturbations in the endmembers (in all algorithms but the CLS and CLS-TV algorithms) significantly improves the unmixing performance evident by the reduction in both MSE and bias (increase in INMSE and INB).The IV technique employed in the CTLS-IV and RCTLS-IV algorithms and the additional regularizations used in the RCTLS-IV algorithm further enhance the performance in terms of both estimation accuracy and convergence speed, although with varying degrees depending on the considered dataset.Understandably, these benefits come at the cost of computational complexity  and algorithm run time.Figs.6-8 also testify to the merits of the proposed RCTLS-IV algorithm by revealing that its estimated abundance maps resemble the ground truth visibly better than those of its contenders.
The CRLB values presented in Table III provide a useful guide for the estimation performance that can be expected in each scenario.However, in some cases, they appear to be smaller than the experimentally-obtained INMSE values for the CTLS-IV or RCTLS-IV algorithms.We blame this on two facts.First, although the CTLS-IV and RCTLS-IV algorithms yield substantial reductions in the estimation bias compared to the CLS, CLS-TV, and CTLS algorithms, they are not strictly unbiased.This is a repercussion of the abundance nonnegativity constraint couple with the assumption of unbounded Gaussian distribution for the perturbations.Second, in the derivation of the CRLB, we dismissed the possible effects of the abundance nonnegativity constraint on the grounds that the CRLB is only a local bound and is not impacted by any strict inequality constraint.
It is argued in [48] that only equality (active) constraints on the unknown parameters affect the CRLB while inequality (inactive) constraints do not, i.e., inequality constraints do not contribute any information to the model nor impact the performance potential.This is justified by observing that as the test points approach the true parameters in the Chapman-Robbins bound, they become interior points of the constraint set.However, this only happens when the model perturbations are sufficiently small so that the estimates fluctuate around the true parameters within the feasible set, i.e., the nonnegative orthant in our case.Calculating a more accurate CRLB-like performance bound that takes into account the effects of inequality constraints, especially the nonnegativity constraint, without assuming that the estimates asymptotically fall into the feasible region is an interesting topic for future research.

VII. CONCLUSION
We studied a fully-perturbed linear mixture model where there exist perturbations on both the spectral image measurements and the available knowledge of the endmembers.This model is more realistic compared with the conventional linear mixture model that assumes perturbation only on the spectral measurements.The main reason is that, in practice, the endmembers are often extracted from imperfect and noisy multiband spectral images hence are subject to noise, error, or mismatch.
In order to gain insights into the theoretical limits of the expected performance of estimating the abundance matrix under a fully-perturbed linear mixture model, we calculated the relevant Fisher information matrix and the Cramer-Rao lower bound (CRLB) while taking into account the abundance sumto-one constraint as well.This was done under the assumption that the perturbations on the spectral measurements and the endmembers are statistically independent and have zero-mean Gaussian distributions.
We posed the problem of inferring the abundance matrix as a constrained maximum-likelihood estimation problem and regularized the pertinent objective function by employing two penalty terms that promote smoothness in the spatial domain and geometrical containment in the spectral domain.We solved the formulated problem using a two-block coordinate-descent scheme along with an instance of the alternating direction method of multipliers.We also applied a modification to the developed algorithm by drawing inspirations from the notion of instrumental-variable estimation.Our simulation results verified the usefulness of the proposed algorithm as well as the validity of the theoretically-obtained CRLB.
For future research, it will be interesting to characterize the estimation bias of the proposed algorithm theoretically and devise a compensation mechanism to mitigate the bias.Considering different distributions for the perturbations can also be of interest, particularly to account for impulsive noise or missing data.Moreover, it will be beneficial to calculate a more accurate CRLB that effectively allows for possible drifts in the estimation performance induced by the abundance nonnegativity constraint.APPENDIX A CALCULATION OF THE FIM Here, we derive the Jacobian matrices required for the calculation of the FIM.
The upper-left block is calculated as where ⊗ denotes the Kronecker product and we use the property [68] vec {UVW} = W T ⊗ U vec {V} .
For the upper-right block, we have where in addition to (30) we use the property [68] for matrix functions U : S → R P ×Q and V : S → R Q×R defined and differentiable on an open set S and the argument W = vec −1 {w} ∈ S.Moreover, P K,N is the so-called vec-permutation matrix of order K, N that is the result of ∂ vec {A}/∂a T and satisfies P K,N vec A T = vec {A}.The vec-permutation matrix of order Q, R, P Q,R , is a QR × QR matrix partitioned into an R × Q array of Q × R submatrices where the (i, j)th submatrix has one at its (j, i)th entry and zeros elsewhere.The last line of ( 31) is due to the property of the vec-permutation matrices in reversing the order of the Kronecker products and the fact that It is easy to show that the lower-left block is the transpose of the upper-right block, i.e., Finally, the lower-right block is computed as

APPENDIX B VERIFYING THE NON-SINGULARITY (INVERTIBILITY) OF THE FIM
Since α is positive, Σ and Ξ are positive-definite, and the endmembers are linearly independent, F 11 and F 22 are fullrank.The Schur complement of F 22 is also full-rank as it can be calculated as where (a) is due to the property of the vec-permutation matrices in reversing the order of Kronecker products.Consequently, F is non-singular and invertible [46].
APPENDIX C DERIVATION OF (10) When Ξ = ηΣ, we have Therefore, we can write and consequently Thus, we have In the above equations, (b) is due to the property of the vec-permutation matrices in reversing the order of Kronecker products and (c) is thanks to the Woodbury matrix identity, i.e., APPENDIX D DERIVATION OF (14) Considering (32) and H = Y ⊗ I N , we have

A
and consequently .

APPENDIX E BENEFIT OF THE INSTRUMENTAL VARIABLE TRICK
If we assume that, after a sufficient number of iterations, E (n) is uncorrelated with the perturbations P E and P X in addition to ignoring the terms multiplied by µ and λ, (23) and ( 24) can be approximated as respectively, where we use the subscript IV to differentiate the two estimates.Applying the expectation operator together with using Slutsky's theorem [70] gives This shows that the employed IV scheme specifically drives A (n,m) IV towards the true A in the mean sense regardless of the proximity of E (n) to the true E. Therefore, it can accelerate the convergence of A (n,m) IV and improve its accuracy by pushing it towards A, albeit when the perturbations are not too large so that E (n) becomes progressively less correlated with P E and P X as the iterations evolve.
, respectively, are computed in terms of the jth column of Z (m) , denoted by z (m) j , as

Fig. 1 .
Fig. 1.An RGB illustration of the hyperspectral images used in the simulations.

TABLE I THE
SPATIAL AND SPECTRAL RESOLUTIONS OF THE HYPERSPECTRAL IMAGE DATASETS USED IN THE SIMULATIONS

TABLE II THE
PARAMETER VALUES USED IN THE SIMULATIONS

TABLE III THE
VALUES OF THE PERFORMANCE METRICS FOR DIFFERENT ALGORITHMS, DATASETS, AND SNRS TOGETHER WITH THE CORRESPONDING CRLBS