Multivariate calibration with temperature interaction using two-dimensional penalized signal regression
Introduction
The typical multivariate calibration (MVC) problem is this: for a number of chemical samples, optical spectra are obtained, as well as concentrations of an analyte; from these data one wishes to derive a vector of coefficients for predicting unknown concentrations given a measured spectra. Generally, the number of samples in the training data is less than 100, while the spectra are measured on several hundreds or even over a thousand wavelengths; thus the problem is inherently ill posed. See Ref. [1] for an extensive presentation and Ref. [2] for a discussion from a statistical point of view.
Two general approaches have been used to make the MVC problem well posed: (a) reduction of the regression bases, and (b) penalized estimation. The first approach can use, for example: principal component regression (PCR), partial least squares regression (PLSR), or projection onto B-splines. Penalized regression comes in many forms, two of which are: (i) ridge regression, which shrinks the regression coefficients towards zero, and (ii) penalized signal regression (PSR), which forces the vector of coefficients to vary smoothly with wavelength [3].
Most applications of MVC have a static flavor: the operating conditions are assumed to be more or less constant. In practice, this might not be the case. Wülfert et al. [4] studied the effect of changing temperature on the predictive ability of an MVC model. Using the same data, Marx and Eilers [5] made a systematic comparison of PCR, PLSR and PSR. They also coined the name multivariate calibration stability for the systematic analysis of an MVC model that is trained under one condition, then monitored for prediction performance under changing operating conditions. This might also include calibration transfer: e.g. developing a model on one instrument (in the laboratory) and using it on another instrument (in a production environment).
A logical further development is to extend the model by including additional covariates (like temperature or pressure) in a systematic way, thereby hoping to improve performance. In this paper, we report on a first step in that direction. Using the data of Ref. [4], we extend PSR to include temperature information. Our strategy is to introduce a coefficient surface, defined on the two-dimensional wavelength–temperature domain. At a specific temperature, one cuts through this surface to get the “classical” MVC regression coefficient vector with which to weigh a spectrum. We assume the surface to be smooth (in both the directions of wavelength and temperature) and estimate it with a two-dimensional extension of PSR, based on tensor products of B-splines and appropriate roughness penalties. We refer to this extension as TPSR.
The way that we estimate the surface allows complicated interactions of wavelength and temperature. The actual results indicate a less complicated structure, and we develop simpler models that implement the ideas of varying-coefficient models (VCM) [6].
Although we use an example with changing temperature, our approach is also applicable to changes in time, to correct for instrumental drift, assuming that it is possible to analyze calibration samples at regular intervals.
In Section 2, we discuss the data structure of a mixture experiment that has motivated this research. A recap of the one-dimensional PSR approach is given in Section 3. Tensor-product B-splines in a nutshell are presented in Section 4, followed by our proposed two-dimensional PSR extension in Section 5. The results of this extension applied to the example are given in Section 6, and we close with a discussion in Section 7.
Section snippets
The motivating example
Wülfert et al. [4] presented an experiment that involved mixtures of ethanol, water and isopropanol prepared according to the design given in Fig. 1 and Table 1. Specific details can be found in their article, and the data are available at www-its.chem.uva.nl. Each of the 19 mixtures, as well as the three pure compounds had measured spectra under several temperature conditions: 30, 40, 50, 60, and 70 °C (±0.2 °C), which were short-wave near-infrared spectra ranging from 580 to 1091 nm, by 1 nm.
Recap: P-spline signal regression (PSR)
Marx and Eilers solved the standard multivariate calibration problem with penalized signal regression (PSR): forcing the coefficients to be smooth. Consider modeling the mean response E(Y)=μ aswhere α0 is the intercept, X is the matrix of digitized spectra, and α is the unknown coefficient vector. As mentioned, typically the number of regressors (p) far exceeds the number of observations (m), i.e. p≫m. What is essential to PSR is that it achieves smoothness in α, by
Tensor product B-splines in a nutshell
Eilers and Marx [10] presented a section B-splines in a nutshell. To give the background that is needed for this paper, we extend the nutshell to illustrate the basic simplicity of tensor product B-splines. A more complete and mathematically rigorous presentation of the subject can be found in Ref. [11] (chapters 1 and 2). Fig. 6 displays the essential building block: a bicubic basis function. In short, this figure displays the tensor product of the two univariate (cubic) B-splines, Bk and B̆l
Two-dimensional Tensor Product PSR (TPSR)
Given spectra matrix X=[xij] (i=1,…,m; j=1,…,p) and coefficient surface α(v, t), let
Eq. (6) is akin to Eq. (1), but uses a slice of the coefficient surface that is specific to the value of t. Recall Fig. 5, which presented several estimated regression coefficient surfaces for the mixture experiment. To give an idea of how the surface is used, Fig. 8 displays various temperature slices of the upper right panel of Fig. 5 that can be used in Eq. (6). If the coefficient
Results for the mixture experiment
We model percent ethanol using the derivative spectra (199 channels) and temperature. The previous sections motivated ideas of tensor product coefficient surface estimation (TPSR) and its application to the mixture data. We first construct a surface with penalty orders along wavelength and temperature of dv=2 and dt=1, respectively. The two-dimensional grid search yields a minimum leave-one-out CV at 0.00594 for λv=10−14, λt=10−8, and λ0=5×1010 using 83 (8) knots on the v(t) axis. Fig. 9
Discussion
We have presented a modeling approach that allows the coefficient vector to vary smoothly (interact) with another variable, e.g. temperature, yielding a surface. Denote the triplet (yi, x(vji), ti) for the response, signal, and covariate, respectively, i=1,…,m; j=1,…,p. We moved from a smooth PSR vector α(vj) to a tensor product smooth surface TPSR α(vj, ti).We have also presented a simplified varying penalized signal regression VPSR surface: α(vj)+tiα*(vj). Although we did not consider it in
Acknowledgements
We thank Age Smilde for valuable discussion regarding this research. Research supported in part for Brian Marx by NSF Grant DMS-0102131.
References (13)
- et al.
Linear techniques to correct for temperature induced spectra variation in multivariate calibration
Chemometrics and Intelligent Laboratory Systems
(2000) - et al.
Development of robust calibration models in near infra-red spectrometric applications
Analytica Chimica Acta
(2000) - et al.
Multivariate Calibration
(1989) - et al.
A statistical view of some chemometric regression tools
Technometrics
(1993) - et al.
Generalized linear regression on sampled signals and curves: a p-spline approach
Technometrics
(1999) - et al.
Influence of temperature on vibrational spectra and consequences for the predictive ability of multivariate models
Analytical Chemistry
(1998)
Cited by (126)
Data- and theory-guided learning of partial differential equations using SimultaNeous basis function Approximation and Parameter Estimation (SNAPE)
2023, Mechanical Systems and Signal ProcessingMultivariate calibration on heterogeneous samples
2021, Chemometrics and Intelligent Laboratory SystemsCompensation of temperature effects on spectra through evolutionary rank analysis
2021, Spectrochimica Acta - Part A: Molecular and Biomolecular SpectroscopyTensor product splines and functional principal components
2020, Journal of Statistical Planning and InferenceTensor product P-splines using a sparse mixed model formulation
2023, Statistical Modelling