Multivariate calibration with temperature interaction using two-dimensional penalized signal regression

https://doi.org/10.1016/S0169-7439(03)00029-7Get rights and content

Abstract

The Penalized Signal Regression (PSR) approach to multivariate calibration (MVC) assumes a smooth vector of coefficients for weighting a spectrum to predict the unknown concentration of a chemical component. B-splines and roughness penalties, based on differences, are used to estimate the coefficients. In this paper, we extend PSR to incorporate a covariate like temperature. A smooth surface on the wavelength–temperature domain is estimated, using tensor products of B-splines and penalties along the two dimensions. A slice of this surface gives the vector of weights at an arbitrary temperature. We present the theory and apply multi-dimensional PSR to a published data set, showing good performance. We also introduce and apply a simplification based on a varying-coefficient model (VCM).

Introduction

The typical multivariate calibration (MVC) problem is this: for a number of chemical samples, optical spectra are obtained, as well as concentrations of an analyte; from these data one wishes to derive a vector of coefficients for predicting unknown concentrations given a measured spectra. Generally, the number of samples in the training data is less than 100, while the spectra are measured on several hundreds or even over a thousand wavelengths; thus the problem is inherently ill posed. See Ref. [1] for an extensive presentation and Ref. [2] for a discussion from a statistical point of view.

Two general approaches have been used to make the MVC problem well posed: (a) reduction of the regression bases, and (b) penalized estimation. The first approach can use, for example: principal component regression (PCR), partial least squares regression (PLSR), or projection onto B-splines. Penalized regression comes in many forms, two of which are: (i) ridge regression, which shrinks the regression coefficients towards zero, and (ii) penalized signal regression (PSR), which forces the vector of coefficients to vary smoothly with wavelength [3].

Most applications of MVC have a static flavor: the operating conditions are assumed to be more or less constant. In practice, this might not be the case. Wülfert et al. [4] studied the effect of changing temperature on the predictive ability of an MVC model. Using the same data, Marx and Eilers [5] made a systematic comparison of PCR, PLSR and PSR. They also coined the name multivariate calibration stability for the systematic analysis of an MVC model that is trained under one condition, then monitored for prediction performance under changing operating conditions. This might also include calibration transfer: e.g. developing a model on one instrument (in the laboratory) and using it on another instrument (in a production environment).

A logical further development is to extend the model by including additional covariates (like temperature or pressure) in a systematic way, thereby hoping to improve performance. In this paper, we report on a first step in that direction. Using the data of Ref. [4], we extend PSR to include temperature information. Our strategy is to introduce a coefficient surface, defined on the two-dimensional wavelength–temperature domain. At a specific temperature, one cuts through this surface to get the “classical” MVC regression coefficient vector with which to weigh a spectrum. We assume the surface to be smooth (in both the directions of wavelength and temperature) and estimate it with a two-dimensional extension of PSR, based on tensor products of B-splines and appropriate roughness penalties. We refer to this extension as TPSR.

The way that we estimate the surface allows complicated interactions of wavelength and temperature. The actual results indicate a less complicated structure, and we develop simpler models that implement the ideas of varying-coefficient models (VCM) [6].

Although we use an example with changing temperature, our approach is also applicable to changes in time, to correct for instrumental drift, assuming that it is possible to analyze calibration samples at regular intervals.

In Section 2, we discuss the data structure of a mixture experiment that has motivated this research. A recap of the one-dimensional PSR approach is given in Section 3. Tensor-product B-splines in a nutshell are presented in Section 4, followed by our proposed two-dimensional PSR extension in Section 5. The results of this extension applied to the example are given in Section 6, and we close with a discussion in Section 7.

Section snippets

The motivating example

Wülfert et al. [4] presented an experiment that involved mixtures of ethanol, water and isopropanol prepared according to the design given in Fig. 1 and Table 1. Specific details can be found in their article, and the data are available at www-its.chem.uva.nl. Each of the 19 mixtures, as well as the three pure compounds had measured spectra under several temperature conditions: 30, 40, 50, 60, and 70 °C (±0.2 °C), which were short-wave near-infrared spectra ranging from 580 to 1091 nm, by 1 nm.

Recap: P-spline signal regression (PSR)

Marx and Eilers solved the standard multivariate calibration problem with penalized signal regression (PSR): forcing the coefficients to be smooth. Consider modeling the mean response E(Y)=μ asμm×101m+Xm×pαp×1,where α0 is the intercept, X is the matrix of digitized spectra, and α is the unknown coefficient vector. As mentioned, typically the number of regressors (p) far exceeds the number of observations (m), i.e. pm. What is essential to PSR is that it achieves smoothness in α, by

Tensor product B-splines in a nutshell

Eilers and Marx [10] presented a section B-splines in a nutshell. To give the background that is needed for this paper, we extend the nutshell to illustrate the basic simplicity of tensor product B-splines. A more complete and mathematically rigorous presentation of the subject can be found in Ref. [11] (chapters 1 and 2). Fig. 6 displays the essential building block: a bicubic basis function. In short, this figure displays the tensor product of the two univariate (cubic) B-splines, Bk and B̆l

Two-dimensional Tensor Product PSR (TPSR)

Given spectra matrix X=[xij] (i=1,…,m; j=1,…,p) and coefficient surface α(v, t), letμi0+j=1pxijα(vj,ti).

Eq. (6) is akin to Eq. (1), but uses a slice of the coefficient surface that is specific to the value of t. Recall Fig. 5, which presented several estimated regression coefficient surfaces for the mixture experiment. To give an idea of how the surface is used, Fig. 8 displays various temperature slices of the upper right panel of Fig. 5 that can be used in Eq. (6). If the coefficient

Results for the mixture experiment

We model percent ethanol using the derivative spectra (199 channels) and temperature. The previous sections motivated ideas of tensor product coefficient surface estimation (TPSR) and its application to the mixture data. We first construct a surface with penalty orders along wavelength and temperature of dv=2 and dt=1, respectively. The two-dimensional grid search yields a minimum leave-one-out CV at 0.00594 for λv=10−14, λt=10−8, and λ0=5×1010 using 83 (8) knots on the v(t) axis. Fig. 9

Discussion

We have presented a modeling approach that allows the coefficient vector to vary smoothly (interact) with another variable, e.g. temperature, yielding a surface. Denote the triplet (yi, x(vji), ti) for the response, signal, and covariate, respectively, i=1,…,m; j=1,…,p. We moved from a smooth PSR vector α(vj) to a tensor product smooth surface TPSR α(vj, ti).We have also presented a simplified varying penalized signal regression VPSR surface: α(vj)+tiα*(vj). Although we did not consider it in

Acknowledgements

We thank Age Smilde for valuable discussion regarding this research. Research supported in part for Brian Marx by NSF Grant DMS-0102131.

References (13)

There are more references available in the full text version of this article.

Cited by (126)

  • Multivariate calibration on heterogeneous samples

    2021, Chemometrics and Intelligent Laboratory Systems
  • Compensation of temperature effects on spectra through evolutionary rank analysis

    2021, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
  • Tensor product splines and functional principal components

    2020, Journal of Statistical Planning and Inference
View all citing articles on Scopus
View full text