Improving the validation of turbulent jet breakup models

Understanding the physics of the breakup of turbulent liquid jets is important for a variety of applications including engine sprays, ﬁre suppression systems, and water jet cutting. Models of turbulent jet breakup allow predictions of quantities of interest like the droplet size distribution and breakup length of the jet. These models are compared against experimental data in a process called validation. If the model predictions are within the experimental uncertainty, then the model is “validated” and believed to be accurate, and possibly can explain the physics. Uncertainty quantiﬁcation is necessary for model validation. While unfortunately relatively few experimental studies quantify uncertainty, that is not the most pressing validation issue in turbulent jet breakup. I detail 3 additional problems that can make the apparent validation of a model actually an illusion, regardless of how well the model appears to match the data. These problems include: 1. important variables being omitted or guessed in experiments and models, 2. confounding between independent variables, that is, two variables changing simultaneously, making determining cause and eﬀect impossible, and 3. testing only combinations of submodels and not each submodel in isolation. To avoid these problems and others, I developed validation guidelines that are detailed in this work. Following these guidelines, I compiled a large experimental database. Only 28 out of 47 experimental studies considered met my data quality guidelines. Only 18 studies had quantiﬁed uncertainty, and only 3 studies had substantial variation in the turbulence intensity.


Introduction
Liquid jet breakup has been modeled in a wide variety of different ways since the theoretical study of liquid jets began in the 19th century. The precise modeling techniques are not a concern in this paper, rather, how the success of models is evaluated is. The comparison of model predictions against experimental data is called "validation" [1]. If the comparison is a success, then the model is deemed "validated" and believed to accurately predict cases not yet measured. Often, models appear to work well in published works, but their accuracy is still regarded with suspicion. The goal is this paper is to highlight some of the reasons why an apparently successful validation of a turbulent jet breakup model may actually be illusory.
For example, the KH-RT (Kelvin-Helmholtz/Rayleigh-Taylor) jet and droplet breakup model [2] is popular, but regarded as not fully predictive [3, p. 34L] despite the seemingly favorable fit between the model and experimental data in the original paper. The lack of predictability is demonstrated by one of the model coefficients, B 1 , taking calibrated values ranging from 1.73 to 40 [4, p. 14]. Magnotti and Genzale attribute this to the model not taking into account all the physical mechanisms involved. This is just one of several possibilities. Even if the model considers all of the physical mechanisms involved, it's possible that a particular submodel is inaccurate for its intended purpose.
While the precise criteria which determines whether a model has been validated is not trivial [1], for the purposes of this work I'll call a model validated if its estimates are within the error bounds (say, 95%) roughly as frequently as the error bound itself. In other words, roughly 95% of the model predictions need to be within the 95% intervals of the data. These model predictions are considered the most likely cases estimated by the model. Model uncertainty are multi-modal distributions are not considered in this work. This simplified approach is sufficient for this work because, as will be discussed in more detail, relatively few turbulent jet breakup experimental data sources have quantified uncertainty, many data sources which have quantified uncertainty have large uncertainties that are not difficult for models to stay within, and even if these data sources did have quantified and small uncertainties, it's still possible for a "bad" model to match the experimental data well due to validation problems unrelated to uncertainty quantification.
These problems largely will be resolved through better and more comprehensive data. Consequently, for validating turbulent jet breakup models, I developed a large data compilation for turbulent jet breakup. This data compilation was specifically designed to be challenging and diagnostic for turbulent jet breakup models. A summary of the problems with existing data is in table 1.
These problems are elaborated in this work. Ultimately, only 18 of the 44 (41.9%) experimental studies with the quantities of interest were used. 10 studies were neglected despite being acceptable because the data collected appeared to mainly duplicate already transcribed data. For reference, all studies considered are cited at the end of this sentence . The studies used are cited at the end of this sentence [6, 12, 13, 15, 17, 18, 22, 26, 29, 31-33, 39, 41-43, 47, 48]. The individual breakdown of studies used for each quantity of interest is discussed in Trettel [52].
Some regression analysis from this data compilation has been published previously [52], but the motivations behind the data selected and problems found with the previous literature are published here first. For additional details, see the dissertation associated with this work [53].
To keep this work simple, I focus primarily on circular non-cavitating turbulent Newtonian liquid jets injected into still gases at high density ratios ρ /ρ g 500. In this regime the main physical mechanisms responsible for breakup are turbulence and velocity profile relaxation, the latter of which can be explained through turbulence [54]. More broadly, making a model which works in this regime is necessary for the model to work in more challenging regimes involving cavitation and higher atmospheric densities.
See figure 1 for an illustration of the basic jet breakup quantities of interest (QoIs) considered in this work. The liquid jet travels to the right from the nozzle on the left surrounded by still gas. The nozzle orifice diameter is d 0 ; the 0 subscript indicates a variable located at the nozzle. Similarly, the ensemble averaged breakup onset location x i and the spray angle θ i use an i subscript to indicate their location as well. The initial breakup location is where jet breakup (i.e., droplet formation from the jet) is first observed to occur. This location is averaged as it varies over time. Similarly, as jet breakup depletes the liquid core, the liquid core eventually ends at a location called the "breakup length", which is also an time-varying quantity, so the average breakup length x b is measured through various means to be discussed later in this work. For brevity the average breakup length will often be called simply the breakup length. Not in the illustration are various measures of the droplet diameter. The Sauter mean diameter is a common measure of the "average" droplet diameter; this is denoted D 32 . How frequently each QoI is measured in experiments can be seen in table 1.
The most common independent variables involved in this problem are the nozzle bulk (average) velocity U 0 , the surface tension σ, the liquid viscosity ν , the liquid density ρ , and the gas density ρ g . From these the Reynolds number Re 0 ≡ U 0 d 0 /ν , Weber number We 0 ≡ ρ U 2 0 d 0 /σ, and density ratio ρ /ρ g can be formed. As before, the 0 subscript indicates a quantity measured at the nozzle exit, and indicates this is a liquid quantity.

Disclaimer
Following Rider [55], a disclaimer is warranted. I will use certain published papers as examples of poor model validation. The issues identified in this work are not an indictment of the researchers. Instead, they show flaws in the accepted practices of the atomization community. In this work, a accepted practice is defined as a practice which appears in the recent (and up-to-date) literature repeatedly or in papers which are widely accepted at present in the community. All of the research I cite was conducted in good faith to my knowledge. The problems I discuss are not obvious. In particular, confounding can be particularly challenging to identify. This study is by no means comprehensive, and reflects my own judgment about which validation problems are more pressing based on examination of a large fraction of the published literature on turbulent jet breakup at low atmospheric densities. Other regimes and other atomization problems may not suffer from the same issues or suffer from any major issue at all.

Uncertainty quantification
A minority of the data sources (41.9%; see table 1) considered in this work quantify uncertainty1 [6, 13, 15, 1To determine whether a study had quantified uncertainty, I used a generous definition: If it was possible using data in the work and some mild assumptions, I considered the work to have quantified uncertainty. 18, 19, 22, 26, 32, 37, 41-43, 45, 47, 48, 51]. When uncertainties are estimated, they are often large. For example, the percent error of the Sauter mean diameter measurements of Wu [56, p. 139] was estimated as 33%. Large uncertainties make validation easier than it should be. New experiments with small, known uncertainties are needed for rigorous validation of turbulent jet breakup models.
There are established procedures for uncertainty quantification for droplet size [57-60, 61, p. 128]. There also appears to be a wide spread in spray angle data due to the sensitivity of the spray angle to its definition [62, 63, p. 114, 64, p. 12, 65], a problem which can only be solved by standardization of the definition of the spray angle2.
Aside from a brief discussion by Osta [51, p. 108] little has been written on uncertainty quantification for the breakup length, which is used for the examples in this paper. There are two main methods to measure breakup length: electrical conductivity of the jet and photographic measurement of where the jet core ends. The electrical conductivity measurements define the breakup length as the point where the jet conducts electricity through itself 50% of time. Photographic measurement defines the breakup length as the average location of the end of the jet's core. Because the distribution of breakup location is highly symmetric, these two numbers are essentially equal [19,66], and consequently I use the notation x b for the breakup length irrespective of how it was measured. The two components of the uncertainty in this case are the measurement precision and the statistical error from taking a finite number of data points3. The electrical conductivity case is essentially taking a very large number of measurements, making the statistical component negligible, so the main source of uncertainty is the precision of the measurements. For photographic measurements, the main source of error is typically statistical. For turbulent jets this can be estimated using the fact that the standard deviation of the jet breakup location (σ b ) is well predicted by a constant coefficient of variation .0019, as is discussed in my dissertation [53]. This along with the t-distribution can be used to estimate the uncertainty in photographic measurements of the breakup length. The statistical error for photographic measurements tends to be rather large at the sample sizes used in the previous literature, a fact which has not been appreciated to my knowledge. This can be seen in figure 2; the large errors in the measurements of 2New experiments may be necessary to determine how to obtain roughly equivalent spray angles using different techniques, e.g., different thresholds for photographic or mass fraction measurements which result in approximately the same spray angles.
3A third component, transcription error from the conversion of data to graphical plots back to data, is more difficult to characterize and has not been included in this work. However, I intend to examine this in future work.  Grant [67] are particularly noticeable4. It is likely that if the uncertainty were quantified in some of the earlier photographic breakup length measurements, the researchers would have conducted more trials to reduce the statistical error in their measurements. Electrical conductivity measurements are preferred, though in principle photographic measurements can have small statistical errors at larger sample sizes.

Omitted and qualitative independent variables
If a model does not include an important variable, it is intuitive that the model may perform poorly. The model may match its calibration data well, but be severely inaccurate in other situations where the neglected variable differs substantially from the values it took in the calibration data.
Unfortunately, this is often the case in turbulent jet breakup for an entire class of variables: turbulence quantities. It is uncontroversial that some measure of the "strength" of turbulence is a major factor in turbulent jet 4Neither Kusui [17] or Arai et al. [33] report measurement precision. These were estimated as 1 cm and 0.2 cm, respectively. Also, note that the R 2 value in the corner of figure 2 differs from that of table 2 because the table only uses the latter 3 data sources, neglecting the noisier photographic measurements of Chen and Davis [13] and Grant [67]. This was necessary to obtain a clear turbulence intensity exponent in the regression procedure. The large error washed out any turbulence intensity effects. breakup, with the breakup being more severe for "stronger" turbulence [68, p. 14, 69, p. 512, 70, p. 72L]. The most natural measure of the strength of turbulence is the turbulence intensity, Tu5. To define the turbulence intensity first the the RMS velocity fluctuation u ≡ u 2 must be defined. In the definition of u , u is the velocity fluctuation, defined as u ≡ U − U where U is the time or ensemble average of the velocity. If the reader is familiar with statistics, the turbulent RMS velocity is simply the standard deviation of a particular velocity component. The larger the RMS velocity, the larger the fluctuations. This quantity can be defined in other directions, e.g., the radial RMS velocity v is believed to be particularly important in turbulent jet breakup as the radial fluctuations can directly cause breakup. Often it is convenient to measure the strength of fluctuations in all directions. In this case one can use the turbulent kinetic energy, k ≡ 1 2 (u + v + w ), which considers fluctuations in each direction. The turbulence intensity is a non-dimensional version of the RMS velocity: Tu ≡ u / U . In this work, I'll use the notation Tu 0 to refer to a turbulence intensity defined using a plane-averaged turbulent kinetic energy: Tu 0 ≡ (2k 0 /3) 1/2 /(3U 0 ). While this choice may seem peculiar, it is chosen because it is believed to capture the first-order effects of the strength of turbulence while ignoring the effects of inhomogeneity in the radial direction and anisotropy. A more recent regression analysis by this author [52] suggested more precise sensitivities to the turbulence intensity for the breakup length, x b , and spray angle, θ i . The previous study of Kusui [17] also showed a clear turbulence intensity effect on the transition to the atomization regime. Most QoIs show a turbulence intensity dependence in turbulent jet breakup to my knowledge.
The turbulence intensity is not typically constant. Measurements of the turbulence intensity in large-scale air models of nozzles varied between roughly 4% and 11% for nozzles similar to diesel nozzles [71, fig. 4] and roughly 4% and 13% for sudden and smooth contraction nozzles with orifice lengths of L 0 /d 0 = 4 [72]. These measurements neglected cavitation, which presumably could increase the turbulence level further. In applications where particularly stable jets are desired, low turbulence intensities on the order of 1% are expected.
Unfortunately, the vast majority of previous liquid jet experiments did not characterize turbulence quantities. This was observed as early as the 1967 survey of Lapple et al. [73, pp. 9-10] and unfortunately the situation has not changed since then. In 2010, Osta and Sallam [74, p. 945] note that turbulence quantities are still neglected 5Many researchers believe that the Reynolds number (Re 0 ≡ U 0 d 0 /ν ) is the most natural measure of the strength of turbulence. The Reynolds number absolutely is a factor in turbulence, however, it will be shown later in this paper that in the second wind-induced regime, the breakup length is nearly insensitive to the Reynolds number.
in experiments, despite their importance. The neglect of turbulence quantities is understandable as turbulence measurements in free surface flows are difficult [75, p. 345]. With that being said, there have been several turbulent jet breakup studies which varied the turbulence intensity, often by avoiding the need for measurement of the turbulence level in a free surface flow. The 1948 study of Bogdanovich [76] used large-scale air models of nozzles to get credible estimates of the turbulence intensity at the nozzle outlet. In 1963, Skrebkov [12] used long pipes of varying roughness to control the turbulence intensity relatively precisely with a known relationship between the turbulence intensity in fully developed pipe flow and the friction factor. The first theoretical study to consider the turbulence intensity was made by Natanzon [77] in 1938. Unfortunately these studies are little known, likely because they were originally written in Russian.
The neglect of turbulence intensity was examined in a review of dimensional analysis of turbulent jet breakup in my dissertation [53]. Only 20% of the 45 studies considered the RMS velocity (and by extension, the turbulence intensity) in their dimensional analyses. A further 20% considered nozzle geometry as a factor, which could be considered a proxy for turbulence intensity (though not a good one; see the next section on confounding). As adding a variable is easy in dimensional analysis, this indicates that turbulence intensity effects are understudied in turbulent jet breakup in general.
The neglect of turbulence quantities presents major issues from a modeling perspective. The data can not be compared fairly against models because the turbulence intensity is now a free parameter. Its precise value is unknown, and it is frequently estimated at precisely where it needs to be to make the model work regardless of whether that value is credible. One example of this problem is breakup length model of Lafrance [78], which uses an implausibly low value for the turbulence intensity (0.8% for fully developed smooth pipe flow, vs. about 5% in reality, depending on the Reynolds number) because that's what matches the data best. If a realistic value of the turbulence intensity were used, the model would not produce a realistic breakup length. The model is miscalibrated. Another example is the breakup length model of Ervine et al. [79]which uses an arbitrarily chosen turbulence intensity value of 3%, which, of course, fits the data very nicely. There is no reason to believe that choice is appropriate and possibly the model suffers from the same problem as Lafrance's model.
The work of Wu [56] also relies on estimates of the turbulence intensity implicitly as the turbulence intensity was assumed roughly constant. This hides the problem in empirically determined coefficients, which ultimately are functions of the turbulence intensity. By treating these coefficients are constants, the model assumes that the turbulence intensity is constant, limiting its ability to generalize. Being based on the work of Wu [56], the recent model of Magnotti et al. [80] itself neglects the turbulence intensity, despite Magnotti and Genzale's criticism of the KH-RT model for not considering turbulence effects [3, p. 34L].
It's not even necessary to quantify a variable (even implicitly as in the coefficient case) to "prove" the validity of a model. Bergwerk [81, p. 655] rejects the idea that turbulence can cause breakup because "turbulent velocity components [...] are hardly likely to be of sufficient magnitude" to cause breakup. However, Bergwerk did not quantify the magnitude of the turbulent velocity components or the velocity magnitude needed to cause breakup, making their argument simply an assertion.
Admittedly, the inverse problem of determining a model input from the outputs can often be perfectly valid. But it relies entirely on the model being validated with known values of the inputs. If the model was not validated with known values of the inputs, avoiding the issues mentioned in the previous paragraph, then there is little reason to be confident in the inversion. I don't believe that current models have been validated due to the issues mentioned in this paper. And even if a model passed a series of good validation tests for turbulent jet breakup, I am not convinced that any present models (including the model I develop in Trettel [52]) are sufficiently trustworthy to be used for inverse modeling purposes. The parameter space explored by existing data is too small; I'd need to be confident outside of the ranges of the source data. If an inverse problem can be avoided entirely (through measurement, pre-existing data, etc.), avoiding inversion is obviously much preferred. In this work I intentionally only select data where this inverse problem can be avoided entirely.
Estimating the turbulent kinetic energy with a model seems prudent if measurement is difficult. Unfortunately, the popular nozzle turbulence model developed by Huh et al. [82] as part of a larger spray model is in severe error when compared against experiment data, as I detailed in a previous paper [83]. For typical nozzle lengths (L 0 /d 0 ≈ 4), Huh et al.'s model predicts turbulent kinetic energies which are more than an order of magnitude too high. Despite this severe error, the combined nozzlespray model was successfully validated for predicting spray angles. This suggests either that the turbulence level of the jet is unimportant, which seems unlikely and contradicts the claims of Huh et al., or it suggests that the spray model is mis-calibrated due to the poor estimates of the turbulence level. This type of problem (integration tests being insufficient) will be discussed later in this paper.
Qualitative trends are also not sufficient. There are many studies which compare jets with presumably "low" turbulence intensity produced by smooth and short nozzles against against jets of presumably "high" turbulence intensity produced by jets from longer nozzles6. These studies can be used for only qualitative validation of models at best. For example, Reitz and Bracco [84, p. 1741L] reject the idea that turbulence alone could be responsible for jet breakup in a high density environment because the trend of a particular model coefficient as the nozzle length increases (presumably increasing the turbulence intensity) is the opposite of expectations if turbulence contributed to breakup. A more recent example is the study of Osta et al. [85], which examined injectors of nozzle aspect ratios L 0 /d 0 = 10 and L 0 /d 0 = 40 as a proxy for turbulence level. Osta et al. conclude that the longer nozzle has a faster rate of breakup, but different models can predict that without getting the precise sensitivity to turbulence intensity correct because the nozzle length also changes the velocity profile, as will be discussed in the next section. Because qualitative trends are so easy to match, they are not sufficient for validation.
The conclusion from the examples over the last page is that if a variable is not empirically quantified, it can be used to "validate" essentially any model or "confirm" any hypothesis.
From an experimental perspective, neglecting important variables also means that an experiment could be less reproducible. A later experimenter could try to reproduce the experiment but be unable to, and have no way to verify that their setup is producing the same jets. Aside from fully developed pipe flows, the turbulence intensity at the outlet of an internal flow component is a function of the inlet turbulence intensity. Consequently, even using the same nozzles and same upstream pipework may not be sufficient for reproducibility. The inflows to the test system must also be standardized.
Given that fully developed turbulent pipe flows have a universal and well-understood state, they make an excellent basis for experimentation. "Pipe" nozzles are the de facto standard nozzle for basic turbulent jet breakup research.
The friction factor f of a long pipe is strongly correlated with the averaged turbulence intensity Tu 0 of the flow [52]: See figure 5. Consequently, the turbulence intensity of any pipe nozzle can be estimated. The data compilation I made was restricted solely to pipe nozzles for this reason. Even if the researchers did not measure the turbulence intensity, it can still be credibly estimated if they used a pipe nozzle. For smooth pipes with turbulent flows, the friction factor is a relatively weak function of the Reynolds number.
6An additional common problem with these studies is confounding between the velocity profile and turbulence intensity, which will be discussed in a later section of this paper. The previously mentioned study by Skrebkov [12] used pipes of varying roughness to change the turbulence intensity independent of the Reynolds number. Unfortunately, as can be seen in table 1, only 3 studies I am aware of used rough pipes [12,17,31], so there is very little data with appreciable variation in turbulence intensity.
With this being said, pipe nozzles are not a panacea; they are a poor choice for studying low turbulence intensity scenarios as a smooth pipe has a turbulence intensity of roughly 5%, while some nozzles designed to produce highly stable jets may have turbulence intensities below 1%. Care must also be taken to have a smooth outlet to separate the effects of imperfections in the orifice and turbulence intensity [86, p. 1162, 13, p. 179, 18, p. 6].
When the turbulence intensity can be taken into account, the accuracy of a turbulent jet breakup model is improved. The results of regression analysis of breakup length data under various conditions is shown in table 2. The regression equation is fitted to 145 data points from 5 different studies, as shown in figure 2. These studies all used long pipes which produce fully developed turbulent flow as their nozzles. The data has been limited to the second windinduced regime where a power law for breakup length has been shown to hold. The study of Kusui [17] had varying roughness which allows the turbulence intensity to vary from about 5% for a smooth pipe to about 13% for a very rough pipe as can be seen in figure 47. This variation in turbulence intensity is much wider than is typical and provides a strong challenge to turbulent jet breakup models. The columns of table 2 indicate which of the 3 variables (We 0 , Re 0 , and Tu 0 ) are considered. The left column lists the exponents of the regression equation. The bottom row is R 2 , a simple measure of how well the model matches the data8. Higher R 2 values indicate a better fit, with 1 being the maximum. Comparing the We 0 and We 0 , Tu 0 cases shows that including the turbulence intensity in the model does appreciably improve the accuracy, increasing R 2 from 0.877 to 0.979. Adding Re 0 only offers a marginal improvement with an R 2 value of 0.980, indicating that the turbulence intensity is indeed more 7Unfortunately Kusui [17] had a moderate length smooth section after their rough pipe, which complicated the estimation of the turbulence intensity. The turbulence intensity was estimated as if the length of the short section was zero. This selection was found to be most consistent with breakup length data from non-pipe nozzles. This issue is discussed in a previous paper [52]. New experiments without this issue are needed.
8In the introduction, I had recommended examining how frequently the model estimates are within the error bounds of the data instead. R 2 implicitly assumes that the data has no uncertainty, and is chosen here for simplicity. important than the Reynolds number in the second windinduced regime. As will be discussed in the next section, this apparent (small) Reynolds number effect may actually be a turbulence intensity effect due to confounding.

Confounding and spurious correlation
Confounding between variables occurs when an experimenter can not differentiate between the effects of changing one variable and the effects of changing another. If two independent variables are changed at once, it is impossible to know the relative contributions each independent variable to the change seen in the dependent variable. Any change seen could have been due to one variable, the other, or both. The vast majority of previous turbulent jet breakup experiments considered in this work suffered from confounding between variables, making data analysis ambiguous unless steps are taken to avoid the confounding. Unfortunately, that was done infrequently for the confounding between We 0 and Re 0 , rarely for the confounding between Re 0 and Tu 0 , and rarely for the confounding between the velocity profile and Tu 0 . The difficulty of distinguishing between We 0 and Re 0 effects in jet breakup experiments was first noted by Asset and Bales [6, p. 2] in 1951, later independently noted by Dodu [87,88] in 1959, but appears to have received little attention since.
The most obvious example of confounding in turbulent jet breakup is between the Reynolds number Re 0 ≡ U 0 d 0 /ν and the Weber number We 0 ≡ ρ U Again, if two variables are changed simultaneously, it's impossible to attribute the effects seen to either variable unambiguously. Perhaps the jet is insensitive to changes in the Reynolds number as long as the Reynolds number is high enough to establish turbulent flow10. Then, all of the changes seen would be due solely to the Weber number changes. However, that can not be determined from a single nozzle with a single fluid. One must use different nozzle diameters and/or fluids to break the correlation between We 0 and Re 0 . This is what I did by compiling data from many different diameter nozzles and fluids, as can be seen in figure 3. Breaking the confounding requires varying the nozzle diameter and the fluid (to change the 9The earliest example of a We 0 -Re 0 plot that I am aware of is due to Dodu [87, p. 500, fig. 4] in 1959. 10For convenience, when I write "independent of Re 0 ", I mean "independent of Re 0 provided it is high enough that the hydrodynamic flow regime is turbulent". viscosity and/or surface tension).
Less obvious is the confounding between Re 0 and Tu 0 . For a particular nozzle geometry and surface roughness, the Reynolds number at the nozzle outlet determines the turbulence intensity at the nozzle outlet. The relationship between the two is not universal [71], but is known for long pipe nozzles, as discussed previously. This confounding can be seen in figure 4. The confounding between the velocity profile and Tu 0 was also discussed in Trettel [54].
Confounding is often caused by nondimensionalization, as it is in the We 0 and Re 0 case. Dimensionless variables frequently have common dimensional variables. Even if all of the dimensional variables were uncorrelated, there may now exist a correlation between the dimensionless variables. A subset of this issue has been discussed extensively in the dimensional analysis literature as "spurious correlation" [89]. However, spurious correlation is only a consequence of a particular type of confounding. In spurious correlation, the dependent (output) dimensionless variables contain dimensional variables in common with the independent (input, not statistically independent) dimensionless variables. In contrast, confounding is more general, and applies between independent variables.
Estimates of the correlation between variables with common terms can be computed assuming that the dimensional variables are uncorrelated [89]. The correlation between dimensionless variables often should be considered when quantifying uncertainty, as typical approaches assume that all variables are uncorrelated11.
To be clear, the confounding seen in turbulent jet breakup is not necessarily caused by nondimensionalization. For example, while the dimensionless velocity profile and turbulence intensity both have the average velocity U 0 in common, confounding can still occur in cases where U 0 is held constant, as it roughly is in many experiments. The confounding is actually an artifact of how many experiments are conducted. Multiple variables change as the nozzle aspect ratio L 0 /d 0 is changed, where L 0 is the nozzle orifice length: 1. the velocity profile changes (and consequently, the boundary layer thickness increases); 2. the flow can transition from laminar to turbulent; 3. the turbulent kinetic energy typically increases (as would the magnitude of the Reynolds shear stress); 4. swirl decays; and 5. the flow could have separated at the nozzle inlet but reattach further downstream. Consequently, if one uses L 0 /d 0 as a proxy for any of the 5 mentioned effects, one can not distinguish between these effects. A similar problem causes confounding between the turbulence intensity Tu 0 and Reynolds number Re 0 , as for smooth pipe nozzles the turbulence intensity is only a function of the Reynolds number. If one variable is a 11Note that even if the dimensional variable in common between two dimensionless terms were held constant experimentally, the errors would still be correlated. function of the other only, then regardless of the composition of those variables (i.e., into dimensional terms), they will be highly correlated. The type of confounding caused specifically by nondimensionalization is, however, the cause of confounding between We 0 and Re 0 .

Avoiding confounding
Confounding in general is best avoided by covering the relevant parameter spaces relatively completely. This could be accomplished through factorial experimental designs. Factorial experiments appear to be rare in turbulent jet breakup; I am aware of only the studies of Ruiz and Chigier [90][91][92]. The experimenter is also required to not miss any important variables. Non-experimentalists are limited by existing experiments in this regard. For turbulent jet breakup, existing data can avoid confounding only for the breakup length, because the parameter spaces are widely sampled enough in that case. To detect confounding, check parameter space plots (e.g., figures 3 and 4) for correlations between different variables. If these are seen, then cover the parameter space more comprehensively by changing the experimental conditions or looking for new data in different parts of the parameter space. As previously mentioned for the We 0 and Re 0 case, this may require changing the nozzle diameter and fluid.

Consequences of confounding
The potential consequences of confounding can be seen in tables 2 and 3. These tables show the results of regression analyses of breakup length data under various conditions. As previously discussed, table 2 shows the effect of using different variables in a regression analysis. Table 3 shows the effect of using different variables in a regression analysis of confounded data. Due to the confounding between We 0 and Re 0 , and also Re 0 and Tu 0 , is is impossible to say whether the observed trends are due to changes in any of those three variables in the confounded case. Indeed, the R 2 values for all four conditions considered are essentially 1 in this case. Confounding between the Reynolds number and turbulence intensity occurs in most turbulent jet breakup experiments. And confounding between the Weber and Reynolds numbers is not uncommon either. As can be seen in figure 3, only 2 experimental studies (Grant and Middleman [15] and Kusui [17]) out of 5 considered avoided confounding by using different nozzle diameters and/or fluids to cover the parameter space more widely. Compiling data in this case helped avoid confounding, but just adding data is not a solution to confounding. The data must avoid a correlation between the two variables of interest to avoid confounding.
Confounding may help explain why so many regressions in turbulent jet breakup seem to be contradictory12.
12The contradictions likely can be partly explained by differences in For the breakup length, many different functional dependencies have been used. A small sample can give the reader an idea of the variety. Miesse [93, p. 1698] proposed Grant and Middleman [15, p. 184] suggested where A ≈ 85 to 112. Finally, De Jarlais et al. [35, p. 87R] obtained the best fit equation Some researchers try both Weber and Reynolds numbers, while some prefer just one of either. It is possible that each of these expressions does in fact fit the source data well, but confounding makes the precise functional dependency more difficult to identify. Considering confounding, the data most closely matches the general form first proposed by Grant and Middleman [15], albeit with a turbulence intensity modification. The data presented here does not clearly eliminate a Reynolds number dependence, rather, it merely shows that any Reynolds number dependence on the breakup length in the second wind-induced regime is weak. If I assume that the experimental data has no uncertainty and neglect the (presumably small) effects of confounding, then using the standard error for the coefficient in the case where all three variables are considered (table 2) I find that C Re 0 = 0.01377 ± 0.0003 (95% interval). This does not overlap with zero, though it might if the uncertainty in the experimental data is considered. Future work will examine if there still is some confounding between Re 0 and Tu 0 which might make C Re 0 statistically indistinguishable from zero.

Integration tests are not sufficient
Another common problem is the use of easily measured quantities which include droplet breakup (secondary breakup), droplet coalescence, and droplet transport to "validate" models which only predict primary13 breakup quantities like the droplet diameter at formation. For simplicity I'll call measurements which include secondary regimes as well.
breakup, droplet coalescence, and droplet transport "farfield" measurements. Kastengren et al. [95, p. 132L] rightfully note that "simply comparing modeled to measured droplet size in the far-field is insufficient to validate the physical breakup model; data in the "near-field" are needed, since this is the region in which primary breakup actually occurs." Magnotti and Genzale [3, p. 34L] have also expressed skepticism about the usefulness of far-field measurements.
Converting between near-field and far-field quantities now requires additional droplet breakup, coalescence, and transport models. As such, this attempt at validation does not test the primary breakup model directly. Now a validation failure could mean a failure of either the droplet breakup model, the droplet coalescence model, the droplet transport model, the primary breakup model, or any combination of the four. The result is ambiguous, making the models difficult to falsify. And a validation success does not necessarily mean that the primary breakup model is correct, as the droplet breakup and coalescence models could hypothetically compensate for problems in the primary breakup model in a way which makes each model wrong when taken in isolation. This possibility becomes much more likely when one considers that these models are usually tuned to the data.
For this reason, it is strongly preferred to validate each model individually (like a "unit test") in addition to the "integration test" for all of the submodels in combination. In software testing, a unit test tests a specific part of a software. An integration test tests the combination of the parts of the software. The same terminology can be applied to testing models in isolation vs. testing the larger collection of models. Similar terminology has been adopted in model validation previously [96, pp. 37L-38R].
In the droplet diameter measurement case there is one additional subtlety that is often missed in the literature. It is not strictly correct to merely compare predictions of primary breakup droplet diameters to droplet measurements in the near-field, that is, droplet measurements at a particular location. One must measure only droplets which were just formed from the jet, that is, droplets formed through primary breakup without any other influences. I'll call these "just-formed" droplets for brevity. The measurements of the Faeth group (e.g. Wu et al. [42], Wu and Faeth [43], and Wu et al. [97]) are the only which I am aware of for just-formed droplets. These measurements are more difficult and limited than either near-field or far-field measurements, as they require analyzing many photographs of the breakup process to select only just-formed droplets. Photography is unfortunately limited in the near-field due to the density of the spray in many situations. Fortunately, new DNS data analysis techniques are being developed to obtain size distributions for just-formed droplets [98], however, this processed DNS data is not available as of this writing.
It is possible that the distribution of near-field droplets is similar to the distribution of just-formed droplets. If true, this would greatly simplify experimentation while maintaining rigor. To my knowledge, this hypothesis has yet to be validated. The easiest way to test the hypothesis would be to measure droplet diameters in the near-field in a case directly comparable to data collected by the Faeth group.
Several sources of data were neglected in my data compilation because only far-field quantities were measured. Of the 7 pipe jet studies with droplet diameter measurements, only 2 had droplets in the near-field. Both were measurements of just-formed droplets. The non-pipe data of Bogdanovich [76] and Dumouchel et al. [99] includes turbulent kinetic energy at the nozzle exit, however, the droplet diameter measurements are in the far-field, making these data sources less attractive for validation.
The insufficiency of integration tests also was previously mentioned in the case of the combined nozzle turbulence and jet breakup models of Huh et al. [82]. The nozzle turbulence model is in severe error, making one of the inputs of the spray model far off where it should be, likely making the model poorly calibrated for the true turbulence intensity despite the apparent validation of the model.

Apples-to-oranges comparisons
Strictly speaking, it is incorrect to naively combine data from nozzles of different geometries, but this is still frequently done in liquid jet breakup research. If not all the important variables are quantified, to properly combine data a researcher needs to be confident that the variables which are not quantified do not vary much, and as such can not have a major influence on the results. This is difficult to do in atomization research in general. Assuming that one has the turbulence intensity, for example, that is not sufficient to fully characterize the turbulence. One would need at the very least some measure of the integral scales. However, integral scale measurements are even more rare than turbulence intensity measurements. Consequently, when compiling data, for the moment it would be useful to know that the integral scales are roughly fixed and consequently will not be affecting the results appreciably. This may be true for pipe jets and is another reason to prefer pipes as nozzles. In an earlier paper [52] I compared a regression against data from non-pipe nozzles and found deviations. The prime suspects for the error are of course variables which I have no estimates for, such as the integral scales.
To give some examples of this problem, consider the breakup length model of Gorokhovski [100] in 2001. The breakup length is plotted as a function of the square root of the density ratio ρ /ρ g in figure 6. Gorokhovski would have had to selected a Weber and Reynolds number for this plot based on their model (if not other variables like turbulence intensity not considered in the model). This selection was not discussed in the paper, and it seems unlikely that the Weber and Reynolds numbers match the experimental data cited (Lee and Spencer [101] and Hoyt and Taylor [102]). Consequently, this appears to be an apples-to-oranges comparison. The problem is actually worse than it appears at first glance, as neither Lee and Spencer [101] or Hoyt and Taylor [102] report what is most commonly known as breakup length. Their studies are photographic and do not report enough photos to allow for estimation of the breakup length. Gorokhovski's model computes the breakup length, but it is not being compared against the breakup length. And, finally, the fit between the model and data is not particularly good. Possibly Hoyt and Taylor's single data point fits poorly due to the applesto-oranges comparison. Gorokhovski does not discuss this discrepancy, but characterizes the fit as "satisfactory", though the lack of uncertainty quantification makes how adequate the fit is difficult to determine.
This problem was not isolated to Gorokhovski's model. Over a decade later in 2017, Movaghar et al.

Validation against only a small amount of data when more is available
The examples from Gorokhovski [100] and Movaghar et al. [103] demonstrate another common issue: Validation against a small amount of data. Gorokhovski [100] compared against a total of 5 data points. For the density ratio effect, Movaghar et al. [103] used only 4 of the previous data points. Fortunately, Movaghar et al. [103] used the data compilation of Wu and Faeth [47] for validation of their breakup length model at low atmospheric densities (high ρ /ρ g ). Aside from the Faeth group, data compilation is rare.
As previously mentioned, turbulence intensity is frequently neglected. Comparison against data where the turbulence intensity varies over the range expected in practice is necessary to validate turbulent jet breakup models. Aside from my CDRSV model [52], Skrebkov's model [12]'s, and Bogdanovich's scaling model [76], I am not aware of comparison of a model against data with varying turbulence intensity14.
More data, as long as it does not duplicate existing data or contribute to confounding, typically challenges a model. Jet breakup has been studied for over a century. While many of the early experimental studies are of poor quality by modern standards, studies from the 1950s through now are generally worth consideration. Validation is best done with as much data as possible to identify as many faults with a model as possible.

Not measuring the most typical quantity of interest
Considering only pipe jets, 4 studies were neglected because the QoIs were not measured at all [8,11,14,27]. These studies tended to be old and were often qualitative (e.g. Hoyt and Taylor [27] or focused on different QoIs than I do.
Considering non-pipe jets, another problem appeared. Some researchers used an alternative QoI which is analogous to but not the same as another common QoI. As an example, the DNS study of Salvador et al. [104, fig. 7] has a plot of axial mass concentration. It is reasonable to believe this is analogous to the breakup length, but the two are not directly comparable. There is no reason why the time-averaged or 50th percentile breakup length could not have been measured from the same DNS data that produced the axial mass concentration plot.
These issues would be reduced by adoption of standard quantities of interest. In this paper I focus on the most commonly measured quantities of interest (see figure 1), which I believe are physically meaningful and useful.

Data presentation issues
Ambiguous data 2 studies were neglected due to the data being presented ambiguously [7,20]. For example, to my knowledge it is not possible to determine the Reynolds number, Weber number, etc. from the regime data presented by the study of Kusui [17]15. This is because the data was only plotted in coordinates that did not allow the Reynolds number and Weber number to be determined independently.
This problem is likely to persist due to space considerations in journals. However, in the future it is strongly recommended to include tabulated raw data (in primitive variables, e.g., U 0 , d 0 , and ν rather than Re 0 ) in reports and dissertations. Posting raw data online is also highly recommended in addition. To increase the probability of the data being available in the future, both are recommended.
To identify whether something is ambiguous in the 14The regression model of Dumouchel et al. [99] is worth mentioning at this point, but it is not a phenomenological model like the others mentioned.
15Note that as I was able to obtain some data from Kusui [17], this study was not included among the two neglected. data, it is recommended that someone else reproduce some basic plots of the data given the tabulated or digitized version.
On a related note, 2 studies were neglected from the data compilation because they presented curves for experimental data rather than data points [10,36]. These studies were neglected because it is impossible to determine which precise points were tested from a curve.

Data neglected due to inconsistency with others
A few data sources were neglected due to inconsistency with other data sources deemed reliable.
Breakup onset location ( x i ) measurements from Eisenklam and Hooper [9] and Reitz [26] were neglected because they were inconsistent with other measurements16. Possibly this is due to the use of small sample sizes, making the error in the mean very large.
Breakup length measurements (  fig. 3] were found to be inconsistent with others and were neglected. However, the breakup length probability distribution from Phinney [19, p. 698, fig. 6] is consistent with other data. Only Phinney's figure 3 appears to be in error. Private communication [106] with the author and the fact that the author's other data is consistent with other researchers indicates this likely was a data reduction error limited solely to figure 3 which does not indicate unreliability of any of the author's other data. On this note, the definition of the stability parameter in Phinney [19, p. 692, fig. 2] has a typographical error; the correct definition is given on page 690 in the text, verified by computing the stability parameter for data from Chen and Davis [13] and comparing against the plot. Another potentially related problem is that the surface tension for fluid II appears to be much lower than seems possible for salt water. Fortunately fluid II was not used for fig. 6 of Phinney [19, p. 698], which may help explain why that plot appears consistent with other data but the other breakup length plot does not.

Foreign language
3 studies were neglected because they were written in a foreign language and did not appear to contain useful data [5,24,28]. While foreign language does not disqualify a study17, if a foreign language study does not appear to contain valuable data from a superficial examination, I 16Because the spray angle from Reitz [26] was used, this study does not count towards the count of inconsistent studies.
17I published English translations of Russian papers by Lebedev [71] and Natanzon [77], so the language barrier is not impenetrable. did not deem it important enough to examine further. It is possible that these studies do contain useful data that is obscured by the language barrier.

Conclusions and recommendations
Improving the validation of turbulent jet breakup models requires not only changes to how models are validated but also new experimental data that is more challenging for model validation. Consequently recommendations are made for both modelers and experimentalists/computationalists. These guidelines are designed to address issues specifically in turbulent jet breakup, and they complement existing validation guidelines [96].

Recommendations for model developers
1. Compile as much data as possible. Actively look for data that fills in gaps in your parameter spaces (e.g., figure 3) and avoids confounding.
2. Consider the uncertainty of the source data. If necessary, neglect data which is too uncertain.
3. Consider turbulence intensity in new models, as it is an important variable in turbulent jet breakup that is frequently neglected.
4. For primary breakup droplet diameter and velocity models, it is necessary to compare against primary breakup data, i.e., data on the diameter and velocity of droplets just-formed from the jet. Far-field measurements are discouraged. Near-field measurements may be acceptable, but they need to be compared against just-formed measurements to check that they are similar.
5. Match all variables (We 0 , Re 0 , Tu 0 , etc.) when comparing model estimates to data. And ensure that the model estimate is for the same quantity as the data. Otherwise the comparisons are invalid; they would be apples-to-oranges comparisons.
Recommendations for experimentalists and computationalists 1. Estimate uncertainty for every measurement.
2. Measure or estimate turbulence intensity at the very least, if not other turbulence quantities (integral scale, Reynolds stress).
3. If turbulence intensity will not be measured, use a standardized setup where the turbulence intensity can be credibly estimated. "Pipe" nozzles which produce fully developed turbulent flow are one way to do this. If the pipes are roughened and the friction factor of the pipe is measured, then the turbulence intensity can be credibly estimated [52]. 4. Cover the We 0 -Re 0 and Re 0 -Tu 0 parameter spaces well enough to avoid confounding. Don't use nozzle orifice length as a proxy for turbulence level or the velocity profile.
5. Measure common quantities of interest: droplet diameter distribution f (D) or Sauter mean diameter D 32 , average droplet velocity at formation v d , average breakup onset location x i , average breakup length x b , and spray angle θ i . Other quantities of interest may be useful in specific applications.
6. For quantities without clear standard definitions (e.g., the spray angle), define the quantity in a precise way. Preferably this definition does not require expensive equipment. For the spray angle the standard could be a specific threshold for photographic or mass fraction measurements.
7. Distinguish between quantities of interest which depend on only primary breakup and those which include other effects. For testing primary breakup models, it is better to measure the diameters of droplets "just-formed" from the jet. If only nearfield measurements are possible, replicate one of the papers with just-formed droplets and compare the near-field and just-formed measurements. If the two are close, then near-field is an acceptable proxy for just-formed measurements.
8. Release raw data with each publication.