OUP user menu

Relevance of hospital characteristics as performance indicators for treatment of very-low-birth-weight neonates

Melanie Esser, Nicholas Lack, Christina Riedel, Ulrich Mansmann, Ruediger von Kries
DOI: http://dx.doi.org/10.1093/eurpub/ckt176 739-744 First published online: 28 November 2013


Background: Current attempts at centralization of neonatal care in Germany focus on a minimum volume of 30 very-low-birth-weight (VLBW, weighing <1250 g) neonate admissions per year. However, the evidence for a selective referral strategy based on hospital volume is unclear. Method: A total of 5575 neonates weighing <1250 g treated in 31 hospitals in Bavaria between 2000 and 2011 were analysed using population-based data. The relevance of different hospital characteristics (i.e. hospital volume, bed capacity and teaching status) for explaining individual in-hospital mortality as well as interhospital variation in mortality rates was analysed using multilevel logistic regression analysis. Results: In a risk-adjusted model, only dichotomized hospital volume (<30 admissions) was significantly associated with higher mortality in VLBW neonates (odds ratio: 1.74; 95% confidence interval: 1.02–2.99). However, the higher mortality risk only applied to neonates with higher Clinical Risk Index for Babies (CRIB) scores. There was considerable heterogeneity in mortality rates between Bavarian hospitals. The median odds ratio for mortality between two neonates treated in a randomly chosen low-performing versus high-performing hospital was 1.62 in the null model (without explanatory variables). Hospital volume only explained 15.1% of interhospital variation in mortality rates after adjustment for case-mix. Other hospital characteristics were of minor relevance. A funnel plot of the standardized mortality ratio against the number of admissions showed that 41% of small-volume hospitals performed better than expected. Conclusion: A selective referral strategy based solely on hospital volume will fall short of the task of optimal allocation of neonatal care by means of centralization.


The influence of hospital volume, defined by the number of admissions on the mortality rates of very-low-birth-weight (VLBW) neonates <1500 g is an increasing focus of research in recent years. Hospital volume is discussed as a proxy measure for determining the quality of care.1 Using hospital volume as a quality criterion is attractive because it is easy to measure and it seems reasonable that the number of admissions correlates with experience in treatment, and therefore results in good quality of care. The majority of studies show a positive association between a high neonatal intensive care unit (NICU) caseload and survival of VLBW neonates.2–7 As a consequence, there were attempts in Germany to increase centralization of neonatal care by raising the threshold to qualify for treatment of VLBW neonates <1250 g from 14 to 30 annual admissions. However, in December 2012, this new threshold was revised by the Social Court of the Bundesland Brandenburg because of gaps in evidence in the current literature.8

The evidence for hospital volume as a quality criterion is equivocal.1,9 The underlying mechanism is unclear, and volume-outcome studies were criticized for their methodological shortcomings.1 Other hospital characteristics such as structure, organization and personnel might allow for a better prediction of VLBW mortality. A beneficial association between highly specialized neonatal care hospitals and mortality rates has been reported.10 However, the evidence for effects of other hospital characteristics is sparse.11–13 The use of hospital characteristics in explaining the differences in mortality rates between hospitals receives even less attention.2,14 Therefore, it is unclear whether a selective referral strategy, based exclusively on hospital volume, is effective in discriminating between hospitals with respect to the quality of care.

To answer this question, a multilevel analysis of VLBW neonates <1250 g born in the German state of Bavaria between 2000 and 2011 was performed to identify sources of interhospital variation in mortality rates.


Selection of study population and preparation of data set

The study design was a secondary data analysis. Neonatal data on n = 8522 life birth neonates <1250 g treated between 2000 and 2011 were available from the Bavarian Working Group for Quality Assessment (‘Bayerische Arbeitsgemeinschaft für Qualitätssicherung’, BAQ). The data are collected routinely from Bavarian hospitals on a population basis benchmarking against national standards of outcome and care. Hospitals transfer electronic records after removing personal identifiers.

A total of 8079 neonates with gestation between 24 and 33 weeks were selected. Our analysis was confined to these neonates because treatment strategy before 24 weeks of gestation is dependent on prognosis and parental considerations,15 and because mortality is substantially reduced after 33 weeks.16 Cases with lethal malformations, such as anencephaly or trisomy 13,17 (n = 43) or missing data on gestational age (n = 360) were excluded. The analysis was further confined to cases treated in tertiary-level perinatal centres (i.e. level 1 in Germany) because treatment of VLBW neonates outside these centres is not recommended in Germany18 (n = 7289), and only to direct transferrals from obstetric wards to neonatal units (n = 6655). Because the objective of the study was to assess the potential impact of hospital characteristics on VLBW survival, the study population was confined to neonates discharged home and deaths occurring within the initial NICU (n = 5578). Three cases were excluded because of data inconsistencies. The final study population consisted of 5575 cases treated in 31 Bavarian hospitals.

Variables used in the analysis

In-hospital mortality was chosen as the outcome variable to enable comparison with other studies.

The infant characteristics birth weight, sex, multiple birth (singleton, twins, triplets and above), temperature on admission (<34°C, 34.0°–35.9°C, 36.0°–37.9°C, >37.9°C), severe acutely life-threatening congenital malformations,17 place of birth (inborn, outborn or missing), completed weeks of gestation, early sepsis/systemic inflammatory response syndrome within 72 hours after birth, Clinical Risk Index for Babies (CRIB; 0–5, 6–10, 11–15, ≥16, and missing)19 and weight per gestational age (appropriate, small or large for gestational age) were available from the records and considered as covariates for individual risk adjustment in VLBW neonates.19–21 Gestational age was determined by the paediatrician based on signs of maturity, last menstrual period and first trimester ultrasound. Weight per gestational age percentiles were calculated from gender-specific tables of German population-based percentiles,22 with values below the 10th percentile coded as small for gestational age and values above the 90th percentile coded as large for gestational age. Early sepsis/systemic inflammatory response syndrome was defined according to the surveillance protocol Neo-Kiss by the National Reference Centre for Surveillance of Nosocomial Infections, and was taken as a surrogate for a transplacentally acquired infection.23 The CRIB score is a validated neonatal physiological score composed of gestational age, birth weight, severe congenital malformations, maximum base excess and minimum/maximum appropriate fraction of inspired oxygen, measured within the first 12 hours of life. High scores indicate high mortality risk.19 In the 15% of cases where the CRIB score was missing, the standard categories of the International Neonatal Network19 were extended to accommodate missing values. A similar method was used for the 9% of cases where the place of birth was missing. All other variables had no missing data.

Hospitals with ≥30 admissions of VLBW (<1250 g) neonates per annum in ≥6 years of the study period of 12 years were dichotomized as large hospitals (n = 8), and those with <30 admissions as small hospitals (n = 23). We adopted this approach to avoid misclassification because data were not available for each unit in every year. As an indicator of hospital capacity, the mean annual number of NICU beds available per hospital was extracted from the Bavarian hospital plan. The mean number of NICU beds was subsequently entered into the model as a linear variable. Finally, teaching status was classified as ‘university hospital’ (n = 6), ‘teaching hospital’ (n = 20) and ‘not affiliated with a university’ (n = 5).

Statistical analyses

Selection of individual level covariates for risk adjustment related to the case-mix of the hospitals was performed using stepwise logistic regression. Temporal effects of the year of admission were tested using year dummies. To allow for possible non-linear effects, squared gestational age was also added. Interactions between CRIB score/gestational age and all individual level covariates were tested. The inclusion threshold for variables to remain in the model was set at 5%. Birth weight and gestational age were found to be closely associated. The choice between both covariates was assessed using Akaike information criterion, and the gestational age was selected based on it.

Mortality was modelled using multilevel logistic regression (generalized linear mixed model with a logit link), including a random intercept for the hospitals, to correct for clustering of patients within the same hospitals.24 Multilevel analysis enables estimation of variations between hospitals separately from infant residual variation, and thereby enables quantification of the proportion of variance in outcome being explained by differences in hospital characteristics.24 Subsequently, characteristics at the hospital level can be analysed with respect to their relevance in explaining the differences between hospitals.24 The multilevel model included individual-level covariates associated with lethal outcome obtained from the stepwise logistic regression. Covariates at the hospital level included size of hospital volume, mean number of NICU beds and teaching status. Odds ratios (ORs) and the corresponding 95% confidence intervals (CIs) were estimated. The discriminatory performance of the model was assessed by the area under the curve (AUC).25 For further analysis, cross-level interactions between gestational age/CRIB score and hospital volume, as well as hospital-level interactions between all hospital covariates and hospital volume were tested in an alternative model, and the corresponding ORs were also calculated.

The relevance of the different hospital characteristics for explaining the variations between hospitals was assessed against a null model with only a random intercept and no further explanatory variables. The null model shows the basic partition of variability between the individual and the hospital level.24 The heterogeneity between hospitals was estimated by the median odds ratio (MOR) which can be calculated from the hospital-level variance (Vh): MOR = exp(0.95 × √Vh).26 This may be interpreted as the median of the ORs between two neonates from randomly chosen different hospitals, with the first having higher risk.26,27 Large values indicate a high amount of heterogeneity between hospitals.26,27 The MOR can be interpreted the same way as an ordinary OR, and therefore can be compared with the other OR estimates.26,27 CI and significance of the hospital variance were calculated based on the profile-likelihood.24 Hospital-level characteristics were separately added to the null model, and for each model hospital-level variance, proportional change in variance (PCV) with respect to the null model28 and MOR were computed. The PCV was calculated by the following form: Embedded Image where Vh1 = hospital-level variance of the initial model and Vh2 = hospital-level variance of the model with more terms. A decrease in hospital variance indicated that the respective hospital-level covariate was relevant in explaining interhospital differences.29 All models were compared pair-wise by computing the difference in the respective log-likelihoods, doubling this value and referring this to the Chi-squared distribution with the difference in respective degrees of freedom. To analyze whether the relevance of the hospital-level covariates might be different in the presence of individual risk factors, the same procedure was carried out for a model adjusting for case-mix (reference model).

Finally, the ratio of observed to predicted probabilities (standardized mortality ratio, SMR) in the reference model was plotted against the total number of observations during the study period. We differentiated between small- and large-volume hospitals. A funnel plot of the SMR was constructed using 95% control limits around the reference of 1 as well as local CIs based on the Wilson score method.

A value of P < 0.05 was considered significant. Analyses were performed using SAS software (version 9.3, SAS Institute, Cary, NC, USA) and R-2.15.2. According to German regulations, no approval of an ethics committee was necessary for the analyses of anonymous data.


We found that 60.9% of neonates (table 1) were treated in low-volume hospitals (<30 VLBW cases per year). Hospitals were equipped with a median of 10.0 (Interquartile range [IQR]: 9.0–10.2) NICU beds and 62.6% of the neonates were admitted to teaching hospitals. A total of 726 neonates died with an overall mortality rate of 13.0%. Median gestational age was 28.0 (IQR 26–30) weeks, the majority of the neonates (54.1%) had a CRIB score between 0 and 5.

View this table:
Table 1

Infant and Hospital characteristics of the study population

LevelVariable, unitValue
Infant characteristics (n = 5575 cases)Mortality, %13
Birth weight, mean/median (SD/IQR), g939/970 (214/780–1125)
CRIB score, %
Gestational age, mean/median (SD/IQR), completed weeks27.8/28.0 (2.3/26–30)
Weight per gestational age, %
    Appropriate for gestational age72.4
    Small for gestational age26.1
    Large for gestational age1.5
Severe congenital malformations, %1.3
Multiple birth, %
    Triplets or more4.4
Early-onset sepsis/Systematic inflammatory response syndrome, %17.1
Sex, %
Temperature in °C on admission, %
Place of birth, %
Hospital characteristics (n = 31 hospitals)Hospital volume, % of admissions
    Small volume (<30 admissions), n = 2339.1
    Large volume (≥30 admissions), n = 860.9
Annual number of admissions, mean/median (SD/IQR)26.2/27.1 (9.7/17.9–35.3)
Teaching status, % of admissions
    University hospital, n = 622.1
    Teaching hospital, n = 2062.6
    Not affiliated with a university, n = 515.4
Annual number of NICU beds available, mean/median (SD/IQR)9.8/10.0 (2.4/9.0–10.2)
  • SD = standard deviation; IQR = interquartile range; g = grams; CRIB = clinical risk index for babies; NICU = neonatal intensive care unit.

Multilevel analysis

Risk-adjusted multilevel logistic regression showed that neonates born in small hospitals had a 74% higher chance of dying than those born in large hospitals (OR 1.74, 95% CI 1.02–2.99; Model 0, table 2). Neonates treated in teaching hospitals tended to have lower likelihood of dying than those treated in university hospitals (OR 0.58, 95% CI 0.33–1.04). NICU bed capacity showed no significant effect (OR: 0.94, 95% CI: 0.86–1.03).

View this table:
Table 2

Results of the full multilevel model, fixed effects of hospital-level covariates

Hospital characteristicsMortality OR (95%CI)
Model 0Model 1
Hospital volume
    Large (≥30 admissions <1250 g)Ref.
    Small (<30 admissions <1250 g)1.74 (1.02–2.99)
    Stratified analysis:
        CRIB score 0–5 (small versus large)0.88 (0.45–1.70)
        CRIB score 6–10 (small versus large)1.61 (0.88–2.92)
        CRIB score 11–15 (small versus large)2.69 (1.39–5.18)
        CRIB score ≥16 (small versus large)1.17 (0.36–3.83)
        Missings (small versus large)2.53 (1.24–5.15)
Type of hospital
    Teaching hospital0.58 (0.33–1.04)0.62 (0.34–1.10)
    Not affiliated with a university0.89 (0.39–2.00)0.94 (0.42–2.14)
Mean number of NICU beds available0.94 (0.86–1.03)0.94 (0.86–1.03)
  • OR = Odds Ratio, CI = Confidence Interval, NICU = neonatal intensive care unit, CRIB = clinical risk index for babies, Ref. = Reference

  • Model 0 adjusted for the individual level covariates gestational age, CRIB score, sex, severe malformations, multiple birth, early-onset sepsis/systemic inflammatory response syndrome, temperature on admission, place of birth, gestational age*gestational age, gestational age*severe malformations, gestational age*CRIB score, CRIB score*severe malformations.

  • Model 1 adjusted for the same covariates as Model 0 plus the cross-level interaction term hospital volume*CRIB score.

In the alternative model, cross-level interaction between the hospital volume and the CRIB score was significant. The stratified ORs showed that only neonates with CRIB scores of more than 5 had a higher chance of dying if born in small hospitals than large hospitals, and this likelihood was elevated for higher CRIB scores (CRIB score 6–10: OR 1.61, 95% CI 0.88–2.92; and CRIB score 11–15: OR 2.69, 95% CI 1.39–5.18; Model 1, table 2). This effect could not be shown for neonates with a CRIB score ≥16 (OR 1.17, CI 0.36–3.83); however, there were only 97 admissions in this category.

Both models showed a good discriminatory ability with an AUC of 0.88.

Analysis of variance

In the null model with random intercept only, the MOR for mortality between neonates treated in a randomly chosen low-performing versus high-performing hospital was 1.62 (Table 3). Hospital volume did not explain any of the interhospital variation in the hospital mortality rates nor did the mean number of NICU beds. Teaching status explained 9% of the interhospital differences, i.e. Vh was reduced by 9.0% by adding this variable to the null model.

In the reference model adjusting for case-mix, MOR increased to a value of 1.72. Hospital volume explained 15.1% of the interhospital variation, i.e. Vh was reduced by 15.1% by adding hospital volume to the reference model. Teaching status explained 3.0%, whereas mean number of NICU beds was of no relevance.

View this table:
Table 3

Hospital level variances dependent on different models: Effects of adding hospital-level covariates to null model (random intercept only) and to reference model (random intercept and individual covariates)

ModelVh (95% CI)PCV, %MORP
Null model0.256 (0.131–0.512)Ref.1.62
    Plus hospital volume0.263 (0.132–0.536)−3.11.630.7006
    Plus teaching status0.233 (0.116–0.47891.580.0556
    Plus mean number of NICU beds0.264 (0.134–0.536)−3.41.630.7649
Reference model0.324 (0.165–0.647)Ref.1.72
    Plus hospital volume0.275 (0.131–0.573)15.11.650.0499
    Plus teaching status0.327 (0.164–0.669)−1.11.720.3651
    Plus mean number of NICU beds0.314 (0.156–0.640)31.70.4979
  • Vh = hospital-level variance; CI = Confidence interval; PCV = Proportional change in variance with respect to null/reference model; MOR = Median odds ratio; P = significance of model comparison (Chi-squared statistic); NICU = neonatal intensive care unit; Ref. = Reference.

The hospital variance was significant for all models (P < 0.0001). The effect of adding hospital volume or mean number of NICU beds to the null model was not significant (table 3). Only teaching status led to improvement in fitting null models with only hospital-level covariates. Extending the reference model containing individual-level covariates was significant only for hospital volume.

Standardized mortality ratio

The plot of the SMR against total number of observations in the study period indicated that 75.0% (6 of 8) of the large hospitals with ≥30 admissions per year had a better-than-expected outcome (solid circles in figure 1). Of the smaller hospitals, 40.9% (9 of 22; open circles) had a better-than-expected outcome, whereas the remaining had worse-than-expected outcomes. Smaller hospitals also varied more in the outcome than large ones. None of the estimates showed evidence of a systematic difference with the exception of one outlier with an SMR of 0.

Figure 1

Funnel plot of the SMR against total number of observations in study period (12 years) for 31 hospitals with 95% control region around the reference of 1. Control limits as well as local control intervals are constructed using the the Wilson score method. Units with an average total annual number of admissions (<1250 g) ≥30 neonates are shown as solid circles, and those <30 as open circles. Because of the exclusion criteria, some of the larger units (solid fill) have smaller number of oservations (n), and thus appear shifted to the left. This is mainly the consequence of high rates of transferrals for these units


Principal findings

In accordance with previous studies,2–7 small-volume hospitals were significantly related to a higher mortality rate in VLBW neonates in a model adjusting for individual- and hospital-level risk factors. However, the higher risk only applied to neonates with a higher initial risk. As a primary finding, we showed considerable heterogeneity in mortality rates between Bavarian hospitals, accounting for an MOR of 1.62 in the null model. Importantly, only 15.1% of interhospital variation in mortality rates could be explained by the hospital volume after adjustment for case-mix. Other hospital characteristics (i.e. bed capacity and teaching status) were of minor relevance for explaining differences between hospitals. A funnel plot of the SMR against the number of admissions showed that 40.9% of the small volume hospitals had a better-than-expected outcome.

Comparison with previous studies and discussion of results

The benefit of treatment in large volume hospitals was only related to children with a higher initial risk. Bartels et al.6 also performed a stratified analysis and showed that the lower the gestational age, the higher the risk of dying for neonates treated in small versus large hospitals. Neonates of lower gestational age are more likely to have higher CRIB scores, as they are at a higher risk of death,20 thus supporting the results of our stratified analysis.

The effect of hospital variation on mortality was addressed by a Swedish study.30 They found an MOR of 1.36 in the null model for neonates with a gestational age of ≥28. In a recently published study, a hospital mortality variance of 0.16 (equivalent to an MOR of 1.46) for neonates ≥22 weeks of gestational age was observed.14 This corroborates the hypothesis that differences between hospitals are an important factor with respect to hospital mortality in VLBW neonates. In our study, interhospital differences between Bavarian hospitals were higher (MOR of 1.62), which may reflect less centralized care in Germany.31

Although hospital volume was significantly associated with mortality in a risk-adjusted model, it would not be a reliable selective referral criterion for directing patients to high-quality hospitals. Hospital volume evidently does not explain the above average performance of 40.9% of the small-volume hospitals (figure 1). This is in accordance with a study on the discriminative ability of a threshold of 30 VLBW admissions in German hospitals with a false-negative rate of 44%.7 Furthermore, hospital volume could not sufficiently explain interhospital differences in mortality rates. After adjustment for individual risk factors, hospital volume accounted for only 15.1% of interhospital differences, leaving 84.9% of unexplained interhospital variation in mortality. Rogowski et al.2 also examined the relevance of hospital volume in explaining interhospital variability in VLBW mortality. In accordance with our results, they found that the annual volume of admissions only explained a small proportion (9%) of interhospital variation.2 We defined hospital volume by a threshold of ≥30 admissions per year. Applying another threshold, hospital volume might account for a higher proportion of the variation between hospitals.

Appraisal of methods

The strength of the present study is observed to be in the application of multilevel modelling, which is recommended as appropriate for analyzing clustered health care data.24,32 To date, only few studies have examined the effects of hospital volume on mortality in VLBW neonates using a multilevel approach to account for clustering.2,4 Single individual-level analysis, i.e. traditional logistic regression, may overestimate the effects on mortality, because intercorrelations between individuals treated in the same hospital are not adequately modelled.32,33 Another strength of the current study is the large size of the dataset (n = 5575), and the long study period (12 years) including data up to 2011. As our study is population-based, this practically eliminates selection bias because our data were based on all available neonatal records in Bavaria.

An important limitation of the study is that neonates transferred to other hospitals were excluded at the cost of possible referral bias. Analysis of transferral patterns showed that small-volume hospitals transferred neonates with higher mortality risk as reflected in the higher CRIB scores; conversely, large-volume hospitals transferred neonates with lower CRIB scores (supplementary data). Therefore, performance of smaller hospitals (figure 1) may be overestimated by excluding transferrals between neonatal units, thereby accounting for potential underestimation of the variability between hospitals and the impact of hospital volume. However, another German study that included transferred neonates and used a threshold of 35 annual admissions showed a similar result for the association between hospital volume and mortality.6 Furthermore, the degree to which our results can be generalized may be limited. Our study included only Bavarian hospitals, which represent only 1/8th of all German hospitals. Bavaria shows a low degree of centralization compared with other German states.34 Therefore, the amount of variability between hospitals might be higher in Bavaria than in other German states. Finally, our study, similar to most of the other available literature on volume-related outcome, only examined mortality as an outcome. Future studies may benefit from inspection of other outcomes such as necrotizing enterocolitis or respiratory pathologies, which are known major hazards for VLBW neonates.35


In summary, a selective referral strategy based solely on hospital volume will fall short of the task of optimal allocation of neonatal care by means of centralization. Other easily available structural hospital characteristics (NICU bed capacity and teaching status) were not suitable as quality indicators. For a better distinction between performances, other causes of residual variability between hospitals must be identified. Centre rates of selected interventions (such as provision of antenatal corticosteroids, mode of delivery) were shown to explain some of the heterogeneity in mortality rates between hospitals.14 Potential other candidates are workload,11 organizational processes12 and provision of health care specialists, as reflected by the ratio of nurses with neonatal qualifications to those without.13 Alternatively, suitable perinatal centres might be identified by means of external audits.

Conflicts of interest: None declared.

Key points

  • In a risk-adjusted model, small-volume Bavarian hospitals were associated with an increased risk of dying, although this only related to children with higher initial mortality risk.

  • There were considerable differences between hospitals in mortality rates, which could not be sufficiently explained by hospital volume.

  • A selective referral strategy based exclusively on hospital volume was not effective in discriminating between hospitals with respect to quality of care. Teaching status and NICU bed capacity were found to be unsuitable as quality indicators.

  • Further research is required to identify additional hospital characteristics for improving performance assessment.


View Abstract