top of page



Case Study: Making the case for Advanced Analytical Techniques

Keywords: Six Sigma, DMAIC, generalized additive model, logistic regression

Health care today is facing serious problems: quality of care does not meet patients’ needs and costs are exploding. In a hospital's cardiology department, discharged patients are advised to participate in a rehabilitation program. However, many of the discharged patients do not join the program, and others quit before being declared cured (a so-called dropout). An improvement project was started that aims to increase revenues by either attracting more patients to the rehabilitation program or reducing the fraction of dropouts.


The measure phase starts with the definition of the internal critical to quality characteristics (CTQs). In this project the strategic focal point is the increase of revenue, which links directly to the following CTQs:

CTQ1: the number of patients who participate in the rehabilitation program every month

CTQ2: the number of sessions per participant

To measure the number of participants and sessions each month, one simply looks at the number of invoices. To assess whether this measurement procedure is valid, a comparison between a sample of invoices and the corresponding list of participating patients from the department was made. These matched perfectly, validating the chosen measurement procedure.

The full study can be found here.

The data of 516 cardio patients was available. Of these patients, 49% participated in the rehabilitation program. For each patient we have the following data available:

  • distance between the patient’s home to the hospital in miles

  • age

  • mobility; whether or not the patient has a car (yes, no, no response)

  • gender (male, female, other, no response)

  • place of residence

  • participation; whether or not the patient participates in the rehabilitation program (Binary: Attended=yes if the patient shows up at least once, else Did Not Attend).

The number of participating patients was on average 33 patients each month, with a standard deviation of 4.9 patients each month. The objective of the project was to increase the average number of participants to 36. Based on the process capability and process knowledge, the objective of the project was to increase the average number of participants to 36. This number had been attained a number of times in the past and both cardiologists and physicians claimed that such an increase was feasible. The number of sessions a patient attended was on average 29 out of a maximum of 45. The objective here was to increase the average number of sessions to 32 for each patient. By increasing both the number of participants and the number of sessions per participant will increase the total revenue by $56,603. That is, 195 additional sessions per month for 12 months at an average of $22.82 per session, makes an extra yearly revenue of $56,603.


156 patients were asked why they left the program early. Summarizing:

  • 26% of the patients were readmitted for a hospital stay,

  • 16% of the patients started working again and could not combine this with the rehabilitation activity (even though the center was open late),

  • 16% of the patients could not join the program due to other obligations (vacations, social obligations),

  • 12% of the patients dropped out for a medical reason provided by the doctor,

  • 8% of the patients had their own rehabilitation facilities.

These factors were the cause of 78% of the dropout. However, none of these causes can be influenced easily. Based on surveying cardiologists, physical therapists, patients, and other interested parties, the following influence factors were explored:

  • Patients should be informed of the rehabilitation program at a much earlier stage

  • information on the rehabilitation program should be much more precise and attractive

  • Cardiologists should stimulate patients to participate in and finish the rehabilitation program

  • Patients should train with a heart rate monitor to improve their feelings of safety

  • Patients desire a smaller exercise room and are more comfortable when not with other patients

  • Patients are not likely to show up around Major Holidays.


Factors that seemed to be most important can be summarized as patient attention factors. These factors were very important in increasing the number of participants. As a consequence, the following improvement actions were proposed:

  • Writing a better brochure on the rehabilitation program

  • Writing a letter to the cardiologists to improve their attitude toward patients: to be more polite and to communicate the possibilities of the rehabilitation program at an earlier stage.



​This is a typical example of jumping to conclusions, which is often experienced in practice. Below, we will explain how statistical techniques, in particular logistic regression, show a different view toward designing improvement actions. This is a good illustration of the strength of the improve phase in Six Sigma and the usefulness of logistic regression.

We now give a detailed analysis of the statistics used in the improve phase. The project supervisor convinced the project leader to complete the improve phase before proceeding with the above-mentioned actions.



The logistic regression model reveals the important influence factors. The probability of joining the program depends on whether a patient has a car at his or her disposal and the distance from a patient’s home to the hospital. As a solution, various measures to stimulate carpooling were implemented. Prior to the implementation, a cost–benefit analysis was conducted using the fitted regression model.

Analyzing Each Factor Separately

Our first step consists of studying the relation between Y participation and each influence factor (denoted by xi) separately. It is useful to screen the data in this way before using more advanced techniques.

  1. The first studied factor is distance. Whether the number of kilometers affects whether the patient will join the program is normally analyzed by means of logistic regression. A first simple approach consists of making boxplots for distance vs. participation. Looking at these plots, we immediately noticed two patients with very long distances (200 km or about 124 miles) to the hospital compared to the other patients. These patients were closely related to one of the physicians and therefore had chosen the hospital considered here. For this reason, these patients were excluded from all further statistical analysis. The left-hand figure of Figure 1 contains boxplots of the data from which these two outliers were removed. This figure suggests that patients with a short distance to the hospital tend to participate more often in the program. In the right-hand figure of Figure 1 a more informative plot is made. We divided the range of distance into eight approximately

equally sized groups. Within each group we computed the relative frequency of patients participating. Because there are ties in the distance values, not all groups were exactly the same size. The diameter of the circle for a group is proportional to the size of that particular group. To visualize a pattern among the points, we added a smoother through these points. A smoother is a nonparametric regression fit, which can be constructed by many methods. Here, we chose Friedman’s ‘‘super smoother,’’ which is implemented in the statistical software package R (function ‘‘supsmu’’). Details about the construction of this smoother are of minor importance at this stage, but the interested reader may consult Friedman (1984). The R code for constructing this figure can be found on Howard Seltman’s Website, hseltman=files=LREDA.R. From the constructed plot we clearly see that the further a patient lives from the hospital, the lower the probability that a patient will join the rehabilitation program.

  1. The factor age can be analyzed in a similar way; see Figure 2. This factor suggests that the probability of joining the rehabilitation program decreases with age. Moreover, at approximately age 65 there seems to be a change point in the decrease of the fraction of participating patients.

  2. The bar chart for mobility (left-hand picture in Figure 3) clearly indicates that the probability of joining the program is influenced by whether the patient has access to a car. Table 1 summarizes these data. The data suggest that having a car at one’s disposal increases the probability for joining the program. There are missing values in the data set: for 71 patients, mobility was not registered.

  3. The factor gender can be analyzed in a similar way as mobility. There were 13 missing values for gender in the data set. The bar chart (righthand picture in Figure 3) indicates that this factor has a minor influence on participation. Table 2 summarizes these data.

The analysis suggests that the accessibility of the hospital has to be improved, especially for those people living far away from the hospital. Hiring a taxi service would definitely improve accessibility, though it is obvious that the costs for this service exceed the revenues of one additional session. It is of major interest to find out how much money can be invested to improve accessibility of the hospital while still ensuring increased revenues. This maximal amount can be considered a break-even point.



To calculate this break-even point, we need a relation between the probability that a patient will join the program and the various influence factors as an ensemble. In the next section we will show how a logistic regression model can be used to accomplish this.


Logistic Regression Model for Modeling the Probability That a Patient Will Join the Program

The software used by the analysts in this study was R (more information about R and a comparison of statistical software packages can be found here). We model the relation between Y and all influence factors simultaneously. In a logistic regression model, we assume that all Yi (the response for the ith patient) are independent and identically distributed, where

It appears that the probability of participating decreases in a slightly nonlinear way with age. Modern software for nonparametric estimation allows inclusion of a general smoothing function of age in the model. In this way, we obtain a generalized additive model, from which we can assess the linearity in a more formal way.

To allow for an interaction effect between distance and mobility, we add an interaction term for these predictors. The set of parametric coefficients can be interpreted just as for (generalized) linear models. Distance and mobility appear to be significant, whereas gender and the interaction between distance and mobility are not significant. The term s(age)represents the fitted smooth function of age. For now, it suffices to note that age is significant. We refit the model without the insignificant terms. The detailed analysis can be found here.

These are the resulting economic models.

IMPROVEMENT ACTIONS: Based on a Break-Even-Point Analysis


Based on this economic model, feasible improvement actions can be evaluated. Because a taxi service for single patients turned out to be too expensive; the project leader came up with a carpooling procedure to couple patients with a car to patients without a car. After a pilot phase of the carpooling procedure, several patients started using this service. Most of these patients explained that they would not have joined the program if this service were not available.


In the last phase of the Six Sigma project the CTQs were monitored. Using a dashboard, the department can see the number of participating patients and the number of sessions per patient for each month. The number of included patients increased from 33 to 45 (far above the objective in the analyze phase). The number of sessions remained constant, at 29 on average. The project was handed over to the department. Additional revenues turned out to be approximately $101760 each year. Note that all initial improvement ideas were abandoned.


This case study describes the success of the Six Sigma methodology in a hospital for a specific project. Verifying ideas before developing improvement actions is an important aspect in this methodology. Often people’s ideas on processes are incorrect, but improvement actions based on these are still being implemented. These actions cause frustrated employees, may not be cost effective, and in the end, do not solve the problem. Within Six Sigma it is obligatory to verify the ideas by data analysis or experiments. At first this may seem like a loss of time, energy, and resources, but it is, much less than what is lost when implementing the wrong improvements.

Some people argue that statistics do not solve major problems in health care or business; Lean principles are much more effective is the credo. This case study shows that statistical analysis can be very useful. Even when a somewhat more advanced technique like logistic regression modeling is required, exploratory graphics such as boxplots and bar charts point the direction toward a valuable solution. The usefulness of the logistic regression is demonstrated by the resulting economic model, which can assist in making deliberated improvement actions.

Featured Posts
Recent Posts
bottom of page