top of page



Case Study: Making the case for Advanced Analytical Techniques

Keywords: Six Sigma, DMAIC, generalized additive model, logistic regression

Health care today is facing serious problems: quality of care does not meet patients’ needs and costs are exploding. In a hospital's cardiology department, discharged patients are advised to participate in a rehabilitation program. However, many of the discharged patients do not join the program, and others quit before being declared cured (a so-called dropout). An improvement project was started that aims to increase revenues by either attracting more patients to the rehabilitation program or reducing the fraction of dropouts.


The measure phase starts with the definition of the internal critical to quality characteristics (CTQs). In this project the strategic focal point is the increase of revenue, which links directly to the following CTQs:

CTQ1: the number of patients who participate in the rehabilitation program every month

CTQ2: the number of sessions per participant

To measure the number of participants and sessions each month, one simply looks at the number of invoices. To assess whether this measurement procedure is valid, a comparison between a sample of invoices and the corresponding list of participating patients from the department was made. These matched perfectly, validating the chosen measurement procedure.

The full study can be found here.

The data of 516 cardio patients was available. Of these patients, 49% participated in the rehabilitation program. For each patient we have the following data available:

  • distance between the patient’s home to the hospital in miles

  • age

  • mobility; whether or not the patient has a car (yes, no, no response)

  • gender (male, female, other, no response)

  • place of residence

  • participation; whether or not the patient participates in the rehabilitation program (Binary: Attended=yes if the patient shows up at least once, else Did Not Attend).

The number of participating patients was on average 33 patients each month, with a standard deviation of 4.9 patients each month. The objective of the project was to increase the average number of participants to 36. Based on the process capability and process knowledge, the objective of the project was to increase the average number of participants to 36. This number had been attained a number of times in the past and both cardiologists and physicians claimed that such an increase was feasible. The number of sessions a patient attended was on average 29 out of a maximum of 45. The objective here was to increase the average number of sessions to 32 for each patient. By increasing both the number of participants and the number of sessions per participant will increase the total revenue by $56,603. That is, 195 additional sessions per month for 12 months at an average of $22.82 per session, makes an extra yearly revenue of $56,603.


156 patients were asked why they left the program early. Summarizing:

  • 26% of the patients were readmitted for a hospital stay,

  • 16% of the patients started working again and could not combine this with the rehabilitation activity (even though the center was open late),

  • 16% of the patients could not join the program due to other obligations (vacations, social obligations),

  • 12% of the patients dropped out for a medical reason provided by the doctor,

  • 8% of the patients had their own rehabilitation facilities.

These factors were the cause of 78% of the dropout. However, none of these causes can be influenced easily. Based on surveying cardiologists, physical therapists, patients, and other interested parties, the following influence factors were explored:

  • Patients should be informed of the rehabilitation program at a much earlier stage

  • information on the rehabilitation program should be much more precise and attractive

  • Cardiologists should stimulate patients to participate in and finish the rehabilitation program

  • Patients should train with a heart rate monitor to improve their feelings of safety

  • Patients desire a smaller exercise room and are more comfortable when not with other patients

  • Patients are not likely to show up around Major Holidays.


Factors that seemed to be most important can be summarized as patient attention factors. These factors were very important in increasing the number of participants. As a consequence, the following improvement actions were proposed:

  • Writing a better brochure on the rehabilitation program

  • Writing a letter to the cardiologists to improve their attitude toward patients: to be more polite and to communicate the possibilities of the rehabilitation program at an earlier stage.



​This is a typical example of jumping to conclusions, which is often experienced in practice. Below, we will explain how statistical techniques, in particular logistic regression, show a different view toward designing improvement actions. This is a good illustration of the strength of the improve phase in Six Sigma and the usefulness of logistic regression.

We now give a detailed analysis of the statistics used in the improve phase. The project supervisor convinced the project leader to complete the improve phase before proceeding with the above-mentioned actions.



The logistic regression model reveals the important influence factors. The probability of joining the program depends on whether a patient has a car at his or her disposal and the distance from a patient’s home to the hospital. As a solution, various measures to stimulate carpooling were implemented. Prior to the implementation, a cost–benefit analysis was conducted using the fitted regression model.

Analyzing Each Factor Separately

Our first step consists of studying the relation between Y participation and each influence factor (denoted by xi) separately. It is useful to screen the data in this way before using more advanced techniques.

  1. The first studied factor is distance. Whether the number of kilometers affects whether the patient will join the program is normally analyzed by means of logistic regression. A first simple approach consists of making boxplots for distance vs. participation. Looking at these plots, we immediately noticed two patients with very long distances (200 km or about 124 miles) to the hospital compared to the other patients. These patients were closely related to one of the physicians and therefore had chosen the hospital considered here. For this reason, these patients were excluded from all further statistical analysis. The left-hand figure of Figure 1 contains boxplots of the data from which these two outliers were removed. This figure suggests that patients with a short distance to the hospital tend to participate more often in the program. In the right-hand figure of Figure 1 a more informative plot is made. We divided the range of distance into eight approximately