Dec 28 2024

Using Regression to Improve Quality | Part III — Validating Models

Whether your goal is to identify substitute characteristics or solve a process problem, regression algorithms can produce coefficients for almost any data. However, it doesn’t mean the resulting models are any good.

In machine learning, you divide your data into a training set on which you calculate coefficients and a testing set to check the model’s predictive ability. Testing concerns externally visible results and is not specific to regression.

Validation, on the other hand, is focused on the training set and involves using various regression-specific tools to detect inconsistencies with assumptions. For these purposes, we review methods provided by regression software.

In this post, we explore the meaning and the logic behind the tools provided for this purpose in linear simple and multiple regression in R, with the understanding that similar functions are available from other software and that similar tools exist for other forms of regression.

It is an attempt to clarify the meaning of these numbers and plots and help readers use them. They will be the judges of how successful it is.

The body of the post is about the application of these tools to an example dataset available from Kaggle, with about 30,000 data points. For the curious, some mathematical background is given in the appendix.

Many of the tools are developments from the last 40 years and, therefore, are not covered in the statistics literature from earlier decades.

By Michel Baudin • Data science 0 • Tags: Linear Model, Quality, regression, Validation

Nov 6 2024

Rebuilding Manufacturing in France | Radu Demetrescoux

Radu Demetrescoux has been a manufacturing consultant for 25 years and recently authored a Lean Toolbox (in French) with actionable details on 64 tools. He has seen the French manufacturing sector losing half its factories and is working to rebuild it. This is how he explains what happened and the way forward. It includes an endorsement of our Introduction to Manufacturing as a contribution to this effort!

Contents

The Numbers
- - Share this:
  - Like this:

The Numbers

Between 1995 and 2015, France lost almost half of its factories and a third of its industrial jobs. In French economic statistics, the industry sector encompasses extraction and refining in addition to manufacturing. The share of Industry in GDP has fallen from 35% in 1970 to less than 20% currently. The share of manufacturing in GDP fell to 11% in 2017 compared to 17% in 1995. The objective stated by the government is to quickly increase the share of manufacturing in GDP to 15%.

By Michel Baudin • Personal communications, Uncategorized 0

Sep 8 2024

Using Regression to Improve Quality | Part II – Fitting Models

This is a personal guided tour of regression techniques intended for manufacturing professionals involved with quality. Starting from “historical monuments” like simple linear regression and multiple regression, it goes through “mid-century modern” developments like logistic regression. It ends with newer constructions like bootstrapping, bagging, and MARS. It is limited in scope and depth, because a full coverage would require a book and knowledge of many techniques I have not tried. See the references for more comprehensive coverage.

To fit a regression model to a dataset today, you don’t need to understand the logic, know any formula, or code any algorithm. Any statistical software, starting with electronic spreadsheets, will give you regression coefficients, confidence intervals for them, and, often, tools to assess the model’s fit.

However, treating it as a black box that magically fits curves to data is risky. You won’t understand what you are looking at and will draw mistaken conclusions. You need some idea of the logic behind regression in general or behind specific variants to know when to use them, how to prepare data, and to interpret the outputs.

By Michel Baudin • Data science 0 • Tags: Bagging, Bootstrapping, Kriging, Linear regression, Logistic regression, MARS, Multiple regression, Multivariate regression, Substitute characteristic, True characteristic

Sep 3 2024

Using Regression to Improve Quality | Part I – What for?

In quality, regression serves to identify substitutes for true characteristics that are hard to observe and to find the root causes of technically challenging process problems. It is a major topic in data science, but oddly, the most extensive coverage I could find in the literature on quality is in Shewhart’s first book, from 1931! Later books, including Shewhart’s second, discuss it briefly or not at all. The ASQC, forerunner of the ASQ, published an 80-page guide on how to use regression analysis in quality control in 1985, but has not updated it since.

Regression analysis has been around for almost 140 years and has grown massively in scope, capabilities, and dataset size. Perhaps, it is time for professionals involved with quality to take another look at it.

By Michel Baudin • Data science, Tools 1 • Tags: Quality, regression, Statistical Process Control

Jul 21 2024

Rankings and Bump Charts

Hectar’s Audrey Bourolleau and Francis Nappez presented their findings about greenhouse gas emissions in the industrial production of bread baguettes at the 2024 Lean Summit in France. They see a major impact in (1) farming and (2) the production of fertilizer and plant protection products. Together, these categories account for 58% of total emissions but barely 6% of the costs. This suggests that improvements in these two areas could cut emissions in half with a minimal impact on bread prices.

This is about the visualization of this kind of information with bump charts/slopegraphs. Edward Tufte prefers slopegraph but bump chart is more common.

By Michel Baudin • Data science 3 • Tags: Bump chart, Slopegraph, Visualization

Jul 18 2024

What is Quality?

Professionals working on quality don’t usually discuss what it is. Instead, they assume a shared understanding that often isn’t there. Individuals with training in different approaches generalize from different experiences and talk past each other. In meetings, these divergent views are often not aired; in the uninhibited environment of social media, on the other hand, they often degenerate into insults and personal attacks. Let’s try and address this foundational issue.

By Michel Baudin • Management 1 • Tags: Quality, Quality Control

About Michel Baudin

Posts by Michel Baudin:

Using Regression to Improve Quality | Part III — Validating Models

Like this:

Rebuilding Manufacturing in France | Radu Demetrescoux

The Numbers

Like this:

Using Regression to Improve Quality | Part II – Fitting Models

Like this:

Using Regression to Improve Quality | Part I – What for?

Like this:

Rankings and Bump Charts

Like this:

What is Quality?

Like this:

Follow Blog via Email

Recent Posts

Categories

About Michel Baudin

Posts by Michel Baudin:

Using Regression to Improve Quality | Part III — Validating Models

Share this:

Like this:

Rebuilding Manufacturing in France | Radu Demetrescoux

The Numbers

Share this:

Like this:

Using Regression to Improve Quality | Part II – Fitting Models

Share this:

Like this:

Using Regression to Improve Quality | Part I – What for?

Share this:

Like this:

Rankings and Bump Charts

Share this:

Like this:

What is Quality?

Share this:

Like this:

Follow Blog via Email

Recent Posts

Categories

Social links

My tags