The official webpage of the book

Model-Based Clustering and Classification for Data Science

by Charles BOUVEYRON, Gilles CELEUX, T. Brendan MURPHY and Adrian E. RAFTERY

The book

The MBC book

The century that is ours will be certainly the century of the data revolution. Our numerical world is indeed creating mass of data everyday and the volume of generated data is doubling every two years according to most recent estimations. This wealth of available data offers hope for exploitation that may lead to great advances in areas such as health, science, transportation or defense. However, manipulating, analyzing and extracting information from those data is made difficult by the volume and nature (high-dimensional data, networks, time series, ...) of modern data.

Among the broad field of statistical and machine learning, model-based techniques for clustering and classification have a central position for anyone interested in exploiting those data. This text book focuses on the recent developments in model-based clustering and classification while providing a comprehensive introduction to the field. It is aimed at advanced undergraduates, graduates or first year PhD students in data science, as well as researchers and practitioners.

The book covers the following topics:
1. Introduction
2. Model-based Clustering: Basic Ideas
3. Dealing with Difficulties
4. Model-based Classification
5. Semi-supervised Clustering and Classification
6. Discrete Data Clustering
7. Variable Selection
8. High-dimensional Data
9. Non-Gaussian Model-based Clustering
10. Network Data
11. Model-based Clustering with Covariates
12. Other Topics

The book is supported by extensive examples on data, with 72 listings of code mobilizing more than 30 software packages, that can be run by the reader. The chosen language for codes is the R software which is one of the most popular languages for data science.

The book is part of the Statistical and Probabilistic Mathematics Series of Cambridge University Press and can be bought on most specialized bookshops and e-commerce websites.

Cambridge University Press

The Cambridge University Press website allows to buy the book. Additional information is also available on CUP website such as citation metrics.

See the book on CUP website

E-commerce websites

The book can be bought on many e-commerce website, such as Amazon.

Buy the book on Amazon

The authors

CB

Charles Bouveyron

Charles Bouveyron is Full Professor of Statistics at Université Côte d'Azur and holds a Chair in Artificial Intelligence. He is the director of the Institut 3IA Côte d'Azur. He has published extensively on model-based clustering, particularly for networks and high-dimensional data.

Read more about this author...

CB

Gilles Celeux

Gilles Celeux is Director of Research Emeritus at INRIA. He is one of the founding researchers in model-based clustering, having published extensively in the area for thrity-five years.

Read more about this author...


CB

T. Brendan Murphy

T. Brendan Murphy is Full Professor in the School of Mathematics and Statistics at University College Dublin. His research interests include model-based clustering, classification, network modeling and latent variable modeling.

Read more about this author...

CB

Adrian E. Raftery

Adrian E. Raftery is the Boeing International Professor of Statistics and Sociology at the University of Washington. He is one of the founding researchers in model-based clustering, having published in the area since 1984.

Read more about this author...

Data & Code

The MBCbook package for R

The book is supported by extensive examples on data, with 72 listings of R code and mobilizing more than 30 software packages. The book is accompanied by a dedicated R package (the MBCbook package) that can be directly downloaded from the CRAN within the R software. The MBCbook package provides in particular all datasets used in the book and some original functions.

Script

Download the MBCbook package

Scripts of the book

In order to ease the reproducibility of the codes provided in the book, all scripts of the 12 chapters can be also downloaded below:



Quotes

Bouveyron, Celeux, Murphy, and Raftery pioneered the theory, computation, and application of modern model-based clustering and discriminant analysis. Here they have produced an exhaustive yet accessible text, covering both the field's state of the art as well as its intellectual development. The authors develop a unified vision of cluster analysis, rooted in the theory and computation of mixture models. Embedded R code points the way for applied readers, while graphical displays develop intuition about both model construction and the critical but often-neglected estimation process. Building on a series of running examples, the authors gradually and methodically extend their core insights into a variety of exciting data structures, including networks and functional data. This text will serve as a backbone for graduate study as well as an important reference for applied data scientists interested in working with cutting-edge tools in semi- and unsupervised machine learning.

John S. Ahlquist, University of California, San Diego

This book, written by authoritative experts in the field, gives a comprehensive and thorough introduction to model-based clustering and classification. The authors not only explain the statistical theory and methods, but also provide hands-on applications illustrating their use with the open-source statistical software R. The book also covers recent advances made for specific data structures (e.g. network data) or modeling strategies (e.g. variable selection techniques), making it a fantastic resource as an overview of the state of the field today.

Bettina Grün, Johannes Kepler Universität Linz, Austria

Four authors with diverse strengths nicely integrate their specialties to illustrate how clustering and classification methods are implemented in a wide selection of real-world applications. Their inclusion of how to use available software is an added benefit for students. The book covers foundations, challenging aspects, and some essential details of applications of clustering and classification. It is a fun and informative read!

Naisyin Wang, University of Michigan, USA

This is a beautifully written book on a topic of fundamental importance in modern statistical science, by some of the leading researchers in the field. It is particularly effective in being an applied presentation - the reader will learn how to work with real data and at the same time clearly presenting the underlying statistical thinking. Fundamental statistical issues like model and variable selection are clearly covered as well as crucial issues in applied work such as outliers and ordinal data. The R code and graphics are particularly effective. The R code is there so you know how to do things, but it is presented in a way that does not disrupt the underlying narrative. This is not easy to do. The graphics are 'sophisticatedly simple' in that they convey complex messages without being too complex. For me, this is a 'must have' book.

Rob McCulloch, Arizona State University, USA

Top