The official webpage of the book
by Charles BOUVEYRON, Gilles CELEUX, T. Brendan MURPHY and Adrian E. RAFTERY
The century that is ours will be certainly the century of the data revolution. Our numerical world is indeed creating mass of data everyday and the volume of generated data is doubling every two years according to most recent estimations. This wealth of available data offers hope for exploitation that may lead to great advances in areas such as health, science, transportation or defense. However, manipulating, analyzing and extracting information from those data is made difficult by the volume and nature (high-dimensional data, networks, time series, ...) of modern data.
Among the broad field of statistical and machine learning, model-based techniques for clustering and classification have a central position for anyone interested in exploiting those data. This text book focuses on the recent developments in model-based clustering and classification while providing a comprehensive introduction to the field. It is aimed at advanced undergraduates, graduates or first year PhD students in data science, as well as researchers and practitioners.
The book covers the following topics:
2. Model-based Clustering: Basic Ideas
3. Dealing with Difficulties
4. Model-based Classification
5. Semi-supervised Clustering and Classification
6. Discrete Data Clustering
7. Variable Selection
8. High-dimensional Data
9. Non-Gaussian Model-based Clustering
10. Network Data
11. Model-based Clustering with Covariates
12. Other Topics
The book is supported by extensive examples on data, with 72 listings of code mobilizing more than 30 software packages, that can be run by the reader. The chosen language for codes is the R software which is one of the most popular languages for data science.
The book is part of the Statistical and Probabilistic Mathematics Series of Cambridge University Press and can be bought on most specialized bookshops and e-commerce websites.
The Cambridge University Press website allows to buy the book either as a whole or chapters by chapters. Additional information is also available on CUP website such as citation metrics.See the book on CUP website
The book is supported by extensive examples on data, with 72 listings of R code and mobilizing more than 30 software packages. The book is accompanied by a dedicated R package (the MBCbook package) that can be directly downloaded from the CRAN within the R software. The MBCbook package provides in particular all datasets used in the book and some original functions.
Download the MBCbook package: https://cran.r-project.org/web/packages/MBCbook/.
In order to ease the reproducibility of the codes provided in the book, all scripts of
the 12 chapters can be downloaded below:
- Scripts for Chapter 2: [script-chapter2.R]
- Scripts for Chapter 3: [script-chapter3.R]
- Scripts for Chapter 4: [script-chapter4.R]
- Scripts for Chapter 5: [script-chapter5.R]
- Scripts for Chapter 6: [script-chapter6.R]
- Scripts for Chapter 7: [script-chapter7.R]
- Scripts for Chapter 8: [script-chapter8.R]
- Scripts for Chapter 9: [script-chapter9.R]
- Scripts for Chapter 10: [script-chapter10.R]
- Scripts for Chapter 11: [script-chapter11.R]
- Scripts for Chapter 12: [script-chapter12.R], [CubaBeach.jpg]
Bouveyron, Celeux, Murphy, and Raftery pioneered the theory, computation, and application of modern model-based clustering and discriminant analysis. Here they have produced an exhaustive yet accessible text, covering both the field's state of the art as well as its intellectual development. The authors develop a unified vision of cluster analysis, rooted in the theory and computation of mixture models. Embedded R code points the way for applied readers, while graphical displays develop intuition about both model construction and the critical but often-neglected estimation process. Building on a series of running examples, the authors gradually and methodically extend their core insights into a variety of exciting data structures, including networks and functional data. This text will serve as a backbone for graduate study as well as an important reference for applied data scientists interested in working with cutting-edge tools in semi- and unsupervised machine learning.
, University of California, San Diego
This book, written by authoritative experts in the field, gives a comprehensive and thorough introduction to model-based clustering and classification. The authors not only explain the statistical theory and methods, but also provide hands-on applications illustrating their use with the open-source statistical software R. The book also covers recent advances made for specific data structures (e.g. network data) or modeling strategies (e.g. variable selection techniques), making it a fantastic resource as an overview of the state of the field today.
, Johannes Kepler Universität Linz, Austria
Four authors with diverse strengths nicely integrate their specialties to illustrate how clustering and classification methods are implemented in a wide selection of real-world applications. Their inclusion of how to use available software is an added benefit for students. The book covers foundations, challenging aspects, and some essential details of applications of clustering and classification. It is a fun and informative read!
, University of Michigan, USA
This is a beautifully written book on a topic of fundamental importance in modern statistical science, by some of the leading researchers in the field. It is particularly effective in being an applied presentation - the reader will learn how to work with real data and at the same time clearly presenting the underlying statistical thinking. Fundamental statistical issues like model and variable selection are clearly covered as well as crucial issues in applied work such as outliers and ordinal data. The R code and graphics are particularly effective. The R code is there so you know how to do things, but it is presented in a way that does not disrupt the underlying narrative. This is not easy to do. The graphics are 'sophisticatedly simple' in that they convey complex messages without being too complex. For me, this is a 'must have' book.
, Arizona State University, USA
- Nothing for now!