Theses and Dissertations

An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm

Matthew Michael Mayo

Date of Award

5-2016

Type

Thesis

Major

Master of Science

Department

TSYS School of Computer Science

Abstract

One of the greatest challenges in k-means clustering is positioning the initial cluster centers, or centroids, as close to optimal as possible, and doing so in an amount of time deemed reasonable. Traditional fc-means utilizes a randomization process for initializing these centroids, and poor initialization can lead to increased numbers of required clustering iterations to reach convergence, and a greater overall runtime. This research proposes a simple, arithmetic-based deterministic centroid initialization method which is much faster than randomized initialization. Preliminary experiments suggest that this collection of methods, referred to herein as the sharding centroid initialization algorithm family, often outperforms random initialization in terms of the required number of iterations for convergence and overall time-related metrics and is competitive or better in terms of the reported mean sum of squared errors (SSE) metric. Surprisingly, the sharding algorithms often manage to report more advantageous mean SSE values in the instances where their performance is slower than random initialization.

Recommended Citation

Mayo, Matthew Michael, "An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm" (2016). Theses and Dissertations. 241.
https://csuepress.columbusstate.edu/theses_dissertations/241

Download

DAISY

Included in

Computer Engineering Commons

COinS

CSU ePress

Theses and Dissertations

An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm

Date of Award

Type

Major

Department

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Library Corner

CSU ePress

Theses and Dissertations

An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm

Author

Date of Award

Type

Major

Department

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Library Corner