Machine Learning Algorithms: Use Cases and Compute Requirements

Machine Learning Algorithm Comparison

AlgorithmTrain or Test DataUse CasesSupervisedPipe?File TypeCPU or GPU
AutoGluon-TabularTraining and (optionally) validationHigh-accuracy tabular prediction achieved through automated ensembling and multi-layer stacking.YNCSVCPU or GPU (single instance only, M5)
BlazingTextTrainText classification for use cases such as sentiment analysis, spam detection, and hashtag prediction.YYText file (one sentence per line with space-separated tokens)CPU or GPU (single instance only, M5)
CatBoostTraining and (optionally) validationGradient Boosting Regression Neural Network. Best used when the number of data dimensions is low, or when simple linear models perform poorly.YNCSVCPU (single instance only)
DeepAR ForecastingTrain and (optionally) testTime series data forecasting. Effective for cold start problems where historical datasets might be limited.YNJSON Lines or ParquetCPU or GPU
Factorization MachinesTrain and (optionally) testSupervised algorithm for sparse datasets. Metrics include RMSE for regression and log loss for binary classification.YYrecordIO-protobuf float32CPU (GPU for dense data, M5)
Image Classification – MXNetTrain and validation, (optionally) train_lst, validation_lst, and modelLeverages MXNet for faster calculation speeds and resource utilization on GPU.YYrecordIO or image files (.jpg or .png)GPU
Image Classification – TensorFlowTraining and validationUsed for image classification. Performs better on CPU compared to some alternatives.YFileImage files (.jpg, .jpeg, or .png)CPU or GPU
IP InsightsTrain and (optionally) validationFlagging IP addresses.NFileCSVCPU or GPU
K-MeansTrain and (optionally) testClustering (unsupervised learning).NYrecordIO-protobuf or CSVCPU or GPU (single GPU device on one or more instances)
K-Nearest-Neighbors (k-NN)Train and (optionally) testText mining and facial recognition. Suitable for small datasets, requiring feature scaling.YYrecordIO-protobuf or CSVCPU or GPU (single GPU device on one or more instances)
LDATrain and (optionally) testText classification using statistical methods.YYrecordIO-protobuf or CSVCPU (single instance only)
LightGBMTraining and (optionally) validationGradient boosting framework.YFileCSVCPU (single instance only)
Linear LearnerTrain and (optionally) validation, test, or bothRegression or classification tasks.YYrecordIO-protobuf or CSVCPU or GPU
Neural Topic ModelTrain and (optionally) validation, test, or bothText classification using Neural Networks.YYrecordIO-protobuf or CSVCPU or GPU
Object2VecTrain and (optionally) validation, test, or bothAnalyzes images or paragraphs to provide relationships between objects.YFileJSON LinesCPU or GPU (single instance only)
Object DetectionTrain and validation, (optionally) train_annotation, validation_annotation, and modelLocating and classifying objects within images (e.g., bounding box prediction).YYrecordIO or image files (.jpg or .png)GPU
PCATrain and (optionally) testDimensionality Reduction (unsupervised learning).NYrecordIO-protobuf or CSVCPU or GPU
Random Cut ForestTrain and (optionally) testOutlier detection and forecasting.NYrecordIO-protobuf or CSVCPU
Semantic SegmentationTrain and validation, train_annotation, validation_annotation, and (optionally) label_map and modelPixel-level image classification. Common in autonomous vehicle applications.YYImage filesGPU (single instance only)
Seq2Seq ModelingTrain, validation, and vocabSolving complex language problems such as machine translation, question answering, chatbot creation, and text summarization.YFilerecordIO-protobuf integer tokens (not float)GPU (single instance only)
XGBoost (0.90-1, 0.90-2, 1.0-1, 1.2-1, 1.2-21)Train and (optionally) validationProvides parallel tree boosting. A leading machine learning library for regression, classification, and ranking problems.YYCSV, LibSVM, or ParquetCPU (or GPU for 1.2-1)