New sampling algorithms for enhancing classifier performance on imbalanced data problems

MORAIS, Romero Fernando Almeida Barata de

Use este identificador para citar ou linkar para este item: https://repositorio.ufpe.br/handle/123456789/31428

Compartilhe esta página

Título:	New sampling algorithms for enhancing classifier performance on imbalanced data problems
Autor(es):	MORAIS, Romero Fernando Almeida Barata de
Palavras-chave:	Inteligência computacional; Aprendizagem de máquina
Data do documento:	6-Fev-2018
Editor:	Universidade Federal de Pernambuco
Abstract:	Classification problems where the distribution of examples among the classes are imbalanced arise frequently in real-world domains. Commonly, these domains comprise critical problems where accurate predictions for all classes are necessary, such as credit card fraud detection, churn prediction, disease diagnosis, and network intrusive traffic detection. The problem with imbalanced data sets is that standard classifiers often have low accuracy on the underrepresented classes of the problem. Data sampling is the most popular approach to deal with imbalanced data sets and works by either decreasing the size of majority classes (undersampling) or increasing the size of minority classes (over-sampling). In this dissertation we propose two new data sampling algorithms: RRUS and k-INOS. RRUS is an under-sampling algorithm that aims to select a subset of examples from the majority class that best represents the majority class by preserving its density distribution. k-INOS is a general strategy to enhance robustness of over-sampling algorithms to noisy examples present in the minority class. Bothalgorithms were extensively tested on 50 imbalanced data sets, 6 diverse classifiers, and performance was evaluated according to 7 metrics. In particular, RRUS was compared to other 3 under-sampling algorithms and was significantly better than KMUS and SBC most of the time, and significantly better than RUS many times, for most classifiers and performance metrics. k-INOS, as a wrapper for over-sampling algorithms, was tested on 7 over-sampling algorithms and significantly increased Accuracy, Precision, and Specificity most of the time, and F1 many times. In addition, k-INOS’ hyperparameters were studied and appropriate values for their use were suggested. Finally, rules extracted from the former experiments with k-INOS revealed that the N3 complexity metric (loocv error rate of the 1-NN classifier) is often an indicator of whether k-INOS is likely to attain performance improvements or not.
URI:	https://repositorio.ufpe.br/handle/123456789/31428
Aparece nas coleções:	Dissertações de Mestrado - Ciência da Computação

Arquivos associados a este item:

Arquivo	Descrição	Tamanho	Formato
DISSERTAÇÃO Romero Fernando Almeida Barata de Morais.pdf		13,51 MB	Adobe PDF	Visualizar/Abrir

Este arquivo é protegido por direitos autorais

Ver licença

Mostrar registro completo do item Recomendar este item Visualizar estatísticas

Este item está licenciada sob uma Licença Creative Commons