This page contains Supplementary Information, Source Codes, and Five different DNA Microarray Gene Expression Datasets used in the paper entitled "Classification of Two and Multi-Class Cancer Data Reliably Using Multi-Objective Evolutionary Algorithms".
Source Codes in C-Language
- Two-Class Classification (NSGA-II + Weighted Voting + LOOCV)
- Multi-Class Classification (NSGA-II + Weighted Voting + LOOCV + OVA-Binary)
DNA Microarray Gene Expression Datasets
- 50-Gene Leukemia Dataset
- Complete 3,859-Gene Leukemia Dataset
- Complete 4,026-Gene Lymphoma Dataset
- Complete 2,000-Gene Colon Dataset
- Complete 6,167-Gene NCI60 9-Class Tumor Dataset
- supplementary-class.ps: A comprehensive step-by-step gene expression preprocessing procedure, Weighted Voting approach for class prediction, Concept of classification with confidence, Classification with Multiple training datasets
- supplementary-ea.ps: Principles of Genetic Algorithms, Multi-objective Genetic Algorithms, Multi-modal optimization, Non-dominated Sorting GA (NSGA-II), Multi-modal NSGA-II for multiple gene subsets, Evolutionary gene selection procedure. For a better description of multi-objective optimization and evolutionary algorithms including NSGA-II, please refer to the following text by the first author:
Deb, K. (2001). Multi-objective optimization using evolutionary algorithms, Chchister, UK: Wiley
- supplementary-results.ps: Complete analysis for 50-gene leukemia dataset with and without prediction strength.
- leukemia-code.tgz: A gzipped and tarred file containing a C-code for the proposed two-step optimization procedure for Leukemia data samples. Codes for other cancer data cases can be obtained from the authors (firstname.lastname@example.org). The parent directory performs the multi-modal NSGA-II to find an optimal set of classifiers for the highest accuracy using a multi-objective optimization procedure. The subdirectory 'local-search' has the codes to perform a focussed search by keeping the size of the classifer fixed to a user-defined value. For example, for the leukemia data set, a three-gene classifier is found to be producing 100\% correct classification in the multi-modal NSGA-II (Step 1) and hence Step 2 can be used by fixing all classifiers to size three.
The above codes are developed at Kanpur Genetic Algorithms Laboratory (KanGAL) or customized from publicly available codes. In any case, the developers of the codes do not take responsibilities of any malfunction, although they are tested on many test problems. These codes are tested on linux Mandrake. Any bug or error may kindly be communicated to KALYANMOY DEB and/or A RAJI REDDY. Commercial use of these codes are strictly prohibited without any knowledge of the developers (KanGAL). For academic use, they can be used or modified at will, however an acknowledgement of KanGAL web site in appropriate places and with a prior notification to KanGAL would be appreciated.