Tutorial: Wine Dataset


This mini-tutorial demonstrates how to use basic features of the AutoSOME GUI.

1) If you have not already done so, Launch AutoSOME.

2) Download and save the wine dataset.
(primary dataset obtained from the machine learning repository)

3) To load the wine file, press INPUT or select the browse button.



4) Dataset attributes are displayed in the Input Data table. This dataset has already been normalized so columns have values within range 0-100.


5) By default, AutoSOME is set to Normal mode, 50 ensemble iterations, a p-value threshold of 0.1, and all available CPUs. To change these settings, expand Basic Fields. Changing the AutoSOME mode from Normal to Precision will take ~4X longer, but may increase clustering accuracy.

6) Press RUN. Progress and elapsed time will be displayed.



7) When AutoSOME finishes, the GUI will be automatically redirected to the output window.

8) Clusters and singletons are displayed as a tree. Click on clusters to see their contents. Because the wine dataset is known to have 3 clusters, all data items are labeled by their benchmark clusters (i.e. 1,2,3).



9) The GUI includes several display options. For example, go to View in the menu bar, and select heat map>green red. The mouse scroll-bar will zoom the heat map. The leftmost vertical bar displays cluster confidence for each data point (blue=highest confidence, red=lowest confidence). In the figure below, cluster 1 was selected. Notice that the Input Data table now shows attributes of cluster 1 rather than the entire dataset.



10) You can select more than one cluster in the tree to show their heat maps simultaneously (hold shift or control while selecting with your mouse). In this case, you may want to resize the heat map to fit the display window. Either right-click the mouse when hovering over the heat map or select fit to screen from View in the menu bar. For several additional display parameters, select View>settings>image settings. In the Image Settings window, you can, for example, hide the row labels. See the manual for further details of display options.



11) Save the image by either pressing Save in the Image Settings window, or by selecting File>Export>save image from the main menu bar. You can also save tabular output for the selected cluster(s) by using File>Export>save tabular data.

12) You can also display individual clusters using the signal plot diagram. Select View>signal plot>rainbow to see a cluster represented as a line diagram of all row vectors.



13) The Output Files table located in the center-left portion of the GUI will display links to all output files after a clustering job is executed. Select an output file link with your mouse to display the actual file.



14) To revisit AutoSOME clustering results at a later time, select File>Open AutoSOME results, then find and select the Clusters text file (e.g. AutoSOME_MyFile_E50_Pval0.1_rows.txt). By default, this file is written to the same directory as the original input file. Any tabular output saved from browsing clusters (see Step 11) can also be reopened.

15) This concludes the wine mini-tutorial. All six additional benchmark datasets are easily explored by following the above steps.