Classification Problems: Detecting Unexploded Mines Using Sonar
Physical Characteristics of the Rocks Versus Mines Data Set
Abschätzung Laufzeiten
The second important observation regarding row and column counts is that if the data set has many more columns than rows, you may be more likely to get the best prediction with penalized linear regression
Statistical Summaries of the Rocks versus Mines Data Set
"""descriptive statistics for the numeric variables and a count of the unique categories in each categorical attribute
Visualization of Outliers Using Quantile‐Quantile Plot
"""... outliers ... the last quartile has a range of 4.6, which is 100 times larger than the range of the other quartiles.
stats.probplot(colData, dist="norm", plot=pylab) pylab.show()
Statistical Characterization of Categorical Attributes
"""check how many categories they have and how many examples there are from each category.
"""The popular Random Forests package written by Breiman and Cutler (the inventors of the algorithm) has a cutoff of 32 categories. If an attribute has more than 32 categories, you’ll need to aggregate them.
stratified sampling
How to Use Python Pandas to Summarize the Rocks Versus Mines Data Set
"""You can think of a data frame as a table or matrix-like structure as in Table 2-1 . The data frame is oriented with a row representing a single case (experiment, example, measurement) and columns representing particular attributes. The structure is matrix-like, but not a matrix because the elements in various columns may be of different types. Formally, a matrix is defined over a field (like the real numbers, binary numbers, complex numbers), and all the entries in a matrix are elements from that field.
The data frame structure enables access to individual elements through an index roughly similar to addressing an entry in a Python Numpy array or a list of lists.
Similarly, index slicing can be used to address an entire row or column from the array.
In addition, the Pandas data frame enables addressing rows and columns by means of their names.