In statistics, a categorical variable is a variable that can
take on one of a limited, and usually fixed, number of possible values, thus
assigning each individual to a particular group or "category." In computer science and some branches of
mathematics, categorical variables are referred to as enumerations or
enumerated types. Commonly (though not in this article), each of the possible
values of a categorical variable is referred to as a level. The probability
distribution associated with a random categorical variable is called a
categorical distribution.
Categorical data is the statistical data type consisting of
categorical variables or of data that has been converted into that form, for
example as grouped data. More specifically, categorical data may derive from
either or both of observations made of qualitative data, where the observations
are summarised as counts or cross tabulations, or of quantitative data, where
observations might be directly observed counts of events happening or might be
counts of values that occur within given intervals. Often, purely categorical
data are summarised in the form of a contingency table. However, particularly
when considering data analysis, it is common to use the term "categorical
data" to apply to data sets that, while containing some categorical
variables, may also contain non-categorical variables.
A categorical variable that can take on exactly two values
is termed a binary variable or dichotomous variable; an important special case
is the Bernoulli variable. Categorical variables with more than two possible
values are called polytomous variables; variables are often assumed to be
polytomous unless otherwise specified. Discretization is treating continuous
data as if it were categorical. Dichotomization is treating continuous data or
polytomous variables as if they were binary variables. Regression analysis
often treats category membership as a quantitative dummy variable.
Examples of categorical variables
Examples of values that might be represented in a
categorical variable:
- The blood type of a person: A, B, AB or O.
- The state that a resident of the United States lives in.
- The political party that a voter in a European country might vote for: Christian Democrat, Social Democrat, Green Party, etc.
- The type of a rock: igneous, sedimentary or metamorphic.
- The identity of a particular word (e.g., in a language model): One of V possible choices, for a vocabulary of size V.
article credit : http://en.wikipedia.org/
No comments:
Post a Comment