Simplified Machine LearningPart 2: Concept Learning

J2w4...SiYs

13 Mar 2024

Concept Learning

Inferring a boolean-valued function from training examples and acquiring the definition of a general category given a sample of positive and negative training examples of the category.

for example, I want to know whether I would enjoy going out (yes/no boolean value) if the sky is sunny, the temperature is warm, humid is high…
Concept Learning is a type of Supervised Learning.
Photo by Avel Chuklanov on Unsplash

So, What is Supervised Learning?

Supervised Learning algorithms build a mathematical model of dataset that contains both inputs and desired outputs.

typical supervised learning techniques: concept learning, classification and regression
compared to unsupervised learning, where inputs are known but we want to discover the outputs or underlying patterns

Classification

output data is discrete categories — to identify which category the new observation belongs to

e.g. Decision Tree Learning is a common classification technique (will be discussed in future articles)

Regression

output data is real numbers — estimating relationship among variables
1. linear regression

the linear approach to model the relationship between the dependent variable (scalar response) and one or more independent variables (explanatory variables)
simple linear regression: when having only one independent variable hence Y = w0 + w1x (can be simply thought of as linear algebra learnt in high school Y = f(X) = a + bX)

2. ordinal regression

a type of regression analysis used for predicting an ordinal variable (where the output variables have ordered categories), therefore, can be thought of as between regression and classification
by assigning numbers to ordinal categories, classification problem has been transformed into regression analysis (General Linear Model)
popularly used in social science (e.g. strongly agree, agree, neutral, disagree, strongly disagree*)

3. logistic regression

uses a logistic function to model a binary dependent variable (e.g. predicting the risk of developing a disease)
for examples, predicting the risk of developing a disease; sigmoid function using in Artificial Neural Network (ANN) is an example of using logistic regression model

How to Represent Concept Learning?

Data representation

using attribute-value pairs

Hypothesis Representation

using conjunction of constraints on attributes
? — accept any value for the attribute
0 — no value acceptable for the attribute
most general hypothesis <?,?,?,?,?,?>
most specific hypothesis <0,0,0,0,0,0>

Notation:

X: represents all instances used to define the concept
c: the boolean-valued target concept we denote to learn c: X → {0, 1} (target function maps instance into 0 or 1)
H: the set of all possible hypotheses (hypothesis space) that learner may consider to estimate target concept c
there might be a difference in the actual value c(x) and the predicted value h(x), but the aim of the learning algorithm is to minimize the error

Inductive Learning Hypothesis:

Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
this is the fundamental assumption of inductive learning

Hypothesis Search Space

concept learning as search: same for all learning algorithms, concept learning can be thought of as search through hypothesis space and find the appropriate inductive hypothesis

Calculate Size of Search Space — three methods

General-to-Specific Ordering

start with a general hypothesis space and work all the way through to the specific or vice versa
hypothesis arranged in a partial ordering (reflexive, antisymmetric & transitive)

Partially ordered set

In mathematics, especially order theory, a partially ordered set (also poset) formalizes and generalizes the intuitive…

en.wikipedia.org

Find-S Algorithm

Properties

finding a maximally specific hypothesis:
begin with the most specific possible hypothesis (h ← <0,0,0,0,0,0>)and generalize this hypothesis each time it fails to meet the goal
ignores every negative example

Inductive bias of Find-S

new instances are classified only in the case where all members of the current version space agree on the classification, otherwise, the system refuses to classify the new instance (same as candidate elimination)
and additionally, all instances are assumed to be negative instances until the opposite is entailed by its other knowledge

List-then-Eliminate Algorithm

Process:

start with version space = all hypothesis
go through all hypothesis, and test the hypothesis for each training examples and find those that satisfy
output the updated version space

Problem:

it requires exhaustively enumerating all hypotheses in hypothesis space

Candidate Elimination Algorithm

An updated version of list-then-eliminate algorithm, employing a more compact representation of version space. Find all descriable hypotheses that are consistent with the observed training examples.

Compacted Representation of Version Space:

general specific boundaries:
general boundary G: the maximally general hypothesis consistent with training data D
specific boundary S: the minimally general hypothesis consistent with training data D

Process:

for each training examples: update S and G until consistent with training example; check S and G, remove hypothesis from G that is less general than another hypothesis in S and vice versa
the boundaries will converge if there are no errors in the training examples

Partially Learned Concepts

when training examples used up before the boundary convergence
new instance satisfies all s → positive
new instance satisfies none of g → negative
otherwise, use voting

Candidate Elimination vs. Find-S:

in find-s algorithm only find hypotheses that satisfy with training examples, hence only consider positive training example
in candidate elimination algorithm, we need to find all consistent hypothesis — version space, hence h(x) = c(x) consider both positive and negative training examples

Inductive Bias

Inductive bias is the set of assumptions that, together with the training data, deductively justify the classifications assigned by the learner to future instances. Every learner has an inductive bias.

Simply said, it is the assumptions that are used to make generalization beyond observed data. If a learner is unbiased, it is completely unable to generalize beyond the unobserved examples — it is just a database.

Inductive vs. Deductive system

deductive reasoning: general principle → special cases
inductive reasoning: special cases → general principle

Two types of Inductive Biases

Preference Bias: incompletely search through a complete hypothesis space, therefore inductive bias introduced by search strategy bias
(e.g. ID3 algorithm — it uses heuristic search to guide in the search space which increases the chance of taking leaps in steps)
Restriction Bias: complete search through an incomplete hypothesis space, therefore, inductive bias introduced by hypothesis space
(e.g. candidate-elimination algorithm searches through version space which is a subset of the whole hypothesis space)
Preference bias is more desirable because of the increase in computing power

A note from AI In Plain English

We are always interested in helping to promote quality content. If you have an article that you would like to submit to any of our publications, send us an email at submissions@plainenglish.io with your Medium username and we will get you added as a writer.