Concept Learning

borrowed from : http://www.userwebs.pomona.edu/~jbm04747/courses/spring2003/cs151/lectures/VersionSpaces/VersionSpaces.html

Examples:

"bird"
"car"
"Impressionist painting"
"catch-22 situation"
"days that are good for enjoying my favorite water sport"

A Concept Learning Task

Represent days (i.e., instances) using a set of attributes:

Sky: Sunny, Cloudy, Rainy
Temp: Warm, Cold
Humidity: Normal, High
Wind: Strong, Weak
Water: Warm, Cool
Forecast: Same, Change

Represent hypothesis as a conjunction of constraints on attribute values.

Possible attribute constraints:

single required value (e.g., Water = Warm)
any value is acceptable: ?
no value is acceptable: Ø

Example:

< Sky=Sunny, Temp=?, Humidity=High, Wind=?, Water=?, Forecast=? >

or just

< Sunny, ?, High, ?, ?, ? >

This is equivalent to a (restricted) logical expression:

Sky(Sunny) ^ Humidity(High)

Most general hypothesis: < ?, ?, ?, ?, ?, ? >

Most specific hypothesis: < Ø, Ø, Ø, Ø, Ø, Ø >

Positive and negative training examples for the concept EnjoySport:

Example	*Sky*	*Temp*	*Humidity*	*Wind*	*Water*	*Forecast*	*EnjoySport*
1	Sunny	Warm	Normal	Strong	Warm	Same	Yes
2	Sunny	Warm	High	Strong	Warm	Same	Yes
3	Rainy	Cold	High	Strong	Warm	Change	No
4	Sunny	Warm	High	Strong	Cool	Change	Yes

Task is to search hypothesis space for a hypothesis consistent with examples.

Inductive learning hypothesis:
    Any hypothesis found to approximate the target function well over
    a sufficiently large set of training examples will also approximate
    the target function well over other unobserved examples.

In this case, space is very small:

1 + ( 4 · 3 · 3 · 3 · 3 · 3 ) = 973

Ø + ( {Sunny, Cold, Rainy, ?} · { Warm, Cold, ? } · . . . )

Need a way of searching very large or infinite hypothesis spaces efficiently.

General-to-Specific Ordering

Hypothesis space H has a natural structure.

h1 = < Sunny, ?, ?, Strong, ?, ? >

h2 = < Sunny, ?, ?, ?, ?, ? >

h2 > h1 ("more general than or equal to")

> relation defines a partial-order on H.

h1 can be generalized to h2
h2 can be specialized to h1

We can use this structure to search for a hypothesis consistent with the examples.

FIND-S Algorithm

Finds a maximally specific hypothesis.

    h <--most specific hypothesis in H
    for each positive training example x {
        for each attribute constraint a in h {
            if constraint a is not satisfied by x {
                replace a by next more general constraint that is satisfied by x
           }
        }
    }
    output hypothesis h

This method works, provided that:

H contains a hypothesis that describes the true target concept
No errors in training data
Hypotheses in H described as conjunctions of attribute constraints

Problems with FIND-S

Cannot determine if final h is the only hypothesis in H consistent with the data. There may be many other consistent hypotheses.

Why prefer most specific hypothesis? Why not most general consistent hypothesis (or one of intermediate generality)?

Fails if training data is noisy or inconsistent. Cannot detect inconsistent training data.

Cannot handle hypothesis spaces that may have several consistent maximally-specific hypotheses.

Not a problem with hypothesis representation used above, but may be if more general logical expressions (e.g., with disjunctions) are used. Must then use backtracking.

Current-Best-Hypothesis Search

generalization of FIND-S

uses backtracking and a more general hypothesis representation (arbitrary logical expressions, including disjunctions)

does not necessarily start from most-specific-possible hypothesis

Example:

*Sky*	*Temp*	*Humidity*	*Wind*	*Water*	*Forecast*	*EnjoySport*
Sunny	Warm	Normal	Strong	Cool	Change	Yes
Cloudy	Warm	Normal	Strong	Cool	Change	Yes
Rainy	Warm	Normal	Strong	Cool	Change	No

Not possible to find a consistent hypothesis using FIND-S.

FIND-S: < ?, Warm, Normal, Strong, Cool, Change > won't work

CBH: (Sky(Sunny) v Sky(Cloudy)) ^ Temp(Warm) ^ . . . ^ Forecast(Change)

Version Spaces

Idea: output the set of all hypotheses consistent with training data, rather than just one (of possibly many).

No need to explicitly enumerate all consistent hypotheses.

Take advantage of partial ordering of H.

The version space VS_H,D with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples D.

How to represent a version space?

One approach: Just list all the members.

List-Then-Eliminate Algorithm

    VS <-- a list of all hypotheses in H
    for each training example <x, f(x)> {
        remove from VS any hypothesis h such that h(x) != f(x)
    }
    output hypotheses in VS

Only works if H is finite and small (usually an unrealistic assumption).

Better representation for version spaces: Use 2 boundary sets:

S-set: least general (i.e., most specific) members of H
G-set: most general members of H

VS is the interval from S-set to G-set.

Total of 6 possible hypotheses consistent with EnjoySport training data
FIND-S only returns one of these (the most specific one)
Principle of least-commitment (like in partial-order planning algorithm)
All consistent hypotheses can be enumerated from S-set and G-set

Version-Space Learning Algorithm

G <-- set of maximally general hypotheses in H (e.g., {<?,?,?,?,?,?>})
S <-- set of maximally specific hypotheses in H (e.g., {<Ø,Ø,Ø,Ø,Ø,Ø>})
for each training example e {
    if e is a positive example {
        remove from G any hypotheses inconsistent with e
        for each hypothesis s in S not consistent with e {
            1. remove s from S
            2. add to S all minimal generalizations h of s such that
                h is consistent with e, and some member of G is more general than h
            3. remove from S any hypothesis that is more general than another
                hypothesis in S
        }
    }
    if e is a negative example {
        remove from S any hypotheses inconsistent with e
        for each hypothesis g in G not consistent with e {
            1. remove g from G
            2. add to G all minimal specializations h of g such that
                h is consistent with e, and some member of S is more specific than h
            3. remove from G any hypothesis that is less general than another
                hypothesis in G
        }
    }
}

Example trace

Behavior of Version-Space Learning

VS will converge to hypothesis that correctly describes target concept if:

no errors in training examples
H contains a hypothesis that correctly describes target concept

VS can be monitored to determine how "close" it is to the true target concept

if S and G converge to a single, identical hypothesis, then target concept has been learned exactly

VSL algorithm removes from VS every hypothesis inconsistent with each example, so the first incorrect/noisy example will eliminate the target concept from VS !

if S and G converge to the empty set, then no hypothesis in H is consistent with all training examples

VSL algorithm can thus detect inconsistencies in training data, assuming that the target concept is representable in H

How can Experiment-Generator module use current VS to suggest new problems to try?

Choose an instance that is classified as positive by half of the VS hypotheses, and as negative by the other half

Example: < Sunny, Warm, Normal, Light, Warm, Same > satisfies 3 out of 6 hypotheses in VS of earlier example

Partially-learned concepts (VS with > 1 hypothesis) can still be useful

if all hypotheses in VS agree on a particular novel instance, then classification must be correct

if an instance satisfies all hypotheses s in S, then instance must be positive

if an instance is rejected by all hypotheses g in G, then instance must be negative

above two tests can be done efficiently (no need to test all hypotheses in VS)

if hypotheses in VS give a mixture of positive and negative classifications, we can output a probability of correct classification (or use majority vote)

Example:

*New instance*	*Sky*	*Temp*	*Humidity*	*Wind*	*Water*	*Forecast*	*EnjoySport*
A	Sunny	Warm	Normal	Strong	Cool	Change	?
B	Rainy	Cold	Normal	Light	Warm	Same	?
C	Sunny	Warm	Normal	Light	Warm	Same	?
D	Sunny	Cold	Normal	Strong	Warm	Same	?

VS from earlier example says:

all hypotheses agree that A is positive
all hypotheses agree that B is negative
3 hypotheses say C is positive, 3 say negative (useful as a test to gain more info)
2 hypotheses say D is positive, 4 say negative (67% confidence that it is negative)