Fall 2018

16340

PH 221 MW 5:30-6:45

**Instructor**
- Dr. William
Sverdlik
( wsverdlik@emich.edu )

**Office**
- 512 E**
Phone**
487-7081

If these times don't work out for you, let me know. We'll figure out some other time. Please note that these times may change.

Software: WEKA (it's free!)

The Algorithms:

1) Majority Rule

2) One-Rule

3) ID3/C4/C5

We need the goofy table

From Temple University

From Wikipedia

4) Models of Information Retrieval

5) Clustering

I am looking for nice links for this topic. So far, the best reference I can find is the textbook.

Clustering Part 1

Clustering Part 2

6) Apriori Algorithm and Market Basket Analysis

Apriori 0

Apriori 1

Apriori 2

Sample Data from Text

Aprior Algorithm from SUNY Buffalo (pdf)

Apriori Algorithm from University of Iowa (pdf)

7) Neural Networks

Part 1

Part 2a

Part 2b

Some More

Neural Network Homework

8) Regression Models

Simple Linear Regression

Correlation Coefficient (note wikipedias complicated name for this

General Linear Model (higher dimensions)

9) Probabilistic Methods

Intro Stuff (slides 1-31)

Abduction

Markov Models and Hidden Markov Models

Nice Easy Viterbi Algorithm Discussion (and the one we'll do in class)

We will discuss various machine learning and data mining algorithms; periodically homework (including some computer programming) will be given. Students are expected to respect submission deadlines; late submissions will be penalized 25% per class period late (homeworks are due at the beginning of class). There will be two exams, homework, and a term project .

Approximate Weighting:

- Homeworks 25%

- Exam1 25% September 24, 2018

- Exam2 25% October 22, 2018

- Project 25%

**Cheating:**

It's
not a good thing to do. It's
counter-productive, you don't learn anything, and most importantly, it
violates University policy.
Cheating is defined as presenting any work as your own that you
obtained from some other source. This
includes copying programs and copying on quizzes.

If you are having problems with the class, come see me!

OK. What about books, references, etc ?

Good question! You must take good notes. In fact, you must take notes for the entire class!! Here's how it will work:

Every week, there will be a note taker and a reviewer. The note
taker takes class notes for both Monday and Wednesday classes. Then,
the note taker passes along his/her notes to the reviewer by Wednesday
night at the latest.

The reviewer reviews, edits and corrects the notes. Finally, the
reviewer will email me the final version of the weeks notes by no later
than Saturday evening. I will post these notes on the web before class
the following Monday.

Note
Taker
Reviewer

Colvin,
Rayaan
Browning, Nicholas

Browning,
Nicholas
Bouzid, Abderraouf

Bouzid, Abderraouf
Mylavarapu, Deepthi

Mylavarapu, Deepthi
Nettem, Sindhura Lakshmi

Nettem, Sindhura
Lakshmi
Nijhawan, Dhwani Sunil

Nijhawan, Dhwani
Sunil
Uddur Gowrishankar, Manjushri

Uddur Gowrishankar,
Manjushri Arif, Muhammad Sohaib

Data Sets!!

- WEKA Datasets. Note:
these are already in ARFF format

- University of California Irvine Data
Repository (KDD). Well known collection!

- University of California Irvine
Data Repository (ML)

- Federal Statistics. You will have
to do some work to create the data files, but there are interesting
things to find

- Major League Baseball. You can
find statistics for any major league sport! Try a google search.

Group
Presentations (find something you like or suggest something else)

Let's put a due date of Monday October 1 on a 2-4 page proposal.
What will you do, what data will
be gathered,
who is involved ?

Final Project Presentations

Your talk should last approximately 20 minutes and allow
another 5 minutes for questions. Remember, you must submit a paper
summarizing your results, as well as citing any references. It's a small class, we will discuss team size.

Here's a partial list of talks from previous years:

- Logistic Regression and Email Spam

- Lunar Cycles and the Stock Market

- Mushroom Classification

- Social Networks (structure)

- Abandoned Objects in Video Feeds

Papers:

HOMEWORK 1 - Due Monday October 8:

Analyze
some data !

HOMEWORK 2 - Due Monday October 15:

Read the following paper sand submit a 1-2 page summary of each:

What's
the difference
between Data Mining and Statistics ? Read this and find out .

One-Rule
Rules! Forget these
silly decision trees, one level suffices??

HOMEWORK 3 a)- Due Monday October 29:

Read the following
papers and submit a 1-2 page summary . Be prepared to discuss in class

Nepotistic
Links!! (A bit
outdated , but interesting!!)

HOMEWORK 3b) - Due Monday October 29

1-2 page write up for your final project. Be prepared to present and discuss.

Summarize the New York Times article on Link Spam. Due TBA

Neural Network 1

Neural Network 2

Neural Network 3

Neural Network 4

HOMEWORK: Neural Network Program THIS PROGRAM IS DUE ???????