• Nursing Exams
  • HESI A2 EXAMS
  • Finance and Insurance
  • NCLEX EXAM
  • Real Estate
  • Business
  • Medical Technology
  • Counseling and Social Work
  • English Language
  • Graduate and Professional School
  • CAREER EXAMS
  • Medical Professional
  • K 12 EXAMS
  • Personal Fitness
  • Public Service and Legal
  • Teaching
  • Nutrition
  • Construction and Industry
  • Test

IT446 DATA MINING & DATA WAREHOUSING Assignment No. 3

Business Nov 3, 2025
Loading...

Loading study material viewer...

Page 0 of 0

Document Text

IT446 DATA MINING & DATA WAREHOUSING Assignment No. 3

Q1. Given a decision tree, you have the option of (a) converting the decision tree to rules and then pruning the resulting rules, or (b) pruning the decision tree and then converting the pruned tree to rules. What advantage does (a) have over (b)? (0.75 Mark)

Q2. See the following Figure and compute the true positive rate .TPR/ , false positive rate .FPR/, Precision and Accuracy. (1 Marks)

• There are two possible predicted classes: "yes" and "no". If we predict the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease.• The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).• Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.• In reality, 105 patients in the sample have the disease, and 60 patients do not.

Q3. Compare the advantages and disadvantages of eager classification (e.g., decision tree, Bayesian, neural network) versus lazy classification (e.g., k-nearest neighbor, case-based reasoning). ( 1 Mark)

Q4. The following decision tree has been created to predict what someone can do.

  • Convert this tree to if then rules (0.25 Mark)

b. Using the following testing data:

  • Predict the class of each record (0.25 Mark)

Parents Visiting

Weather

Money

class

Prediction 1 Yes Sunny Rich Shopping

2 Yes Windy Poor Cinema

3 No Windy Poor Play tennis

4 No Rainy Rich Stay in

5 Yes Rainy Poor Stay in

6 No Windy Rich Cinema

ii. Calculate the accuracy of this model. (0.25 Mark) iii. Interpret the obtained result (0.25 Mark) iv. How we can improve the performance of the obtained model? (0.25Mark)

Answers:

Q1. If pruning a subtree, we would remove the subtree completely with method (b). However, with method (a), if pruning a rule, we may remove any precondition of it. The latter is less restrictive.

Q2. Let's now define the most basic terms, which are whole numbers (not rates):

true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.

true negatives (TN): We predicted no, and they don't have the disease.

false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.") false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.") I've added these terms to the confusion matrix, and also added the row and column totals:

This is a list of rates that are often computed from a confusion matrix for a binary classifier:

• Accuracy:Overall, how often is the classifier correct?

  • (TP+TN)/total = (100+50)/165 = 0.91

• Misclassification Rate:Overall, how often is it wrong?

  • (FP+FN)/total = (10+5)/165 = 0.09
  • equivalent to 1 minus Accuracy
  • also known as "Error Rate"

• True Positive Rate:When it's actually yes, how often does it predict yes?

  • TP/actual yes = 100/105 = 0.95
  • also known as "Sensitivity" or "Recall"

• False Positive Rate:When it's actually no, how often does it predict yes?

  • FP/actual no = 10/60 = 0.17

• Specificity:When it's actually no, how often does it predict no?

  • TN/actual no = 50/60 = 0.83
  • equivalent to 1 minus False Positive Rate

• Precision:When it predicts yes, how often is it correct?

  • TP/predicted yes = 100/110 = 0.91

• Prevalence:How often does the yes condition actually occur in our sample?

  • actual yes/total = 105/165 = 0.64
  • Q3. Eager classification is faster at classification than lazy classification because it constructs a generalization model before receiving any new tuples to classify. Weights can be assigned to attributes, which can improve classification accuracy. Disadvantages of eager classification are that it must commit to a single hypothesis that covers the entire instance space, which can decrease classification, and more time is needed for training.Lazy classification uses a richer hypothesis space, which can improve classification accuracy. It requires less time for training than eager classification. A disadvantages of lazy classification is that all training tuples need to be stored, which leads to expensive storage costs and requires efficient indexing techniques. Another disadvantage is that it is slower at classification because classifiers are not built until new tuples need to be classified. Furthermore, attributes are all equally weighted, which can decrease classification accuracy. (Problems may arise due to irrelevant attributes in the data.) 4 Ans a.

If then rules:

R1 : IF(Parents Visiting = YES) then Cinema

R2 : IF(Parents Visiting = NO) AND (Weather = Sunny) then Play Tennis

R3 : IF(Parents Visiting = NO) AND (Weather = Rainy) then Stay in

R4 : IF(Parents Visiting = NO) AND (Weather = Windy) AND (Money = Rich) then Shopping R5 : IF(Parents Visiting = NO) AND (Weather = Windy) AND (Money = Poor) then Cinema

b.Prediciton:

(i) Parents Visiting Weather Money Class PREDICITION

  • Yes Sunny Rich Shopping False
  • Yes Windy Poor Cinema True
  • No Windy Poor Play Tennis False
  • No Rainy Rich Stay In True
  • Yes Rainy Poor Stay In False
  • No Windy Rich Cinema False
  • (ii) Accuracy = No of correct predicitons / total no of predictions

= 2/6 =1/3 = 33.33 %

(iii) Here, we can see that total number of prediction is 6. But as we are going with the if then rules than, for first prediction (Parent Visiting = Yes), so prediction should be Cinema but the class is shopping so prediction is false.Second prediciton we have same condition like (Parent Visiting = yes) and class is also Cinema so than according to if then rules prediciton is True.Third prediction we have condition of (Parent Visiting = No)AND (Weather = Windy) AND (Money = Poor) so that it has to be class of Cinema but class is Play Tennis so prediction is False.Fourth prediction we have condition of (Parent Visiting = No)AND (Weather = Rainy) AND (Money = Rich) so that it has to be class of Stay in and class is Stay in so prediction is True.Fifth prediction we have condition of (Parent Visiting = Yes)AND (Weather = Rainy) AND (Money = Poor) so that it has to be class of Stay in and class is Cinema in so prediction is False.Sixth prediction we have condition of (Parent Visiting = No)AND (Weather = Windy) AND (Money = Rich) so that it has to be class of Shopping and class is Cinema in so prediction is False.(ix)

There are seveal methods to improve performance of tree like :

1) Attribute Evaluator :

In this method subset of attributes are assessed.Determine attributes are correalte highly with class value and low with class value.Depending upon it tree prunnig take place.we can use predictive algorith for that.

2) Search Method :

here Random search and Exhaustive search are involved so Best-first search.We have to test all teh combinations of attribues and use best-first strategy to navigate attribute subsets.We use greedy approch for reducing training time as well.Step-by-step explanation 3 Eager classification is much faster than the lazy classification because it builds a generalization model before receiving any new tuples to classify. Accuracy of classification is generally seen because weights are assigned to attributes. Since this kind of classification involves a single theory for the entire classification, this leads to decrease in classification levels and also takes a lengthy time. People working on such classifications need to be trained.Lazy classification on the other hand involves a better hypothesis level this develop accuracy of classification. In this classification trading tuples have to be stored increasing costs and

Download Study Material

Business Study Materials
100.00/100 days
  • Full access to all Business study materials
  • Access for 100 days
  • Study materials and practice papers
Purchase Membership
Get download access to all Business study materials

Study Material Information

Category: Business
Description:

IT446 DATA MINING & DATA WAREHOUSING Assignment No. 3 Due Date: Saturday 3 rd Dec 11:59 PM Total Marks: 4 Q1. Given a decision tree, you have the option of (a) converting the decision tree to rules...