Wednesday, January 30, 2019

The Benefits and Drawbacks of a Binary Tree Versus a Bushier Tree

Homework 3 4. Discuss the benefits and drawbacks of a binary tree versus a bushier tree. The structure of binary is simple than a bushier tree. Each parent thickening only if has deuce small fry. It save the storage space. Besides, binary tree may deeper than bushier tree. The result character of binary may not very refine. 5. Construct a compartmentalisation and regression tree to classify salary based on the separate variables. Do as much as you can by hand, sooner turning to the software. Data NO. 2 3 4 5 6 7 8 9 10 11 stave sales forethought traffic dish sexuality egg-producing(prenominal) young-begetting(prenominal) male male Female manly Female Female male Female young-begetting(prenominal) maturate 45 25 33 25 35 26 45 40 30 50 25 Salary $48,000 $25,000 $35,000 $45,000 $65,000 $45,000 $70,000 $50,000 $40,000 $40,000 $25,000 take aim aim 3 aim 1 train 2 train 3 level 4 take aim 3 direct 4 Level 3 Level 2 Level 2 Level 1 Candidate Splits for t= rout o ut thickener Candidate Split 1 2 3 Left child node, tL business concern = utility air = Management wrinkle = Sales Right child Node, tR avocation = Management, Sales, provide seam = serve well, Sales, Staff profession = Service, Management, Staff 5 6 7 8 9 10 11 12 Occupation = Staff sex = Female era 45 set of the Components of the Optimality Measure =(st) for distributively panorama break away, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? (st) Root Node 1 2 3 4 5 6 7 8 9 0. 27 0. 73 0. 33 0. 33 0. 33 0. 00 0. 13 0. 25 0. 38 0. 29 0. 25 0. 40 0. 23 0. 36 0. 64 0. 00 0. 18 0. 82 0. 00 0. 18 0. 82 0. 50 0. 45 0. 55 0. 00 0. 27 0. 73 0. 67 0. 36 0. 64 0. 50 0. 45 0. 55 0. 40 0. 55 0. 45 0. 33 0. 00 0. 50 0. 50 0. 20 0. 00 0. 00 0. 20 0. 33 0. 29 0. 25 0. 20 0. 50 0. 50 0. 00 0. 0 0. 33 0. 50 0. 40 0. 33 0. 29 0. 38 0. 40 0. 50 0. 00 0. 00 0. 40 0. 00 0. 00 0. 00 0. 00 0. 14 0. 13 0. 20 0. 29 0. 22 0. 11 0. 33 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 43 0. 22 0. 22 0. 33 0. 38 0. 43 0. 33 0. 20 0. 25 0. 33 1. 00 0. 00 0. 22 0. 22 0. 00 0. 25 0. 29 0. 33 0. 40 0. 25 0. 33 0. 00 0. 46 0. 30 0. 30 0. 50 0. 40 0. 46 0. 93 0. 50 0. 46 0. 40 1. 60 0. 66 0. 26 0. 40 0. 46 0. 53 0. 66 0. 46 0. 46 0. 30 0. 23 0. 26 0. 33 0. 44 0. 33 0. 38 0. 29 0. 33 0. 40 0. 50 0. 33 0. 00 10 0. 64 0. 36 0. 29 11 0. 73 0. 27 0. 25 12 0. 91 0. 09 0. 20 Optimality flyer maximized to 0. 6, when occupation=Management(Left Branch), occupation=Service or Sales or Staff(Right Branch) After the first split, left child has records 4,5,6,7, right child has records 1,2,3,8,9,10,11. at present we split the left child which has records 4,5,6,7. Candidate Split 5 6 7 10 Left claw Node, tL sexuality = Male get on with 35 Values of the Components of the Optimality Measure =(st) for each candidate split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? (st) each candidate s plit, for finality pommel A 5 6 7 0. 50 0. 50 0. 25 0. 75 0. 50 0. 50 0. 00 0. 00 0. 0 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 1. 00 1. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 1. 00 0. 00 0. 50 1. 00 0. 00 0. 00 0. 00 0. 33 0. 00 0. 00 0. 67 1. 00 1. 00 0. 38 0. 50 0. 38 0. 50 1. 00 0. 50 1. 00 0. 67 0. 00 0. 33 10 0. 75 0. 25 Optimality treasure maximized to 1. 00, when sexual urge=Male(Left Branch), sexuality=Female(Right Branch) After this split, both left growth and right branch give the bounce to pure peruse inspissation. The left child has records 4. 6 which repute=Level 3&8243 and the right child has record 5,7 which abide by=Level 4&8243. Now we split the right child of root node which has records 1,2,3,8,9,10,11.Candidate Split 1 3 Left Child Node, tL Occupation = Service Occupation = Sales Right Child Node, tR Occupation = Sales, Staff Occupation = Service, Staff 4 5 6 8 9 11 12 Occupation = Staff Gender = Female years 45 Values of the Components of the Optimality M easure =(st) for each candidate split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? (st) each candidate split, for finding node B 1 3 4 5 6 8 9 0. 43 0. 57 0. 29 0. 71 0. 29 0. 71 0. 43 0. 57 0. 29 0. 71 0. 43 0. 57 0. 57 0. 43 0. 33 0. 00 0. 50 0. 00 1. 0 0. 67 0. 50 0. 40 0. 33 0. 33 0. 50 0. 33 0. 50 0. 00 0. 00 0. 25 0. 40 0. 50 0. 25 0. 00 0. 49 0. 16 0. 40 0. 40 0. 50 0. 60 0. 50 0. 33 0. 50 1. 00 0. 20 0. 40 0. 00 0. 40 0. 50 0. 67 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 41 0. 41 0. 49 0. 41 0. 49 0. 49 0. 41 0. 24 0. 33 0. 33 0. 65 0. 82 0. 65 0. 65 0. 33 0. 33 0. 50 0. 33 0. 00 0. 33 0. 50 0. 40 0. 33 0. 00 0. 67 0. 00 0. 00 0. 00 0. 20 0. 33 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 20 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 11 0. 71 0. 29 12 0. 86 0. 14 Optimality measure maximized to 0. 2, when term25&8243(Right Branch) After this split, the left branch terminates to pure leaf node which has r ecords 2,11 and order=Level 1&8243. The right branch has records 1,3,8,9,10. Now we split the right child which has records 1,3,8,9,10. Candidate Split Left Child Node, tL Right Child Node, tR 1 3 4 5 8 9 11 12 Occupation = Service Occupation = Sales Occupation = Staff Gender = Female Age 45 Values of the Components of the Optimality Measure =(st) for each candidate split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? (st) ach candidate split, for decision node C 1 3 4 5 8 9 0. 40 0. 60 0. 40 0. 60 0. 20 0. 80 0. 60 0. 40 0. 20 0. 80 0. 40 0. 60 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 50 0. 50 1. 00 0. 50 0. 50 0. 00 0. 67 0. 00 0. 00 0. 33 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 67 0. 67 0. 50 1. 00 0. 50 0. 33 0. 50 1. 00 0. 33 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 48 0. 48 0. 32 0. 48 0. 32 0. 48 0. 48 0. 32 0. 16 0. 16 0. 32 0. 64 0. 32 0. 64 0. 16 0. 32 0. 33 0. 50 0. 00 0. 50 0. 67 0. 50 0. 00 0. 33 1. 00 1. 00 0. 67 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 11 0. 60 0. 40 12 0. 0 0. 20 Optimality measure maximized to 0. 64, when Gender=Female(Left Branch), Gender=Male(Right Branch) After this split, the right branch terminates to pure leaf node which has records 3,9 and the value=Level 2&8243. The left branch has records 1,8,9. Now we split the left child which has records 1,8,10. Candidate Split 1 3 4 11 12 Left Child Node,s tL Occupation = Service Occupation = Sales Occupation = Staff Age 45 Values of the Components of the Optimality Measure =(st) for each candidate split, for the Split PL PR P(L=1tL) P(L=2tL) P(L=3tL) P(L=4tL) P(L=1tR) P(L=2tR) P(L=3tR) P(L=4tR) 2PLPR ? st) each candidate split, for decision node D 1 3 4 0. 33 0. 67 0. 00 0. 33 0. 67 0. 00 0. 00 1. 00 0. 00 0. 00 0. 50 0. 50 0. 00 0. 44 0. 44 0. 00 1. 00 0. 00 0. 00 1. 00 0. 00 1. 00 1. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 00 0. 50 0. 00 0. 50 1. 00 0. 50 1. 00 0. 50 0. 00 0. 00 0. 00 0. 00 0. 00 0. 44 0. 44 0. 44 0. 44 0. 44 0. 89 0. 44 0. 89 0. 33 0. 67 0. 00 11 0. 33 0. 67 0. 00 12 0. 67 0. 33 0. 00 Optimality measure maximized to 0. 89, when Occupation=Staff(Left Branch), Occupation=Service or Sales(Right Branch) After this split, both the left and right branch terminate to pure leaf node.The left branch has record 10 which value=Level 2&8243 and the right branch has records 1 and 8 which value=Level 3&8243. In summary, we construct the CART tree below, Root Node (All Records) Occupation human beingsagement vs. not management Occupation=man agement Occupationmanag ement close Node A (Records 4,5,6,7 ) Gender=Female Gender=Male Age25 Level 3 (Records 4,6) Level 4 (Records 5,7) Decision Node C (Records 1,3,8,9,10) Gender=Female Decision Node D (Records 1,8,10) Gender=Male Level 2 (Records 3,9) Occupation=Staff Level 3 Occupation=Service or Sales Level 2 (Record 10) Records 1. 8) 6. Construct a C4. 5 decision tree to classify salary based on the other variables. Do as much as you can by hand, before turning to the software. Below is all candidate split and data gain for root node Candidate Split 1 Child Nodes Occupation = Service Occupation = Management Occupation = Sales Occupation = Staff 2 Gender = Female Gender = Male Age 25 Age 26 Age 30 Age 33 7 Age 35 8 Age 40 9 Age 45 0. 19 0. 12 0. 15 0. 38 cultivation Gain 0. 78 3 0. 55 4 0. 58 5 0. 38 6 0. 38 Candidate split 1 has highest education Gain=0. 8 bits and chosen for initial split. And the initial split produces four-spot second level decision node, decision node A,B,C and D. Then do the equal process again until all leaf nodes have same target class values. The C4. 5 decision tree is below. Root Node(All Records) Occupation=Service, Management, Sales or Staff Occupation=Staff Occupation=Service Occupation= Management Decision Node A (Records 1,2,3) Decision Node B (Records 4,5,6,7) Occupation= Sales Decision Node C (Records 8,9) Decision Node D (Records 10,1 1) Gender=Female Level 4 (Records 5,7)Gender=Male Level 3 (Records 4,6) Gender=Male Gender=Female Gender=Male Level 2 Gender=Female Level 3 (Record 8) Level 2 (Record 9) (Record 10) Level 1 (Record 11) Gender=Female Level 3 (Record 1) Gender=Male Decision Node E (Records 2,3) Age25 Level 1 (Record 2) Level 2 (Record 3) 7. Compare the two decision trees and hold forth the benefits and drawbacks of each. In this case, CART tree is deeper than C4. 5 tree. CART algorithm hypothecates each node(except left node) can only have two child. But C4. 5 algorithm dont have this restriction. Besides, most of leaf nodes of C4. tree have only one record, it may cause overfitting. 8. Generate the near set of decision rules for the CART decision tree. Antecedent if Occupation = Management and Gender = Male if Occupation = Management and Gender = Female if Occupation = Service, Sales, Staff and Age 25 and Gender = Female if Occupation = Service, Sales and Age 25 and Gender = Female if Occupation = Service, Sales, Staff and Age 25 and Gender = Male Consequent thus Level 3 then Level 4 then Level 1 then Level 2 then Level 3 then Level 2 nourishment 2 2 2 1 2 2 trustfulness 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 9.Generate the full set of decision rules for the C4. 5 decision tree. Antecedent if Occupation = Service and Gender = Female if Occupation = Service and Gender = Male and Age 25 if Occupation = Management and Gender = Female if Occupation = Management and Gender = Male if Occupation = Sales and Gender = Female if Occupation = Sales and Gender = Male if Occupation = Staff and Gender = Female if Occupation = Staff and Gender = Male Consequent then Level 3 then Level 1 then Level 2 then Level 4 then Level 3 then Level 3 then Level 2 then Level 2 then Level 1 Support 1/11 1/11 1/11 2/11 2/11 /11 1/11 1/11 1/11 Confidence 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 1. 0 10. Compare the two sets of decision rules and discuss the benefits and drawbacks of each. CART only has two bran ches, support is more than C4. 5, that is to say the result is not very refine. It is deeper than other trees most of the time. But it is delicate to interpret. C4. 5 can have several branches. Support of C4. 5 is less(prenominal) than CART. The result is more accurate.

No comments:

Post a Comment