Tuesday, September 23, 2014

Plot C5.0 decision trees in R

Introduction

In many cases, the user of C50 R-package wants to present the resulted decision tree in some graphical form. Such a plotting feature however is not available in the C50 package. This R code overcomes the plotting issue by providing automatic interpretation of the decision tree to GraphViz design language, enabling then to straightforwardly plot the tree model using the dot command of GraphViz. In case the user has not a local installation GraphViz (open source software), they may freely acquire it from this site.

The steps to plot the decision tree are the following:
  1. Generate your model using the C50 package in R, as usual.
  2. Interpret the model output and save it to a file, by calling C5.0.graphviz function (given later in this page), using as parameters your C5.0 model and a desired text output filename.
  3. In your operating system, call the GraphViz's dot command with proper parameter syntax (example given below).

Usage

C5.0.graphviz ( C5.0.model, filename, fontname ='Arial',  col.draw ='black', col.font ='blue',  col.conclusion ='lightpink',  col.question = 'grey78', shape.conclusion ='box3d',  shape.question ='diamond', bool.substitute = c('None',     'yesno','truefalse','TF'), prefix=FALSE, vertical=TRUE )

Arguments

C5.0.model       The name of a variable which is a valid C5.0 model result.
filename         The name of a file where the output GraphViz model will be saved to.
fontname         The font that will be used for the graph.
col.draw         The color of the drawing lines.
col.font         The color of the font.
col.conclusion   The color which will be used for conclusion nodes (tree leaves).
col.question     The color which will be used for question nodes (tree inner nodes).
shape.conclusion The shape which will be used for conclusion nodes (tree leaves).
shape.question   The shape which will be used for question nodes (tree inner nodes).
bool.substitute  A substitution may take place for boolean comparisons. Default is 'None' which will plot '= 0' and '= 1' on the respective decision tree branches. The option 'yesno', will plot 'no' and 'yes' respectively, the 'truefalse' will plot 'false' and 'true' and the 'TF' option will plot '.F.' and '.T.' (considering the value of 0 as 'false' and 1 as 'true')
prefix           When set to true, the class nodes will have the prefix 'Class' before the class number (useful to multiclass problems where the class is referred by a single number).
vertical         The orientation of the decision tree. Default is vertical, and if this is set to False, the tree is drawn from left to right.

Details

In GraphViz, the X11 color scheme, the SVG scheme, and the Brewer scheme are supported, with X11 being the default. For an exhaustive list of candidate colors, the user can check the respective GraphViz page here.

An exhaustive list of candidate node shapes, is also included in this GraphViz page.

Some information on GraphViz's fonts can be found here.

Value

If successful, the function will create a text file at the given directory, containing the decision tree model described in GraphViz's dot language.

Note

In respect of boolean substitutions using the bool.substitute parameter, it is noted that the routine in this version is not able to know whether the comparison is indeed of boolean nature, neither traces other boolean comparisons with arithmetic arguments (e.g. between '1' and '2'), i.e. the comparison is performed between '0' and '1' (both required, as inner node options, for a successful substitution).

Update: Version 2 extends the translation into multi-branched trees. Version 1 was able to handle only trees with binary splits. 
Version 2.2 corrects a missing initialization of the firstindent variable.

Example

We use the example from the C50 package, a data set from the MLC++ machine learning software for modeling customer churn.

library(C50)
data(churn)
treeModel <- C5.0(x = churnTrain[, -20], y = churnTrain$churn)
summary(treeModel) #to compare output
C5.0.graphviz(treeModel, 'c:\\dtreeout.txt', col.question ='cyan')

(The generated output of the C5.0.graphviz routine, contained in the dtreeout.txt file is shown in the Appendix, at the end of this page).

Then, in the operating system, we ensure we have access to dot command of the GraphViz package (having the directory either in path, or having navigated to the respective directory) and we enter the following command (here, presumed from a WINDOWS command prompt):

dot -Tpng c:\dtreeout.txt > c:\dtreeout.png

This command produces a graphic file named 'dtreeout.png'.  This file, includes the following graph, which depicts graphically the example decision tree (not shown in actual size here):



Code

#---------------------------------------------------------#
# Function: C5.0.graphviz                                 # 
# Version: 2.2.0                                          #
# Date: 26/09/2014                                        #
# Author: Athanasios Tsakonas                             #
# This code implements C5.0.graphviz conversion routine   #
#---------------------------------------------------------#

C5.0.graphviz <- function( C5.0.model, filename, fontname ='Arial',col.draw ='black',
col.font ='blue',col.conclusion ='lightpink',col.question = 'grey78',
shape.conclusion ='box3d',shape.question ='diamond', 
bool.substitute = 'None', prefix=FALSE, vertical=TRUE ) {

library(cwhmisc)  
library(stringr) 
treeout <- C5.0.model$output
treeout<- substr(treeout, cpos(treeout, 'Decision tree:', start=1)+14,nchar(treeout))
treeout<- substr(treeout, 1,cpos(treeout, 'Evaluation on training data', start=1)-2)
variables <- data.frame(matrix(nrow=500, ncol=4)) 
names(variables) <- c('SYMBOL','TOKEN', 'TYPE' , 'QUERY') 
connectors <- data.frame(matrix(nrow=500, ncol=3)) 
names(connectors) <- c('TOKEN', 'START','END')
theStack <- data.frame(matrix(nrow=500, ncol=1)) 
names(theStack) <- c('ITEM')
theStackIndex <- 1
currentvar <- 1
currentcon <- 1
open_connection <- TRUE
previousindent <- -1
firstindent <- 4
substitutes <- data.frame(None=c('= 0','= 1'), yesno=c('no','yes'),
truefalse=c('false', 'true'),TF=c('F','T'))
dtreestring<-unlist( scan(text= treeout,   sep='\n', what =list('character')))

for (linecount in c(1:length(dtreestring))) {
lineindent<-0
shortstring <- str_trim(dtreestring[linecount], side='left')
leadingspaces <- nchar(dtreestring[linecount]) - nchar(shortstring)
lineindent <- leadingspaces/4
dtreestring[linecount]<-str_trim(dtreestring[linecount], side='left') 
while (!is.na(cpos(dtreestring[linecount], ':   ', start=1)) ) {
lineindent<-lineindent + 1 
dtreestring[linecount]<-substr(dtreestring[linecount],
ifelse(is.na(cpos(dtreestring[linecount], ':   ', start=1)), 1,
cpos(dtreestring[linecount], ':   ', start=1)+4),
nchar(dtreestring[linecount]) )
shortstring <- str_trim(dtreestring[linecount], side='left')
leadingspaces <- nchar(dtreestring[linecount]) - nchar(shortstring)
lineindent <- lineindent + leadingspaces/4
dtreestring[linecount]<-str_trim(dtreestring[linecount], side='left')
}
if (!is.na(cpos(dtreestring[linecount], ':...', start=1)))
lineindent<- lineindent +  1 
dtreestring[linecount]<-substr(dtreestring[linecount],
ifelse(is.na(cpos(dtreestring[linecount], ':...', start=1)), 1,
cpos(dtreestring[linecount], ':...', start=1)+4),
nchar(dtreestring[linecount]) )
dtreestring[linecount]<-str_trim(dtreestring[linecount])
stringlist <- strsplit(dtreestring[linecount],'\\:')
stringpart <- strsplit(unlist(stringlist)[1],'\\s')
if (open_connection==TRUE) { 
variables[currentvar,'TOKEN'] <- unlist(stringpart)[1]
variables[currentvar,'SYMBOL'] <- paste('node',as.character(currentvar), sep='')
variables[currentvar,'TYPE'] <- shape.question
variables[currentvar,'QUERY'] <- 1
   theStack[theStackIndex,'ITEM']<-variables[currentvar,'SYMBOL']
theStack[theStackIndex,'INDENT'] <-firstindent 
theStackIndex<-theStackIndex+1
currentvar <- currentvar + 1
if(currentvar>2) {
  connectors[currentcon - 1,'END'] <- variables[currentvar - 1, 'SYMBOL']
}
   }
connectors[currentcon,'TOKEN'] <- paste(unlist(stringpart)[2],unlist(stringpart)[3])
if (connectors[currentcon,'TOKEN']=='= 0') 
connectors[currentcon,'TOKEN'] <- as.character(substitutes[1,bool.substitute])
if (connectors[currentcon,'TOKEN']=='= 1') 
connectors[currentcon,'TOKEN'] <- as.character(substitutes[2,bool.substitute])
if (open_connection==TRUE) { 
if (lineindent<previousindent) {
theStackIndex <- theStackIndex-(( previousindent- lineindent)  +1 )
currentsymbol <-theStack[theStackIndex,'ITEM']
} else
currentsymbol <-variables[currentvar - 1,'SYMBOL']
} else {  
currentsymbol <-theStack[theStackIndex-((previousindent -lineindent ) +1    ),'ITEM']
theStackIndex <- theStackIndex-(( previousindent- lineindent)    )
}
connectors[currentcon, 'START'] <- currentsymbol
currentcon <- currentcon + 1
open_connection <- TRUE 
if (length(unlist(stringlist))==2) {
 stringpart2 <- strsplit(unlist(stringlist)[2],'\\s')
variables[currentvar,'TOKEN'] <- paste(ifelse((prefix==FALSE),'','Class'), unlist(stringpart2)[2]) 
variables[currentvar,'SYMBOL'] <- paste('node',as.character(currentvar), sep='')
variables[currentvar,'TYPE'] <- shape.conclusion
variables[currentvar,'QUERY'] <- 0
currentvar <- currentvar + 1
connectors[currentcon - 1,'END'] <- variables[currentvar - 1,'SYMBOL']
open_connection <- FALSE
}
previousindent<-lineindent
}
runningstring <- paste('digraph g {', 'graph ', sep='\n')
runningstring <- paste(runningstring, ' [rankdir="', sep='')
runningstring <- paste(runningstring, ifelse(vertical==TRUE,'TB','LR'), sep='' )
runningstring <- paste(runningstring, '"]', sep='')
  for (lines in c(1:(currentvar-1))) {
  runningline <- paste(variables[lines,'SYMBOL'], '[shape="')
  runningline <- paste(runningline,variables[lines,'TYPE'], sep='' )
  runningline <- paste(runningline,'" label ="', sep='' )
  runningline <- paste(runningline,variables[lines,'TOKEN'], sep='' )
  runningline <- paste(runningline,
  '" style=filled fontcolor=', sep='')
  runningline <- paste(runningline, col.font)
  runningline <- paste(runningline,' color=' )
  runningline <- paste(runningline, col.draw)
  runningline <- paste(runningline,' fontname=')
  runningline <- paste(runningline, fontname)
  runningline <- paste(runningline,' fillcolor=')
  runningline <- paste(runningline,
  ifelse(variables[lines,'QUERY']== 0 ,col.conclusion,col.question))
  runningline <- paste(runningline,'];')
  runningstring <- paste(runningstring, runningline , sep='\n')
  }
  for (lines in c(1:(currentcon-1)))
  runningline <- paste (connectors[lines,'START'], '->')
  runningline <- paste (runningline, connectors[lines,'END'])
  runningline <- paste (runningline,'[label="')
  runningline <- paste (runningline,connectors[lines,'TOKEN'], sep='')
  runningline <- paste (runningline,'" fontname=', sep='')
  runningline <- paste (runningline, fontname)
  runningline <- paste (runningline,'];')
  runningstring <- paste(runningstring, runningline , sep='\n')
  }
runningstring <- paste(runningstring,'}')
cat(runningstring)
  sink(filename, split=TRUE)
cat(runningstring)
sink()
}

Appendix

The example decision tree as shown in the C50 package summary and the generated output by the C5.0.graphviz routine are shown below.

C50 summary output tree:

total_day_minutes > 264.4:
:...voice_mail_plan = yes:
:   :...international_plan = no: no (45/1)
:   :   international_plan = yes: yes (8/3)
:   voice_mail_plan = no:
:   :...total_eve_minutes > 187.7:
:       :...total_night_minutes > 126.9: yes (94/1)
:       :   total_night_minutes <= 126.9:
:       :   :...total_day_minutes <= 277: no (4)
:       :       total_day_minutes > 277: yes (3)
:       total_eve_minutes <= 187.7:
:       :...total_eve_charge <= 12.26: no (15/1)
:           total_eve_charge > 12.26:
:           :...total_day_minutes <= 277:
:               :...total_night_minutes <= 224.8: no (13)
:               :   total_night_minutes > 224.8: yes (5/1)
:               total_day_minutes > 277:
:               :...total_night_minutes > 151.9: yes (18)
:                   total_night_minutes <= 151.9:
:                   :...account_length <= 123: no (4)
:                       account_length > 123: yes (2)
total_day_minutes <= 264.4:
:...number_customer_service_calls > 3:
    :...total_day_minutes <= 160.2:
    :   :...total_eve_charge <= 19.83: yes (79/3)
    :   :   total_eve_charge > 19.83:
    :   :   :...total_day_minutes <= 120.5: yes (10)
    :   :       total_day_minutes > 120.5: no (13/3)
    :   total_day_minutes > 160.2:
    :   :...total_eve_charge > 12.05: no (130/24)
    :       total_eve_charge <= 12.05:
    :       :...total_eve_calls <= 125: yes (16/2)
    :           total_eve_calls > 125: no (3)
    number_customer_service_calls <= 3:
    :...international_plan = yes:
        :...total_intl_calls <= 2: yes (51)
        :   total_intl_calls > 2:
        :   :...total_intl_minutes <= 13.1: no (173/7)
        :       total_intl_minutes > 13.1: yes (43)
        international_plan = no:
        :...total_day_minutes <= 223.2: no (2221/60)
            total_day_minutes > 223.2:
            :...total_eve_charge <= 20.5: no (295/22)
                total_eve_charge > 20.5:
                :...voice_mail_plan = yes: no (20)
                    voice_mail_plan = no:
                    :...total_night_minutes > 174.2: yes (50/8)
                        total_night_minutes <= 174.2:
                        :...total_day_minutes <= 246.6: no (12)
                            total_day_minutes > 246.6:
                            :...total_day_charge <= 43.33: yes (4)
                                total_day_charge > 43.33: no (2)

Produced C5.0.graphviz dot description: 

digraph g {
graph [rankdir="TB"]
node1 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node2 [shape="diamond" label ="voice_mail_plan" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node3 [shape="diamond" label ="international_plan" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node4 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node5 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node6 [shape="diamond" label ="total_eve_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node7 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node8 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node9 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node10 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node11 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node12 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node13 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node14 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node15 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node16 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node17 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node18 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node19 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node20 [shape="diamond" label ="account_length" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node21 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node22 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node23 [shape="diamond" label ="number_customer_service_calls" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node24 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node25 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node26 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node27 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node28 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node29 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node30 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node31 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node32 [shape="diamond" label ="total_eve_calls" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node33 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node34 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node35 [shape="diamond" label ="international_plan" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node36 [shape="diamond" label ="total_intl_calls" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node37 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node38 [shape="diamond" label ="total_intl_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node39 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node40 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node41 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node42 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node43 [shape="diamond" label ="total_eve_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node44 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node45 [shape="diamond" label ="voice_mail_plan" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node46 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node47 [shape="diamond" label ="total_night_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node48 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node49 [shape="diamond" label ="total_day_minutes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node50 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node51 [shape="diamond" label ="total_day_charge" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= cyan ];
node52 [shape="box3d" label =" yes" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node53 [shape="box3d" label =" no" style=filled fontcolor= blue  color= black  fontname= Arial  fillcolor= lightpink ];
node1 -> node2 [label="> 264.4" fontname= Arial ];
node2 -> node3 [label="= yes" fontname= Arial ];
node3 -> node4 [label="= no" fontname= Arial ];
node3 -> node5 [label="= yes" fontname= Arial ];
node2 -> node6 [label="= no" fontname= Arial ];
node6 -> node7 [label="> 187.7" fontname= Arial ];
node7 -> node8 [label="> 126.9" fontname= Arial ];
node7 -> node9 [label="<= 126.9" fontname= Arial ];
node9 -> node10 [label="<= 277" fontname= Arial ];
node9 -> node11 [label="> 277" fontname= Arial ];
node6 -> node12 [label="<= 187.7" fontname= Arial ];
node12 -> node13 [label="<= 12.26" fontname= Arial ];
node12 -> node14 [label="> 12.26" fontname= Arial ];
node14 -> node15 [label="<= 277" fontname= Arial ];
node15 -> node16 [label="<= 224.8" fontname= Arial ];
node15 -> node17 [label="> 224.8" fontname= Arial ];
node14 -> node18 [label="> 277" fontname= Arial ];
node18 -> node19 [label="> 151.9" fontname= Arial ];
node18 -> node20 [label="<= 151.9" fontname= Arial ];
node20 -> node21 [label="<= 123" fontname= Arial ];
node20 -> node22 [label="> 123" fontname= Arial ];
node1 -> node23 [label="<= 264.4" fontname= Arial ];
node23 -> node24 [label="> 3" fontname= Arial ];
node24 -> node25 [label="<= 160.2" fontname= Arial ];
node25 -> node26 [label="<= 19.83" fontname= Arial ];
node25 -> node27 [label="> 19.83" fontname= Arial ];
node27 -> node28 [label="<= 120.5" fontname= Arial ];
node27 -> node29 [label="> 120.5" fontname= Arial ];
node24 -> node30 [label="> 160.2" fontname= Arial ];
node30 -> node31 [label="> 12.05" fontname= Arial ];
node30 -> node32 [label="<= 12.05" fontname= Arial ];
node32 -> node33 [label="<= 125" fontname= Arial ];
node32 -> node34 [label="> 125" fontname= Arial ];
node23 -> node35 [label="<= 3" fontname= Arial ];
node35 -> node36 [label="= yes" fontname= Arial ];
node36 -> node37 [label="<= 2" fontname= Arial ];
node36 -> node38 [label="> 2" fontname= Arial ];
node38 -> node39 [label="<= 13.1" fontname= Arial ];
node38 -> node40 [label="> 13.1" fontname= Arial ];
node35 -> node41 [label="= no" fontname= Arial ];
node41 -> node42 [label="<= 223.2" fontname= Arial ];
node41 -> node43 [label="> 223.2" fontname= Arial ];
node43 -> node44 [label="<= 20.5" fontname= Arial ];
node43 -> node45 [label="> 20.5" fontname= Arial ];
node45 -> node46 [label="= yes" fontname= Arial ];
node45 -> node47 [label="= no" fontname= Arial ];
node47 -> node48 [label="> 174.2" fontname= Arial ];
node47 -> node49 [label="<= 174.2" fontname= Arial ];
node49 -> node50 [label="<= 246.6" fontname= Arial ];
node49 -> node51 [label="> 246.6" fontname= Arial ];
node51 -> node52 [label="<= 43.33" fontname= Arial ];
node51 -> node53 [label="> 43.33" fontname= Arial ]; }

References

None.

22 comments:

  1. Hi,

    Thanks for great post above.
    Any chance you could let me know how hard it would be to add stats for 1/0-outcomes in the graph? (Such as in the pic here: http://exploringdatablog.blogspot.se/2013/04/classification-tree-models.html )

    NOT that I'm asking you to do it, only asking how heavy lifting you believe this would be to implement in your code above....

    Thanks again,
    Matti

    ReplyDelete
  2. Hello,

    Since the statistics are already contained in the decision tree (e.g. line 3 of the decision tree above reads: : :...international_plan = no: no (45/1)) it's rather straightforward to extend the program including this value as well.
    Think of extending the code line above that reads:
    variables[currentvar,'TOKEN'] <- unlist(stringpart)[1]

    Hope it helped,
    Thanos

    ReplyDelete
    Replies
    1. Thanks Thanos.
      I switched this line;
      variables[currentvar,'TOKEN'] <- paste(ifelse((prefix==FALSE),'','Class'), unlist(stringpart2)[2])

      To this;
      variables[currentvar,'TOKEN'] <- paste(ifelse((prefix==FALSE),'','Class'), unlist(stringpart2)[2], " ", unlist(stringpart2)[3], sep = "") ##Added From [2] to [3]

      Delete
  3. This comment has been removed by the author.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. Hi,
    I am getting following error:


    Read 342 records
    Hide Traceback

    Rerun with Debug
    Error in `[<-.data.frame`(`*tmp*`, currentcon, "START", value = c("node1", :
    replacement has 499 rows, data has 1
    4 stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
    "replacement has %d rows, data has %d"), N, n), domain = NA)
    3 `[<-.data.frame`(`*tmp*`, currentcon, "START", value = c("node1",
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
    2 `[<-`(`*tmp*`, currentcon, "START", value = c("node1", NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
    1 C5.0.graphviz(decision_tree, "c:\\output.txt", col.question = "cyan")

    ReplyDelete
    Replies
    1. I am having the same problem. any answer on that?

      Delete
    2. Please make sure you follow all the steps as described. F.ex., the input to the C5.0.graphviz command should be a complete model of C5.0 (e.g. not just the part of the C5.0 tree).

      Delete
    3. Hello, I get the same error, what could be the cause?

      Delete
  7. Hi,

    I am new to R and when I am running the code
    C5.0.graphviz(model1, 'C:\\mydotfile.txt')
    I am getting the error "could not find function "C5.0.graphviz"

    I installed graphviz 2.38 msi as well

    ReplyDelete
  8. Hi I am getting the below error...
    C5.0.graphviz(Model_C50, "dtreeout.txt", col.question ='cyan')
    Read 3001 records
    Error in `[<-.data.frame`(`*tmp*`, currentcon, "START", value = character(0)) :
    replacement has length zero
    Please help

    ReplyDelete
    Replies
    1. Hi,
      Try to increase the model size, in the following lines, eg.:

      variables <- data.frame(matrix(nrow=2000, ncol=4))

      connectors <- data.frame(matrix(nrow=2000, ncol=3))

      theStack <- data.frame(matrix(nrow=2000, ncol=1))

      Also, ensure you enter the complete model as an argument, not only the decision tree part.

      Delete
  9. I have installed graphviz but still getting this error:
    "could not find function "C5.0.graphviz"
    is there any configuration I need to do after instalation?

    ReplyDelete
    Replies
    1. Hi,
      You have to follow these steps:
      1. Open R, load the C5.0.graphviz code (or type it within R environment).
      2. Run your C5.0 model.
      3. Execute the C5.0.graphviz with proper parameters.
      4. Find your generated output in your OS directory.
      5. Execute the graphviz command (as stated above).

      Note that latest C5.0 versions in R offer a plotting function. That function although not as versatile as a generic graphviz command, offers a complete information for the C5.0 tree in the plot.

      Delete
  10. I get this error:-

    Error in if (start + lsub1 > lstr) return(NA) else { :
    missing value where TRUE/FALSE needed

    ReplyDelete
  11. This post is really great. If you use Mac, it is enough to use the R system command: system("dot -T png -O ~/directoryPath/c5.txt")

    ReplyDelete
  12. This comment has been removed by a blog administrator.

    ReplyDelete
  13. This comment has been removed by a blog administrator.

    ReplyDelete
  14. This comment has been removed by a blog administrator.

    ReplyDelete
  15. This comment has been removed by a blog administrator.

    ReplyDelete
  16. This comment has been removed by a blog administrator.

    ReplyDelete