\documentclass[nojss]{jss}

%\VignetteIndexEntry{partykit: A Toolkit for Recursive Partytioning}
%\VignetteDepends{partykit}
%\VignetteKeywords{recursive partitioning, regression trees, classification trees, decision trees}
%\VignettePackage{partykit}

%% packages
\usepackage{amstext}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{thumbpdf}
\usepackage{rotating}
%% need no \usepackage{Sweave}

%% additional commands
\newcommand{\squote}[1]{`{#1}'}
\newcommand{\dquote}[1]{``{#1}''}
\newcommand{\fct}[1]{\texttt{#1()}}
\newcommand{\class}[1]{\squote{\texttt{#1}}}

%% for internal use
\newcommand{\fixme}[1]{\emph{\marginpar{FIXME} (#1)}}
\newcommand{\readme}[1]{\emph{\marginpar{README} (#1)}}

\hyphenation{Qua-dra-tic}

\title{\pkg{partykit}: A Toolkit for Recursive Partytioning}
\Plaintitle{partykit: A Toolkit for Recursive Partytioning}

\author{Achim Zeileis\\Universit\"at Innsbruck
   \And Torsten Hothorn\\Universit\"at Z\"urich}
\Plainauthor{Achim Zeileis, Torsten Hothorn}

\Abstract{
  The \pkg{partykit} package provides a flexible toolkit with infrastructure for
  learning, representing, summarizing, and visualizing a wide range of tree-structured
  regression and classification models. The functionality encompasses: (a)~Basic
  infrastructure for \emph{representing} trees (inferred by any algorithm) so that
  unified \code{print}/\code{plot}/\code{predict} methods are available.
  (b)~Dedicated methods for trees with \emph{constant fits} in the leaves (or terminal nodes)
  along with suitable coercion functions to create such tree models (e.g., by
  \pkg{rpart}, \pkg{RWeka}, PMML). (c)~A reimplementation of \emph{conditional inference
  trees} (\code{ctree}, originally provided in the \pkg{party} package). (d)~An extended
  reimplementation of \emph{model-based recursive partitioning} (\code{mob}, also originally
  in \pkg{party}) along with dedicated methods for trees with parametric models in the
  leaves. This vignette gives a brief overview of the package and discusses in detail
  the generic infrastructure for representing trees (a). Items~(b)--(d) are discussed
  in the remaining vignettes in the package.
}
\Keywords{recursive partitioning, regression trees, classification trees, decision trees}

\Address{
  Achim Zeileis \\
  Department of Statistics \\
  Faculty of Economics and Statistics \\
  Universit\"at Innsbruck \\
  Universit\"atsstr.~15 \\
  6020 Innsbruck, Austria \\
  E-mail: \email{Achim.Zeileis@R-project.org} \\
  URL: \url{http://eeecon.uibk.ac.at/~zeileis/} \\

  Torsten Hothorn\\
  Institut f\"ur Epidemiologie, Biostatistik und Pr\"avention \\
  Universit\"at Z\"urich \\
  Hirschengraben 84\\
  CH-8001 Z\"urich, Switzerland \\
  E-mail: \email{Torsten.Hothorn@R-project.org}\\
  URL: \url{http://user.math.uzh.ch/hothorn/}
}


\begin{document}


\SweaveOpts{eps=FALSE, keep.source=TRUE, eval = TRUE}

<<setup, echo = FALSE, results = hide>>=
suppressWarnings(RNGversion("3.5.2"))
options(width = 70)
library("partykit")
set.seed(290875)
@

\section{Overview} \label{sec:overview}

In the more than fifty years since \cite{Morgan+Sonquist:1963}
published their seminal paper on ``automatic interaction detection'',
a wide range of methods has been suggested that is usually
termed ``recursive partitioning'' or ``decision trees'' or
``tree(-structured) models'' etc. Particularly influential were the
algorithms CART \citep[classification and regression trees,][]{Breiman+Friedman+Olshen:1984},
C4.5 \citep{Quinlan:1993}, QUEST/GUIDE \citep{Loh+Shih:1997,Loh:2002},
and CTree \citep{Hothorn+Hornik+Zeileis:2006} among many others
\citep[see][for a recent overview]{Loh:2014}. Reflecting the heterogeneity
of conceptual algorithms, a wide range of computational implementations
in various software systems emerged: Typically the original authors of an
algorithm also provide accompanying software but many software systems,
e.g., including \pkg{Weka} \citep{Witten+Frank:2005} or \proglang{R} \citep{R},
also provide collections of various types of trees.
Within \proglang{R} the list of prominent packages includes
\pkg{rpart} \citep[implementing the CART algorithm]{rpart},
\pkg{mvpart} \citep[for multivariate CART]{mvpart},
\pkg{RWeka} \citep[containing interfaces to J4.8, M5', LMT from \pkg{Weka}]{RWeka}, and
\pkg{party} \citep[implementing CTree and MOB]{party} among
many others. See the CRAN task view ``Machine Learning''
\citep{ctv} for an overview.

All of these algorithms and software implementations have to deal with
very similar challenges. However, due to the fragmentation of the
communities in which the corresponding research is published -- ranging
from statistics over machine learning to various applied fields -- many
discussions of the algorithms do not reuse established theoretical results
and terminology. Similarly, there is no common ``language'' for the
software implementations and different solutions are provided by
different packages (even within \proglang{R}) with relatively little
reuse of code.

The \pkg{partykit} tries to address the latter point and improve the
computational situation by providing a common unified infrastructure
for recursive partytioning in the \proglang{R} system for statistical
computing. In particular, \pkg{partykit}
provides tools for representing fitted trees along with printing,
plotting, and computing predictions. The design principles are:
\begin{itemize}
  \item One `agnostic' base class (\class{party}) which can encompass an extremely wide
        range of different types of trees.
  \item Subclasses for important types of trees, e.g., trees with constant
        fits (\class{constparty}) or with parametric models (\class{modelparty})
	in each terminal node (or leaf).
  \item Nodes are recursive objects, i.e., a node can contain child nodes.
  \item Keep the (learning) data out of the recursive node and split structure.
  \item Basic printing, plotting, and predicting for raw node structure.
  \item Customization via suitable panel or panel-generating functions.
  \item Coercion from existing object classes in \proglang{R} (\code{rpart}, \code{J48}, etc.) to the new class.
  \item Usage of simple/fast \proglang{S}3 classes and methods.
\end{itemize}
In addition to all of this generic infrastructure, two specific tree
algorithms are implemented in \pkg{partykit} as well: \fct{ctree}
for conditional inference trees \citep{Hothorn+Hornik+Zeileis:2006}
and \fct{mob} for model-based recursive partitioning \citep{Zeileis+Hothorn+Hornik:2008}.

This vignette (\code{"partykit"}) introduces the basic \class{party} class
and associated infrastructure while three further vignettes discuss the tools
built on top of it: \code{"constparty"} covers the eponymous class for
constant-fit trees along with suitable coercion functions, and
\code{"ctree"} and \code{"mob"} discuss the new \fct{ctree} and
\fct{mob} implementations, respectively. Each of the vignettes can be viewed
within \proglang{R} via \code{vignette(}\emph{``name''}\code{, package = "partykit")}.

Normal users reading this vignette will typically be interested only
in the motivating example in Section~\ref{sec:intro} while the remaining
sections are intended for programmers who want to build infrastructure on
top of \pkg{partykit}.


\section{Motivating example} \label{sec:intro}

\subsection{Data}

To illustrate how \pkg{partykit} can be used to represent trees,
we employ a simple artificial data set taken from \cite{Witten+Frank:2005}.
It concerns the conditions suitable for playing some unspecified game:
%
<<weather-data>>=
data("WeatherPlay", package = "partykit")
WeatherPlay
@
%
To represent the \code{play} decision based on the corresponding
weather condition variables one could use the tree displayed in
Figure~\ref{fig:weather-plot}. For now, it is ignored how this tree
was inferred and it is simply assumed to be given.

\setkeys{Gin}{width=0.8\textwidth}
\begin{figure}[t!]
\centering
<<weather-plot0, echo=FALSE, fig=TRUE, height=5, width=7.5>>=
py <- party(
  partynode(1L,
    split = partysplit(1L, index = 1:3),
    kids = list(
      partynode(2L,
        split = partysplit(3L, breaks = 75),
        kids = list(
          partynode(3L, info = "yes"),
          partynode(4L, info = "no"))),
      partynode(5L, info = "yes"),
      partynode(6L,
        split = partysplit(4L, index = 1:2),
        kids = list(
          partynode(7L, info = "yes"),
          partynode(8L, info = "no"))))),
  WeatherPlay)
plot(py)
@
\caption{\label{fig:weather-plot} Decision tree for \code{play} decision based on
  weather conditions in \code{WeatherPlay} data.}
\end{figure}
\setkeys{Gin}{width=\textwidth}

To represent this tree (or recursive partition) in \pkg{partykit},
two basic building blocks are used: splits of class \class{partysplit} and
nodes of class \class{partynode}. The resulting recursive partition can then
be associated with a data set in an object of class \class{party}.

\subsection{Splits}

First, we employ the \fct{partysplit} function to create the three
splits in the ``play tree'' from Figure~\ref{fig:weather-plot}. The
function takes the following arguments
\begin{Code}
  partysplit(varid, breaks = NULL, index = NULL, ..., info = NULL)
\end{Code}
where \code{varid} is an integer id (column number) of the variable used for splitting,
e.g., \code{1L} for \code{outlook}, \code{3L} for \code{humidity},
\code{4L} for \code{windy} etc. Then, \code{breaks} and \code{index}
determine which observations are sent to which of the branches, e.g.,
\code{breaks = 75} for the humidity split. Apart from further arguments
not shown above (and just comprised under `\code{...}'), some arbitrary
information can be associated with a \class{partysplit} object by passing
it to the \code{info} argument. The three splits from Figure~\ref{fig:weather-plot}
can then be created via
%
<<weather-splits>>=
sp_o <- partysplit(1L, index = 1:3)
sp_h <- partysplit(3L, breaks = 75)
sp_w <- partysplit(4L, index = 1:2)
@
%
For the numeric \code{humidity} variable the \code{breaks} are set while
for the factor variables \code{outlook} and \code{windy} the information
is supplied which of the levels should be associated with which of the
branches of the tree.

\subsection{Nodes}

Second, we use these splits in the creation of the whole decision tree.
In \pkg{partykit} a tree is represented by a \class{partynode} object
which is recursive in that it may have ``kids'' that are again \class{partynode}
objects. These can be created with the function
\begin{Code}
  partynode(id, split = NULL, kids = NULL, ..., info = NULL)
\end{Code}
where \code{id} is an integer identifier of the node number, \code{split}
is a \class{partysplit} object, and \code{kids} is a list of \class{partynode}
objects. Again, there are further arguments not shown (\code{...}) and
arbitrary information can be supplied in \code{info}. The whole tree
from Figure~\ref{fig:weather-plot} can then be created via
%
<<weather-nodes>>=
pn <- partynode(1L, split = sp_o, kids = list(
  partynode(2L, split = sp_h, kids = list(
    partynode(3L, info = "yes"),
    partynode(4L, info = "no"))),
  partynode(5L, info = "yes"),
  partynode(6L, split = sp_w, kids = list(
    partynode(7L, info = "yes"),
    partynode(8L, info = "no")))))
@
%
where the previously created \class{partysplit} objects are used as splits
and the nodes are simply numbered (depth first) from~1 to~8. For the terminal nodes of the
tree there are no \code{kids} and the corresponding \code{play} decision
is stored in the \code{info} argument. Printing the \class{partynode} object
reflects the recursive structure stored.
%
<<weather-nodes-print>>=
pn
@
%
However, the displayed information is still rather raw as it has not yet
been associated with the \code{WeatherPlay} data set.

\subsection{Trees (or recursive partitions)}

Therefore, in a third step the recursive tree structure stored
in \code{pn} is coupled with the corresponding data in a \class{party}
object.
%
<<weather-party>>=
py <- party(pn, WeatherPlay)
print(py)
@
%
Now, Figure~\ref{fig:weather-plot} can easily be created by
%
<<weather-plot, eval=FALSE>>=
plot(py)
@
%
In addition to \fct{print} and \fct{plot}, the \fct{predict} method can now
be applied, yielding the predicted terminal node IDs.
%
<<weather-predict>>=
predict(py, head(WeatherPlay))
@
%
In addition to the \class{partynode} and the \class{data.frame},
the function \fct{party} takes several further arguments 
\begin{Code}
  party(node, data, fitted = NULL, terms = NULL, ..., info = NULL)
\end{Code}
i.e., \code{fitted} values, a \code{terms} object, arbitrary additional
\code{info}, and again some further arguments comprised in \code{...}.


\subsection{Methods and other utilities}

The main idea about the \class{party} class is that tedious tasks
such as \fct{print}, \fct{plot}, \fct{predict} do not have to be
reimplemented for every new kind of decision tree but can simply be
reused. However, in addition to these three basic tasks (as already illustrated
above) developers of tree model software also need further basic utiltities
for working with trees: e.g., functions for querying or subsetting
the tree and for customizing printed/plotted output. Below, various
utilities provided by the \pkg{partykit} package are introduced.

For querying the dimensions of the tree, three basic functions are available:
\fct{length} gives the number of kid nodes of the root node,
\fct{depth} the depth of the tree and \fct{width}
the number of terminal nodes.
%
<<weather-methods-dim>>=
length(py)
width(py)
depth(py)
@
%
As decision trees can grow to be rather large, it is often useful
to inspect only subtrees. These can be easily extracted using the standard
\code{[} or \code{[[} operators:
%
<<weather-methods-subset>>=
py[6]
@
%
The resulting object is again a full valid \class{party} tree and can
hence be printed (as above) or plotted (via \code{plot(py[6])}, see
the left panel of Figure~\ref{fig:plot-customization}). Instead of using
the integer node IDs for subsetting, node labels can also be used. By
default thise are just (character versions of) the node IDs but other
names can be easily assigned:
%
<<weather-methods-names>>=
py2 <- py
names(py2)
names(py2) <- LETTERS[1:8]
py2
@
%
The function \fct{nodeids} queries the integer node IDs belonging to
a \class{party} tree. By default all IDs are returned but optionally only
the terminal IDs (of the leaves) can be extracted.
%
<<weather-methods-nodeids>>=
nodeids(py)
nodeids(py, terminal = TRUE)
@
%
Often functions need to be applied to certain nodes of a tree, e.g., for
extracting information. This is accomodated by a new generic function \fct{nodeapply}
that follows the style of other \proglang{R} functions from the \code{apply}
family and has methods for \class{party} and \class{partynode} objects. Furthermore,
it needs a set of node IDs (often computed via \fct{nodeids}) and a function
\code{FUN} that is applied to each of the requested \class{partynode} objects,
typically for extracting/formatting the \code{info} of the node.
%
<<weather-methods-nodeapply>>=
nodeapply(py, ids = c(1, 7), FUN = function(n) n$info)
nodeapply(py, ids = nodeids(py, terminal = TRUE),
  FUN = function(n) paste("Play decision:", n$info))
@
%
Similar to the functions applied in a \fct{nodeapply}, the \fct{print}, \fct{predict}, and
\fct{plot} methods can be customized through panel function that format
certain parts of the tree (such as header, footer, node, etc.). Hence,
the same kind of panel function employed above can also be used for predictions:
<<weather-methods-predict>>=
predict(py, FUN = function(n) paste("Play decision:", n$info))
@
As a variation of this approach, an extended formatting with multiple
lines can be easily accomodated by supplying a character vector in every node:
%
<<weather-methods-print>>=
print(py, terminal_panel = function(n)
  c(", then the play decision is:", toupper(n$info)))
@
%
The same type of approach can also be used in the default \fct{plot} method
(with the main difference that the panel function operates on the \code{info}
directly rather than on the \class{partynode}).
%
<<weather-methods-plot, eval=FALSE>>=
plot(py, tp_args = list(FUN = function(i) 
  c("Play decision:", toupper(i))))
@
%
See the right panel of Figure~\ref{fig:plot-customization} for the resulting
graphic. Many more elaborate panel functions are provided in \pkg{partykit}, especially
for not only showing text in the visualizations but also statistical graphics.
Some of these are briefly illustrated in this and the other package vignettes.
Programmers that want to write their own panel functions are advised to
inspect the corresponding \proglang{R} source code to see how flexible (but sometimes
also complicated) these panel functions are.


\setkeys{Gin}{width=0.48\textwidth}
\begin{figure}[t!]
\centering
<<weather-methods-plot1, echo=FALSE, fig=TRUE, height=5, width=7.5>>=
plot(py[6])
@
<<weather-methods-plot2, echo=FALSE, fig=TRUE, height=5, width=7.5>>=
<<weather-methods-plot>>
@
\caption{\label{fig:plot-customization} Visualization of subtree (left) and tree with custom
text in terminal nodes (right).}
\end{figure}
\setkeys{Gin}{width=\textwidth}

Finally, an important utility function is \fct{nodeprune} which allows to
prune \class{party} trees. It takes a vector of node IDs and prunes
all of their kids, i.e., making all the indicated node IDs terminal nodes.
<<weather-prune>>=
nodeprune(py, 2)
nodeprune(py, c(2, 6))
@
Note that for the pruned versions of this particular \class{party} tree,
the new terminal nodes are displayed with a \code{*} rather than the
play decision. This is because we did not store any play decisions in the
\code{info} of the inner nodes of \code{py}. We could have of course done so initially,
or could do so now, or we might want to do so automatically. For the latter,
we would have to know how predictions should be obtained from the data and
this is briefly discussed at the end of this vignette and in more
detail in \code{vignette("constparty", package = "partykit")}.

\section{Technical details}

\subsection{Design principles}

To facilitate reading of the subsequent sections, two design principles
employed in the creation of \pkg{partykit} are briefly explained.
%
\begin{enumerate}

\item Many helper utilities are encapsulated in functions that follow
a simple naming convention. To extract/compute some information \emph{foo}
from splits, nodes, or trees, \pkg{partykit} provides
\emph{foo}\code{_split}, \emph{foo}\code{_node}, \emph{foo}\code{_party}
functions (that are applicable to \class{partysplit}, \class{partynode},
and \class{party} objects, repectively).

An example for the information \emph{foo} might be \code{kidids} or \code{info}. Hence, in the printing example above
using \code{info_node(n)} rather than \code{n$info} for a node \code{n} would have been the
preferred syntax; at least when programming new functionality on top of \pkg{partykit}.

\item As already illustrated above, printing and plotting relies on
\emph{panel functions} that visualize and/or format certain aspects of the
resulting display, e.g., that of inner nodes, terminal nodes, headers, footers, etc. 
Furthermore, arguments like \code{terminal_panel} can also take
\emph{panel-generating functions}, i.e., functions that produce a panel
function when applied to the \class{party} object.

\end{enumerate}


\subsection{Splits} \label{sec:splits}

\subsubsection{Overview}

A split is basically a function that maps data --
or more specifically a partitioning variable -- to daugther nodes. 
Objects of class \class{partysplit} are designed to represent 
such functions and are set up by the \fct{partysplit} 
constructor. For example, a binary split in the numeric
partitioning variable \code{humidity} (the 3rd variable in \code{WeatherPlay})
at the breakpoint \code{75} can be created (as above) by
%
<<partysplit-1, echo = TRUE>>=
sp_h <- partysplit(3L, breaks = 75)
class(sp_h)
@
%
The internal structure of class \class{partysplit} contains
information about the partitioning variable, the splitpoints
(or cutpoints or breakpoints), the handling of splitpoints, the treatment of observations with
missing values and the kid nodes to send observations to:
%
<<partysplit-2, echo = TRUE>>=
unclass(sp_h)
@
%
Here, the splitting rule is \code{humidity} $\le 75$:
%
<<partysplit-3, echo = TRUE>>=
character_split(sp_h, data = WeatherPlay)
@
%
This representation of splits is completely abstract and, most importantly,
independent of any data. Now, data comes into play when we actually
want to perform splits:
%
<<partysplit-4, echo = TRUE>>= 
kidids_split(sp_h, data = WeatherPlay)
@
%
For each observation in \code{WeatherPlay} the split is performed and the 
number of the kid node to send this observation to
is returned. Of course, this is a very complicated way of saying
%
<<partysplit-5, echo = TRUE>>=
as.numeric(!(WeatherPlay$humidity <= 75)) + 1
@

\subsubsection{Mathematical notation}

To explain the splitting strategy more formally, we employ some mathematical notation.
\pkg{partykit} considers a split to represent a function $f$ mapping
an element $x = (x_1, \dots, x_p)$ of a $p$-dimensional 
sample space $\mathcal{X}$ into a set of $k$ daugther nodes $\mathcal{D} = \{d_1, \dots, d_k\}$.
This mapping is defined as a composition $f = h \circ g$ of two functions $g: \mathcal{X} \rightarrow \mathcal{I}$
and $h: \mathcal{I} \rightarrow \mathcal{D}$ with index set $\mathcal{I} = \{1, \dots, l\}, l \ge k$.

Let $\mu = (-\infty, \mu_1, \dots, \mu_{l - 1}, \infty)$ denote the split points
($(\mu_1, \dots, \mu_{l - 1})$ = \code{breaks}). We are interested to split according to the 
information contained in the $i$-th element of $x$ ($i$ = \code{varid}).
For numeric $x_i$, the split points are also numeric. If $x_i$ is a factor at
levels $1, \dots, K$, the default split points are $\mu = (-\infty, 1, \dots, K - 1, \infty)$.

The function $g$ essentially determines, which of the intervals (defined by $\mu$) the value
$x_i$ is contained in ($I$ denotes the indicator function here):
\begin{eqnarray*}
x \mapsto g(x) = \sum_{j = 1}^l j I_{\mathcal{A}(j)}(x_i)
\end{eqnarray*}
where $\mathcal{A}(j) = (\mu_{j - 1}, \mu_j]$ for \code{right = TRUE} except 
$\mathcal{A}(l) = (\mu_{l - 1}, \infty)$.
If \code{right = FALSE}, then $\mathcal{A}(j) = [\mu_{j - 1}, \mu_j)$ except 
$\mathcal{A}(1) = (-\infty, \mu_1)$. Note that for a categorical variable $x_i$ and default
split points, $g$ is simply the identity.

Now, $h$ maps from the index set $\mathcal{I}$ into the set of daugther nodes:
\begin{eqnarray*}
f(x) = h(g(x)) = d_{\sigma_{g(x)}}
\end{eqnarray*}
where $\sigma = (\sigma_1, \dots, \sigma_l) \in \{1, \dots, k\}^l$ (\code{index}). 
By default, $\sigma = (1, \dots, l)$ and $k = l$.

If $x_i$ is missing, then $f(x)$ is randomly drawn with $\mathbb{P}(f(x) = d_j) = \pi_j, j = 1, \dots, k$
for a discrete probability distribution $\pi = (\pi_1, \dots, \pi_k)$ over the $k$ daugther nodes (\code{prob}).

In the simplest case of a binary split in a numeric variable $x_i$, there is only
one split point $\mu_1$ and, with $\sigma = (1, 2)$, observations with $x_i \le \mu_1$ are sent
to daugther node $d_1$ and observations with $x_i > \mu_1$ to $d_2$. However,
this representation of splits is general enough to deal with more complicated
set-ups like surrogate splits, where typically the index needs modification, for example
$\sigma = (2, 1)$, categorical splits, i.e., there is one data structure for both
ordered and unordered splits, multiway splits, and functional splits. The latter
can be implemented by defining a new artificial splitting variable $x_{p + 1}$ by
means of a potentially very complex function of $x$ later used for splitting.

\subsubsection{Further examples}

Consider a split in a categorical
variable at three levels where the first two levels go to the left
daugther node and the third one to the right daugther node:
%
<<partysplit-6, echo = TRUE>>=
sp_o2 <- partysplit(1L, index = c(1L, 1L, 2L))
character_split(sp_o2, data = WeatherPlay)
table(kidids_split(sp_o2, data = WeatherPlay), WeatherPlay$outlook)
@
%
The internal structure of this object contains the \code{index} slot
that maps levels to kid nodes. 
%
<<partysplit-6, echo = TRUE>>=
unclass(sp_o2)
@
%
This mapping is also useful with splits in ordered variables or when 
representing multiway splits:
%
<<partysplit-7, echo = TRUE>>=
sp_o <- partysplit(1L, index = 1L:3L)
character_split(sp_o, data = WeatherPlay)
@
%
For a split in a numeric variable, the mapping to daugther nodes can
also be changed by modifying \code{index}:
%
<<partysplit-8, echo = TRUE>>=
sp_t <- partysplit(2L, breaks = c(69.5, 78.8), index = c(1L, 2L, 1L))
character_split(sp_t, data = WeatherPlay)
table(kidids_split(sp_t, data = WeatherPlay),
  cut(WeatherPlay$temperature, breaks = c(-Inf, 69.5, 78.8, Inf)))
@

\subsubsection{Further comments}

The additional argument \code{prop} can be used to specify a 
discrete probability distribution over the daugther nodes that
is used to map observations with missing values to daugther nodes.
Furthermore, the \code{info} argument and slot can take arbitrary objects
to be stored with the split (for example split statistics). Currently,
no specific structure of the \code{info} is used.

Programmers that employ this functionality in their own functions/packages
should access the elements of a \class{partysplit} object by the
corresponding accessor function (and not just the \code{$} operator as
the internal structure might be changed/extended in future release).


\subsection{Nodes} \label{sec:nodes}

\subsubsection{Overview}

Inner and terminal nodes are represented by objects of class \class{partynode}.
Each node has a unique identifier \code{id}. A node consisting only
of such an identifier (and possibly additional information  
in \code{info}) is a terminal node:
%
<<partynode-1, echo = TRUE>>=
n1 <- partynode(id = 1L)
is.terminal(n1)
print(n1)
@
%
Inner nodes have to have a primary split \code{split} and at least two
daugther nodes. The daugther nodes are objects of class \class{partynode}
itself and thus represent the recursive nature of this data structure.
The daugther nodes are pooled in a list \code{kids}. 

In addition, a list of \class{partysplit} objects offering 
surrogate splits can be supplied in argument \code{surrogates}. These
are used in case the variable needed for the primary split has missing
values in a particular data set.

The IDs in a \class{partynode} should be numbered ``depth first''
(sometimes also called ``infix'' or ``pre-order traversal''). This simply means
that the root node has identifier 1; the first kid node has identifier 2,
whose kid (if present) has identifier 3 and so on. If other IDs are desired,
then one can simply set \fct{names} (see above) for the tree; however, internally
the depth-first numbering needs to be used. Note that the \fct{partynode} constructor
also allows to create \class{partynode} objects with other ID schemes as this is
necessary for growing the tree. If one wants to assure the a given \class{partynode}
object has the correct IDs, one can simply apply \fct{as.partynode} once more to
assure the right order of IDs.

Finally, let us emphasize that \class{partynode} objects are not directly connected to the
actual data (only indirectly through the associated \class{partysplit}
objects).


\subsubsection{Examples}

Based on the binary split \code{sp_h} defined in the previous section, 
we set up an inner node with two terminal daugther nodes and
print this stump (the data is needed because neither split nor nodes
contain information about variable names or levels):
%
<<partynode-2, echo = TRUE>>=
n1 <- partynode(id = 1L, split = sp_o, kids = lapply(2L:4L, partynode))
print(n1, data = WeatherPlay)
@
%
Now that we have defined this simple tree, we want to assign
observations to terminal nodes:
%
<<partynode-3, echo = TRUE>>=
fitted_node(n1, data = WeatherPlay)
@
%
Here, the \code{id}s of the terminal node each observations falls into
are returned. Alternatively, we could compute the position of these
daugther nodes in the list \code{kids}:
%
<<partynode-4, echo = TRUE>>=
kidids_node(n1, data = WeatherPlay)
@
%
Furthermore, the \code{info} argument and slot takes arbitrary objects
to be stored with the node (predictions, for example, but we will handle
this issue later). The slots can be extracted by means of the corresponding
accessor functions.

\subsubsection{Methods}

A number of methods is defined for \class{partynode} objects:
\fct{is.partynode} checks if the argument is a valid \class{partynode}
object. \fct{is.terminal} is \code{TRUE} for terminal nodes
and \code{FALSE} for inner nodes. The subset method \code{[}
returns the \class{partynode} object corresponding to the \code{i}-th
kid.

The \fct{as.partynode} and \fct{as.list} methods can be used
to convert flat list structures into recursive \class{partynode}
objects and vice versa. As pointed out above, \fct{as.partynode} applied to
\class{partynode} objects also renumbers the recursive nodes
starting with root node identifier \code{from}.

Furthermore, many of the methods defined for the class \class{party} illustrated
above also work for plain \class{partynode} objects. For example,
\fct{length} gives the number of kid nodes of the root node,
\fct{depth} the depth of the tree and \fct{width}
the number of terminal nodes.

 
\subsection{Trees} \label{sec:trees}

Although tree structures can be represented by \class{partynode} objects,
a tree is more than a number of nodes and splits. More information about
(parts of the) corresponding data is necessary for high-level computations on trees.

\subsubsection{Trees and data}

First, the raw node/split structure needs to be associated with a corresponding
data set.
%
<<party-1a, echo = TRUE>>=
t1 <- party(n1, data = WeatherPlay)
t1
@
%
Note that the \code{data} may have zero rows (i.e., only contain variable
names/classes but not the actual data) and all methods that do not require
the presence of any learning data still work fine:
%
<<party-1b, echo = TRUE>>=
party(n1, data = WeatherPlay[0, ])
@

\subsubsection{Response variables and regression relationships}

Second, for decision trees (or regression and classification trees)
more information is required: namely, the response variable and its fitted
values. Hence, a \class{data.frame} can be supplied in \code{fitted}
that has at least one variable \code{(fitted)} containing the terminal
node numbers of data used for fitting the tree. For representing the
dependence of the response on the partitioning variables, a \code{terms}
object can be provided that is leveraged for appropriately preprocessing
new data in predictions. Finally, any additional (currently unstructured)
information can be stored in \code{info} again. 
%
<<party-2, echo = TRUE>>=
t2 <- party(n1, 
  data = WeatherPlay,
  fitted = data.frame(
    "(fitted)" = fitted_node(n1, data = WeatherPlay),
    "(response)" = WeatherPlay$play,
    check.names = FALSE),
  terms = terms(play ~ ., data = WeatherPlay),
)
@
%
The information that is now contained in the tree \code{t2} is sufficient
for all operations that should typically be performed on constant-fit
trees. For this type of trees there is also a dedicated class 
\class{constparty} that provides some further convenience methods, especially
for plotting and predicting. If a suitable \class{party} object like
\code{t2} is already available, it just needs to be coerced:
%
<<party-3, echo=TRUE>>=
t2 <- as.constparty(t2)
t2
@
%
\setkeys{Gin}{width=0.6\textwidth}
\begin{figure}[t!]
\centering
<<constparty-plot, echo=FALSE, fig=TRUE, height=5, width=6>>=
plot(t2, tnex = 1.5)
@
\caption{\label{fig:constparty-plot} Constant-fit tree for \code{play} decision based on
  weather conditions in \code{WeatherPlay} data.}
\end{figure}
\setkeys{Gin}{width=\textwidth}
%
As pointed out above, \class{constparty} objects have enhanced \fct{plot} and \fct{predict}
methods. For example, \code{plot(t2)} now produces stacked bar plots in the leaves
(see Figure~\ref{fig:constparty-plot}) as \code{t2} is a classification tree
For regression and survival trees, boxplots and Kaplan-Meier curves are employed
automatically, respectively.

As there is information about the response variable, the \fct{predict} method can now
produce more than just the predicted node IDs. The default is to predict the \code{"response"}, i.e.,
a factor for a classification tree. In this case, class probabilities (\code{"prob"})
are also available in addition to the majority votings.
%
<<party-4, echo=TRUE>>=
nd <- data.frame(outlook = factor(c("overcast", "sunny"),
  levels = levels(WeatherPlay$outlook)))
predict(t2, newdata = nd, type = "response")
predict(t2, newdata = nd, type = "prob")
predict(t2, newdata = nd, type = "node")
@
More details on how \class{constparty} objects and their methods work
can be found in the corresponding \code{vignette("constparty", package = "partykit")}.

\section{Summary}

This vignette (\code{"partykit"}) introduces the package \pkg{partykit} that
provides a toolkit for computing with recursive partytions, especially
decision/regression/classification trees. In this vignette, the basic \class{party} class
and associated infrastructure are discussed: splits, nodes, and trees with functions
for printing, plotting, and predicting. Further vignettes in the package
discuss in more detail the tools built on top of it.

\bibliography{party}

\end{document}