PRIM for multivariate data
prim.box.RdPRIM for multivariate data.
Usage
prim.box(x, y, box.init=NULL, peel.alpha=0.05, paste.alpha=0.01,
mass.min=0.05, threshold, pasting=TRUE, verbose=FALSE,
threshold.type=0, y.fun=mean)
prim.hdr(prim, threshold, threshold.type, y.fun=mean)
prim.combine(prim1, prim2, y.fun=mean)Arguments
- x
matrix of data values
- y
vector of response values
- y.fun
function applied to response y. Default is mean.
- box.init
initial covering box
- peel.alpha
peeling quantile tuning parameter
- paste.alpha
pasting quantile tuning parameter
- mass.min
minimum mass tuning parameter
- threshold
threshold tuning parameter(s)
- threshold.type
threshold direction indicator: 1 = ">= threshold", -1 = "<= threshold", 0 = ">= threshold[1] & <= threshold[2]"
- pasting
flag for pasting
- verbose
flag for printing output during execution
- prim,prim1,prim2
objects of type
prim
Details
The data are \((\bold{X}_1, Y_1), \dots, (\bold{X}_n, Y_n)\) where \(\bold{X}_i\) is d-dimensional and \(Y_i\) is a scalar response. PRIM finds modal (and/or anti-modal) regions in the conditional expectation \(m(\bold{x}) = \bold{E} (Y | \bold{x}).\)
In general, \(Y_i\) can be real-valued. See
vignette("prim").
Here, we focus on the special case for binary \(Y_i\). Let
\(Y_i\) = 1 when
\(\bold{X}_i \sim F^+\); and \(Y_i\) = -1 when
\(\bold{X}_i \sim
F^-\) where \(F^+\) and \(F^-\) are different
distribution functions. In this set-up, PRIM finds the
regions where \(F^+\) and \(F^-\) are most different.
The tuning parameters peel.alpha and paste.alpha control
the `patience' of PRIM. Smaller values involve more patience. Larger
values less patience. The peeling steps remove data from a box till
either the box mean is smaller than threshold or the box mass
is less than mass.min. Pasting is optional, and is used to correct any
possible over-peeling. The default values for peel.alpha,
paste.alpha and mass.min are taken from Friedman &
Fisher (1999).
The type of PRIM estimate is controlled threshold and
threshold.type:
threshold.type=1search for {\(m(\bold{x}) \geq\)
threshold}.threshold.type=-1search for {\(m(\bold{x}) \leq\)
threshold}.threshold.type=0search for both {\(m(\bold{x}) \geq\)
threshold[1]} and {\(m(\bold{x}) \leq\)threshold[2]}.
There are two ways of using PRIM. One is prim.box with
pre-specified threshold(s). This is appropriate when the threshold(s)
are known to produce good estimates.
On the other hand, if the user doesn't provide threshold values then
prim.box computes box sequences which cover the data
range. These can then be pruned at a later stage. prim.hdr
allows the user to specify many different threshold values in an
efficient manner, without having to recomputing the entire PRIM box
sequence. prim.combine can be used to join the regions computed
from prim.hdr. See the examples below.
Value
– prim.box produces a PRIM estimate, an object of
type prim, which is a list with 8 fields:
- x
list of data matrices
- y
list of response variable vectors
- y.mean
list of vectors of box mean for y
- box
list of matrices of box limits (first row = minima, second row = maxima)
- mass
vector of box masses (proportion of points inside a box)
- num.class
total number of PRIM boxes
- num.hdr.class
total number of PRIM boxes which form the HDR
- ind
threshold direction indicator: 1 = ">= threshold", -1 = "<=threshold"
The above lists have num.class fields, one for each box.
– prim.hdr takes a prim object and prunes it using
different threshold values. Returns another prim object. This
is much faster for experimenting with different threshold values than
calling prim.box each time.
– prim.combine combines two prim objects into a single
prim object. Usually used in conjunction with prim.hdr. See examples below.
Examples
data(quasiflow)
qf <- quasiflow[1:1000,1:2]
qf.label <- quasiflow[1:1000,4]
## using only one command
thr <- c(0.25, -0.3)
qf.prim1 <- prim.box(x=qf, y=qf.label, threshold=thr, threshold.type=0)
## alternative - requires more commands but allows more control
## in intermediate stages
qf.primp <- prim.box(x=qf, y=qf.label, threshold.type=1)
## default threshold too low, try higher one
qf.primp.hdr <- prim.hdr(prim=qf.primp, threshold=0.25, threshold.type=1)
qf.primn <- prim.box(x=qf, y=qf.label, threshold=-0.3, threshold.type=-1)
qf.prim2 <- prim.combine(qf.primp.hdr, qf.primn)
plot(qf.prim1, alpha=0.2) ## orange=x1>x2, blue x2<x1