Package 'CaseBasedReasoning' reference manual

Title:	Case Based Reasoning
Description:	Case-based reasoning is a problem-solving methodology that involves solving a new problem by referring to the solution of a similar problem in a large set of previously solved problems. The key aspect of Case Based Reasoning is to determine the problem that "most closely" matches the new problem at hand. This is achieved by defining a family of distance functions and using these distance functions as parameters for local averaging regression estimates of the final result. The optimal distance function is chosen based on a specific error measure used in regression estimation. This approach allows for efficient problem-solving by leveraging past experiences and adapting solutions from similar cases. The underlying concept is inspired by the work of Dippon J. et al. (2002) <doi:10.1016/S0167-9473(02)00058-0>.
Authors:	Simon Mueller [cre], PD Dr. Juergen Dippon [ctb]
Maintainer:	Simon Mueller <[email protected]>
License:	MIT + file LICENSE
Version:	0.3
Built:	2025-03-20 20:42:00 UTC
Source:	https://github.com/sipemu/case-based-reasoning

Converts a distance vector into an object of class `dist`

Description

Converts a distance vector into an object of class dist

Usage

asDistObject(x, n, method)
asDistObject(x, n, method)

Arguments

`x`	data vector
`n`	length of x
`method`	method description

Call a function by character strings using the namespace and custom parameters.

Description

Call a function by character strings using the namespace and custom parameters.

Usage

call_function(func_list)
call_function(func_list)

Arguments

func_list

A list with fields func, namespace, and args

Case Based Reasoning

Description

A R package for Case Based Reasoning using statistical/ML models.

Root class for common functionality of this package

Description

Root class for common functionality of this package

Public fields

model: the statistical model
data: training data
model_fit: trained object
formula: Object of class formula or character describing the model fit
terms: terms of the formula
endPoint: Target variable
distMat: A matrix with distances
orderMat: A matrix with the order indices for similar cases search

Methods

Method `new()`

Initialize object for searching similar cases

Usage

CBRBase$new(formula, data)

Arguments

formula: Object of class formula or character describing the model fit
data

Method `fit()`

Fit the Model

Usage

CBRBase$fit()

Arguments

x: Training data of class data.frame

Method `calc_distance_matrix()`

Calculates the distance matrix

Usage

CBRBase$calc_distance_matrix(query = NULL)

Arguments

query: Query data of class data.frame
x: Training data of class data.frame

Method `get_similar_cases()`

Extracts similar cases

Usage

CBRBase$get_similar_cases(query, k = 1, addDistance = T, merge = F)

Arguments

query: Query data of class data.frame
k: number of similar cases
addDistance: Add distance to result data.frame
merge: Add query data to matched cases data.frame

Method `clone()`

The objects of this class are cloneable with this method.

Usage

CBRBase$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Cox-Beta Model for Case-Based-Reasoning

Description

Cox-Beta Model for Case-Based-Reasoning

Details

Regression beta coefficients obtained from a CPH regression model fitted on the training data are used for building a weighted distance measure between train and test data. Afterwards, we will use these weights for calculating a (n x m)-distance matrix, where n is the number of observations in the training data, and m is the number of observations of the test data. The user can use this distance matrix for further cluster analysis or for extracting for each test observation k (= 1,...,l) similar cases from the train data. We use the rms-package for model fitting, variable selection, and checking model assumptions. If the user omits the test data, this functions returns a n x n-distance matrix.

Super classes

CaseBasedReasoning::CBRBase -> CaseBasedReasoning::RegressionModel -> CoxModel

Public fields

model: the statistical model
model_params: rms arguments

Methods

Public methods

CoxModel$check_ph()
CoxModel$clone()

Inherited methods

CaseBasedReasoning::CBRBase$calc_distance_matrix()
CaseBasedReasoning::CBRBase$get_similar_cases()
CaseBasedReasoning::CBRBase$initialize()
CaseBasedReasoning::RegressionModel$fit()
CaseBasedReasoning::RegressionModel$print()
CaseBasedReasoning::RegressionModel$variable_selection()

Method `check_ph()`

Check proportional hazard assumption graphically

Usage

CoxModel$check_ph()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

CoxModel$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Depth Distance

Description

This function returns for each observation the pairwise sum of edges between the corresponding terminal nodes over each tree in the random forest.

Usage

depth_distance(x, y = NULL, rfObject)
depth_distance(x, y = NULL, rfObject)

Arguments

`x`	A data.frame with the same columns as in the training data of the RandomForest model
`y`	A data.frame with the same columns as in the training data of the RandomForest model
`rfObject`	`ranger` object

Examples


require(ranger)
rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
depth_distance(x=iris[, -5], rfObject=rf)


require(ranger)
rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
depth_distance(x=iris[, -5], rfObject=rf)

Distance calculation based on RandomForest Proximity or Depth

Description

Distance calculation based on RandomForest Proximity or Depth

Usage

distanceRandomForest(
  x,
  y = NULL,
  rfObject,
  method = "Proximity",
  threads = NULL
)
distanceRandomForest(
  x,
  y = NULL,
  rfObject,
  method = "Proximity",
  threads = NULL
)

Arguments

`x`	a data.frame
`y`	a second data.frame
`rfObject`	`ranger` object
`method`	distance calculation method, Proximity (Default) or Depth.
`threads`	number of threads to use

Value

a dist or a matrix object with pairwise distance of observations in x vs y (if not null)

Examples


library(ranger)
# proximity pairwise distances
rf.fit <- ranger(Species ~ ., data = iris, num.trees = 500, write.forest = TRUE)
distanceRandomForest(x = iris[, -5], rfObject = rf.fit, method = "Proximity", threads = 1)

# depth distance for train versus test subset
set.seed(1234L)
learn <- sample(1:150, 100)
test <- (1:150)[-learn]
rf.fit <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE)
distanceRandomForest(x = iris[learn, -5], y = iris[test, -5], rfObject = rf.fit, method = "Depth")


library(ranger)
# proximity pairwise distances
rf.fit <- ranger(Species ~ ., data = iris, num.trees = 500, write.forest = TRUE)
distanceRandomForest(x = iris[, -5], rfObject = rf.fit, method = "Proximity", threads = 1)

# depth distance for train versus test subset
set.seed(1234L)
learn <- sample(1:150, 100)
test <- (1:150)[-learn]
rf.fit <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE)
distanceRandomForest(x = iris[learn, -5], y = iris[test, -5], rfObject = rf.fit, method = "Depth")

Number of Edges between Terminal Nodes

Description

first two columns are terminal node IDs; If an ID pair do not appear in a tree -1 is inserted

Usage

edges_between_terminal_nodes(rfObject)
edges_between_terminal_nodes(rfObject)

Arguments

rfObject

ranger object

Value

a matrix object with pairwise terminal node edge length

Examples


require(ranger)
rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
edges_between_terminal_nodes(rf.fit)


require(ranger)
rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
edges_between_terminal_nodes(rf.fit)

Generate Grid

Description

Generates a uniform grid over the distribution of the time2event variable, calculates closest point and returns this point for each input time2event element. Memory consumption will increase when performing the randomForest model with many unique time2event values. Therefore, we offer a reduction of the time2event values by choosing closest elements in a grid.

Usage

generate_grid(t2e, grid_length = 250)
generate_grid(t2e, grid_length = 250)

Arguments

`t2e`	numeric vector with time2event values
`grid_length`	number of grid elements

Value

a list with new_t2e and grid_error

Linear Regression Model for Case-Based-Reasoning

Description

Linear Regression Model for Case-Based-Reasoning

Super classes

CaseBasedReasoning::CBRBase -> CaseBasedReasoning::RegressionModel -> LinearModel

Public fields

model: the statistical model

Methods

Public methods

LinearModel$clone()

Inherited methods

CaseBasedReasoning::CBRBase$calc_distance_matrix()
CaseBasedReasoning::CBRBase$get_similar_cases()
CaseBasedReasoning::CBRBase$initialize()
CaseBasedReasoning::RegressionModel$fit()
CaseBasedReasoning::RegressionModel$print()
CaseBasedReasoning::RegressionModel$variable_selection()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LinearModel$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Logistic Regression Model for Case-Based-Reasoning

Description

Logistic Regression Model for Case-Based-Reasoning

Super classes

CaseBasedReasoning::CBRBase -> CaseBasedReasoning::RegressionModel -> LogisticModel

Public fields

model: the statistical model

Methods

Public methods

LogisticModel$clone()

Inherited methods

CaseBasedReasoning::CBRBase$calc_distance_matrix()
CaseBasedReasoning::CBRBase$get_similar_cases()
CaseBasedReasoning::CBRBase$initialize()
CaseBasedReasoning::RegressionModel$fit()
CaseBasedReasoning::RegressionModel$print()
CaseBasedReasoning::RegressionModel$variable_selection()

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LogisticModel$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Get proximity matrix of an ranger object

Description

Get proximity matrix of an ranger object

Usage

proximity_distance(x, y = NULL, rfObject, as_dist = TRUE)
proximity_distance(x, y = NULL, rfObject, as_dist = TRUE)

Arguments

`x`	a new dataset
`y`	a second new dataset (Default: NULL)
`rfObject`	`ranger` object
`as_dist`	Bool, return a dist object.

Value

a dist or a matrix object with pairwise proximity of observations in x vs y (if not null)

Examples


require(ranger)
rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
proximity_distance(x = iris[, -5], rfObject = rf)

set.seed(1234L)
learn <- sample(1:150, 100)
test <- (1:150)[-learn]
rf <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE)
proximity_distance(x = iris[learn, -5], y = iris[test, -5], rfObject = rf)

require(ranger)
rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
proximity_distance(x = iris[, -5], rfObject = rf)

set.seed(1234L)
learn <- sample(1:150, 100)
test <- (1:150)[-learn]
rf <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE)
proximity_distance(x = iris[learn, -5], y = iris[test, -5], rfObject = rf)

Forest2Matrix

Description

Transform trees of a ranger-object to a matrix

Usage

ranger_forests_to_matrix(rfObject)
ranger_forests_to_matrix(rfObject)

Arguments

rfObject

ranger object

Value

a matrix object with Column 1: tree ID Column 2: node ID Column 3: child node ID 1 Column 4: child node ID 2

Examples


library(ranger)
rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
forest_matrix <- ranger_forests_to_matrix(rf.fit)


library(ranger)
rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
forest_matrix <- ranger_forests_to_matrix(rf.fit)

Root class for Regression Models, e.g., CPH, logistic, and linear regression

Description

Root class for Regression Models, e.g., CPH, logistic, and linear regression

Super class

CaseBasedReasoning::CBRBase -> RegressionModel

Public fields

model_params: rms arguments
weights: Weights for distance calculation

Methods

Public methods

RegressionModel$print()
RegressionModel$variable_selection()
RegressionModel$fit()
RegressionModel$clone()

Inherited methods

CaseBasedReasoning::CBRBase$calc_distance_matrix()
CaseBasedReasoning::CBRBase$get_similar_cases()
CaseBasedReasoning::CBRBase$initialize()

Method `print()`

Prints information of the initialized object

Usage

RegressionModel$print()

Method `variable_selection()`

Fast backward variable selection with penalization

Usage

RegressionModel$variable_selection(x)

Arguments

x: Training data of class data.frame

Method `fit()`

Fit the RandomForest

Usage

RegressionModel$fit()

Arguments

x: Training data of class data.frame

Method `clone()`

The objects of this class are cloneable with this method.

Usage

RegressionModel$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

RandomForest Model for Searching Similar Cases

Description

RandomForest Model for Searching Similar Cases

Details

This class uses the proximity or depth matrix of the RandomForest algorithm as a similarity matrix of training and query observations. By default all cases with at least one missing values are dropped from learning, calculating the distance matrix and searching for similar cases.

Super class

CaseBasedReasoning::CBRBase -> RFModel

Public fields

model: the statistical model
model_params: model arguments
dist_method: Distance method

Methods

Public methods

RFModel$print()
RFModel$new()
RFModel$fit()
RFModel$set_distance_method()
RFModel$clone()

Inherited methods

CaseBasedReasoning::CBRBase$calc_distance_matrix()
CaseBasedReasoning::CBRBase$get_similar_cases()

Method `print()`

Prints information of the initialized object

Usage

RFModel$print()

Method `new()`

Initialize a RandomForest object for searching similar cases.

Usage

RFModel$new(formula, data, ...)

Arguments

formula: Object of class formula or character describing the model fit.
data: Training data of class data.frame
...: ranger RandomForest arguments

Method `fit()`

Fit the RandomForest

Usage

RFModel$fit()

Arguments

x: Training data of class data.frame

Method `set_distance_method()`

Set the distance method. Available are Proximity and Depth

Usage

RFModel$set_distance_method(method = "Depth")

Arguments

method: Distance calculation method (default: Proximity)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

RFModel$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

Englund and Verikas. A novel approach to estimate proximity in a random forest: An exploratory study.

Get the terminal node id of a RandomForest Object

Description

Extracts for each observation and for each tree in the forest the terminal node id. The index of terminal nodes are starting with 1, e.g., the root node has id 1

Usage

terminalNodes(x, rfObject)
terminalNodes(x, rfObject)

Arguments

`x`	a data.frame
`rfObject`	`ranger` object

Value

Matrix with terminal node IDs for all observations in x (rows) and trees (columns)

Examples

library(ranger)
rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
dfNodes <- terminalNodes(iris[, -5], rf.fit)

library(ranger)
rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
dfNodes <- terminalNodes(iris[, -5], rf.fit)

Weighted Distance calculation

Description

Weighted Distance calculation

Usage

weightedDistance(x, y = NULL, weights = NULL)
weightedDistance(x, y = NULL, weights = NULL)

Arguments

`x`	a new dataset
`y`	a second new dataset
`weights`	a vector of weights

Value

a dist or matrix object

Examples


require(ranger)
rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
terminalNodes(iris[, -5], rf)


require(ranger)
rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE)
terminalNodes(iris[, -5], rf)

Package 'CaseBasedReasoning'

Help Index

Converts a distance vector into an object of class dist

Description

Usage

Arguments

Call a function by character strings using the namespace and custom parameters.

Description

Usage

Arguments

Case Based Reasoning

Description

Root class for common functionality of this package

Description

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method fit()

Usage

Arguments

Method calc_distance_matrix()

Usage

Arguments

Method get_similar_cases()

Usage

Arguments

Method clone()

Usage

Arguments

Cox-Beta Model for Case-Based-Reasoning

Description

Details

Super classes

Public fields

Methods

Public methods

Method check_ph()

Usage

Method clone()

Usage

Arguments

Depth Distance

Description

Usage

Arguments

Examples

Distance calculation based on RandomForest Proximity or Depth

Description

Usage

Arguments

Value

Examples

Number of Edges between Terminal Nodes

Description

Usage

Arguments

Value

Examples

Generate Grid

Description

Usage

Arguments

Value

Linear Regression Model for Case-Based-Reasoning

Description

Super classes

Public fields

Methods

Public methods

Method clone()

Usage

Arguments

Logistic Regression Model for Case-Based-Reasoning

Description

Super classes

Public fields

Methods

Converts a distance vector into an object of class `dist`

Method `new()`

Method `fit()`

Method `calc_distance_matrix()`

Method `get_similar_cases()`

Method `clone()`

Method `check_ph()`

Method `clone()`

Method `clone()`

Method `clone()`

Method `print()`

Method `variable_selection()`

Method `fit()`

Method `clone()`

Method `print()`

Method `new()`

Method `fit()`

Method `set_distance_method()`

Method `clone()`