| Title: | Case Based Reasoning |
|---|---|
| Description: | Case-based reasoning is a problem-solving methodology that involves solving a new problem by referring to the solution of a similar problem in a large set of previously solved problems. The key aspect of Case Based Reasoning is to determine the problem that "most closely" matches the new problem at hand. This is achieved by defining a family of distance functions and using these distance functions as parameters for local averaging regression estimates of the final result. The optimal distance function is chosen based on a specific error measure used in regression estimation. This approach allows for efficient problem-solving by leveraging past experiences and adapting solutions from similar cases. The underlying concept is inspired by the work of Dippon J. et al. (2002) <doi:10.1016/S0167-9473(02)00058-0>. |
| Authors: | Simon Mueller [aut, cre], PD Dr. Juergen Dippon [ctb] |
| Maintainer: | Simon Mueller <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.4.1 |
| Built: | 2026-05-28 10:50:54 UTC |
| Source: | https://github.com/sipemu/case-based-reasoning |
dist
Converts a distance vector into an object of class dist
asDistObject(x, n, method) as_dist_object(x, n, method)asDistObject(x, n, method) as_dist_object(x, n, method)
x |
data vector |
n |
length of x |
method |
method description |
Root class for common functionality of this package
Root class for common functionality of this package
modelthe statistical model
datatraining data
model_fittrained object
formulaObject of class formula or character describing the model fit
termsterms of the formula
endpointTarget variable
dist_matrixA matrix with distances
order_matrixA matrix with the order indices for similar cases search
endPointDeprecated: use endpoint instead.
distMatDeprecated: use dist_matrix instead.
orderMatDeprecated: use order_matrix instead.
new()
Initialize object for searching similar cases
CBRBase$new(formula, data)
formulaObject of class formula or character describing the model fit
dataTraining data of class data.frame
fit()
Fit the Model
CBRBase$fit()
calc_distance_matrix()
Calculates the distance matrix
CBRBase$calc_distance_matrix(query = NULL)
queryQuery data of class data.frame
get_similar_cases()
Extracts similar cases
CBRBase$get_similar_cases(query, k = 1, add_distance = TRUE, merge = FALSE)
queryQuery data of class data.frame
knumber of similar cases
add_distanceAdd distance to result data.frame
mergeAdd query data to matched cases data.frame
clone()
The objects of this class are cloneable with this method.
CBRBase$clone(deep = FALSE)
deepWhether to make a deep clone.
Cox-Beta Model for Case-Based-Reasoning
Cox-Beta Model for Case-Based-Reasoning
Regression beta coefficients obtained from a CPH regression model fitted on the training data are used for building a weighted distance measure between train and test data. Afterwards, we will use these weights for calculating a (n x m)-distance matrix, where n is the number of observations in the training data, and m is the number of observations of the test data. The user can use this distance matrix for further cluster analysis or for extracting for each test observation k (= 1,...,l) similar cases from the train data. We use the rms-package for model fitting, variable selection, and checking model assumptions. If the user omits the test data, this functions returns a n x n-distance matrix.
CaseBasedReasoning::CBRBase -> CaseBasedReasoning::RegressionModel -> CoxModel
modelthe statistical model
model_paramsrms arguments
check_ph()
Check proportional hazard assumption graphically
CoxModel$check_ph()
clone()
The objects of this class are cloneable with this method.
CoxModel$clone(deep = FALSE)
deepWhether to make a deep clone.
This function returns for each observation the pairwise sum of edges between the corresponding terminal nodes over each tree in the random forest.
depth_distance(x, y = NULL, rfObject)depth_distance(x, y = NULL, rfObject)
x |
A data.frame with the same columns as in the training data of the RandomForest model |
y |
A data.frame with the same columns as in the training data of the RandomForest model |
rfObject |
|
library(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) depth_distance(x = iris[, -5], rfObject = rf)library(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) depth_distance(x = iris[, -5], rfObject = rf)
Distance calculation based on RandomForest Proximity or Depth
distanceRandomForest( x, y = NULL, rfObject, method = "Proximity", threads = NULL ) distance_random_forest( x, y = NULL, rfObject, method = "Proximity", threads = NULL )distanceRandomForest( x, y = NULL, rfObject, method = "Proximity", threads = NULL ) distance_random_forest( x, y = NULL, rfObject, method = "Proximity", threads = NULL )
x |
a data.frame |
y |
a second data.frame |
rfObject |
|
method |
distance calculation method, Proximity (Default) or Depth. |
threads |
number of threads to use |
a dist or a matrix object with pairwise distance of
observations in x vs y (if not null)
library(ranger) # proximity pairwise distances rf.fit <- ranger(Species ~ ., data = iris, num.trees = 500, write.forest = TRUE) distance_random_forest(x = iris[, -5], rfObject = rf.fit, method = "Proximity", threads = 1) # depth distance for train versus test subset set.seed(1234L) learn <- sample(1:150, 100) test <- (1:150)[-learn] rf.fit <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE) distance_random_forest(x = iris[learn, -5], y = iris[test, -5], rfObject = rf.fit, method = "Depth")library(ranger) # proximity pairwise distances rf.fit <- ranger(Species ~ ., data = iris, num.trees = 500, write.forest = TRUE) distance_random_forest(x = iris[, -5], rfObject = rf.fit, method = "Proximity", threads = 1) # depth distance for train versus test subset set.seed(1234L) learn <- sample(1:150, 100) test <- (1:150)[-learn] rf.fit <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE) distance_random_forest(x = iris[learn, -5], y = iris[test, -5], rfObject = rf.fit, method = "Depth")
first two columns are terminal node IDs; If an ID pair do not appear in a tree -1 is inserted
edges_between_terminal_nodes(rfObject)edges_between_terminal_nodes(rfObject)
rfObject |
|
a matrix object with pairwise terminal node edge length
library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) edges_between_terminal_nodes(rf.fit)library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) edges_between_terminal_nodes(rf.fit)
Generates a uniform grid over the distribution of the time2event variable, calculates closest point and returns this point for each input time2event element. Memory consumption will increase when performing the randomForest model with many unique time2event values. Therefore, we offer a reduction of the time2event values by choosing closest elements in a grid.
generate_grid(t2e, grid_length = 250)generate_grid(t2e, grid_length = 250)
t2e |
numeric vector with time2event values |
grid_length |
number of grid elements |
a list with new_t2e and grid_error
Linear Regression Model for Case-Based-Reasoning
Linear Regression Model for Case-Based-Reasoning
CaseBasedReasoning::CBRBase -> CaseBasedReasoning::RegressionModel -> LinearModel
modelthe statistical model
clone()
The objects of this class are cloneable with this method.
LinearModel$clone(deep = FALSE)
deepWhether to make a deep clone.
Logistic Regression Model for Case-Based-Reasoning
Logistic Regression Model for Case-Based-Reasoning
CaseBasedReasoning::CBRBase -> CaseBasedReasoning::RegressionModel -> LogisticModel
modelthe statistical model
clone()
The objects of this class are cloneable with this method.
LogisticModel$clone(deep = FALSE)
deepWhether to make a deep clone.
Predict method for CoxModel
## S3 method for class 'CoxModel' predict(object, newdata, k = 1, ...)## S3 method for class 'CoxModel' predict(object, newdata, k = 1, ...)
object |
A |
newdata |
Query data of class data.frame |
k |
Number of similar cases to return |
... |
Additional arguments (currently unused) |
A data.frame of similar cases
Predict method for LinearModel
## S3 method for class 'LinearModel' predict(object, newdata, k = 1, ...)## S3 method for class 'LinearModel' predict(object, newdata, k = 1, ...)
object |
A |
newdata |
Query data of class data.frame |
k |
Number of similar cases to return |
... |
Additional arguments (currently unused) |
A data.frame of similar cases
Predict method for LogisticModel
## S3 method for class 'LogisticModel' predict(object, newdata, k = 1, ...)## S3 method for class 'LogisticModel' predict(object, newdata, k = 1, ...)
object |
A |
newdata |
Query data of class data.frame |
k |
Number of similar cases to return |
... |
Additional arguments (currently unused) |
A data.frame of similar cases
Predict method for RFModel
## S3 method for class 'RFModel' predict(object, newdata, k = 1, ...)## S3 method for class 'RFModel' predict(object, newdata, k = 1, ...)
object |
An |
newdata |
Query data of class data.frame |
k |
Number of similar cases to return |
... |
Additional arguments (currently unused) |
A data.frame of similar cases
Print method for CoxModel
## S3 method for class 'CoxModel' print(x, ...)## S3 method for class 'CoxModel' print(x, ...)
x |
A |
... |
Additional arguments (currently unused) |
Print method for LinearModel
## S3 method for class 'LinearModel' print(x, ...)## S3 method for class 'LinearModel' print(x, ...)
x |
A |
... |
Additional arguments (currently unused) |
Print method for LogisticModel
## S3 method for class 'LogisticModel' print(x, ...)## S3 method for class 'LogisticModel' print(x, ...)
x |
A |
... |
Additional arguments (currently unused) |
Print method for RFModel
## S3 method for class 'RFModel' print(x, ...)## S3 method for class 'RFModel' print(x, ...)
x |
An |
... |
Additional arguments (currently unused) |
Get proximity matrix of an ranger object
proximity_distance(x, y = NULL, rfObject, as_dist = TRUE)proximity_distance(x, y = NULL, rfObject, as_dist = TRUE)
x |
a new dataset |
y |
a second new dataset (Default: NULL) |
rfObject |
|
as_dist |
Bool, return a dist object. |
a dist or a matrix object with pairwise proximity of
observations in x vs y (if not null)
library(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) proximity_distance(x = iris[, -5], rfObject = rf) set.seed(1234L) learn <- sample(1:150, 100) test <- (1:150)[-learn] rf <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE) proximity_distance(x = iris[learn, -5], y = iris[test, -5], rfObject = rf)library(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) proximity_distance(x = iris[, -5], rfObject = rf) set.seed(1234L) learn <- sample(1:150, 100) test <- (1:150)[-learn] rf <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE) proximity_distance(x = iris[learn, -5], y = iris[test, -5], rfObject = rf)
Transform trees of a ranger-object to a matrix
ranger_forests_to_matrix(rfObject)ranger_forests_to_matrix(rfObject)
rfObject |
|
a matrix object with
Column 1: tree ID
Column 2: node ID
Column 3: child node ID 1
Column 4: child node ID 2
library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) forest_matrix <- ranger_forests_to_matrix(rf.fit)library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) forest_matrix <- ranger_forests_to_matrix(rf.fit)
Root class for Regression Models, e.g., CPH, logistic, and linear regression
Root class for Regression Models, e.g., CPH, logistic, and linear regression
CaseBasedReasoning::CBRBase -> RegressionModel
model_paramsrms arguments
weightsWeights for distance calculation
print()
Prints information of the initialized object
RegressionModel$print()
variable_selection()
Fast backward variable selection with penalization
RegressionModel$variable_selection()
fit()
Fit the regression model
RegressionModel$fit()
clone()
The objects of this class are cloneable with this method.
RegressionModel$clone(deep = FALSE)
deepWhether to make a deep clone.
RandomForest Model for Searching Similar Cases
RandomForest Model for Searching Similar Cases
This class uses the proximity or depth matrix of the RandomForest algorithm as a similarity matrix of training and query observations. By default all cases with at least one missing values are dropped from learning, calculating the distance matrix and searching for similar cases.
CaseBasedReasoning::CBRBase -> RFModel
modelthe statistical model
model_paramsmodel arguments
dist_methodDistance method
print()
Prints information of the initialized object
RFModel$print()
new()
Initialize a RandomForest object for searching similar cases.
RFModel$new(formula, data, ...)
formulaObject of class formula or character describing the model fit.
dataTraining data of class data.frame
...ranger RandomForest arguments
fit()
Fit the RandomForest
RFModel$fit()
set_distance_method()
Set the distance method. Available are Proximity and Depth
RFModel$set_distance_method(method = "Depth")
methodDistance calculation method (default: Proximity)
clone()
The objects of this class are cloneable with this method.
RFModel$clone(deep = FALSE)
deepWhether to make a deep clone.
Englund and Verikas. A novel approach to estimate proximity in a random forest: An exploratory study.
Summary method for CoxModel
## S3 method for class 'CoxModel' summary(object, ...)## S3 method for class 'CoxModel' summary(object, ...)
object |
A |
... |
Additional arguments (currently unused) |
Summary method for LinearModel
## S3 method for class 'LinearModel' summary(object, ...)## S3 method for class 'LinearModel' summary(object, ...)
object |
A |
... |
Additional arguments (currently unused) |
Summary method for LogisticModel
## S3 method for class 'LogisticModel' summary(object, ...)## S3 method for class 'LogisticModel' summary(object, ...)
object |
A |
... |
Additional arguments (currently unused) |
Summary method for RFModel
## S3 method for class 'RFModel' summary(object, ...)## S3 method for class 'RFModel' summary(object, ...)
object |
An |
... |
Additional arguments (currently unused) |
Extracts for each observation and for each tree in the forest the terminal node id. The index of terminal nodes are starting with 1, e.g., the root node has id 1
terminalNodes(x, rfObject) terminal_nodes(x, rfObject)terminalNodes(x, rfObject) terminal_nodes(x, rfObject)
x |
a data.frame |
rfObject |
|
Matrix with terminal node IDs for all observations in x (rows) and trees (columns)
library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) dfNodes <- terminal_nodes(iris[, -5], rf.fit)library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) dfNodes <- terminal_nodes(iris[, -5], rf.fit)
Weighted Distance calculation
weightedDistance(x, y = NULL, weights = NULL) weighted_distance(x, y = NULL, weights = NULL)weightedDistance(x, y = NULL, weights = NULL) weighted_distance(x, y = NULL, weights = NULL)
x |
a new dataset |
y |
a second new dataset |
weights |
a vector of weights |
a dist or matrix object
library(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) terminal_nodes(iris[, -5], rf)library(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) terminal_nodes(iris[, -5], rf)