Title: | Case Based Reasoning |
---|---|
Description: | Case-based reasoning is a problem-solving methodology that involves solving a new problem by referring to the solution of a similar problem in a large set of previously solved problems. The key aspect of Case Based Reasoning is to determine the problem that "most closely" matches the new problem at hand. This is achieved by defining a family of distance functions and using these distance functions as parameters for local averaging regression estimates of the final result. The optimal distance function is chosen based on a specific error measure used in regression estimation. This approach allows for efficient problem-solving by leveraging past experiences and adapting solutions from similar cases. The underlying concept is inspired by the work of Dippon J. et al. (2002) <doi:10.1016/S0167-9473(02)00058-0>. |
Authors: | Simon Mueller [cre], PD Dr. Juergen Dippon [ctb] |
Maintainer: | Simon Mueller <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3 |
Built: | 2024-11-14 06:16:59 UTC |
Source: | https://github.com/sipemu/case-based-reasoning |
dist
Converts a distance vector into an object of class dist
asDistObject(x, n, method)
asDistObject(x, n, method)
x |
data vector |
n |
length of x |
method |
method description |
Call a function by character strings using the namespace and custom parameters.
call_function(func_list)
call_function(func_list)
func_list |
A list with fields func, namespace, and args |
Root class for common functionality of this package
Root class for common functionality of this package
model
the statistical model
data
training data
model_fit
trained object
formula
Object of class formula or character describing the model fit
terms
terms of the formula
endPoint
Target variable
distMat
A matrix with distances
orderMat
A matrix with the order indices for similar cases search
new()
Initialize object for searching similar cases
CBRBase$new(formula, data)
formula
Object of class formula or character describing the model fit
data
fit()
Fit the Model
CBRBase$fit()
x
Training data of class data.frame
calc_distance_matrix()
Calculates the distance matrix
CBRBase$calc_distance_matrix(query = NULL)
query
Query data of class data.frame
x
Training data of class data.frame
get_similar_cases()
Extracts similar cases
CBRBase$get_similar_cases(query, k = 1, addDistance = T, merge = F)
query
Query data of class data.frame
k
number of similar cases
addDistance
Add distance to result data.frame
merge
Add query data to matched cases data.frame
clone()
The objects of this class are cloneable with this method.
CBRBase$clone(deep = FALSE)
deep
Whether to make a deep clone.
Cox-Beta Model for Case-Based-Reasoning
Cox-Beta Model for Case-Based-Reasoning
Regression beta coefficients obtained from a CPH regression model fitted on the training data are used for building a weighted distance measure between train and test data. Afterwards, we will use these weights for calculating a (n x m)-distance matrix, where n is the number of observations in the training data, and m is the number of observations of the test data. The user can use this distance matrix for further cluster analysis or for extracting for each test observation k (= 1,...,l) similar cases from the train data. We use the rms-package for model fitting, variable selection, and checking model assumptions. If the user omits the test data, this functions returns a n x n-distance matrix.
CaseBasedReasoning::CBRBase
-> CaseBasedReasoning::RegressionModel
-> CoxModel
model
the statistical model
model_params
rms arguments
check_ph()
Check proportional hazard assumption graphically
CoxModel$check_ph()
clone()
The objects of this class are cloneable with this method.
CoxModel$clone(deep = FALSE)
deep
Whether to make a deep clone.
This function returns for each observation the pairwise sum of edges between the corresponding terminal nodes over each tree in the random forest.
depth_distance(x, y = NULL, rfObject)
depth_distance(x, y = NULL, rfObject)
x |
A data.frame with the same columns as in the training data of the RandomForest model |
y |
A data.frame with the same columns as in the training data of the RandomForest model |
rfObject |
|
require(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) depth_distance(x=iris[, -5], rfObject=rf)
require(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) depth_distance(x=iris[, -5], rfObject=rf)
Distance calculation based on RandomForest Proximity or Depth
distanceRandomForest( x, y = NULL, rfObject, method = "Proximity", threads = NULL )
distanceRandomForest( x, y = NULL, rfObject, method = "Proximity", threads = NULL )
x |
a data.frame |
y |
a second data.frame |
rfObject |
|
method |
distance calculation method, Proximity (Default) or Depth. |
threads |
number of threads to use |
a dist
or a matrix object with pairwise distance of
observations in x vs y (if not null)
library(ranger) # proximity pairwise distances rf.fit <- ranger(Species ~ ., data = iris, num.trees = 500, write.forest = TRUE) distanceRandomForest(x = iris[, -5], rfObject = rf.fit, method = "Proximity", threads = 1) # depth distance for train versus test subset set.seed(1234L) learn <- sample(1:150, 100) test <- (1:150)[-learn] rf.fit <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE) distanceRandomForest(x = iris[learn, -5], y = iris[test, -5], rfObject = rf.fit, method = "Depth")
library(ranger) # proximity pairwise distances rf.fit <- ranger(Species ~ ., data = iris, num.trees = 500, write.forest = TRUE) distanceRandomForest(x = iris[, -5], rfObject = rf.fit, method = "Proximity", threads = 1) # depth distance for train versus test subset set.seed(1234L) learn <- sample(1:150, 100) test <- (1:150)[-learn] rf.fit <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE) distanceRandomForest(x = iris[learn, -5], y = iris[test, -5], rfObject = rf.fit, method = "Depth")
first two columns are terminal node IDs; If an ID pair do not appear in a tree -1 is inserted
edges_between_terminal_nodes(rfObject)
edges_between_terminal_nodes(rfObject)
rfObject |
|
a matrix
object with pairwise terminal node edge length
require(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) edges_between_terminal_nodes(rf.fit)
require(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) edges_between_terminal_nodes(rf.fit)
Generates a uniform grid over the distribution of the time2event variable, calculates closest point and returns this point for each input time2event element. Memory consumption will increase when performing the randomForest model with many unique time2event values. Therefore, we offer a reduction of the time2event values by choosing closest elements in a grid.
generate_grid(t2e, grid_length = 250)
generate_grid(t2e, grid_length = 250)
t2e |
numeric vector with time2event values |
grid_length |
number of grid elements |
a list with new_t2e and grid_error
Linear Regression Model for Case-Based-Reasoning
Linear Regression Model for Case-Based-Reasoning
CaseBasedReasoning::CBRBase
-> CaseBasedReasoning::RegressionModel
-> LinearModel
model
the statistical model
clone()
The objects of this class are cloneable with this method.
LinearModel$clone(deep = FALSE)
deep
Whether to make a deep clone.
Logistic Regression Model for Case-Based-Reasoning
Logistic Regression Model for Case-Based-Reasoning
CaseBasedReasoning::CBRBase
-> CaseBasedReasoning::RegressionModel
-> LogisticModel
model
the statistical model
clone()
The objects of this class are cloneable with this method.
LogisticModel$clone(deep = FALSE)
deep
Whether to make a deep clone.
Get proximity matrix of an ranger object
proximity_distance(x, y = NULL, rfObject, as_dist = TRUE)
proximity_distance(x, y = NULL, rfObject, as_dist = TRUE)
x |
a new dataset |
y |
a second new dataset (Default: NULL) |
rfObject |
|
as_dist |
Bool, return a dist object. |
a dist
or a matrix object with pairwise proximity of
observations in x vs y (if not null)
require(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) proximity_distance(x = iris[, -5], rfObject = rf) set.seed(1234L) learn <- sample(1:150, 100) test <- (1:150)[-learn] rf <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE) proximity_distance(x = iris[learn, -5], y = iris[test, -5], rfObject = rf)
require(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) proximity_distance(x = iris[, -5], rfObject = rf) set.seed(1234L) learn <- sample(1:150, 100) test <- (1:150)[-learn] rf <- ranger(Species ~ ., data = iris[learn, ], num.trees = 500, write.forest = TRUE) proximity_distance(x = iris[learn, -5], y = iris[test, -5], rfObject = rf)
Transform trees of a ranger
-object to a matrix
ranger_forests_to_matrix(rfObject)
ranger_forests_to_matrix(rfObject)
rfObject |
|
a matrix
object with
Column 1: tree ID
Column 2: node ID
Column 3: child node ID 1
Column 4: child node ID 2
library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) forest_matrix <- ranger_forests_to_matrix(rf.fit)
library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) forest_matrix <- ranger_forests_to_matrix(rf.fit)
Root class for Regression Models, e.g., CPH, logistic, and linear regression
Root class for Regression Models, e.g., CPH, logistic, and linear regression
CaseBasedReasoning::CBRBase
-> RegressionModel
model_params
rms arguments
weights
Weights for distance calculation
print()
Prints information of the initialized object
RegressionModel$print()
variable_selection()
Fast backward variable selection with penalization
RegressionModel$variable_selection(x)
x
Training data of class data.frame
fit()
Fit the RandomForest
RegressionModel$fit()
x
Training data of class data.frame
clone()
The objects of this class are cloneable with this method.
RegressionModel$clone(deep = FALSE)
deep
Whether to make a deep clone.
RandomForest Model for Searching Similar Cases
RandomForest Model for Searching Similar Cases
This class uses the proximity or depth matrix of the RandomForest algorithm as a similarity matrix of training and query observations. By default all cases with at least one missing values are dropped from learning, calculating the distance matrix and searching for similar cases.
CaseBasedReasoning::CBRBase
-> RFModel
model
the statistical model
model_params
model arguments
dist_method
Distance method
print()
Prints information of the initialized object
RFModel$print()
new()
Initialize a RandomForest object for searching similar cases.
RFModel$new(formula, data, ...)
formula
Object of class formula or character describing the model fit.
data
Training data of class data.frame
...
ranger RandomForest arguments
fit()
Fit the RandomForest
RFModel$fit()
x
Training data of class data.frame
set_distance_method()
Set the distance method. Available are Proximity and Depth
RFModel$set_distance_method(method = "Depth")
method
Distance calculation method (default: Proximity)
clone()
The objects of this class are cloneable with this method.
RFModel$clone(deep = FALSE)
deep
Whether to make a deep clone.
Englund and Verikas. A novel approach to estimate proximity in a random forest: An exploratory study.
Extracts for each observation and for each tree in the forest the terminal node id. The index of terminal nodes are starting with 1, e.g., the root node has id 1
terminalNodes(x, rfObject)
terminalNodes(x, rfObject)
x |
a data.frame |
rfObject |
|
Matrix with terminal node IDs for all observations in x (rows) and trees (columns)
library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) dfNodes <- terminalNodes(iris[, -5], rf.fit)
library(ranger) rf.fit <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) dfNodes <- terminalNodes(iris[, -5], rf.fit)
Weighted Distance calculation
weightedDistance(x, y = NULL, weights = NULL)
weightedDistance(x, y = NULL, weights = NULL)
x |
a new dataset |
y |
a second new dataset |
weights |
a vector of weights |
a dist
or matrix
object
require(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) terminalNodes(iris[, -5], rf)
require(ranger) rf <- ranger(Species ~ ., data = iris, num.trees = 5, write.forest = TRUE) terminalNodes(iris[, -5], rf)