Skip to content

CompBioIPM/RBBR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 

Repository files navigation

RBBR: an R package for Regression-Based Boolean Rule inference.

  • RBBR: Regression-Based Boolean Rule inference
  • This repository contains code and tutorials for executing RBBR.
  • Sample datasets to execute RBBR are stored within the example_data directory.
  • The RBBR package supports parallel execution on multi-CPU platforms, enhancing accessibility for real-world Boolean rule inference applications.

Malekpour, S.A., Pezeshk, H., Explainable artificial intelligence with Boolean rule-aware predictions in ridge regression models. Neurocomputing, 2025.
If you find our study useful and relevant to your research, please kindly cite us. Your citation means a lot to us and helps acknowledge our contributions.

Step 1. RBBR installation

The RBBR codes are written in R version 4.1.3 and have been tested in both Windows and Linux environments.

Installation

  1. Download the compiled package file RBBR_0.1.0.tar.gz from this GitHub page.
  2. Install the RBBR package by running the following command in R:
install.packages("path/to/RBBR_0.1.0.tar.gz", repos = NULL, type = "source")

Dependencies

Please ensure that you have the following packages installed. The glmnet package is required to fit ridge regressions. In order to run RBBR with parallel computing, the packages doParallel, foreach, and doSNOW need to be installed.

install.packages("glmnet")
install.packages("doParallel")  
install.packages("foreach")
install.packages("doSNOW")

Step 2. Prepare input files

Preprocessing input data

To preprocess raw data, including steps such as rescaling to bring each input feature within the [0,1] range, you can use the rbbr_scaling() function from the RBBR package.

# Preprocessing input data
data <- rbbr_scaling(data)

Step 3. Train RBBR and use it for prediction purpose on new dataset

The RBBR package offers the rbbr_train() function for training the model on a dataset to extract Boolean rules, and the rbbr_predictor() function for utilizing the trained model to predict target values or labels on a new dataset.

Train RBBR

For training the RBBR model on a dataset to extract Boolean rules, you can use the rbbr_train() function.

# For training the RBBR model
trained_model <- rbbr_train(data, max_feature = NA, mode = NA, slope = NA, penalty = NA, weight_threshold = NA, balancing = NA, num_cores = NA)
# Required input arguments
# data              The dataset with rescaled features within the [0,1] interval.
#                   Each row represents a sample and each column represents a feature. The target variable (class) should be in the last column.  

# Optional input arguments  
# max_feature       The maximum number of input features allowed in a Boolean rule.
#                   The default value is 3.
 
# mode              Choose between "1L" for fitting 1-layered models or "2L" for fitting 2-layered models.
#                   The default value is "1L".
 
# slope             The slope parameter used in the Sigmoid activation function.
#                   The default value is 10.
 
# penalty           The penalty for the number of parameters in the BIC function.
#                   The default value is 1, but it can be adjusted (e.g., to 10) to favor simpler Boolean rules with fewer features.

# weight_threshold  Conjunctions with weights above this threshold in the fitted ridge regression models will be printed as active conjunctions in the output.
#                   The default value is 0.

# balancing         This is for adjusting the distribution of classes or categories within a dataset to ensure that each class is adequately represented.
#                   The default value is "True". Set it to "False", if you don't need to perform the data balancing.

# num_cores         Specify the number of parallel workers (adjust according to your system)

Use RBBR for making predictions

For utilizing the trained model to predict target values or labels on a new dataset, you can use the rbbr_predictor() function. In datasets with binary (0/1) target features, the rbbr_predictor() function produces predicted probabilities for target labels. However, when dealing with a continuous target variable, the rbbr_predictor() output can be regarded as the predicted target value.

# For making predictions
predicted_labels <- rbbr_predictor(trained_model, data_test, num_top_rules = NA, slope = NA, num_cores = NA)
# Required input arguments
# trained_model     The model that has been trained using the rbbr_train() function on a training dataset.

# data_test         The new dataset for which we want to predict the target value or label.
#                   Each sample is represented as a row, and the input features are in columns.

# Optional input arguments  
# num_top_rules     Specifies the number of Boolean rules with the best Bayesian Information Criterion (BIC) scores to be used for prediction.
#                   The default value is 1.

# slope             The slope parameter used in the Sigmoid activation function.
#                   The default value is 10.

# num_cores         Specify the number of parallel workers (adjust according to your system).

Example usage of RBBR

I. XOR synthetic data

rm(list = ls())
library(RBBR)

# load data
data <- as.data.frame(read.csv(file = "../example_data/XOR_data.csv", header= TRUE))

print(head(data))
       feat_0     feat_1     feat_2     feat_3      feat_4    feat_5    feat_6      feat_7    feat_8     feat_9 xor
1 0.625853352 0.48361474 0.10529046 0.57628866 0.878811699 0.7895082 0.7824021 0.019096636 0.9185168 0.34709053   1
2 0.903319041 0.02643791 0.37173989 0.05941228 0.204367513 0.1348234 0.3502440 0.928994618 0.3147315 0.95440266   1
3 0.880183794 0.64759100 0.09914387 0.26202347 0.000217744 0.4028034 0.1407160 0.767496957 0.5380130 0.22020956   0
4 0.000478779 0.09698873 0.79021914 0.21921378 0.964783778 0.8605876 0.7625196 0.869578955 0.9855441 0.03724109   0
5 0.442783273 0.22467666 0.61272149 0.11801844 0.380510666 0.8145049 0.3535930 0.007680483 0.3920220 0.62243004   0
6 0.088976773 0.68076354 0.92153500 0.04653772 0.178060560 0.1599707 0.2744070 0.414334818 0.7652831 0.26153153   1

# create train and test sets
data_train   <- data[1:800, ]
data_test    <- data[801:1000, ]

# train RBBR
trained_model<- rbbr_train(data_train)
training process started with  8  computing cores
  |====================| 100%

# print the Boolean rules with best BICs
print(head(trained_model$boolean_rules_sorted))
                                                                                                         Boolean_Rule                R2       BIC Input_Size Index
1                                                                                                [XOR(feat_0,feat_1)] 0.763040725683245 -2350.970          2     1
2 [OR(AND(~feat_0,feat_1,feat_2),AND(feat_0,~feat_1,feat_2),AND(~feat_0,feat_1,~feat_2),AND(feat_0,~feat_1,~feat_2))] 0.766667886255495 -2341.053          3     1
3 [OR(AND(~feat_0,feat_1,feat_4),AND(feat_0,~feat_1,feat_4),AND(~feat_0,feat_1,~feat_4),AND(feat_0,~feat_1,~feat_4))] 0.766009041280997 -2338.679          3     3
4 [OR(AND(~feat_0,feat_1,feat_3),AND(feat_0,~feat_1,feat_3),AND(~feat_0,feat_1,~feat_3),AND(feat_0,~feat_1,~feat_3))] 0.765424468948402 -2336.578          3     2
5 [OR(AND(~feat_0,feat_1,feat_9),AND(feat_0,~feat_1,feat_9),AND(~feat_0,feat_1,~feat_9),AND(feat_0,~feat_1,~feat_9))] 0.765319468799652 -2336.201          3     8
6 [OR(AND(~feat_0,feat_1,feat_6),AND(feat_0,~feat_1,feat_6),AND(~feat_0,feat_1,~feat_6),AND(feat_0,~feat_1,~feat_6))] 0.764590481443423 -2333.590          3     5
              Features Active_Conjunctions                   Weights Layer1, Sub-Rule1
1        feat_0.feat_1                   2                       -1.08:0.99:1.09:-1.06
2 feat_0.feat_1.feat_2                   4 -0.99:1.09:1.08:-1.18:-1.08:0.96:1.17:-1.04
3 feat_0.feat_1.feat_4                   4  -1.04:0.96:1.12:-1.15:-1.02:1.05:1.1:-1.13
4 feat_0.feat_1.feat_3                   4 -1.14:0.99:1.17:-1.05:-0.96:1.02:1.06:-1.17
5 feat_0.feat_1.feat_9                   4     -1.18:1.11:1.22:-1.03:-1.18:0.9:0.97:-1
6 feat_0.feat_1.feat_6                   4 -1.13:1.05:1.26:-1.01:-1.03:1.01:1.02:-1.06

# use input features from test data for making predictions
data_test_x  <- data_test[ ,1:(ncol(data_test)-1)]

predicted_labels <- rbbr_predictor(trained_model, data_test_x, num_top_rules = 1, slope = 10, num_cores = NA)
print(head(predicted_labels))
[1] 0.0326984275 0.9896672566 0.0003574667 0.3325976343 0.0803804948 0.0043887294

# the output from rbbr_predictor() as shown above is the predicted probabilities for target labels (0/1). 

II. MAGIC data

rm(list = ls())
library(RBBR)

# load data
data          <- as.data.frame(read.table("../example_data/magic04.data", sep = ",", header = FALSE))
colnames(data)<- c("fLength", "fWidth", "fSize", "fConc", "fConc1", "fAsym", "fM3Long", "fM3Trans", "fAlpha", "fDist", "class")
data$class    <- ifelse(data$class == "g", 1, 0)
print(head(data))
   fLength   fWidth  fSize  fConc fConc1    fAsym  fM3Long fM3Trans  fAlpha    fDist class
1  28.7967  16.0021 2.6449 0.3918 0.1982  27.7004  22.0110  -8.2027 40.0920  81.8828     1
2  31.6036  11.7235 2.5185 0.5303 0.3773  26.2722  23.8238  -9.9574  6.3609 205.2610     1
3 162.0520 136.0310 4.0612 0.0374 0.0187 116.7410 -64.8580 -45.2160 76.9600 256.7880     1
4  23.8172   9.5728 2.3385 0.6147 0.3922  27.2107  -6.4633  -7.1513 10.4490 116.7370     1
5  75.1362  30.9205 3.1611 0.3168 0.1832  -5.5277  28.5525  21.8393  4.6480 356.4620     1
6  51.6240  21.1502 2.9085 0.2420 0.1340  50.8761  43.1887   9.8145  3.6130 238.0980     1

# Scaling to bring each input feature within the (0,1) range
data_scaled <- rbbr_scaling(data)
print(head(data_scaled))
    fLength    fWidth     fSize      fConc     fConc1     fAsym   fM3Long  fM3Trans     fAlpha     fDist class
1 0.1427532 0.2050390 0.3490045 0.49884575 0.42928416 0.8943205 0.8223308 0.7967131 0.46874459 0.2292175     1
2 0.1590993 0.1502162 0.2863067 0.68128604 0.81778742 0.8916903 0.8265443 0.7896415 0.07436989 0.5800906     1
3 0.9187688 0.9999000 0.9999000 0.03200938 0.03991323 0.9999000 0.6204176 0.6475468 0.89979506 0.7266274     1
4 0.1137550 0.1226587 0.1970219 0.79246265 0.85010846 0.8934186 0.7561468 0.8009503 0.12216682 0.3283388     1
5 0.4126125 0.3961922 0.6050535 0.40005137 0.39674620 0.8331270 0.8375355 0.9177845 0.05434313 0.9999000     1
6 0.2756886 0.2710029 0.4797571 0.30152045 0.29002169 0.9370013 0.8715550 0.8693237 0.04224220 0.6734752     1

# Randomly select indices for the training dataset
train_indices <- sample(nrow(data_scaled), floor(0.8 * nrow(data_scaled)))

# create train and test sets
data_train <- data_scaled[train_indices, ]
data_test  <- data_scaled[-train_indices, ]

# train RBBR
trained_model <- rbbr_train(data_train, max_feature = 6, mode = "1L")
training process started with  8  computing cores
  |====================| 100%

# use input features from test data for making predictions
data_test_x  <- data_test[ ,1:(ncol(data_test)-1)]

predicted_labels <- rbbr_predictor(trained_model, data_test_x, num_top_rules = 1, slope = 10, num_cores = NA)
print(head(predicted_labels))
[1] 0.008138437 0.910168341 0.992705287 0.989310570 0.145255610 0.605176927

# the output from rbbr_predictor() as shown above is the predicted probabilities for target labels (0/1). 

III. OR([AND(C,D)],[NOR(A,B)]) example

rm(list = ls())
library(RBBR)

# load data
data              <- as.data.frame(read.csv("../example_data/OR1000_binary.csv", header = TRUE))
print(head(data))
          C          D         A         B Y
1 0.9503997 0.81028102 0.6249662 0.1931601 1
2 0.1481077 0.50052685 0.1527322 0.8670829 0
3 0.8151770 0.65770170 0.3031337 0.6836050 1
4 0.8049381 0.07865007 0.6084361 0.5317733 0
5 0.8960193 0.27124144 0.5090235 0.4332590 0
6 0.1509037 0.79736172 0.1772568 0.5100828 0

# train RBBR
trained_model     <- rbbr_train(data, max_feature = 2, mode = "2L", slope = 6, penalty = NA, weight_threshold = NA, balancing = "False", num_cores = NA)
training process started with  8  computing cores
  |====================| 100%

# print the Boolean rules with best BICs
print(head(trained_model$boolean_rules_sorted))
                  Boolean_Rule                R2       BIC Input_Size Index Features Active_Conjunctions Weights Layer1, Sub-Rule1
1  [OR([AND(C,D)],[NOR(A,B)])] 0.708158226534958 -2545.354        2.2     5  C.D.A.B                   5     0.92:-0.35:-0.35:-0.2
2 [AND([OR(C,~A)],[OR(D,~B)])] 0.487064507428149 -1981.415        2.2     8  C.A.D.B                   7      0.11:-0.65:0.42:0.12
3 [AND([OR(C,~B)],[OR(D,~A)])] 0.480488233276991 -1968.676        2.2    10  C.B.D.A                   7      0.13:-0.65:0.37:0.17
4         [OR([D],[NOR(A,B)])] 0.456845016519437 -1935.956        1.2    12    D.A.B                   5                 0.31:-0.3
5        [OR([~A],[AND(C,D)])] 0.429102528682834 -1886.141        1.2    13    A.C.D                   5                -0.28:0.26
6        [OR([~B],[AND(C,D)])] 0.426923033229411 -1882.331        1.2    19    B.C.D                   5                -0.29:0.27
  Weights Layer1, Sub-Rule2         Weights Layer2
1    -0.17:-0.42:-0.35:0.91    0.55:0.53:0.5:-1.04
2      0.12:-0.65:0.47:0.08 1.37:-0.23:-0.25:-0.51
3      0.11:-0.65:0.51:0.05  1.37:-0.27:-0.23:-0.5
4    -0.17:-0.42:-0.35:0.91    0.6:0.53:0.27:-0.92
5     0.92:-0.35:-0.35:-0.2    0.6:0.49:0.28:-0.95
6     0.92:-0.35:-0.35:-0.2    0.56:0.52:0.3:-0.91

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published