AndroFL: Evolutionary-Driven Fault Localization for Android Apps

AndroFL is a comprehensive framework for automated test suite generation and fault localization in Android applications. This artifact evaluation guide provides instructions for running AndroFL using our pre-configured Docker environment.

AndroFL: The command line for deployment/usage:

usage: androFL.py [-h] [--platformVersion PLATFORMVERSION] [--deviceName DEVICENAME] [--udid UDID] [--apkPath APKPATH] --appiumPort APPIUMPORT --output OUTPUT --config CONFIG

Main script to run the AndroFL Tool

optional arguments:
  -h, --help                    show this help message and exit
  --platformVersion, -v         Android platform version (e.g., 10)
  --deviceName, -d              Name of the device (e.g Emulators:"device_1" or Real Device:"Redmi 12C")
  --udid, -i                    Unique device ID (Emulators:"emulator-5554" or Real Device:"Redmi 12C"), use adb devices to get the device id
  --apkPath, -a                 Path to the APK file
  --appiumPort, -p              Port number of the Appium server (e.g., 4723)
  --output, -o                  Directory to store the results and metadata
  --config, -c                  Path to the configuration file (JSON format) which contains user parameters and algorithm settings

Implementation details

AndroFL/
├── androFL.py              # Main entry point
├── framework/              # Core framework components
│   ├── algorithms/         # Test Suite generation algorithms
│   │   └── ga/             # Genetic algorithm implementation
│   ├── device/             # Device management modules
│   ├── configurations/     # Configurations of tool
│   └── utils/              # Utility functions
├── faultlocalizer/         # Fault localization implementation
├── benchmarks/             # Test applications
├──hookApp.py               # script to instrument methods and app state monitoring
├──run.sh                   # utility script ot run Frida parallel
├──script.js                # javascript that Frida will inject into apps
├──scripts                  # extra utility scripts fo analysis and testing 
└── configs/                # parameter Configurations used in paper.

1. Instructions for Artifact Evaluation

To facilitate artifact evaluation, we provide a pre-configured Docker image that encapsulates all necessary dependencies, including the Android emulator, Appium server, Frida, and required Python packages. This setup ensures a streamlined and minimal-effort execution of AndroFL.

Note: Comprehensive instructions for native installation will be made available upon the paper’s acceptance. We appreciate your understanding and patience.

Prerequisite

Ensure that virtualization technology (VT-x/AMD-V) is enabled in your system BIOS. If it is not enabled, refer to this guide
install qemu-kvm in your system if not installed using this link. After installation, verify successful setup using the kvm-ok command
A system needs at least 4 cores (more prefered), 16GB memory and 50GB of storage to run the experiments
Docker must be installed on your system. Installation instructions can be found here. Additionally, ensure Docker has appropriate user permissions. Either: Add your user to the docker group for passwordless access Or, use sudo with every Docker command (e.g., sudo docker run ...)

Docker Setup for Artifact Evaluation

We provide a pre-configured Docker image (androfl_magisk_image.tar) with all dependencies installed and an Android emulator (device_backup) ready for experiments. The container also includes a web-based GUI interface, allowing users to interact with the emulator directly through a browser (Chrome, Firefox) for convenience and ease of use.

Load the docker image into system

docker load -i ./androfl_magisk_image.tar

On completion run the beloe command to verify the container is loaded into the system:

docker images

It will show you a docker container with name androflmagisk

Starting the Container

docker run --device=/dev/kvm  -ti -p 6901:6901 -p 5901:5901 -e VNC_RESOLUTION=1920x1080 -v ./:/headless/androfl/ androflmagisk

In terminal it will show something like this:

------------------ update chromium-browser.init ------------------

... set window size 1920 x 1080 as chrome window size!
.
.
.
------------------ VNC environment started ------------------

VNCSERVER started on DISPLAY= :1 
        => connect via VNC viewer with 172.17.0.2:5901

noVNC HTML client started:
        => connect via http://172.17.0.2:6901/?password=...

Your container is now running, and the current working directory on your host machine is mounted inside the Docker container. All experimental data generated will be stored directly on your system.

Note: Do not close the terminal running the container, as this will terminate the container and halt the experiment.

Open any web browser and navigate to the following URL:

http://172.17.0.2:6901/?password=vncpassword

This will launch the Docker container's graphical environment via a VNC-based web interface, allowing you to interact with the Android emulator and associated tools through your browser.

The mounted drive can be access via file-manager, as shown below:

You are now ready to run our tool within the Docker-based WebView environment. Simply interact with the interface through the browser to begin your experiments.

Getting Started

Run a quick test to get familiar with AndroFL and verify that the setup is functioning correctly.

Step 1:
Open a terminal within the WebView environment and navigate to the AndroFL source code directory by executing the following command:

cd /headless/androfl/

Step-2. run Random Testing (Monkey) on benchmark app 4.apk

python androFL.py -v 11 -d device_backup -i emulator-5554 -a ./benchmarks/4.apk -p 4723 -o ./outputs_dummy/ -c ./configs/demo_config_random.json

Here:

python androFL.py: Executes the main script androFL.py, the entry point of  AndroFL framework.

-v 11 :  SDK version: This specifies the Android API level (in this case, 11).

-d device_backup : Emulator name, our container contains this single emulator

-i emulator-5554: This is the emulator ID to run the experiments on. emulator-5554 is the default ID of the first running Android emulator. You can use other number of last two digit like emulator-5556, emulator-5558, emulator-5560 ... 

-a ./benchmarks/4.apk : Specifies the target APK file for testing. In this case, it points to the 4.apk inside the benchmarks folder.

-p 4723 : Port number for the Appium server. 4723 is the default Appium port. This is needed to enable UI automation.

-o ./outputs_dummy/ : Output directory: Directory where the tool should store its outputs. For this case outputs will be stored in outputs_dummy folder.

-c ./configs/demo_config_random.json :  JSON config file specifying how the tool should operate (e.g., testing strategy, SBFL formula, instrumentation details, etc.).

Expected Results

Once the emulator starts, AndroFL will automatically begin testing the 4.apk after a few seconds. The main terminal will display the current status and progress of the experiment.

Additionally, a secondary terminal window will open to monitor internal processes, such as method instrumentation, active monitoring, and lifecycle events (e.g., process started or killed).

Upon successful completion of the experiment, the main terminal running androFL will display a success message indicating that the execution has finished.

Check Output Files

cd ./outputs_dummy/, you will Find Quick_Calculation_results_X, the highest 'X' value will be the lastest result. for example for first run it will be Quick_Calculation_results_1 inside it you will many files and folder as shown below. If you find ranklist.txt file present indicating tool is running without any issue.

activation_matrix.txt  
app_info.json  
current_exp_setup.json  
dexFiles  
ranklist.txt  
results  
spectrum.txt  
textFiles

1. Script to Replicate Full Experiments from Scratch

Replicating the full set of experiments can be extremely time-consuming on a single device. Each benchmark typically takes between 2–10 hours per run.

On our system (Intel® Xeon® Silver 4108 CPU @ 1.80GHz, 16 cores, 128 GB RAM), the complete experimentation process required approximately 22,550 CPU hours.

To initiate the full experiment from scratch, run the following script:

./experiment_paper.sh ./outputs_new/

Results

Upon completion, the script will generate an output directory:

Within this directory, you will find two subdirectories:

./outputs_androfl/
./outputs_random/

Each of these folders contains results from 20 applications, with 3 independent runs per application (totaling 60 folders per method):

These outputs correspond to the full experimental results reported in the paper.

2. Script to Replicate a Subset of Benchmarks

We also provide a script to replicate the experiments on a subset of benchmarks used in the paper.
Note that this script runs a total of 2 × |Apps| × 3 experiments:

2 methods: {AndroFL, Baseline}
|Apps|: Number of selected APKs
3: Independent runs per app-method pair

Usage:

./experiment_paper_subset.sh ./outputs_subset/ ./benchmarks/4.apk ./benchmarks/liquid_buggy.apk

It will create a output directory ./outputs_subset/ under which It will have two sub-directiories: 1. ./outputs_androfl/ and 2. ./outputs_baseline/ each of these folders will have 20x3 = 60 folders containing 3 different runs for all 20 applications used in paper.

3. Script To generate plots from submitted artifact data (added in the paper).

python ./scripts/genReport.py -i ./outputs/data_androfl/ -b ./outputs/data_baseline/ -g ground_truth_paper.json  -r 3 -o ./paperReport/

4. Script To generate plots from new data.

This script expect a ground truth json that only contains apk's information for which experiment conducted. For best practice duplicate ground_truth_paper.json, remove the entries of those apks that are not included in the experiment.

python ./scripts/genReport.py -i [data_androfl] -b [data_baseline] -g [cutom_ground_truth]  -r 3 -o ./newReport/

Deep Dive into AndroFL Usability

Basic Usage

This section provides a detailed walkthrough of AndroFL's core functionality and how to use it effectively for fault localization in Android applications. The basic usage mode enables users to run the tool on individual APKs with minimal configuration effort.

We assume the environment is already set up (either via Docker or native installation), and the required emulator may be running.

python androFL.py [options]

Required Parameters

-v, --platformVersion: Android version (e.g., "11")
-d, --deviceName: Device name (e.g., "device_1" or "Pixel_4")
-i, --udid: Device unique identifier (e.g., "emulator-5554")
-a, --apkPath: Path to the APK file
-p, --appiumPort: Appium server port (e.g., "4723")
-o, --output: Custom output directory to store results
-c, --config: Configurations using which tool will run on given apk file

python androFL.py -v [android_version] -d [device_name] -i [device_unique_id] -a [apk_path] -p [appium_port] -o [output_folder] -c [configuration]

Examples

Running with Genetic Algorithm using a demo configuration (for testing only):

python androFL.py -v 11 -d device_backup -i emulator-5554 -a ./benchmarks/4.apk -p 4723 -o ./outputs_ga/ -c ./configs/demo_config_ga.json

Running with Random Testing using a demo configuration (for testing only):

python androFL.py -v 11 -d device_backup -i emulator-5554 -a ./benchmarks/4.apk -p 4723 -o ./outputs_random/ -c ./configs/demo_config_random.json

For example Ranked List of liquid_buggy.apk

org.eukalyptus.liquidrechner.MainActivity$1.onClick:1
org.eukalyptus.liquidrechner.MainActivity.aromarechner:9
org.eukalyptus.liquidrechner.MainActivity.berechneBasis:9
org.eukalyptus.liquidrechner.MainActivity.hideSoftKeyboard:9
org.eukalyptus.liquidrechner.MainActivity.liquidsMischen:9
org.eukalyptus.liquidrechner.MainActivity.nikotinerhoehen:9
org.eukalyptus.liquidrechner.MainActivity.shakeandvape:9
org.eukalyptus.liquidrechner.MainActivity.verduennen:9
org.eukalyptus.liquidrechner.MainActivity$2.onTouch:9
org.eukalyptus.liquidrechner.ui.liquidsmischen.LiquidsMischenFragment.onCreateView:10
org.eukalyptus.liquidrechner.MainActivity.onCreateOptionsMenu:12
org.eukalyptus.liquidrechner.MainActivity.onSupportNavigateUp:12
org.eukalyptus.liquidrechner.ui.shakeandvape.ShakeandvapeFragment.onCreateView:13
org.eukalyptus.liquidrechner.ui.verduennungsrechner.VerduennungFragment.onCreateView:14
org.eukalyptus.liquidrechner.ui.about.AboutFragment.onCreateView:16
org.eukalyptus.liquidrechner.ui.basisrechner.BasisrechnerFragment.onCreateView:16
org.eukalyptus.liquidrechner.MainActivity.onCreate:20
org.eukalyptus.liquidrechner.MainActivity.setupUI:20
org.eukalyptus.liquidrechner.ui.NikotinErhoehen.NikotinErhoehenFragment.onCreateView:20
org.eukalyptus.liquidrechner.ui.aromarechner.AromarechnerFragment.onCreateView:20

Output Directory Structure

output_dir/
├── app_name_results_iter/
│   ├── dexFiles/       # Decompiled APK files
│   ├── textFiles/      # Test execution logs
│   |── results/        # Intermediate results, testsuite as pickle file and other metadata
|   |── activation_maxtix.txt # acitivity matrix
|   |── spectrum.txt    # spectrum of generated optimized test suite.
|   └── ranklist.txt    # A ranked list as output that will be used by developers to find the fault in code.

Configuration Templates

1. Genetic Algorithm Configuration using Emulator

the values shown as XXXXX are configurable parameters for genetic algorithm, add your own parameters and save this json file as user_config.json then call androFL.py with -c user_config.json

python androFL.py -v [android_version] -d [device_name] -i [device_unique_id] -a [apk_path] -p [appium_port] -o [output_folder] -c user_config.json

{
  "device_type": "emulator",
  "algorithm_to_use": "ga",
  "ga": {
    "parameters": {
      "test_case_sequence_length": XXXXX,
      "population_size": XXXXX,
      "selBest": XXXXX,
      "cx_prob": XXXXX,
      "mut_prob": XXXXX,
      "ngen": XXXXX
    },
    "fitness_function": "XXXXX"
  },
  "FL_metric": "XXXXX",
  "desired_capabilities": {
    "platformName": "Android",
    "platformVersion": "XXXXX",
    "deviceName": null,
    "noReset": true,
    "udid": null,
    "apkPath": null,
    "newCommandTimeout": 3600,
    "appWaitForLaunch": true,
    "relaxedSecurity": true,
    "allowInsecure": true,
    "adbExecTimeout": 50000,
    "dex": null,
    "classes": null
  }
}

2. Random Testing Configuration using Emulator

Similiarly, the values shown as XXXXX are configurable parameters for random testing, add your own parameters and save this json file as user_config.json then call androFL.py with -c user_config.json

{
  "device_type": "emulator",
  "algorithm_to_use": "random",
  "random": {
    "parameters": {
      "test_case_sequence_length": XXXXX,
      "test_suite_length": XXXXX
    }
  },
  "FL_metric": XXXXX,
  "desired_capabilities": {
    "platformName": "Android",
    "platformVersion": XXXXX,
    "deviceName": null,
    "noReset": true,
    "udid": null,
    "apkPath": null,
    "newCommandTimeout": 3600,
    "appWaitForLaunch": true,
    "relaxedSecurity": true,
    "allowInsecure": true,
    "adbExecTimeout": 50000,
    "dex": null,
    "classes": null
  }
}

Extensibility

AndroFL provides a flexible interface for implementing custom fitness functions for test generation and custom metrics for fault localization.

Custom Fitness Functions

Edit framework/algorithms/ga/user_fitness_functions.py to implement your custom fitness functions:

# user_fitness_functions.py
import numpy as np
from .testsuiteFitness import register_fitness_function

@register_fitness_function("CustomFitness")
def custom_fitness(individual):
    """User-defined fitness function example"""
    activity = individual.individual
    Activity = np.array(activity, dtype=int)
    
    # Custom logic: Maximize unique coverage patterns
    unique_patterns = len(set(tuple(row) for row in Activity.T))
    return unique_patterns / Activity.shape[1]

Available Properties in Individual Object:

individual.individual: Activity matrix representing test execution
individual.fitness: Current fitness value
individual.testcase: List of test cases
individual.fitness_valid: Boolean indicating fitness validity

Using Custom Fitness Function:

Place your implementation in framework_v2/algorithms/ga/user_fitness_functions.py
Import and register your function using the @register_fitness_function decorator
Use in configuration by specifying the registered name:

{
    "algorithm": "ga",
    "ga":{
        "fitness_function": "CustomFitness",
        ... 
    }
...
}

Custom Fault Localization Metrics

Edit faultlocalizer/user_metrics.py to implement your custom FL metrics:

# user_metrics.py
from faultlocalizer.metrics import register_metric

@register_metric("myCustom_metric")
def custom_metric(metrics_obj):
    """User-defined FL metric example"""
    # Custom logic using metrics_obj counts
    numerator = metrics_obj.Cf
    denominator = metrics_obj.Cf + metrics_obj.Cp + metrics_obj.Nf + metrics_obj.Np

    if denominator == 0:
        return 0
    score = numerator / denominator
    return score

Available Metric Properties:

metrics_obj.Cf: Count of failed tests covering the component
metrics_obj.Cp: Count of passed tests covering the component
metrics_obj.Nf: Count of failed tests not covering the component
metrics_obj.Np: Count of passed tests not covering the component

Using Custom FL Metrics:

Place your implementation in faultlocalizer/user_metrics.py
Import and register your metric using the @register_metric decorator
Use in configuration or command line:

{
    ...
    "parameters": {
        ...
    }
    "FL_metric": "myCustom_metric",
    ....
}

Built-in Metrics and Functions

Available Fitness Functions:

Ulysis: Default fitness function based on ambiguity groups
Coverage: Basic coverage-based fitness calculation

Available FL Metrics:

Ochiai: Default FL metric
Tarantula: Alternative suspiciousness calculation
DStar: D* fault localization metric and others 20+ metrics

Contributing

Contributions are welcome! Please feel free to submit pull requests or create issues for bugs and feature requests.

License

This work is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
faultlocalizer		faultlocalizer
framework		framework
images		images
rootAVD		rootAVD
scripts		scripts
templates		templates
LICENSE		LICENSE
README.md		README.md
androFL.py		androFL.py
experiment_paper_full.sh		experiment_paper_full.sh
experiment_paper_subset.sh		experiment_paper_subset.sh
ground_truth_paper.json		ground_truth_paper.json
hookApp.py		hookApp.py
requirements.txt		requirements.txt
run.sh		run.sh
script.js		script.js

License

PRAISE-group/AndroFL

Folders and files

Latest commit

History

Repository files navigation

AndroFL: Evolutionary-Driven Fault Localization for Android Apps

AndroFL: The command line for deployment/usage:

Implementation details

1. Instructions for Artifact Evaluation

Prerequisite

Docker Setup for Artifact Evaluation

Load the docker image into system

Starting the Container

Getting Started

Expected Results

Check Output Files

1. Script to Replicate Full Experiments from Scratch

Results

2. Script to Replicate a Subset of Benchmarks

Usage:

3. Script To generate plots from submitted artifact data (added in the paper).

4. Script To generate plots from new data.

Deep Dive into AndroFL Usability

Basic Usage

Required Parameters

Examples

For example Ranked List of liquid_buggy.apk

Output Directory Structure

Configuration Templates

1. Genetic Algorithm Configuration using Emulator

2. Random Testing Configuration using Emulator

Extensibility

Custom Fitness Functions

Available Properties in Individual Object:

Using Custom Fitness Function:

Custom Fault Localization Metrics

Available Metric Properties:

Using Custom FL Metrics:

Built-in Metrics and Functions

Available Fitness Functions:

Available FL Metrics:

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages