Skip to the content.

license license License: GPL v3

Table of Contents

About this Study

This paper conducts a systematic and rigorous study on emulation-based mobile app testing as to the fidelity of test results, as motivated by its huge advantages and side effects compared to testing apps on physical devices. Leveraging a custom-built virtualized testing infrastructure with its physical counterpart at scale, we identify the key aspects contributing to test result discrepancies to be specific system add-ons and corrupted regional ecosystems, rather than commonly believed factors such as heterogeneous hardware and general customizations. These findings lead to practical solutions that boost the testing fidelity by effectively managing conflicts among stakeholders at both the emulator and app levels. We hope that our infrastructure, experiences, and enhancements will foster a more viable mobile ecosystem by making app testing more accurate, affordable, and scalable.

Artifact Release

We have released the anonymized failure data associated with our paper in this github repository.

The anonymized failure data are collected from our physical and device farms over a three-month period. The failure data involves 5,918 physical devices as well as 5,918 virtualized devices running on ARM commodity servers.

Data Format

The data file is organized in .csv format. Each row represents a single failure scene, and detailed information (i.e. call stacks, device information) about the scenes is provided, in the format described in the table below.

Column Description Example
type A number that labels failure type. Failures that belong to the same type have the same number. 1
error The triggered exception/signal of the failure java.lang.NullPointerException
reason The descriptive message printed after the error must not be null
stack_frame The call stack of the failure [{‘file’: ‘app.java’, ‘method’: ‘badMethod()’, ‘line_number’: ‘10’}]
thread_name The name of the thread at fault thread-1
failure_time The unix timestamp at which the failure occurs, in seconds 1640966505.0
app_id The id of the failing app. They correspond to Table 1 of our paper. 1
app_version The version of the app, denoted by the date they are tested in our device farm. 2022-01-01
device_brand The brand of the failing device. For virtualized devices this is the brand of its physical device pair. samsung
device_model The device model of the failing device. The model for our virtualized devices is ‘virt’. samsung-model-1
android_version The android version of the device. 10.0

Failure Discrepancy Report

We have reported the root causes and solutions of the failure discrepancies mentioned in our paper to all the corresponding stakeholders, including phone vendors (e.g., Huawei, Honor, and Meizu) and hardware manufacturers (e.g., MediaTek). The complete list of our reported failures is provided below.

Index Stakeholder Description Current State
1 Huawei Integer overflow during implicit conversions Confirmed & Fixed
2 Meizu Improper null-terminations of C/C++ strings in vendor modules Confirmed & Fixed
3 Honor Integer overflow during implicit conversions Confirmed & Fixed
4 Huawei, Xiaomi Deadlock in system server when accessing local media files Confirmed & Fixed
5 Smartisan Incorrect handling of page faults in the FUSE filesystem Confirmed & Fixed
6 MediaTek Errors in MediaTek’s GPU drivers Confirmed
7 Samsung Array index out of bounds in vendor modules Confirmed
8 Google Graphics resource format inconsistency Confirmed & Fixed