Virtual Device Farms for Mobile App Testing at Scale: A Pursuit for Fidelity, Efficiency, and Accessibility

About this Study
Artifact Release
- Data Format
Failure Discrepancy Report

About this Study

This paper conducts a systematic and rigorous study on emulation-based mobile app testing as to the fidelity of test results, as motivated by its huge advantages and side effects compared to testing apps on physical devices. Leveraging a custom-built virtualized testing infrastructure with its physical counterpart at scale, we identify the key aspects contributing to test result discrepancies to be specific system add-ons and corrupted regional ecosystems, rather than commonly believed factors such as heterogeneous hardware and general customizations. These findings lead to practical solutions that boost the testing fidelity by effectively managing conflicts among stakeholders at both the emulator and app levels. We hope that our infrastructure, experiences, and enhancements will foster a more viable mobile ecosystem by making app testing more accurate, affordable, and scalable.

Artifact Release

We have released the anonymized failure data associated with our paper in this github repository.

The anonymized failure data are collected from our physical and device farms over a three-month period. The failure data involves 5,918 physical devices as well as 5,918 virtualized devices running on ARM commodity servers.

Data Format

The data file is organized in .csv format. Each row represents a single failure scene, and detailed information (i.e. call stacks, device information) about the scenes is provided, in the format described in the table below.

Column	Description	Example
type	A number that labels failure type. Failures that belong to the same type have the same number.	1
error	The triggered exception/signal of the failure	java.lang.NullPointerException
reason	The descriptive message printed after the error	must not be null
stack_frame	The call stack of the failure	[{‘file’: ‘app.java’, ‘method’: ‘badMethod()’, ‘line_number’: ‘10’}]
thread_name	The name of the thread at fault	thread-1
failure_time	The unix timestamp at which the failure occurs, in seconds	1640966505.0
app_id	The id of the failing app. They correspond to Table 1 of our paper.	1
app_version	The version of the app, denoted by the date they are tested in our device farm.	2022-01-01
device_brand	The brand of the failing device. For virtualized devices this is the brand of its physical device pair.	samsung
device_model	The device model of the failing device. The model for our virtualized devices is ‘virt’.	samsung-model-1
android_version	The android version of the device.	10.0

Failure Discrepancy Report

We have reported the root causes and solutions of the failure discrepancies mentioned in our paper to all the corresponding stakeholders, including phone vendors (e.g., Huawei, Honor, and Meizu) and hardware manufacturers (e.g., MediaTek). The complete list of our reported failures is provided below.

Index	Stakeholder	Description	Current State
1	Huawei	Integer overflow during implicit conversions	Confirmed & Fixed
2	Meizu	Improper null-terminations of C/C++ strings in vendor modules	Confirmed & Fixed
3	Honor	Integer overflow during implicit conversions	Confirmed & Fixed
4	Huawei, Xiaomi	Deadlock in system server when accessing local media files	Confirmed & Fixed
5	Smartisan	Incorrect handling of page faults in the FUSE filesystem	Confirmed & Fixed
6	MediaTek	Errors in MediaTek’s GPU drivers	Confirmed
7	Samsung	Array index out of bounds in vendor modules	Confirmed
8	Google	Graphics resource format inconsistency	Confirmed & Fixed

Table of Contents

About this Study

Artifact Release

Data Format

Failure Discrepancy Report