publications | Weifan Jiang

2021

HotNets’21

Towards a Traffic Map of the Internet: Connecting the Dots between Popular Services and Users

Thomas Koch, Weifan Jiang, Tao Luo, Petros Gigis, Yunfan Zhang, Kevin Vermeulen, Emile Aben, Matt Calder, Ethan Katz-Bassett, Lefteris Manassakis, Georgios Smaragdakis, and Narseo Vallina-Rodriguez

In Proceedings of the Twentieth ACM Workshop on Hot Topics in Networks 2021

Abs PDF

The impact of Internet phenomena depends on how they impact users, but researchers lack visibility into how to translate Internet events into their impact. Distressingly, the research community seems to have lost hope of obtaining this information without relying on privileged viewpoints. We argue for optimism thanks to new network measurement methods and changes in Internet structure which make it possible to construct an "Internet traffic map". This map would identify the locations of users and major services, the paths between them, and the relative activity levels routed along these paths. We sketch our vision for the map, detail new measurement ideas for map construction, and identify key challenges that the research community should tackle. The realization of an Internet traffic map will be an Internet-scale research effort with Internet-scale impacts that reach far beyond the research community, and so we hope our fellow researchers are excited to join us in addressing this challenge.
IMC’21

Towards Identifying Networks with Internet Clients Using Public Data

Weifan Jiang, Tao Luo, Thomas Koch, Yunfan Zhang, Ethan Katz-Bassett, and Matt Calder

In Proceedings of the 21st ACM Internet Measurement Conference 2021

Abs PDF Slides Talk

Does an outage impact any users? Can a geolocation database known to be good at locating users and bad at infrastructure be trusted for a particular prefix? Is a content-heavy network likely to peer with a particular network? For these questions and many more, knowing which prefixes contain Internet users aids in interpreting Internet analysis. However, existing datasets of Internet activity are out of date, unvalidated, based on privileged data, or too coarse. As a step towards identifying which IP prefixes contain users, we present multiple novel techniques to identify which IP prefixes host web clients without relying on privileged data. Our techniques identify client activity in ASes responsible for 98.8% of Microsoft CDN traffic and in prefixes responsible for 95.2% of Microsoft CDN traffic. Less than 1% of prefixes identified by our technique as active do not contact Microsoft at all. We present measurements of Internet usage worldwide and sketch future directions for extending the techniques to measure relative activity levels across prefixes.
Security’21

Cost-Aware Robust Tree Ensembles for Security Applications

Yizheng Chen, Shiqi Wang, Weifan Jiang, Asaf Cidon, and Suman Jana

In 30th USENIX Security Symposium (USENIX Security 21) Aug 2021

Abs PDF

There are various costs for attackers to manipulate the features of security classifiers. The costs are asymmetric across features and to the directions of changes, which cannot be precisely captured by existing cost models based on Lp-norm robustness. In this paper, we utilize such domain knowledge to increase the attack cost of evading classifiers, specifically, tree ensemble models that are widely used by security tasks. We propose a new cost modeling method to capture the feature manipulation cost as constraint, and then we integrate the cost-driven constraint into the node construction process to train robust tree ensembles. During the training process, we use the constraint to find data points that are likely to be perturbed given the feature manipulation cost, and we use a new robust training algorithm to optimize the quality of the trees. Our cost-aware training method can be applied to different types of tree ensembles, including gradient boosted decision trees and random forest models. Using Twitter spam detection as the case study, our evaluation results show that we can increase the attack cost by 10.6⇥ compared to the baseline. Moreover, our robust training method using cost-driven constraint can achieve higher accuracy, lower false positive rate, and stronger cost-aware robustness than the state-of-theart training method using L•-norm cost model. Our code is available at https://github.com/surrealyz/growtrees.

2020

IMC’20

Cloud Provider Connectivity in the Flat Internet

Todd Arnold, Jia He, Weifan Jiang, Matt Calder, Italo Cunha, Vasileios Giotsas, and Ethan Katz-Bassett

In Proceedings of the ACM Internet Measurement Conference Aug 2020

Abs PDF

The Tier-1 ISPs have been considered the Internet’s backbone since the dawn of the modern Internet 30 years ago, as they guarantee global reachability. However, their influence and importance are waning as Internet flattening decreases the demand for transit services and increases the importance of private interconnections. Conversely, major cloud providers – Amazon, Google, IBM, and Microsoft– are gaining in importance as more services are hosted on their infrastructures. They ardently support Internet flattening and are rapidly expanding their global footprints, which enables them to bypass the Tier-1 ISPs and other large transit providers to reach many destinations.In this paper we seek to quantify the extent to which the cloud providers’ can bypass the Tier-1 ISPs and other large transit providers. We conduct comprehensive measurements to identify the neighbor networks of the major cloud providers and combine them with AS relationship inferences to model the Internet’s AS-level topology to calculate a new metric, hierarchy-free reachability, which characterizes the reachability a network can achieve without traversing the networks of the Tier-1 and Tier-2 ISPs. We show that the cloud providers are able to reach over 76% of the Internet without traversing the Tier-1 and Tier-2 ISPs, more than virtually every other network.
N2Women’20

Poster: Footprint and Performance of Large Cloud Networks

Jia He, Weifan Jiang, Ege Gürmeriçliler, Georgia Essig, Arpit Gupta, Matt Calder, Vasileios Giotsas, Italo Cunha, Ethan Katz-Bassett, and Todd Arnold

In ACM SIGCOMM 2020 Networking Networking Women Professional Development Workshop Aug 2020

Abs PDF Poster

The Tier-1 and Tier-2 transit providers have historically been considered the backbone of the Internet as they guarantee global reachability. In recent years, Internet flattening has reduced the need for transit providers, an effect greatly contributed to by the top cloud providers, such as Google, Amazon, Microsoft, and IBM. Recently, these cloud providers started offering two performance tiers for routing traffic. One tier, referred to as "Premium Tier" (PT), the Google-specific term, uses the cloud provider’s private network as much as possible, while "Standard Tier" (ST), uses the public Internet as much as possible. Through analysis of measurements made to gather performance and connectivity data, we find that the cloud provider networks’ points-of-presence (PoPs) tend to be deployed closer to population centers than the transit providers’ PoPs. We also find that the performance improvement from PT service is dependent on variables such as the ST/PT path length difference. These metrics demonstrate how cloud providers connect within the Internet, and what benefits their private networks provide to users.
GHTC’20

Irrigation Detection by Car: Computer Vision and Sensing for the Detection and Geolocation of Irrigated and Non-irrigated Farmland

Weifan Jiang, Vivek Kumar, Nikhil Mehta, Jack Bott, and Vijay Modi

In 2020 IEEE Global Humanitarian Technology Conference (GHTC) Aug 2020

Abs PDF

Irrigation can greatly increase the income of smallholder farmers in sub-Saharan Africa. By providing information about current irrigation utilization, or lack thereof, we seek to encourage investment in irrigation systems and their supporting infrastructure. In this paper, we describe the design, prototyping, and testing of a novel, cost-effective, and reliable computer vision system that is capable of locating irrigated plots at scale. Our system will be mounted to a vehicle and record the depth of objects in the camera’s view while the vehicle is in motion. The GPS coordinates of objects are computed based on estimated depth, vehicle coordinates, and orientation, available from included sensors. We tested our prototype on objects at various distances from the system and achieved feasible accuracy with acceptable error in the estimated depth. In the future, we hope to deploy the system in parts of sub-Saharan Africa, to detect and geolocate irrigated agricultural plots during the dry season. Then we plan to use that collected data to inform and train machine learning models that use remote sensing and satellite imagery.