Home

 

Toronto Systems Workshop
October 31st, 2019
Bahen Centre for Information Technology
40 St. George Street, Suite 5205

This is a free event open to all; however, we ask that you please register to help us plan better.

8:30-9:00am Breakfast
9:00-10:30am
Are We Susceptible to Rowhammer? An End-to-End Methodology for Cloud Providers
Stefan Saroiu (Microsoft Research)

Cloud providers are nervous about recent research showing how Rowhammer attacks affect many types of DRAM including DDR4 and ECC-equipped DRAM. Unfortunately, cloud providers lack a systematic way to test the DRAM present in their servers for the threat of a Rowhammer attack. Building such a methodology needs to overcome two difficult challenges: (1) devising a CPU instruction sequence that maximizes the rate of DRAM row activations on a given system, and (2) determining the adjacency of rows internal to DRAM. This talk will present an end-to-end methodology that overcomes these challenges to determine if cloud servers are susceptible to Rowhammer attacks. With our methodology, a cloud provider can construct worst-case testing conditions for DRAM.

We used our methodology to create worst-case DRAM testing conditions on the hardware used by a major cloud provider for a recent generation of its servers. Our findings show that none of the instruction sequences used in prior work to mount Rowhammer attacks create worst-case DRAM testing conditions. Instead, we construct an instruction sequence that issues non-explicit load and store instructions. Our new sequence leverages microarchitectural side-effects to ``hammer'' DRAM at a near-optimal rate on modern Skylake platforms. We also designed a DDR4 fault injector capable of reverse engineering row adjacency inside a DRAM device. When applied to our cloud provider's DIMMs, we find that rows inside DDR4 DRAM devices do not always follow a linear map.

Joint work with Lucian Cojocar (VU Amsterdam), Jeremie Kim, Minesh Patel, Onur Mutlu (ETH Zurich), Lily Tsai (MIT), and Alec Wolman (MSR)

Stefan Saroiu is a researcher in the Mobility and Networking Research group at Microsoft Research (MSR) in Redmond. Stefan's research interests span many aspects of systems and networks although his most recent work focuses on systems security. Stefan takes his work beyond publishing results. With his colleagues at MSR, he designed and built (1) the reference implementation of a software-based Trusted Platform Module (TPM) used in millions of smartphones and tablets, and (2) Microsoft Embedded Social, a cloud service aimed at user engagement in mobile apps that has 20 million users. Before joining MSR in 2008, Stefan spent three years as an Assistant Professor at the University of Toronto, and four months at Amazon.com as a visiting researcher where he worked on the early designs of their new shopping cart system (aka Dynamo). Stefan is an ACM Distinguished Member.

BurScale: Using Burstable Instances for Cost-Effective Autoscaling in the Public Cloud
Timothy Zhu (PennState)

Cloud providers have recently introduced burstable instances - virtual machines whose CPU capacity is rate limited by token-bucket mechanisms. A user of a burstable instance is able to burst to a much higher resource capacity ("peak rate") than the instance's long-term average capacity ("sustained rate"), provided the bursts are short and infrequent. A burstable instance tends to be much cheaper than a conventional instance that is always provisioned for the peak rate. Consequently, cloud providers advertise burstable instances as cost-effective options for customers with intermittent needs and small (e.g., single VM) clusters. By contrast, this talk presents two novel usage scenarios for burstable instances in larger clusters with sustained usage. We demonstrate (i) how burstable instances can be utilized alongside conventional instances to handle the transient queueing arising from variability in traffic, and (ii) how burstable instances can mask the VM startup/warmup time when autoscaling to handle flash crowds. We implement our ideas in a system called BurScale and use it to demonstrate cost-effective autoscaling for two important workloads: (i) a stateless web server cluster, and (ii) a stateful Memcached caching cluster. Results from our prototype system show that via its careful combination of burstable and regular instances, BurScale can ensure similar application performance as traditional autoscaling systems that use all regular instances while reducing cost by up to 50%Cloud providers have recently introduced burstable instances - virtual machines whose CPU capacity is rate limited by token-bucket mechanisms. A user of a burstable instance is able to burst to a much higher resource capacity ("peak rate") than the instance's long-term average capacity ("sustained rate"), provided the bursts are short and infrequent. A burstable instance tends to be much cheaper than a conventional instance that is always provisioned for the peak rate. Consequently, cloud providers advertise burstable instances as cost-effective options for customers with intermittent needs and small (e.g., single VM) clusters. By contrast, this talk presents two novel usage scenarios for burstable instances in larger clusters with sustained usage. We demonstrate (i) how burstable instances can be utilized alongside conventional instances to handle the transient queueing arising from variability in traffic, and (ii) how burstable instances can mask the VM startup/warmup time when autoscaling to handle flash crowds. We implement our ideas in a system called BurScale and use it to demonstrate cost-effective autoscaling for two important workloads: (i) a stateless web server cluster, and (ii) a stateful Memcached caching cluster. Results from our prototype system show that via its careful combination of burstable and regular instances, BurScale can ensure similar application performance as traditional autoscaling systems that use all regular instances while reducing cost by up to 50%.

Timothy Zhu is an assistant professor in computer science and engineering at Penn State. He received both his B.S. and Ph.D. in Computer Science from Carnegie Mellon University, where his research was in the broad area of performance and resource management of cloud computing systems. Prior to starting his Ph.D., Timothy worked on graphics device driver development at NVIDIA, a leading visual computing technology company. He is the recipient of a National Science Foundation Graduate Research Fellowship Program fellowship and a EuroSys best paper award for TetriSched. He now co-leads the Computer Systems Lab (CSL) at Penn State where he advises students on cloud computing research.

10:30-11:00am Break
11:00am-12:30pm
Tesseract: Fast Scalable Pattern Mining on Evolving Graphs
Laurent Bindschaedler (EPFL)

Extracting insight from highly dynamic, graph-structured data has wide-ranging applications in areas such as chemistry, networking, finance, semantic web, and social networks. Graph pattern mining algorithms help discover interesting structural patterns embedded within these graphs. Mining evolving graphs is challenging today because most existing graph mining systems are designed for static graphs and recomputing patterns from scratch after a small subset of the graph has changed is prohibitively expensive. Similarly, current solutions for mining dynamic graphs are specialized for a subclass of problems in which patterns can be expressed as a fixed query.

In this talk, I will present a novel update-driven approach to building general-purpose, high-throughput, continuous graph pattern mining systems. First, I will demonstrate the need for an approach to support evolving graphs that differs from existing static mining systems. I will describe a localized exploration algorithm that allows incrementally computing pattern instances affected by updates. I will then introduce a novel canonicalization method that makes it possible to filter out duplicate subgraphs found during exploration in a co-ordination-free manner, making it possible to easily parallelize and distribute exploration. Finally, I will discuss how our approach combines with a pattern pruner API that allows programmers to leverage domain expertise to perform early pruning and speed up computation.

Laurent Bindschaedler is a Ph.D. candidate in Computer Science at the Swiss Federal Institute of Technology in Lausanne (EPFL). He is part of the Operating Systems Lab (LABOS), headed by Prof. Willy Zwaenepoel. Laurent is particularly interested in designing and implementing large-scale, high-performance, resource-efficient computer systems for big data analytics and cloud computing. He built the Chaos graph processing system which currently holds a record for the largest graph processed in a small cluster of commodity servers. As part of his research, he collaborated with companies such as Nokia and Nutanix. Before starting his Ph.D., Laurent co-founded LakeMind, a cloud service troubleshooting company, in 2012. He earned a BSc and a MSc from EPFL in 2010 and 2012, respectively.

Re-architect Distributed DNN Training with Heterogeneous GPU and CPU Resources
Chuanxiong Guo/Yibo Zhu (ByteDance)

DNN training usually uses data parallelism for distributed training, leveraging all-reduce or Parameter Server (PS). A common belief is that all-reduce has the best performance in synchronous training, while PS is good for asynchronous training. In this talk, we show neither all-reduce nor PS is optimal in the heterogeneous GPU and CPU environment, by decomposing the training process into multiple independent, yet connected components. Such decomposition guides us to design Summation Server abstraction and its implementation called BytePS, in which model optimization is separated into data summation in CPUs and parameter update in GPUs. BytePS is a truly cross-framework implementation because the server CPUs are left only with simple, primitive summation operations. BytePS is also generic in that it supports both synchronous and asynchronous training. Evaluation shows that SS achieves up to 149% speedup compared with all-reduce, or up to 67% speedup and 18.75x cost saving compared with PS (with state-of-the-art optimization). We have open-sourced BytePS (https://github.com/bytedance/byteps). It has been deployed internally and actively experimented by many external users and developers.

Yibo is a Senior Research Scientist at AI Lab, ByteDance Inc. Prior to joining ByteDance, he received his PhD degree from UC Santa Barbara in 2016 and worked for two years as a Researcher at Microsoft Research, Redmond. Yibo's research is mainly focused on Distributed Machine Learning Systems and Datacenter Networks. He designs scalable software systems that leverage heterogeneous hardware like GPUs, RDMA NICs and programmable ASICs. Examples include BytePS (https://github.com/bytedance/byteps), a distributed DL framework, communicatoin scheduling for DL, cluster scheduling for DL, large-scale RDMA networks, etc. Yibo is a recipient of Microsoft Research Fellowship (2015) and MSR Redmond Labs Exemplary Collaboration Award (2017) and authors 10+ SOSP/NSDI/SIGCOMM/MobiCom papers.

12:30-2:00pm Lunch
2:00-3:30pm
Cluster Storage Systems Gotta Have HeART
Saurabh Kadekodi (CMU)

Large-scale cluster storage systems typically consist of a heterogeneous mix of storage devices with significantly varying failure rates. Despite such differences among devices, redundancy settings are generally configured in a one-scheme-for-all fashion. In this paper, we make a case for exploiting reliability heterogeneity to tailor redundancy settings to different device groups. We present HeART, an online tuning tool that guides selection of, and transitions between redundancy settings for long-term data reliability, based on observed reliability properties of each disk group. By processing disk failure data over time, HeART identifies the boundaries and steady-state failure rate for each deployed disk group (e.g., by make/model). Using this information, HeART suggests the most space-efficient redundancy option allowed that will achieve the specified target data reliability with much fewer disks than one-scheme-for-all approaches.

Saurabh Kadekodi is a PhD student at Carnegie Mellon University affiliated with the Parallel Data Lab. He is advised by Prof. Greg Ganger and Prof. Rashmi Vinayak. His interests are in both, local and distributed storage systems, with recent focus in reliability of large-scale storage systems.

Alleviating Garbage Collection Interference through Spatial Separation in All Flash Arrays
Sam Noh (UNIST)

We are in the midst of a dramatic change in what computer systems look like. Our traditional view of computer systems composed of the CPU, main memory, and very slow storage device has recently been challenged with the advent of SSDs. Today, with the advent of persistent memory (PM), which has characteristics of DRAM as well as storage, we are, possibly, anticipating an even more dramatic change in what computers of the future will look like. In this talk, I will share experiences and results from work that we, at the NECSST (Nextgeneration Embedded/Computer System Software Technology) lab at UNIST, have been conducting in regards to these changes. In particular, I will present SWAN, a novel All Flash Array (AFA) management scheme. Recent flash SSDs provide high I/O bandwidth (e.g., 3- 10GB/s) so the storage bandwidth can easily surpass the network bandwidth by aggregating a few SSDs. However, it is still challenging to unlock the full performance of SSDs. The main source of performance degradation is garbage collection (GC). We find that existing AFA designs are susceptible to GC at SSD-level and AFA software level. In designing SWAN, we aim to alleviate the performance interference caused by GC at both levels. Unlike the commonly-used temporal separation approach that performs GC at idle time, we take a spatial separation approach that partitions SSDs into the front-end SSDs dedicated to serve write requests and the back-end SSDs where GC is performed. Compared to temporal separation of GC and application I/O, which is hard to control with AFA software, our approach guarantees that the storage bandwidth always matches the full network performance without being interfered by AFA-level GC. We provide extensive evaluations that show SWAN is effective for a variety of workloads.

Sam H.(Hyuk) Noh received the BS degree in computer engineering from the Seoul National University, Seoul, Korea, in 1986, and the PhD degree from the Department of Computer Science, University of Maryland, College Park, MD, in 1993. He held a visiting faculty position at the George Washington University, Washington, DC, from 1993 to 1994 before joining Hongik University, Seoul, Korea, where he was a professor in the School of Computer and Information Engineering until the Spring of 2015. Starting from the Fall of 2015 he joined UNIST (Ulsan National Institute of Science and Technology), a young Science and Tech focused national university, where he is a Professor (and former Dean, 2016~2018) of the School of Electrical and Computer Engineering. From August 2001 to August 2002, he was also a visiting associate professor with the University of Maryland Institute of Advanced Computer Studies (UMIACS), College Park, MD. He has served as General Chair, Program Chair, Steering Committee Member, and Program Committee Member on a number of technical conferences and workshops, and is currently serving as Program Co-Chair for FAST '20 and as Editor-in-Chief of the ACM Transactions on Storage. His current research interests include operating system issues pertaining to embedded/computer systems with a focus on the use of new memory technologies such as flash memory and persistent memory. He is a Distinguished Member of the ACM, a Senior member of the IEEE, and a member of USENIX and KIISE.

3:30-3:45pm Break
3:45-5:15pm
Glass: A New Media for a New Era?
Ioan Stefanovici (Microsoft Research Cambridge)

The rapid expansion of cloud computing is driving unprecedented demand for long-term data storage. By 2020, it's expected that zettabytes of data will be stored in the cloud. All of the storage technologies in use today were designed before the cloud to support various smaller-scale usage scenarios in the consumer and enterprise space, with the limits of what's possible using existing technologies being reached. In particular, magnetic media is facing physical limits on capacity scaling, as well as fundamental limits on the cost of storing long-lived data. In Project Silica, we are leveraging recent discoveries in ultrafast laser optics to store data in quartz glass by using femtosecond lasers, and building a completely new storage system designed from scratch around this technology. In this talk, I will describe the basis for how data is stored in glass, as well as some of the exciting opportunities we are pursuing to challenge and completely re-think traditional storage system design, in order to build a new storage technology whose hardware and software are co-designed solely for the cloud.

Ioan Stefanovici is a Senior Researcher in the Cloud Infrastructure group at Microsoft Research Cambridge. His research focuses mainly on novel storage technologies and data center systems, and recently on using machine learning to solve hard systems challenges. He received his PhD from the University of Toronto in 2016, where his research focused on improving system reliability, controllability, and programmability, as well as reducing the impact of large-scale systems on the environment.

In-situ Data Processing and Zoned Storage
George Amvrosiadis (CMU)

How do we ingest data from millions of sources, analyze it, and create denser storage devices to cram it in? It will take one paradigm shift in the way we design file systems, and another in the way we design storage devices. This talk is about those paradigm shifts: DeltaFS and Zoned Storage.

DeltaFS is a new distributed file system that unlocks orders of magnitude higher metadata performance by rethinking cross-process namespace synchronization. I will present another property of DeltaFS: enabling three orders of magnitude faster data queries by producing data indexes in-situ, i.e., while the data is written to storage.

Zoned storage devices change the way we view hard disk and solid state storage. They enable the creation of denser hard disks, and solid state drives with almost no write amplification and predictable performance. But zoned storage also demands we replace the venerable block interface with a new zoned interface. I will be introducing this new interface, as well as a backend we have developed to support zoned devices without requiring modifications to existing key-value stores.

George is an Assistant Research Professor of Electrical and Computer Engineering at Carnegie Mellon University, and a member of the Parallel Data Lab. His research focuses on distributed storage and data analytics, with an emphasis on high performance computing and machine learning. He co-teaches courses on cloud computing and storage systems, and holds a Ph.D. from the University of Toronto.

6:00pm Dinner