CSL seminars - Winter 2013

Location and time: BA5256, Friday 12:30-2pm





Feb 1

Madalin Mihailescu

MixApart: Decoupled Analytics for Shared Storage Systems

We present MixApart, a scalable data processing framework for shared enterprise storage systems. With MixApart, a single consolidated storage back-end manages enterprise data and services all types of workloads, thus simplifying data management and lowering hardware costs for enterprises. In addition, MixApart enables the local storage performance required by data analytics through an integrated data caching and scheduling solution.

Presenter Bio: Madalin Mihailescu has recently defended his PhD thesis in the Department of Computer Science, at UofT, under the supervision of Professor Cristiana Amza. He is currently employed with NetApp, Boston. His PhD research focused on the design of a new Hadoop-like MapReduce framework allowing the flexibility of scaling the compute tier independently from the storage tier, by placing the compute tier in a cache which could be deployed anywhere, including on the Cloud.

Mar 8

James Zhen Huang

Ocasta: Precise and Efficient Configuration Recovery for Desktop Applications

Configuration errors are a leading cause of desktop system failures. To recover from a configuration error, end-users usually choose to use snapshot recovery, or reset recovery (reset all configuration settings of the software having the error to default values). However, all these solutions cause a challenging problem: collateral damage (user losing configuration changes unrelated to the configuration error). Ocasta addresses this problem with two novel techniques: configuration setting clustering and semi-automatic search. We evaluated Ocasta with traces of real application usage collected from 24 Linux desktops and 5 Windows desktops. Ocasta successfully fixed all 18 injected configuration errors for 10 Linux and Windows desktop applications, with an average of 87% and 95% less collateral damage as compared to snapshot recovery and reset recovery, respectively.

Presenter Bio: Zhen Huang is a fourth year PhD student working with Professor David Lie. He finished his Masters degree under the supervision of Professor David Lie at University of Toronto in 2009. In his past life, he worked as a senior software developer at EMC, SAP Canada, and Bank of China. His research interests include software reliability, software security, and operating system.

Mar 22

Gennady Pekhimenko

Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches

Cache compression is a promising technique to increase on-chip cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high hardware complexity and unacceptable decompression/compression latencies, which in turn can negatively affect performance. Hence, there is a need for a simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency.

In this work, we introduce a new compression algorithm called Base-Delta-Immediate (BΔI) compression, a practical technique for compressing data in on-chip caches. The key idea is that, for many cache lines, the values within the cache line have a low dynamic range Δ i.e., the differences between values stored within the cache line are small. As a result, a cache line can be represented using a base value and an array of differences whose combined size is much smaller than the original cache line (we call this the base+delta encoding). Moreover, many cache lines intersperse such base+delta values with small values - our BΔI technique efficiently incorporates such immediate values into its encoding.

Compared to prior cache compression approaches, our studies show that BΔI strikes a sweet-spot in the tradeoff between compression ratio, decompression/compression latencies, and hardware complexity. Our results show that BΔI compression improves performance for both single-core (8.1% improvement) and multi-core workloads (9.5% / 11.2% improvement for two/four cores). For many applications, BΔI provides the performance benefit of doubling the cache size of the baseline system, effectively increasing average cache capacity by 1.53X.

Presenter Bio: Gennady Pekhimenko is a third year PhD student in the Computer Science Department at Carnegie Mellon University, working with Professor Todd C. Mowry and Professor Onur Mutlu. Before that (2008-2010), he worked at IBM Toronto Lab (Compilers Group) under supervision of Dr. Yaoqing Gao. He received his MSc. in Computer Science in 2008 from the University of Toronto, where he was advised by Professor Angela Demke Brown; and his B.S. in Applied Mathematics and Computer Science from Moscow State University in 2004. His research interests are in computer architecture with emphasis on efficient memory hierarchy designs. Gennady's work is funded by Microsoft Research and NSERC CGS-D fellowships.

May 24

Thomas Sewell, University of New South Wales, Sidney Australia

Verifying an OS Kernel down to the Binary Level

Bugs in crucial software systems are a constant, unfortunate nuisance. Formal verification is an attractive theoretical solution to the problem, but one which has been applied to only a tiny number of real programs. This talk begins with a recap of one such verification: the proof of functional correctness of the seL4 microkernel. We will take a brief look at the big picture of this proof and some aspects of seL4's design that made the problem tractable.

The functional correctness proof for seL4 initially assumed the correctness of the compiler, hardware control routines and initialization code. The body of this talk will focus on recent work which addresses the compiler correctness assumption by showing that the semantics of the compiled binary matches the semantics of the C source which was assumed in the earlier proof. We target the Cambridge ARM semantics, gcc (4.5.1) at optimization level 1, and the previously-verified parts of seL4. We use a combination of proof-producing conversions in Isabelle/HOL and HOL4 and a custom correspondence finder for assembly-like programs powered by modern SMT solvers.

Presenter Bio: Thomas is a PhD student at the University of New South Wales in Sidney Australia. He is working on extensions to the verification of the seL4 microkernel. He was a core team member of the L4.verified project that proved the functional correctness of the kernel. Since the completion of that project he has extended the functional correctness proof to include the fastpath and worked on an access control model for seL4. He is now working on eliminating the compiler correctness assumption from the seL4 proof by establishing that the compilation was faithful to the assumed semantics of the C code. His upcoming PLDI paper, Translation validation for a verified OS kernel, can be found here:http://www.nicta.com.au/pub?doc=6449

May 31

Nosayba El-Sayed

Reading between the lines of failure logs: Understanding how HPC systems fail

As the component count in supercomputing installations continues to increase, system reliability is becoming one of the major issues in designing HPC systems. These issues will become more challenging in future Exascale systems, which are predicted to include millions of CPU cores. Efficiently running those systems will require a good understanding of how different factors impact system reliability. In this work we use a decade worth of field data made available by Los Alamos National Lab to study the impact of a diverse set of factors on the reliability of HPC systems. We provide insights into the nature of correlations between failures, and investigate the impact of factors, such as the power quality, temperature, fan and chiller reliability, system usage and utilization, and external factors, such as cosmic radiation, on system reliability.

Nosayba is a third year PhD student in the department of Computer Science working under the supervision of Professor Bianca Schroeder. Her research focuses on collecting and analyzing large-scale field data from production machines and applying rigorous statistical and analytical methods to explore opportunities for reliability and energy-efficiency improvements.