CSL seminars - Summer 2014

Location and time: BA5256/BA5205, Time and Day below





Presenter Bio

June 10

Sahil Suneja (Tuesday, 12:00pm, BA5256!)

Non-intrusive, Out-of-band and Out-of-the-box Systems Monitoring in the Cloud
The dramatic proliferation of virtual machines (VMs) in datacenters and the highly-dynamic and transient nature of VM provisioning has revolutionized datacenter operations. However, the management of these environments is still carried out using re-purposed versions of traditional agents, originally developed for managing physical systems, or most recently via newer virtualization-aware alternatives that require guest cooperation and accessibility. We show that these existing approaches are a poor match for monitoring and managing (virtual) systems in the cloud due to their dependence on guest cooperation and operational health, and their growing lifecycle management overheads in the cloud.

In this work, we first present Near Field Monitoring (NFM), our non-intrusive, out-of-band cloud monitoring and analytics approach that is designed based on cloud operation principles and to address the limitations of existing techniques. NFM decouples system execution from monitoring and analytics functions by pushing monitoring out of the targets systems' scope. By leveraging and extending VM introspection techniques, our framework provides simple, standard interfaces to monitor running systems in the cloud that require no guest cooperation or modification, and have minimal effect on guest execution. By decoupling monitoring and analytics from target system context, NFM provides "always-on'' monitoring, even when the target system is unresponsive. NFM also works "out-of-the-box" for any cloud instance as it eliminates any need for installing and maintaining agents or hooks in the monitored systems. We describe the end-to-end implementation of our framework with two real-system prototypes based on two virtualization platforms. We discuss the new cloud analytics opportunities enabled by our decoupled execution, monitoring and analytics architecture. We present four applications that are built on top of our framework and show their use for across-time and across-system analytics.

Sahil is a third year Ph.D. student working with Prof. Eyal de Lara. His research interests lie broadly in the fields of Virtualization, Cloud Computing, Parallel & High Performance Computing. His Ph.D. research focuses on making systems management and virtualization management techniques more scalable and efficient in virtualization based cloud systems. His secondary interests include Wireless Networking and Mobile Computing.

June 12

Zhen (James) Huang (Thursday, 12:00pm, BA5256!)

Ocasta: Clustering Configuration Settings For Error Recovery
Effective machine-aided diagnosis and repair of configuration errors continues to elude computer systems designers. Most of the literature targets errors that can be attributed to a single erroneous configuration setting. However, a recent study found that a significant amount of configuration errors require fixing more than one setting together. To address this limitation, Ocasta statistically clusters dependent configuration settings based on the application’s accesses to its configuration settings and utilizes the extracted clustering of configuration settings to fix configuration errors involving more than one configuration settings. Ocasta treats applications as black-boxes and only relies on the ability to observe application accesses to their configuration settings. We collected traces of real application usage from 24 Linux and 5 Windows desktops computers and found that Ocasta is able to correctly identify clusters with 88.6% accuracy. To demonstrate the effectiveness of Ocasta, we evaluated it on 16 real-world configuration errors of 11 Linux and Windows applications. Ocasta is able to successfully repair all evaluated configuration errors in 11 minutes on average and only requires the user to examine an average of 3 screenshots of the output of the application to confirm that the error is repaired. A user study we conducted shows that Ocasta is easy to use by both expert and non-expert users and is more efficient than manual configuration error troubleshooting.

Zhen Huang is a fourth year PhD student working with Professor David Lie. He finished his Masters degree under the supervision of Professor David Lie at University of Toronto in 2009. In his past life, he worked as a senior software developer at EMC, SAP Canada, and Bank of China. His research interests include software reliability, software security, and operating system.

June 16

Mike Qin (Monday, 12:00pm, PT266!)

Modern data centers are increasingly using shared storage solutions for ease of management. Data is cached on the client side on inexpensive and high-capacity flash devices, helping improve performance and reduce contention on the storage side. Currently, write-through caching is used because it ensures consistency and durability under client failures, but it offers poor performance for write-heavy workloads. In this work, we propose two write-back based caching policies, called write-back flush and write-back persist, that provide strong reliability guarantees, under two different client failure models. These policies rely on storage applications, such as file systems and databases, issuing write barriers to persist their data reliably on storage media. Our evaluation shows that these policies perform close to write-back caching, significantly outperforming write-through caching, for both read-heavy and write-heavy workloads.

Mike is a first year PhD student at ECE department working with Professor Ashvin Goel and Professor Angela Demke Brown. His research focuses on storage systems in general. He is now working on storage systems for new storage hardware.

June 19

Suprio Ray (Thursday, 12:00pm, BA5205!)

Supporting Location-Based Services in a Main-Memory Database
With the proliferation of mobile devices and explosive growth of spatio-temporal data, Location-Based Services (LBS) have become an indispensable technology in our daily lives. The key characteristics of the LBS applications include a high rate of time-stamped location updates, and many concurrent historical, present and predictive queries. The commercial providers of LBS must support all three kinds of queries and address the high update rates. While they employ relational databases for this purpose, traditional databases are unable to cope with the growing demands of many LBS systems. Support for spatio-temporal indexes within these databases are limited to R-tree based approaches. Although a number of advanced spatio-temporal indexes have been proposed by the research community, only a few of them support historical queries. These indexing techniques, with support for historical queries, are unable to sustain high update and query throughput typical in LBS.

Technological trends involving increasingly large main memory and core footprints offer opportunities to address some of these issues. We present several key ideas to support high performance commercial LBS by exploiting in-memory database techniques. Taking advantage of very large memory available in modern machines, our system maintains the location data and index for the past N days in memory. Older data and index are kept in disk. We propose an in-memory storage organization for high insert performance. We also introduce a novel spatio-temporal index that maintains partial temporal indexes in a versioned-grid structure. The partial temporal indexes are organized as compressed bitmaps. With extensive evaluation, we demonstrate that our system supports high insert and query throughputs and it outperforms the leading LBS system by a significant margin.

Suprio is a 4th year Ph.D. student working with Prof. Angela Demke Brown. He also collaborates with Prof. Ryan Johnson. His research interests are Big Data issues in database management systems, spatial and spatio-temporal databases, Cloud computing and distributed systems. He obtained his Masters degree from UBC under the supervision of Prof. Mike Feeley and Prof. Norm Hutchinson. He also worked with Prof. Alexandra Fedorova at SFU as a research student. Before joining the Ph.D. program he worked as a software engineer at Oracle, Lucent and Webtech Wireless. This work was done during his recent internship at SAP.

June 25

Suprio Ray (Wednesday, 12:00pm, BA5205!)

Skew-Resistant Parallel In-memory Spatial Join
Spatial join is a crucial operation in many spatial analysis applications in scientific and geographical information systems. Due to the compute-intensive nature of spatial predicate evaluation, spatial join queries can be slow even with a moderate sized dataset. Efficient parallelization of spatial join is therefore essential to achieve acceptable performance for many spatial applications. Technological trends, including the rising core count and increasingly large main memory, hold great promise in this regard. Previous parallel spatial join approaches tried to partition the dataset so that the number of spatial objects in each partition was as equal as possible. They also focused only on the filter step. However, when the more compute-intensive refinement step is included, significant processing skew may arise due to the uneven size of the objects. This processing skew significantly limits the achievable parallel performance of the spatial join queries, as the longest-running spatial partition determines the overall query execution time. Our solution is SPINOJA, a skew-resistant parallel in-memory spatial join infrastructure. SPINOJA introduces MOD-Quadtree declustering, which partitions the spatial dataset such that the amount of computation demanded by each partition is equalized and the processing skew is minimized. We compare three work metrics used to create the partitions and three load-balancing strategies to assign the partitions to multiple cores. SPINOJA uses an in-memory column-store to store the spatial tables. Our evaluation shows that SPINOJA outperforms in-memory implementations of previous spatial join approaches by a significant margin and a recently proposed in-memory spatial join algorithm by an order of magnitude.

Suprio is a fourth year Ph.D. student advised by Prof. Angela Demke Brown. He also collaborates with Prof. Ryan Johnson. His research interests are Big Data issues in database management systems, spatial and spatio-temporal databases, Cloud computing and distributed systems. He obtained a Masters degree from UBC. Before joining the Ph.D. program he worked as a software engineer at Oracle, Lucent and Webtech Wireless.

July 31

Tianzheng Wang (Thursday, 12:00pm, BA5256!)

Scalable Logging through Emerging Non-Volatile Memory
Emerging byte-addressable, non-volatile memory (NVM) is fundamentally changing the design principle of transaction logging. It potentially invalidates the need for flush-before-commit as log records are persistent immediately upon write. Distributed logging - a once prohibitive technique for single node systems in the DRAM era - becomes a promising solution to easing the logging bottleneck because of the non-volatility and high performance of NVM.

In this paper, we advocate NVM and distributed logging on multicore and multi-socket hardware. We identify the challenges brought by distributed logging and discuss solutions. To protect committed work in NVM-based systems, we propose passive group commit, a lightweight, practical approach that leverages existing hardware and group commit. We expect that durable processor cache is the ultimate solution to protecting committed work and building reliable, scalable NVM-based systems in general. We evaluate distributed logging with logging-intensive workloads and show that distributed logging can achieve as much as ~3x speedup over centralized logging in a modern DBMS and that passive group commit only induces minuscule overhead.

Tianzheng is a first year Ph.D. student working with Prof. Ryan Johnson. His research lies broadly in building high performance and reliable systems with new hardware like emerging non-volatile memories. His research interests also include embedded systems, databases and operating systems in general.