5th International Workshop on Cloud Data and Platforms (CloudDP'15)

Call for Papers

Processing of very large data sets requires a unique combination of data management and distributed systems engineering knowledge. The data management challenges include, among others, the development of new approaches and algorithms that can reduce the complexity of the data processing and allow incremental, continuous, and as accurate as possible result production. Simultaneously, the sheer volume and velocity of the data require support of systems which can automatically and adaptively scale up and out in order to accommodate big data processing algorithms.

The focus of this workshop is on new cloud-based data management and processing systems which span tens of thousands of machines in order to support processing of contemporary, very large data sets. Such systems require novel architectures, programming models and designs that go beyond approaches used in fixed-sized compute clusters. The focus of such systems is to support the work of users who interactively explore and analyze large and quickly changing data sets. The right platforms and techniques can simplify and accelerate the design, implementation, and execution of new “big data” applications.

In the past, data processing in the cloud has been dominated by batch processing paradigms such as MapReduce, but increasingly users seek to consume their results in near real-time. In order to efficiently support these new types of applications, it is necessary to overcome challenges when supporting adaptive, near real-time processing of data in cloud environments. Ultimately, adaptive low-latency data processing across large number of machines brings a new set of problems related to systems, distributed systems and geo-distribution, networking, fault-tolerance, and data management research.

Instead of providing a forum for merely extending existing cloud data systems and platforms, we hope to encourage the discussion of radical new alternatives. In particular, we want to foster the development of new infrastructures and platforms that rethink how data can be processed in cloud-based systems. We plan to attract research that has the potential to underpin the next generation of scalable and efficient data management applications on top of high-level, flexible platforms.

The topics of the workshop relate to various aspects of cloud-based data management platforms, and the resulting challenges for the supporting cloud infrastructures. Specifically, we invite submissions focusing on, not exclusively:

new processing paradigms
new programming models
adaptive data management

vertical and horizontal scalability in data processing
elastic storage and networking
elasticity and adaptive scheduling

resource allocation and provisioning
multi-tenancy and virtualization

dependability and fault tolerance
predictability in cloud environments

large-scale and distributed deployments
case-studies

Important Dates

Paper submission and notifications dates:
Paper Submission deadline: (EXTENDED): February 16, 2015 (23:59, anywhere on earth) There will be no further extension
Notification of acceptance: March 6th, 2015
Camera ready deadline: March 27th, 2015

Submissions

Paper submission can be done here . Authors should follow the following guidelines when preparing their papers:
Papers should be 6 pages long.
Authors should use a 10pt font by specifying \\documentclass[10pt,twocolumn]{sigplanconf}.
Papers must be formatted according to the ACM SIGPLAN style, for which templates are available for both LaTeX and Word.

Accepted papers will be published as part of the ACM Digital Library.

Program

9:15 - 10:30: Opening and Session 1 - Chair: Etienne Riviere

Introduction and welcome

Mei Li, Xu Gao, Yanjun Wu, Chen Zhao and Mingshu Li. Characterizing the Spatio-temporal Burstiness of Storage Workloads.

Domenico Cotroneo, Flavio Frattini, Roberto Pietrantuono and Stefano Russo. Robustness Testing of IaaS Cloud Platforms: A State-Based Approach.

10:30 - 11:00: Coffee Break

11:00 - 12:30: Session 2 - Chair: Peter Pietzuch

Anastassios Nanos, Stefanos Gerangelos, Ioanna Alifieraki and Nectarios Koziris. V4VSockets: low-overhead intra-node communication in Xen.

Chunliang Hao, Jie Shen, Heng Zhang, Xiao Zhang, Yanjun Wu and Mingshu Li. Sparkle: Adaptive Sample Based Scheduling for Cluster Computing.

EU project highlight: CoherentPaaS : Blending SQL, NoSQL, and CEP.
Ricardo Jimenez-Peris. CEO & Co-Founder LeanXcale.
The talk will present how enterprises today are moving towards a polyglot persistence world in which they use multiple data management technologies to solve different problems they have and new problems they are facing in this new world. The pains they have are mainly two. On one hand, when they have to update the data on multiple data stores, if there is a failure, it results inevitably into an inconsistent logical database in which certain data stores have been updated and others have not been updated. The second pain is that whenever they need to read and correlate data across data stores, they have to do it programmatically, that is simply too hard. CoherentPaaS comes to solve these pains. First, it provides a holistic ultra-scalable transactional processing that enables to update transactionally any number of data stores integrated in the framework. Second, it provides a way to read declaratively from a combination of data stores.

12:30 - 14:00: Lunch Break

14:00 - 15:30: Session 3 - Chair: Luis Veiga

Keynote: Prof. Willy Zwaenepoel, School of Computer and Communication Sciences, EPFL

"Analytics on Graphs with a Trillion Edges"

Big graphs occur naturally in many applications, most obviously in social networks, but also in many other areas such as biology and forensics. Current approaches to processing large graphs use either supercomputers or very large clusters. In both cases the entire graph must reside in memory before it can be processed. We are pursuing an alternative approach, processing graphs from secondary storage. While this comes with some performance penalty, it makes analytics on very large graphs accessible on a small number of commodity machines. It also has the pleasing property that "if you can store a graph, you can compute on it".

We have developed two systems, one for a single machine and one for a cluster of machines. X-Stream, the single machine solution, aims to make all secondary storage access sequential. It uses two techniques to achieve this goal, edge-centric processing and streaming partitions. X-Stream outperforms the state-of-the-art GraphChi system, because it achieves better sequentiality and because it requires less preprocessing. Slipstream, the cluster solution, starts from the observation that there is little benefit to locality when accessing data from secondary storage over a high-speed network. As a result, partitioning can be dynamic and can focus on achieving load balance, in combination with sequentiality of secondary storage access. The resulting system achieves good scaling behavior and outperforms the state-of-the-art out-of-core Giraph system. With Slipstream we have also been able to process a trillion-edge graph, a new milestone for graph size on a small cluster. I will describe both systems and their performance on a number of benchmarks and in comparison to state-of-the-art alternatives.

This is joint work with Laurent Bindschaedler, Jasmina Malicevic and Amitabha Roy at EPFL.

Willy Zwaenepoel received his BS/MS from the University of Gent, Belgium, and his PhD from Stanford University. He is currently a Professor of Computer Science at EPFL. Before he has held appointments as Professor of Computer Science and Electrical Engineering at Rice University, and as Dean of the School of Computer and Communication Sciences at EPFL. His interests are in operating systems and distributed systems. He is a Fellow of the ACM and the IEEE, he has received the IEEE Kanai Award and several best paper awards, and is a member of the Belgian and European Academies. He has also been involved in a number of startups, including iMimic (now part of Cisco), Midokura and Nutanix.
EU project highlight: HARNESS: bringing real hardware heterogeneity to the cloud
Guillaume Pierre, Université de Rennes 1

15:30 - 16:30: Coffe Break + Poster Session

16:30 - 17:30: Session 4 - Chair: Guillaume Pierre

Fernando Costa and Paulo Ferreira. SCOLARS-DV: Scalable Task Validation over the Internet.

EU project highlight: LeanBigData: Real-Time Big Data Analytics
Ricardo Jimenez-Peris. CEO & Co-Founder LeanXcale
The will present how today enterprises are forced to use two different kinds of data management systems. One for their operational databases that provides full data consistency guarantees (i.e. ACID properties), the so-called OLTP systems (OnLine Transactional Processing). Another for doing business analytics that enable them to perform heavy queries with online response times, the so called data warehouses or OLAP systems. Because these systems are disjoint they are forced to copy from the operational database into the OLAP data warehouse by means of a process called Extract-Transform-Load (ETL). This process is quantified in 80% of performed business analytics that is totally ridiculous. LeanBigData brings an ultra-scalable transactional OLTP database with an integrated OLAP engine that accesses the operational data with full transactional guarantees. The resulting system delivers real-time big data analytics.

Workshop Venue

The CloudDP 2015 workshop is co-located with the EuroSys 2015 conference and will be held in in the facilities of Bordeaux INP Enseirb-Matmeca which is located on the campus of the University of Bordeaux. Please refer to the EuroSys 2015 local information pages for details about the venue and accommodation. The exact location of the workshop will be announced early March 2015.

Organization

Workshop Organizers and Program Chairs:

Program Chairs:
Minos Garofalakis (Technical University of Crete in Chania, Greece)
Etienne Rivière (University of Neuchâtel, Switzerland)
Luis Veiga (Técnico Lisboa - ULisboa / INESC-ID Lisboa, Portugal)
Publication Chair:
Anita Sobe (University of Neuchâtel, Switzerland)

Technical Program Committee:

Paulo Ferreira, INESC ID, Portugal
Pedro Garcia Lopez, Uni. Rovira I Virgili, Spain
Ching-Hsien Hsu, Chung Hua University, Taiwan
Jinho Hwang, IBM Research, USA
Stratos Idreos, Harvard University, USA
Zbigniew Jerzak, SAP AG, Germany
Sebastian Michel, TU Kaiserslautern, Germany
Rui Oliveira, INESC-TEC, Portugal
Odysseas Papapetrou, Technical University of Crete, Greece
Guillaume Pierre, Universite de Rennes 1, France
Peter Pietzuch, Imperial College London, UK
Neoklis Polyzotis, UCSC and Google, USA
Marco Serafini, Qatar Computing Research Institute, Qatar
Marko Vukolic, Eurecom, France

Contact:

Please do not hesitate to contact the organizers if you have any questions.