25th International Conference on Scientific and Statistical Database Management
 

Sunday July 28, 2013
18:30-19:30Tour of the Aquarium
19:30Reception at the Aquarium
 
Monday July 29, 2013
8:00-8:30Breakfast
8:30-8:45Welcome by General Chairs
Alex Szalay (The Johns Hopkins University, USA), Tamas Budavari (The Johns Hopkins University, USA)
8:45-9:1525th Anniversary of SSDBM Talk
Arie Shoshani (Lawrence Berkeley National Laboratory)
9:15-10:30Keynote Session 1: Making Sense of Big Data with the Berkeley Data Analytics Stack
Michael J. Franklin (University of California, Berkeley)
10:30-11:00Coffee
11:00-12:20Research Session 1: Multidimensional Data
12:20-13:20Lunch
13:20-15:00Research Session 2: Spatio-Temporal Data
15:00-15:20Coffee
15:20-17:00Case-Studies
 
Tuesday July 30, 2013
8:00-8:30Breakfast
8:30-9:00Status & Awards
9:00-10:15Keynote Session 2: Computational Challenges in Next-Generation Genomics
Steven Salzberg (The Johns Hopkins University)
10:15-11:00Coffee & Posters
11:00-12:20Research Session 3: Streaming and Time-series Data
12:20-13:20Lunch
13:20-15:00Research Session 4: Miscellaneous
15:00-15:45Coffee & Posters
15:45-17:00Short Papers Session 1
17:15-18:15Short Papers Session 2
19:00-22:00Conference banquet dinner at McCormick and Schmick’s.
 
Wednesday July 31, 2013
8:00-8:30Breakfast
8:30-9:30Panel: Education and career paths for data scientists.
Organizer: Magdalena Balazinska (University of Washington); Panelists: Susan Davidson (University of Pennsylvania), Bill Howe (University of Washington), Alexandros Labrinidis (University of Pittsburgh)
9:30-10:30Demonstrations
10:30-11:00Coffee
11:00-12:20Research Session 5: Graphs and Indexes

Keynote Sessions

Keynote Session 1: Making Sense of Big Data with the Berkeley Data Analytics StackMonday July 29, 2013 9:15-10:30
Speaker: Michael J. Franklin (UC Berkeley, USA)

Session Chair: Judy Cushing
The Berkeley AMPLab was founded on the idea that the challenges of emerging Big Data applications requires a new approach to analytics systems. Launching in early 2011, the project set out to rethink the traditional analytics stack, breaking down technical and intellectual barriers that had arisen during decades of evolutionary development. The vision of the lab is to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (such as machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and en masse, as with crowdsourced human computation). To pursue this goal, we assembled a research team with diverse interests across computer science, forged relationships with domain experts on campus and elsewhere, and obtained the support of leading industry partners and major government sponsors. The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the nearly three years the lab has been in operation, we've released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. In this talk I'll describe the current state of BDAS with an emphasis on the key components that have been released to date. I'll then discuss ongoing efforts on machine learning scalability and ease of use, including the MLbase system, as our focus moves higher up the stack. Finally I will present our longer-term views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.

Michael Franklin is the Thomas M. Siebel Professor of Computer Science at UC Berkeley, where he also serves as Director of the Algorithms, Machines and People Lab (AMPLab). The Berkeley AMPLab is a collaboration of over 60 researchers supported by Founding Sponsors Amazon Web Services, Google, and SAP, along with 17 other leading companies, the Darpa XData program, and an NSF Expeditions in Computing award. The latter was announced as part of the Obama Administration's Big Data research initiative in 2012. His research interests include large-scale data management and analytics, data integration, and hybrid human/computer data processing systems. He was founder and CTO of Truviso, a real-time data analytics company acquired by Cisco Systems in 2012. He is an ACM Fellow and two-time winner of the ACM SIGMOD Test of Time Award (2013 and 2004). He also recently received the Best Paper awards at ICDE 2013 and NSDI 2012, a "Best of VLDB 2012" selection, Best Demo awards at SIGMOD 2012 and VLDB 2011 and the Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley. He is a committee member on the U.S. National Academy of Sciences study on Analysis of Massive Data and a Transportation Research Board committee on long-term data stewardship. Prof. Franklin received his Ph.D. in Computer Science from the University of Wisconsin-Madison in 1993.


Keynote Session 2: Computational Challenges in Next-Generation GenomicsTuesday July 30, 2013 9:00-10:15
Speaker: Steven Salzberg (The Johns Hopkins University, USA)

Session Chair: Alex Szalay
Next-generation sequencing (NGS) technology allows us to peer inside the cell in exquisite detail, revealing new insights into biology, evolution, and disease that would have been impossible to find just a few years ago. The enormous volumes of data produced by NGS experiments present many computational challenges that we are working to address. In this talk, I will discuss solutions to two basic alignment problems: (1) mapping sequences onto the human genome at very high speed, and (2) mapping and assembling transcripts from RNA-seq experiments. I will also discuss some of the problems that can arise during alignment and how these can lead to mistaken conclusions about genetic variation and gene expression. My group has developed algorithms to solve each of these problems, including the widely-used Bowtie and Bowtie2 programs for fast alignment and the TopHat and Cufflinks programs for assembly and quantification of genes in transcriptome sequencing (RNA-seq) experiments. This talk describes joint work with current and former lab members including Ben Langmead, Cole Trapnell, Daehwan Kim, and Geo Pertea; and with collaborators including Mihai Pop and Lior Pachter.

Steven Salzberg is a Professor of Medicine and the Director of the Center for Computational Biology in the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins University. He holds joint appointments as Professor in the Departments of Biostatistics and Computer Science. He earned his B.A. and M.S. degrees from Yale University, and his Ph.D. from Harvard University. From 1997-2005 he was Senior Director of Bioinformatics at The Institute for Genomic Research (TIGR) in Rockville, Maryland, and fFrom 2005-2011, he was the Director of the Center for Bioinformatics and Computational Biology (CBCB) and the Horvitz Professor of Computer Science at the University of Maryland, College Park. Dr. Salzberg's interest in the human genome project motivated him to develop one of the first computational gene-finding systems for the human genome in the early 1990s. His initial collaborations with TIGR at that time led to the development of a gene-finding program (Glimmer) that has been used in the analysis of thousands of bacterial, archaeal, and virusal genomes, including Borrelia burgdorferi, Mycobacterium tuberculosis, Vibrio cholerae, Bacillus anthracis, and many others. He was a co-founder of the Influenza Genome Sequencing Project, the first large-scale genomics study of human and avian influenza viruses. His current work focuses on algorithms for genome assembly and alignment, particularly emphasizing next-generation sequencing data. His group has recently developed the Bowtie, TopHat, and Cufflinks software for alignment of next-gen sequences from re-sequencing and RNA-seq experiments. All of his group's software is free and open source. Dr. Salzberg has authored or co-authored two books and over 200 publications in leading scientific journals, and he is a Fellow of the American Association for the Advancement of Science and of the International Society for Computational Biology. His H-index is 102.


Panel

Panel: Education and career paths for data scientistsWednesday July 31, 2013 8:30-9:30
Organizer: Magdalena Balazinska (University of Washington); Panelists: Susan Davidson (University of Pennsylvania), Bill Howe (University of Washington), Alexandros Labrinidis (University of Pittsburgh)

MOTIVATION: As industry and science are increasingly data-driven, the need for skilled data scientists is exceeding what our universities are producing. According to a Mckinsey report: "By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills". Similarly, the ability to extract knowledge from scientific data is accelerating discovery and we need the next generation of domain scientists to be experts not only in their domain but also in data management. At the same time, however, researchers in academia who focus on building instruments or data management tools are often less recognized for their contributions than researchers focusing purely on the actual science. OVERVIEW: The goal of this panel will be to discuss all these challenges. We will discuss various aspects of how we should be educating both the emerging "data science" experts and the next generation of database and domain science experts. The panel will also discuss career paths for researchers who choose to specialize in developing new methods and tools for Big Data management in domain sciences, with recommendations for how we should better support these less traditional career paths.

Magdalena Balazinska is an Associate Professor in the department of Computer Science and Engineering at the University of Washington. Magdalena's research interests are in the field of database management systems. Her current research focuses on big data management, sensor and scientific data management, and cloud computing. Magdalena holds a Ph.D. from the Massachusetts Institute of Technology (2006). She is a Microsoft Research New Faculty Fellow (2007), received an NSF CAREER Award (2009), a 10-year most influential paper award (2010), an HP Labs Research Innovation Award (2009 and 2010), a Rogel Faculty Support Award (2006), a Microsoft Research Graduate Fellowship (2003-2005), and multiple best-paper awards.

Panelists:

Susan B. Davidson received the B.A. degree in Mathematics from CornellUniversity, Ithaca, NY, in 1978, and the M.A. and Ph.D. degrees in Electrical Engineering and Computer Science from Princeton University, Princeton NJ, in 1980 and 1982. Dr. Davidson is the Weiss Professor and Chair of Computer and Information Science at the University of Pennsylvania, where she has been since 1982. She also served as Deputy Dean of the School of Engineering and Applied Science from 2005-2007.
Dr. Davidson's research interests include database and web-based systems, scientific data management, and extra-large databases. She co-developed the Kleisli data integration system (with Drs. Buneman, Tannen and Overton), featuring a complex value model of data that was amenable to optimizations, which was used for on-the-fly data integration of large genomic datasets. Kleisli was subsequently commercialized in the company GeneticXChange. She has also developed techniques for provenance management in scientific workflow systems, including support for search and query, techniques for focusing user attention on “relevant” provenance information, and marrying database-style and workflow-style provenance management using Pig-Latin to elucidate the function of black-box modules. More recently, she has focused on privacy concerns surrounding the capture and use of provenance information in both databases and workflow systems.
Dr. Davidson was the founding co-director of the Penn Center for Bioinformatics from 1997-2003, and the founding co-director of the Greater Philadelphia Bioinformatics Alliance. She holds a secondary appointment in the Department of Genetics, is an ACM Fellow, received the Lenore Rowe Williams Award (2002), and was a Fulbright Scholar and recipient of a Hitachi Chair (2004).

Bill Howe is the Director of Research for Scalable Data Analytics at the UW eScience Institute and holds an Affiliate Assistant Professor appointment in Computer Science & Engineering, where he studies data management, analytics, and visualization systems for science applications. Howe has received two Jim Gray Seed Grant awards from Microsoft Research for work on managing environmental data, has had two papers elected to VLDB Journal's "Best of Conference" issues (2004 and 2010), and co-authored what are currently the most-cited papers from both VLDB 2010 and SIGMOD 2012. Howe serves on the program and organizing committees for a number of conferences in the area of databases and scientific data management, and serves on the Science Advisory Board of the SciDB project. He has a Ph.D. in Computer Science from Portland State University and a Bachelor's degree in Industrial & Systems Engineering from Georgia Tech.

Alexandros Labrinidis received his Ph.D degree in Computer Science from the University of Maryland, College Park in 2002. He is currently an associate professor at the Department of Computer Science of the University of Pittsburgh and co-director of the Advanced Data Management Technologies Lab. He is also an adjunct associate professor at Carnegie Mellon University (CS Dept).
Dr. Labrinidis' research focuses on user-centric data management for network-centric applications, including web-databases, data stream management systems, sensor networks, and scientific data management (with an emphasis on big data). He has published over 65 papers at peer-reviewed journals, conferences, and workshops; he is the recipient of an NSF CAREER award in 2008. Dr. Labrinidis served as the Secretary/Treasurer for ACM SIGMOD (July 2009 - June 2013), and has previously served as the Editor of SIGMOD Record. He has also served on numerous program committees of international conferences/workshops.


Welcome by General Chairs

25th Anniversary of SSDBM Talk

Program Details

Welcome by General ChairsMonday July 29, 2013 8:30-8:45
Alex Szalay, Tamas Budavari. (The Johns Hopkins University, USA)


25th Anniversary of SSDBM TalkMonday July 29, 2013 8:45-9:15
Arie Shoshani (Lawrence Berkeley National Laboratory)

Session Chair: Judy Cushing

Keynote Session 1: Making Sense of Big Data with the Berkeley Data Analytics Stack »Monday July 29, 2013 9:15-10:30
Speaker: Michael J. Franklin (UC Berkeley, USA)

Session Chair: Judy Cushing

Research Session 1: Multidimensional Data »Monday July 29, 2013 11:00-12:20
Session Chair: Dan Halperin
Research Session 2: Spatio-Temporal Data »Monday July 29, 2013 13:20-15:00
Session Chair: Alexandros Labrinidis
Case-Studies »Monday July 29, 2013 15:20-17:00
Session Chair: Claudia Bauzer Medeiros
Status & AwardsTuesday July 30, 2013 8:30-9:00
Keynote Session 2: Computational Challenges in Next-Generation Genomics »Tuesday July 30, 2013 9:00-10:15
Speaker: Steven Salzberg (The Johns Hopkins University, USA)

Session Chair: Alex Szalay

Coffee & Posters »Tuesday July 30, 2013 10:15-11:00
Research Session 3: Streaming and Time-series Data »Tuesday July 30, 2013 11:00-12:20
Session Chair: Jim French
Research Session 4: Miscellaneous »Tuesday July 30, 2013 13:20-15:00
Session Chair: Sylvia Spengler
Coffee & Posters »Tuesday July 30, 2013 15:00-15:45
Short Papers Session 1 »Tuesday July 30, 2013 15:45-17:00
Session Chair: Milena Ivanova
Short Papers Session 2 »Tuesday July 30, 2013 17:15-18:15
Session Chair: Bill Howe
Panel: Education and career paths for data scientists »Wednesday July 31, 2013 8:30-9:30
Organizer: Magdalena Balazinska (University of Washington); Panelists: Susan Davidson (University of Pennsylvania), Bill Howe (University of Washington), Alexandros Labrinidis (University of Pittsburgh)


Demonstrations »Wednesday July 31, 2013 9:30-10:30
Session Chair: Alexandra Meliou
Research Session 5: Graphs and Indexes »Wednesday July 31, 2013 11:00-12:20
Session Chair: Susan Davidson

Abstracts

Research Session 1: Multidimensional DataMonday July 29, 2013 11:00-12:20
Session Chair: Dan Halperin
Research Session 2: Spatio-Temporal DataMonday July 29, 2013 13:20-15:00
Session Chair: Alexandros Labrinidis
Case-StudiesMonday July 29, 2013 15:20-17:00
Session Chair: Claudia Bauzer Medeiros
Coffee & PostersTuesday July 30, 2013 10:15-11:00
Research Session 3: Streaming and Time-series DataTuesday July 30, 2013 11:00-12:20
Session Chair: Jim French
Research Session 4: MiscellaneousTuesday July 30, 2013 13:20-15:00
Session Chair: Sylvia Spengler
Coffee & PostersTuesday July 30, 2013 15:00-15:45
Short Papers Session 1Tuesday July 30, 2013 15:45-17:00
Session Chair: Milena Ivanova
Short Papers Session 2Tuesday July 30, 2013 17:15-18:15
Session Chair: Bill Howe
DemonstrationsWednesday July 31, 2013 9:30-10:30
Session Chair: Alexandra Meliou
Research Session 5: Graphs and IndexesWednesday July 31, 2013 11:00-12:20
Session Chair: Susan Davidson