Archive

Posts Tagged ‘Visitors’

Julia Stoyanovich and Gerome Miklau are going to give a talk at Télécom ParisTech on December 5th

November 15th, 2011
Comments Off

Webdam is very happy to welcome you at Télécom ParisTech on December 5th to the talk organized by Pierre Senellart.

This will take place in “Télécom ParisTech” 46, rue Barrault – 75013 Paris in room C017 in the basement.

Planning:

Gerome Miklau talk abstract

Using Inference to Improve the Accuracy of Differentially-Private Output

Differential privacy is a rigorous privacy standard that protects against powerful adversaries, offers precise accuracy guarantees, and has been successfully applied to a range of data analysis tasks. When differential privacy is satisfied, participants in a dataset enjoy the compelling assurance that information released about the dataset is virtually indistinguishable whether or not their personal data is included.

Differential privacy is achieved by introducing randomness into query answers, and a major goal of research in this area is to devise methods that offer the best accuracy for a fixed level of privacy. The original algorithm for achieving differential privacy, commonly called the Laplace mechanism, returns the true answer after the addition of random noise drawn from a Laplace distribution. If an analyst requires only the answer to a single query about the database, then a version of the Laplace mechanism is known to offer optimal accuracy. But the Laplace mechanism can be severely suboptimal when a set of correlated queries are submitted, and despite much recent work, optimal strategies for answering a collection of correlated queries are not known.

After reviewing the basic principles of differential privacy, I will describe two examples of how query constraints and statistical inference can be used to construct more accurate differentially-private algorithms, with no privacy penalty. The first example comes from our recent work investigating the properties of a social network that can be studied without threatening the privacy of individuals and their connections. I will show that the degree distribution of a network can be estimated privately and accurately by asking a special query for which constraints are known to hold, and then exploiting the constraints to infer a more accurate final result. The second example comes from the analysis of more typical tabular data (such as census or medical data). When answering a set of predicate counting queries, I will show that correlations amongst the queries can be exploited to significantly reduce error introduced by the privacy mechanism.

Julias Stoyanovich talk abstract

Ranked Exploration of Large Structured Datasets

In online applications such as Yahoo! Personals and Trulia.com, users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of ranked matches. Highly ranked results tend to be homogeneous, which hinders data exploration. For example, a dating website user who is looking for a partner between 20 and 40 years old, and who sorts the matches by income from higher to lower, will see a large number of matches in their late 30s who hold an MBA degree and work in the financial industry, before seeing any matches in different age groups and walks of life. An alternative to presenting results in a ranked list is to find clusters, identified by a combination of attributes that correlate with rank, and that allow for richer exploration of the result set.

In the first part of this talk I will propose a novel data exploration paradigm, termed rank-aware interval-based clustering. I will formally define the problem and, to solve it, will propose a novel measure of locality, together with a family of clustering quality measures appropriate for this application scenario. These ingredients may be used by a variety of clustering algorithms, and I will present BARAC, a particular subspace-clustering algorithm that enables rank-aware interval-based clustering in domains with heterogeneous attributes. I will present results of a large-scale user study that validates the effectiveness of this approach. I will also demonstrate scalability with an extensive performance evaluation on datasets from Yahoo! Personals, a leading online dating site, and on restaurant data from Yahoo! Local.

In the second part of this talk I will describe on-going work on data exploration for datasets in which multiple alternative rankings are defined over the items, and where each ranking orders only a subset of the items. Such datasets arise naturally in a variety of application domains, including social (e.g., restaurant and movie rating sites) and biological (e.g., analysis of genetic data). In these datasets there is often a need to aggregate multiple rankings, computing, e.g., a single ranked list of differentially expressed genes across a variety of experimental conditions, or of restaurants that are well-liked by one’s friends. I will argue that blindly aggregating multiple rankings into a single list may lead to an uninformative result, because it may not fully leverage opinions of different, possibly disagreeing, groups of judges. I will describe a framework that robustly identifies ranked agreement, i.e., it finds groups of judges whose rankings can be meaningfully aggregated. Finally, I will show how structured attributes of items and of judges can be used to guide the process of identifying ranked agreement, and to describe the resulting consensus rankings to a user.

Bio:
Julia Stoyanovich is a Visiting Scholar at the University of Pennsylvania. Julia holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics and Statistics from the University of Massachusetts at Amherst. After receiving her B.S. Julia went on to work for two start-ups and one real company in New York City, where she interacted with, and was puzzled by, a variety of massive datasets. Julia’s research focuses on modeling and exploring large datasets in presence of rich semantic and statistical structure. She has recently worked on personalized search and ranking in social content sites, rank-aware clustering in large structured datasets that focus on dating and restaurant reviews, data exploration in repositories of biological objects as diverse as scientific publications, functional genomics experiments and scientific workflows, and representation and inference in large datasets with missing values.

Events , , , ,

Julia Stoyanovich is going to visit Webdam on December

November 8th, 2011
Comments Off

Webdam look forward to welcoming Julia Stoyanovich from December 2nd to 9th.

Julia Stoyanovich is currently a visiting scholar at the University of Pensylvania, where she works with Professor Susan Davidson and her group’s research.

Julia Stoyanovich is motivated by the data management needs of life sciences applications and of social information processing. She is working on incorporating semantic context into search, ranking, and data exploration in large complex datasets. She is also interested in managing provenance in scientific workflows, and in the related privacy and security considerations.

News

Gerome Miklau is going to visit Webdam on December

November 3rd, 2011
Comments Off

Webdam look forward to welcoming Gerome Miklau from December 1st to 7th.

Gerome Miklau is an associate professor at the University of Massachusetts, Amherst. Professor Miklau’s research interests are in the area of Database research with an emphasis on security; database theory; semi-structured data. The objective of his research is to enable secure and trustworthy data management in both conventional database systems and distributed environments like the World Wide Web. His work focuses on classical security concerns such as confidentiality, privacy, and integrity of data.

News

Hyunjung Park visiting Webdam

December 1st, 2010
Comments Off

Hyunjung Park is visiting webdam Friday 17 December. He is a Ph.D. student in the Department of Electrical Engineering at Stanford University. His advisor is Jennifer Widom. He is the winner of the SIGMOD 2010 Programming Contest, organized by Pierre Senellart and Serge Abiteboul. Teams of contestants from degree-granting institutions had to develop an efficient distributed query engine on top of an in-memory index. The competition received much attention, with 29 teams from 23 different institutions over the world. H. Park will give a talk on its experience of the SIGMOD 2010 Programming Contest at 2pm in the meeting room G008 of INRIA-Saclay .

Title: SIGMOD 2010 Programming Contest

Abstract: Student teams have competed to build a distributed query engine in the SIGMOD 2010 Programming Contest. In this talk, H. Park will describe the task briefly and present the winning system. Specifically, he will cover the design choices and implementation issues for the query planner and executor. Also, he will discuss several optimizations and their effectiveness.

News

Moshe Vardi visiting Webdam

October 13th, 2010
Comments Off

Moshe Y. Vardi is visiting Webdam Wednesday 3 November. He is Karen Ostrum George Professor in Computational Engineering and Director of the Computer and Information Technology Institute ((Rice University, Houston, Texas). His interests focus on applications of logic to computer science, including database theory, finite-model theory, knowledge in multi-agent systems, computer-aided verification and reasoning, and teaching logic across the curriculum. He will present his work on propositional satisfiability in the meeting room of ENS Cachan (Pavillon des Jardins) at 11:00am.

Title: Symbolic Techniques in Propositional Satisfiability Solving

Abstract: Search-based techniques in propositional satisfiability (SAT) solving have been enormously successful, leading to what is becoming known as the “SAT Revolution”. Essentially all state-of-the-art SAT solvers are based on the Davis-Putnam-Logemann-Loveland (DPLL) technique, augmented with backjumping and conflict learning. Much of current research in this area involves refinements and extensions of the DPLL technique. Yet, due to the impressive success of DPLL, little effort has gone into investigating alternative techniques. This work focuses on symbolic techniques for SAT solving, with the aim of stimulating a broader research agenda in this area.

Refutation proofs can be viewed as a special case of constraint propagation, which is a fundamental technique in solving constraint-satisfaction problems. The generalization lifts, in a uniform way, the concept of refutation from Boolean satisfiability problems to general constraint-satisfaction problems. On the one hand, this enables us to study and characterize basic concepts, such as refutation width, using tools from finite-model theory. On the other hand, this enables us to introduce new proof systems, based on representation classes, that have not been considered up to this point. We consider ordered binary decision diagrams (OBDDs) as a case study of a representation class for refutations, and compare their strength to well-known proof systems, such as resolution, the Gaussian calculus, cutting planes, and Frege systems of bounded alternation-depth. In particular, we show that refutations by ODBBs polynomially simulate resolution and can be exponentially stronger.

We then describe an effort to turn OBDD refutations into OBBD decision procedures. The idea of this approach, which we call “symbolic quantifier elimination”, is to view an instance of propositional satisfiability as an existentially quantified propositional formula. Satisfiability solving then amounts to quantifier elimination; once all quantifiers have been eliminated we are left with either 1 or 0. Our goal here is to study the effectiveness of symbolic quantifier elimination as an approach to satisfiability solving. To that end, we conduct a direct comparison with the DPLL-based ZChaff, as well as evaluate a variety of optimization techniques for the symbolic approach. In comparing the symbolic approach to ZChaff, we evaluate scalability across a variety of classes of formulas. We find that no approach dominates across all classes. While ZChaff dominates for many classes of formulas, the symbolic approach is superior for other classes of formulas.

Finally, we turn our attention to Quantified Boolean Formulas (QBF) solving. Much recent work has gone into adapting techniques that were originally developed for SAT solving to QBF solving. In particular, QBF solvers are often based on SAT solvers. Most competitive QBF solvers are search-based. Here we describe an alternative approach to QBF solving, based on symbolic quantifier elimination. We extend some symbolic approaches for SAT solving to symbolic QBF solving, using various decision-diagram formalisms such as OBDDs and ZDDs. In both approaches, QBF formulas are solved by eliminating all their quantifiers. Our first solver, QMRES, maintains a set of clauses represented by a ZDD and eliminates quantifiers via multi-resolution. Our second solver, QBDD, maintains a set of OBDDs, and eliminate quantifiers by applying them to the underlying OBDDs. We compare our symbolic solvers to several competitive search-based solvers. We show that QBDD is not competitive, but QMRESS compares favorably with search-based solvers on various benchmarks consisting of non-random formulas.

News

Balder ten Cate visiting Webdam

June 17th, 2010
Comments Off

Balder ten Cate is visiting Webdam from Monday 5 July to Saturday 10 july 2009 and from Saturday 17 July to Saturday 24 July. He is a postdoctoral researcher in the database theory group at UC Santa Cruz.

News

Fabian & Yannis & Summer School

May 19th, 2010
Comments Off
  1. Fabian Suchanek who recently joined Webdam received the Honorable Mention at ACM Sigmod  Dissertion Awards.
  2. Yannis Papakonstantinou will be visiting Webdam for 3 weeks in July.
  3. The BDA summer school co-organized by Webdam is happening now in Les Houches. An important part of the Webdam textbook is being tested.

News , ,

Gerome Miklau visiting Webdam

April 21st, 2010
Comments Off

Gerome Miklau is visiting Webdam Wednesday 2 June at 2pm. He is an Assistant Professor at the Computer Science Department of University of Massachusetts, Amherst. He will present his work on differential privacy in the meeting room of LSV at ENS Cachan.

Title: Optimizing Linear Counting Queries Under Differential Privacy

Abstract: Differential privacy is a rigorous privacy standard that protects against powerful adversaries, offers precise accuracy guarantees, and has been successfully applied to a range of data analysis tasks. When differential privacy is satisfied, participants in a dataset enjoy the compelling assurance that information released about the dataset is virtually indistinguishable whether or not their personal data is included.

Differential privacy is achieved by introducing randomness into query answers. The original algorithm for achieving differential privacy, commonly called the Laplace mechanism, returns the true answer after the addition of random noise drawn from a Laplace distribution. If an analyst requires only the answer to a single query about the database, then the Laplace mechanism is known to be optimal. But the Laplace mechanism can be highly suboptimal when a set of correlated queries are submitted, and despite much recent work, optimal strategies for answering a collection of correlated queries are not known in general.

In this talk I will review the basic principles of differential privacy and then describe the “matrix mechanism”, a new algorithm for answering a workload of predicate counting queries. Given a workload, the mechanism first requests answers to a different set of queries, called a query strategy, which are answered using the standard Laplace mechanism. Noisy answers to the workload queries are then derived from the noisy answers to the strategy queries.

When the strategy queries are chosen appropriately, this two stage process increases accuracy (with no cost in privacy) by answering the workload queries using a more complex, correlated noise distribution. I will show that two recently-proposed algorithms, which provide accurate answers for the set of all range queries, can be seen as instances of the matrix mechanism. I will then present results on optimally choosing the query strategy to minimize the error for any given workload.

This talk is based on forthcoming work that will appear in PODS 2010 and VLDB 2010, and is joint with Chao Li, Michael Hay, Vibhor Rastogi, Andrew McGregor, and Dan Suciu.

News

News for the beginning of the year

January 15th, 2010
Comments Off

The webdam team wish you an happy new year!

The project is now running for more than one year. We have already found some interesting results in a large spectrum of directions: task sequencing in data-driven workflow, querying probabilistic data, access control on the web and web archiving. You will find more results on the annual activity report . We are really excited by the new challenges we will meet this year.

More practicaly, Tova Milo and Balder ten Cate will both visit Webdam end of january.

News ,

Sihem Amer-Yahia visiting Webdam

December 1st, 2009
Comments Off

Sihem Amer-Yahia is visiting Webdam from Tuesday 1 December to Thursday 31 December. She is a Senior Research Scientist at Yahoo! Research New-York. She will present her work on recommendation Friday 4 December at 3pm in the meeting room G008.

Title: I’ll Have What She’s Having: Recommendations on Social Content Sites

Abstract: We examine the challenges behind recommendations in social content sites. We use collaborative tagging sites (think del.icio.us, YouTube and Yahoo!Travel) as our application and report on our experiments in harvesting the collective tagging behavior to serve relevant content (think URLs, videos, travel destinations) to users. We address well-known and lesser-known problems in recommender systems such as over-specialization and data management for the masses. We conclude with open questions.

News