A Look Inside the Think Tank...Work by Thomas Steiner
International Conference on Web Engineering (ICWE2015): Trip Report
Last week, I attended the 15th International Conference on Web Engineering in Rotterdam, the Netherlands. Google was one of the industry sponsors and Google Zurich's Enrique Alfonseca delivered one of the keynotes on news processing at Google and general advances in language understanding. As the name suggests, the focus of the conference was on Web engineering aspects; so below, in my list of personal paper highlights, I have also included a number of demo papers:
Beyond Graph Search: Exploring and Exploiting Rich Connected Data Sets: Discusses open questions and research directions for (knowledge) graph search (by a former Bing intern).
Conflict Resolution in Collaborative User Interface Mashups: Shows how to resolve conflicts in a collaborative iGoogle-like user interface using the operational transformation algorithm.
Tilt-and-Tap: Framework to Support Motion-Based Web Interaction Techniques: Nice demo of mobile device interaction patterns that might be interesting for photo gallery exploration.
SUMMA: A Common API for Linked Data Entity Summaries (best paper candidate): An API interface description for comparing entity summaries (i.e., ranked facts for an entity as presented in knowledge panels on Google, Bing, Yahoo!).
Curtains Up! Lights, Camera, Action! Documenting the Creation of Theater and Opera Productions with Linked Data and Web Technologies (disclosure: my paper): Web Components built with Polymer for the creation of hypervideos and the consumption of Linked Data Fragments.
- General theory of reactivity by Kris Kowal (Uber).
- How to reduce mobile network latency and battery drain on front-end by Dmytrii Shchadei (ex-Yandex). Some valuable hints for when to send beacons (or lazy-load content) and when not to in order to save our users' precious battery lives.
- A simple but non-trivial React Native app by Sven Anders Robbestad, if you ever wanted to get an impression of the basic mechanics of the framework.
- Concurrency in ES6+ by Ingvar Stepanyan.
- An introduction to game development with Phaser.io by Belen Albeza (Téléfonica).
- WebGL slides on virtual reality by Martin Naumann, features Google Cardboard.
- Slides from my own talk on disaster monitoring with Wikipedia and Google Maps below.
World Wide Web Conference (WWW2015)—Trip Report
The week before last, I attended the 24th International World Wide Conference ( WWW2015 ) in Florence, Italy. Google was a gold sponsor, and Google's Distinguished Scientist Andrei Broder delivered one of the main keynotes. The core proceedings and the companion proceedings are available online. This is my trip report with personal highlights and key take-aways.
Workshops, Day 1
I started the conference on Monday with the Workshop on Web APIs and RESTful Design ( WS-REST ) that I have co-organized together with Ruben Verborgh (University of Gent) and Carlos Pedrinaci (The Open University). We had three main themes in the workshop: testing, hypermedia and semantics, and REST in practice. The day started with a keynote delivered by Erik Wilde ( ex-Siemens ); one of his main points—that also got identified as a general workshop theme—was that the REST world, despite all self-descriptiveness, still needs service descriptions and better testability. Erik shared his keynote slides on his personal website. The WS-REST proceedings can be found online. Personally, I liked Ronnie Mitra 's (CA Technologies) slides and paper on his upcoming API design tool Rápido a lot.
One of the workshop attendants, Michael Petychakis , also wrote a workshop report .
On the same day, I also had an accepted paper in the Workshop Ad Targeting at Scale ( TargetAd ), co-organized by Googler D. Sculley . The title of my paper is AdAlyze Redux: Post-Click and Post-Conversion Text Feature Attribution for Sponsored Search Ads . In the paper, I describe a tool in use in my organization at Google to show large-scale advertisers what textual features work in their ads. The workshop triggered broad industry interest with presenters and speakers coming from Twitter, Yahoo!, Etsy, Adobe, eBay, Facebook, and Google (D. Sculley). The TargetAd proceedings are available online.
Workshops, Day 2
I spent the first half of Tuesday morning in the Workshop Linked Data on the Web ( LDOW ), and the second half in the Workshop on Web and Data Science for News Publishing ( NewsWWW ). From LDOW , I want to highlight DBpedia Atlas , an alternative visualization of DBpedia ( demo ). NewsWWW had an interesting paper on gender bias in news images . In the afternoon, I attended Facebook's Antoine Bordes ' and Google's Evgeniy Gabrilovich 's tutorial on Constructing and Mining Web-scale Knowledge Graphs (slides from KDD 2014 , but similar enough to the ones at WWW).
Main Conference, Day 1
The main conference began with a keynote by Jeanette Hofmann (Berlin University of the Arts), who raised a number of critical points that she named "dilemmas of digitalization". She especially mentioned the Right to be Forgotten and how (personal) data has become the currency we pay our free apps with.
A number of papers from Wednesday morning that I want to highlight are The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk on crowdsourcing with Amazon's Mechanical Turk, a Facebook study on The Lifecycles of Apps in a Social Ecosystem where they study, among other things, app sustainability, and finally a Google paper on account recovery secret questions titled Secrets, Lies, and Account Recovery: Lessons From the Use of Personal Knowledge Questions at Google .
In the afternoon, I listened to Philipp Singer 's presentation of their paper HypTrails: A Bayesian Approach for Comparing Hypotheses about Human Trails on the Web (best paper award) , wherein they present "a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states" . Further of interest to me was a Google paper titled Getting More for Less: Optimized Crowdsourcing with Dynamic Tasks and Goals where the authors "optimize the crowdsourcing process by jointly maximizing the user longevity in the system and the true value that the system derives from user participation" . The Yahoo! paper Evolution of Conversations in the Age of Email Overload looked at 16 billion emails between 2 million users and studied the reply times and reply lengths as indicators of how people deal with email overload. The task of benchmarking entity annotation systems reproducibly was addressed in the paper GERBIL - General Entity Annotator Benchmarking Framework .
I follow privacy implications of Web tracking critically (probably due to my day job ), so the paper Cookies That Give You Away: The Surveillance Implications of Web Tracking was of great interest to me. I generally liked the track Security and Privacy 3 – Browsers a lot. Related to my PhD research on breaking news events and their perception in online social networks , I enjoyed the paper Crowdsourcing the Annotation of Rumourous Conversations in Social Media very much.
Main Conference, Day 2
I started Thursday after the keynote with an interesting Yahoo! paper on explorative entity search titled From "Selena Gomez" to "Marlon Brando": Understanding Explorative Entity Search that identified query patterns that lead to explorative searching. A somewhat emotional paper that certainly raises privacy warning flags was Diagnoses, Decisions, and Outcomes: Web Search as Decision Support for Cancer , which examined search behavior of patients detected with cancer. The paper Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia looked at identifying missing hyperlinks in Wikipedia.
During the lunch break, I called in an informal meeting of the W3C Media Fragments WG and interested friends in order to discuss extensions to Media Fragments URI by allowing for more than rectangular spatial fragment shapes and dynamic moving spatial fragments. The notes are on the mailing list.
In the afternoon, I attended the Industry Knowledge Graphs PechaKucha 20×20 and Panel where Googler Chris Welty presented the Google Knowledge Graph , Yuqing Gao gave an overview of Microsoft's (Bing's) Satori , Paul Groth talked about Elsevier's scholarly publications graph, and Lora Aroyo presented Tagasauris' mediaGraph. This also touched on my 20% project together with Googlers Denny Vrandečić and Sebastian Schaffert around migrating Freebase to Wikidata via a crowdsourcing approach titled primary sources tool.
From the posters and demos session in the evening, I want to highlight whoVIS: Visualizing Editor Interactions and Dynamics in Collaborative Writing Over Time , which deals with visualizing editor interactions in Wikipedia ( demo ).
Main Conference, Day 3
Friday began with Andrei Broder's excellent keynote How good was the crystal ball? A personal perspective and retrospective on favorite Web research topics where he first looked back at search engines and what worked and what did not work (subscribing to pages for obtaining change notifications). I especially liked the outlook he gave for semantic smart agents and how Google Now is just the beginning.
Again driven by my PhD topic, I followed the paper presentation of Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts who showed how scepticism-driven follow-up queries on questionable news-spreading posts may reveal rumors early on. A fun paper was User Review Sites as a Resource for Large-Scale Sociolinguistic Studies that, among other things, detected that users older than 34 mostly use smileys with nose " :-) " , those younger than 34 without nose " :) ".
- People start to get tired of PDF proceedings when past WWW conferences explicitly required HTML submissions, as highlighted by Andrei Broder in his keynote. RASH : Research Articles in Simplified HTML is an attempt to bring back the Web to WWW.
- Larry Page and Sergey Brin were awarded the Test of Time award for their paper The Anatomy of a Large-Scale Hypertextual Web Search Engine .
- Splitting the poster and demos session into two sub-sessions is a great idea that certainly reduced my personal perceptual overload.
- Commonly frowned upon and joked about, the lack of any formal speech or program topic at all during the (not so gala) dinner felt somewhat inadequate.
- Paul Groth wrote a WWW trip report , too, as did Amy Guy with her WWW observations , eXascale with their blog post, and Daniel Garijo with his "first time at WWW" post.
Last week, I attended the 13th International Semantic Web Conference in Riva del Garda, Italy. Google was a gold sponsor, and Vice President Prabhakar Raghavan delivered one of the keynotes. This is my trip report with personal highlights and key take-aways.
I started the conference on Sunday with the Developers Workshop, where I had two papers. The workshop was the first of its kind and was put together by my good friend Ruben Verborgh. It pulled more than 70 people in the room and the workshop was prominently featured during the main conference's opening ceremony.Dandelion. Liepi? et al. showed an ontology visualizer called OWLGrEd. Ebner et al. showed a system called LDcache that deals with caching flaky Linked Data sources. With XSPARQL, DellAglio et al. presented a language and implementation combining XML, SPARQL, and SQL to query heterogeneous data sources. Matteis et al. showed how App Engine or Google Code among others can be used as "free" and queryable triple pattern data stores. Ceccarelli showed an entity linking framework called Dexter. My first contribution is titled Comprehensive Wikipedia Monitoring for Global and Realtime Natural Disaster Detection and focuses on natural disaster detection and monitoring with Wikipedia and online social networks. My second contribution is a paper called Self-Contained Semantic Hypervideos Using Web Components and introduces Web Components for the creation of hypervideos. Consuming Linked Data workshop. The most interesting paper for me was by Rula et al., which dealt with the recency of facts in DBpedia. In the afternoon, I switched to the NLP and DBpedia workshop where the highlight was an amazing 300 slides in 30 minutes keynote by Roberto Navigli on BabelNet, Babelfy, Games with a Purpose, and the Wikipedia Bitaxonomy. Further of interest was a paper by Weisenburger et al. on mining historical data for DBpedia via Wikipedia infoboxes.
Tuesday started with Prabhakar's well-received keynote, in which he provided an overview of search engine development in the last years. His book has a nice summary. I then went to the NLP & IEs track, where the best-paper-award-winning paper on the AGDISTIS framework on entity disambiguation by Usbeck et al. was presented. In the afternoon, I attended the Data Integration and Link Discovery track. I liked a paper by Erxleben et al. that described the integration of Wikidata in the Linked Data Web. From the demos in the evening, I want to specially highlight the best-demo-award-winning paper by my friends Verborgh et al. on Linked Data Fragments on a Raspberry Pie. In general, Linked Data Fragments were one of the themes at this conference with several works citing them and also the release of the official DBpedia Linked Data Fragments interface.paper by Uchida et al. who presented a Chrome extension on browser personalization. I further liked a paper by Khamkham et al. on the CrowdTruth framework for harnessing disagreement in gathering annotated data. In the afternoon, my personal highlight was Verborgh et al.'s full paper on Linked Data Fragments.
I skipped Thursday morning and was back in the afternoon for the Linked Data track. Notable papers include Beek et al.'s LOD Laundromat that provides a solution for streamlining access to Linked Data sources by cleansing and format conversion and Patel-Schneider's analysis of Schema.org and some (author's view) recommendations on how to improve it. Meusel et al. gave an overview of the current state of WebDataCommons project that examines Microdata, RDFa, and Microformats distribution in the CommonCrawl corpus.
PhD thesis successfully defended
I have finally defended my PhD thesis. A raw, unedited recording of the defense is available on YouTube
You can check out my slide deck that I used on http://tomayac.com/phd and the PDF of the thesis itself is available at http://tomayac.com/phd/thesis.pdf. The source code of the thesis is available in the GitHub repository https://github.com/tomayac/phd. I guess this makes me officially Dr. Thomas Steiner from now on.