SPLASH 2017 as a Start Point (1)

1. Prologue

Volunteering @ SPLASH 2017, Vancouver, Canada

It’s my honor to attending the academic event SPLASH 2017 (with travel fund from PLMW even no publication). Thanks to all organizers and supporters for SPLASH 2017 and PL Mentoring Workshop (PLMW). I received an email for this from JHU CS Department at April 2017, but if hadn’t Celeste Hollenbeck told me it again at Programming Language Implementation Summer School PLISS 2017, I shouldn’t remember it and apply in the summer. This is the story. Then I applied for student volunteers. I am willing to cooperate with students sharing the same academic interest and also interested in how the community and conference work. Thanks to all to make my first time to academic conference a wonderful memory. The conference becomes another kind of tutorial on how to do research. It’s neither the same as details of research skills on PLMW, nor hands-on help from my advisor Dr. Scott Smith and some nice guys in my lab. I am remembering novelist Mo Yan’s Nobel lecture Storytellers

I must say that in the course of creating my literary domain, Northeast Gaomi Township, I was greatly inspired by the American novelist William Faulkner and the Columbian Gabriel García Márquez. I had not read either of them extensively, but was encouraged by the bold, unrestrained way they created new territory in writing, and learned from them that a writer must have a place that belongs to him alone. -Mo Yan

2. SPLASH 2017 PL Mentoring Workshop (PLMW)

PLMW is a one-day workshop for upper undergraduates and 1st/2nd year graduates. It aims to de-mystify the graduate school experience and offer first-hand perspective on graduate study, research and career from recent PhD graduates, young and senior researchers.

This workshop has seven talks (except welcome and wrap-up) and two panels (Slides are here).

Research is a social process / What Programming Languages Researchers Do and How by Kathryn S McKinley is an overview of PL search area. She introduces traditional topics inside PL as well as some fresh topics coming into PL. I agree with her idea “PL is fundamental to CS”. It’s one reason that I enjoy being a PL guy. The interesting fact I will remember for a long time is even papers proved to be excellent work some years later, got rejected for first few years.

Why do a Ph.D. and how to pick an area? by Yannis Smaragdakis starts from his own career path. Yannis got his home computer at the age of 13. At that age I started to programming for Olympiad in Informatics. His attitudes on PhD isn’t new to me. I have already thought this for a long time. In the other word, his talk is ensure. Even for his points on always facing people much outstanding than me, it’s fair and I have graduated from this (suffering). I should thank Yannis Smaragdakis for his sincerity since he mentions survival bias at last: Speakers are survivors of academic process.

The Influence of Programming Languages on Augmenting Human Understanding by Ben Zorn also starts from his story, shifts of research interest from 1st year PhD to now. Ben reviews PL in the evolution then gives topics combining PL with another areas e.g. verification of AI software. He also mentions rethinking PL and SE foundations is important to emerging technology in other CS areas.

What to do and who to ask? by Y. Annie Liu tries to answer these two questions. Her principles on the latter question is quite simple: ask best people you could find and do your homework well. She shares lots of her research experience on the perspective of asking. Asking is being mentoring. In her talking it looks so natural and straightforward.

My 25 years in OO (at ECOOP) by Jan Vitek talks his experience and advices in the perspective of ECOOP. Jan mentions “The 1st workshop you attend may shape your entire research career” therefore I am so lucky because of enjoyable PLMW and SPLASH. Jan also mentions rejection, saying it is a problem of communication failure. I guess rejection may be the trial of most young researchers. One focus of his talk is community, on how he join, serve, improve it, and how both sides benefit.

Choosing your mentor and learning to present your ideas clearly by Ana Milanova and Navigating the process of doing a rewarding Ph.D. by Mayur Naik focus on details in PhD life. Mayur also mentions anecdotes on paper reject. Vaccine is enough.

3. Conference Experience (papers and talks)

3.1 Overview

I am going to write my experience of the whole SPLASH. Some gives me great inspiration and some gives impression of my unfamiliar areas. Since several tracks are simultaneous (parallel and distributed …). It’s a pity that I had to miss some excellent talks. Every volunteer is expected to pick 5 or more shifts (one for half day). My work is mainly doing videos so if you gave presentation in my shift, we had talked!

My track is:

Date Section Content
Oct 22 Sun Escaped (attending and room assistant all day)
Oct 23 Mon GPCE Keynote: The Landscape of Refactoring Research in the Last Decade
  SLE Parsing
  GPCE Staging
Oct 24 Tue PLMW (all day)
Oct 25 Wed SPLASH Keynote: Eve: tackling a giant with a change in perspective
  OOPSLA Performance
  OOPSLA Gradual Types and Memory
  SPLASH-I Panel: 50 Years of Language Evolution: From Simula’67 to the Future
Oct 26 Thur SPLASH Keynote: Objects in the Age of Data
  OOPSLA Types and Language Design
  OOPSLA Verification
  OOPSLA Verification in Practice
Oct 27 Fri Onward! Keynote: How the languages we speak shape the ways we think
  OOPSLA Static Analysis
  Onward! New Languages

3.2 Escaped on the first day

Out of expectation, my first day is occupied by a “discussion-style” non-academic workshop Escaped 2017, hosted by Dennis Mancl and Steven D. Fraser. Dennis and Steven are software experts with many years industry experience. The workshop is a brainstorm-style discussion on leverage lab innovations into products. Here is the report for the workshop

I am the room assistant for this workshop, asking for a flip chart twice and power strip once. Then I just attended this workshop. I shared some programming languages stories e.g. Kotlin vs Java, evolving of NodeJS.

3.3 GPCE Keynote

Danny Dig gave GPCE Keynote The Landscape of Refactoring Research in the Last Decade. Danny’s talk reviews the prosperity of refactoring in the last decade. He uses some statistics of paper to show this trend. His students manually check thousand of papers. I asked him and confirmed this in coffee break. His research has an impact of IDE users as he mentions an asynchronous refactoring tool shipped with Visual Studio.

3.4 SLE - Parsing

Four talks are given in this section:

  1. Type-Safe Modular Parsing
  2. Incremental Packrat Parsing
  3. A Symbol-Based Extension of Parsing Expression Grammars and Context-Sensitive Packrat Parsing
  4. Red Shift: Procedural Shift-Reduce Parsing

Paper 1 concerns the type and semantics of modular parsing in OO. Paper 2 presents a “modification to the memoization mechanism … to support incremental parsing”. Paper 3 shows a extended PEGs that can “recognize practical context-sensitive grammars”. Paper 4 is on “a new design pattern for implementing parsers” which “behaves like shift-reduce parsers but eliminate ambiguity”.

Personally I am not a fan in parsing but this section shows me the area is covering theory and practice interest. Last semester I was doing an independent study on package management advised by Dr. Scott, which shares some concerns on modular problem but from different perspective.

3.5 GPCE - Staging

  1. Refining Semantics for Multi-stage Programming
  2. Staging for Generic Programming in Space and Time
  3. Staging with Control: Type-Safe Multi-stage Programming with Control Operators
  4. Code Staging in GNU Guix

Paper 1 talks about abstract machine semantics for multi-state languages. Paper 2 is on applying “generic programming in metaprogramming” and “extend the scope of generic programming into the dimension of time”. Paper 3 presents “a staged programming language with the expressive control operators” while “keeping the static guarantee of well typedness and well scopedness”. Paper 4 focuses on staging mechanism used in GNU Guix.

The term Multi-stage is fresh to me, but I thought of similar ideas. Conceptually, multi-stage adds more layers between the written code and the running code. I complain and think about macro, which enhances the expressiveness but also introduces more complexity. There is a gap between static and dynamic. Multi-stage may decrease the gap due to more layers reduces the granularity of gaps, may increase the gap due to more layers to concern. I believe better IDE support can exploit multi-stage in practice. For GNU Guix I read some before on my independent study, and now it seems more interesting.

3.6 GPCE - Variability

  1. A Classification of Variation Control Systems
  2. Analyzing the Impact of Natural Language Processing over Feature Location in Models
  3. How Preprocessor Annotations (Do Not) Affect Maintainability: A Case Study on Change-Proneness

Paper 1 talks about “a classification and comparison of selected variation control systems” and discuss “their comparably low popularity”. Paper 2 analyzes “the most common NLP techniques over Feature Location(FL)” in Software Engineering field. Papers 3 researches the relation between preprocessor annotations (e.g. C/C++ Macros) and software maintainability.

These three papers concern more about practical problems in software engineering. Version/variation control is the complexity of software from the perspective of develop time. I am fond of the research style done by paper 3, from a language feature to software practice. Preprocessor annotation is yet another way to introduce indirection. Remembering All problems in computer science can be solved by another level of indirection - David Wheeler. I worked with some programs trying to solve cross-platform problems via macros. In some scenarios I don’t prefer solving problems at this phase, however in practice you may have to do it since you may not control the building/module/configuration.

3.7 GPCE - Types

  1. Type Qualifiers as Composable Language Extensions
  2. Accurate Reification of Complete Supertype Information for Dynamic Analysis on the JVM
  3. Rewriting for Sound and Complete Union, Intersection and Negation Types

Paper 1 presents “type qualifiers as language extensions that can be automatically and reliably composed”. Paper 2 talks adding “reflective supertype information” to “dynamic analysis on the JVM”. Paper 3 presents “declarative rewrite rules” to automatically generate type system “involving unions, intersections and negations”.

I happened to know the term type qualifiers before SPLASH (I know the things type qualifiers refer to). My friend takes a course and his homework is specifying subtyping relation over some type qualifiers. This aroused my interest then scanned Jeffrey S. Foster’s paper. Rewriting gives me a feeling of multi-stage, but not sure.

3.8 SPLASH keynotes

  1. Eve: tackling a giant with a change in perspective, Chris Granger Oct 25 Wed.
  2. Objects in the Age of Data, Crista Lopes, Oct 26 Thur.

I have some stories in my pre-PL era to tell so I decide to put two keynotes together. Chris’ talk shares thoughts and perspectives on and beyond his new language and IDE Eve. Crista “revisits the history of object-oriented programming” in her talk.

My pre-PL era ends when I realized Programming Languages is a research area and then prepared to applying to graduate school in united states. I happened to read some their works that time. I was on my tour to Tibet when Apple released Swift and Playground in Xcode. I learned Objective-C several months before and suffered. I am very fond of Playground. Chris Lattner mentions “Playgrounds were heavily influenced by Bret Victor’s ideas, by Light Table and by many other interactive systems.” in his homepage. I had watched Bret Victor’s famous video Inventing on Principle. In search fo these material did I find Light Table and wrote a blog about it (in Chinese). As for Crista, I read her book Exercises in Programming Style as well as its github repo. I knew this due to I occasional found a post for this on InfoQ.

I very appreciate the historical perspective. By study history we may have better understanding of the present and a vision for the future. I heard of some similar stories from industrial friends, saying they have limited tickets to run their code on (cloud) computing platform. It’s periodic to “in that time the department has only two computers that each of us has only one hour in the lab course”. Another point is we may leave some good design which may be useful in some current scenario.

3.9 OOPSLA - Performance

  1. A Volatile-by-Default JVM for Server Applications
  2. Static Placement of Computation on Heterogeneous Devices
  3. Skip Blocks: Reusing Execution History to Accelerate Web Scripts
  4. Virtual Machine Warmup Blows Hot and Cold

Paper 1 presents a new memory model, that every variable has volatile by default, for JVM. Paper 2 gives a program annotation tool to speedup heterogeneous execution. Paper 3 is on a language doing web scripting with better performance. Paper 4 talks program performance considering initial warmup phase and steady phase in VM with JIT.

These four papers are not difficult in concept. Currently in Java we need explicitly use the keyword volatile. Bugs caused by volatile is sometimes hard to find out, since it’s one of “only two hard problems in Computer Science: cache invalidation and naming things” (– Phil Karlton). I remember in sophomore year, instructor of Fudan’s version of CMU ICS 15-213 told their group had spent two months to find a kernel bug caused by an extra volatile. Why I mention this? Because that teacher left my undergrad university to the rival university, became the dean and one leader of the institute. It’s the lab where first author of this paper worked in her undergrad. The method in Paper 2 is also very straightforward and efficient. I am not familiar with web-scripting in academic, but I read some posts on doing this in practice. It’s very tricky on offense and defense around web-scripting and web servers. I heard a presentation on an early version of Paper 4, given by Laurence Tratt at Programming Language Implementation Summer School(PLISS) 2017. I had a consistent worrying on benchmark: we don’t have a good runtime model/analysis/tool so we have to time. It sometimes recalls the time before the advent of algorithm complexity, when people compare algorithms mainly by benchmarking on them.

3.10 OOPSLA - Gradual Types and Memory

  1. Sound Gradual Typing: Only Mostly Dead
  2. Sound Gradual Typing Is Nominally Alive and Well
  3. The VM Already Knew That: Leveraging Compile-Time Knowledge to Optimize Gradual Typing
  4. Model Checking Copy Phases of Concurrent Copying Garbage Collection with Various Memory Models

Paper 1 shows a JIT compiler that can greatly reduce the overhead of sound gradual typing. Paper 2 presents a new language with a type system tackling gradual typing. Paper 3 presents a design turning type check into existing shape check which only has a little overhead. Paper 4 performs bound model checking on copy phases of various concurrent copying GC algorithms.

Titles of papers in this section are hilarious. Gradual typing is widely used but soundness is given up. The industrial consideration is efficiency. Paper 1, 2 and 3 all concern soundness while trying not to slowdown the whole system too much. Paper 1 and 3 is on existing language (Racket/Typed Racket, TypeScript/SaftTypeScript) while paper 2 is on a new language. My point is solving problems for existing problem is complicated considering compability and solutions on a new language can be more elegant since people more or less know the essential part.

3.11 SPLASH-I Panel: Language Evolution & Onward Keynote

  1. Panel: 50 Years of Language Evolution: From Simula’67 to the Future
  2. Onward 2017 Keynote: How the languages we speak shape the ways we think

The panel invites some big names to talk about language evoluation in recent (all) 50 years. The keynote, given by Lera Boroditsky, talks some ideas and stories in linguistics and cognitive science.

I do agree that the ways we think are shaped by the languages we speak. To be specific, I mean thinking are eroded by languages, but not determined. Some questions from the audience mention programming languages learn something from natural languages. I can agree this point to a limited degree. Say it in a metaphor, in early days human learn design of aircrafts from birds. In modern times, the improvement mainly benefit from engines, materials etc. It’s hard to make progress in these aspects by observing birds. I have a scenario where natural languages is helpful. Suppose you want to read a text file with “0” and “1” and convert to bytes in hexadecimal. It’s somewhat trickier to do this in some language rather than the other, even you can read documentation or search on Stack Overflow. Why trickier? In the facade of a programming language including keywords, statements or expressions, API and documents, some programming languages are better. Natural languages can help us find why and how to be better.

3.12 OOPSLA - Types and Language Design

  1. Familia: Unifying Interfaces, Type Classes, and Family Polymorphism
  2. Static Stages for Heterogeneous Programming
  3. Orca: GC and Type System Co-Design for Actor Languages
  4. Monadic Composition for Deterministic, Parallel Batch Processing

Paper 1 shows a single language integrated parametric polymorphism (generic type) and class inheritance. Paper 2 presents “a real-time graphics programming language for CPU-GPU systems”. Paper 3 introduces “a concurruent and parallel garbage collector for actor program” (Pony). Paper 4 presents a system turning deterministic execution of batch-processing programs into pure functions on files.

Paper 1 defines a new language focus on refining type-related features. It just makes me think of another unifying paper Unifying Typing and Subtyping. The conclusion part of paper 1 mentions “degree of expressive” and “surface complexity”. I think it’s a good perspective to consider language design, rather than just focus on some pure math-like properties. Paper 1’s author thanks Sophia Drossopoulou in acknowledgements. She is one author of paper 3, and gave a talk in PLISS 2017 on pony language, which uses orca(paper 3) in its virtual machine. Paper 2 solves pain points in graphics language for example GLSL, providing real-time experience by multi-stages. Paper 4 evaluates on bioinformatics data pipelines and software building system. Graphical languages and data/building pipelines, in some sense, are not “well-defined” code. In this perspective, Paper 3 and 4 tackle the same problem.

3.13 OOPSLA - Verification

  1. Seam: Provably Safe Local Edits on Graphs
  2. TiML: A Functional Language for Practical Complexity Analysis with Invariants
  3. FairSquare: Probabilistic Verification of Program Fairness
  4. Reasoning on Divergent Computations with Coaxioms

Paper 1 proposes a language expressing “local edits to graph-lie data structures” with a verification method. Paper 2 presents a language “with time-complexity annotations in types”, “supporting highly automated time-bound verification”. Paper 3 shows a verification tool fairness property of programs. Paper 4 proposes an approach based on coaxioms to capture divergence by induction and coinductions.

Topics of papers in this section is unfamiliar to me…

3.14 OOPSLA - Verification in Practice

  1. A Model for Reasoning about JavaScript Promises
  2. Robust and Compositional Verification of Object Capability Patterns
  3. A Verified Messaging System
  4. Who Guards the Guards? Formal Validation of the ARM v8-M Architecture Specification

Paper 1 presents a core calculus capturing the essence of ECMAScript 6 promises. Paper 2 develops a program logic for compositionally specifying and verifying Object Capability Pattern (OCP). Paper 3 actually concerns on a multicomponent vehicle-control system. Paper 4 demonstrates a formal validation of ARM’s v8-M spefication.

Paper 1 and 4 faces the same challenge: specification is not formal enough. Author of paper 1 says some programmers misunderstand the documentation, result in error code. In the q&a session, a man asks if the author finds some flaws insides the specification, if found please tell him since he is one of the specification authors. Paper 4 is just verification on specification. It unavoidable deals with some natural language part of the specification, which makes it more practical.

3.15 OOPSLA - Static Analysis

  1. IDEal: Efficient and Precise Alias-Aware Dataflow Analysis
  2. P/Taint: Unified Points-to and Taint Analysis
  3. Data-Driven Context-Sensitivity for Points-to Analysis
  4. Automatically Generating Features for Learning Program Analysis Heuristics for C-Like Languages

Paper 1 presents “an alias-aware extension to the framework for Interprocedural Distributive Environment (IDE) problems”. Paper 2 propose “a deep unification of information-flow and points-to analysis”, in which authors observe information-flow “is indeed identical to points-to analysis in some sense”. Paper 3 presents “a greedy algorithm” of analysis learning from heuristic rules. Paper 4 presents a technique for generating features from training programs.

Paper 4 presents itself “data-driven program analysis” which happen to have the same abbreviation with the project in my lab Demand-Driven Program Analysis (DDPA).

3.16 Onward! - New Languages

  1. [Infra: Structure All the Way Down - Structured Data as a Visual Programming Language]
  2. Selfie and the Basics
  3. [Systems Level Liveness with AnonSystem]

This is the last presentation section I took in SPLASH and they are super interesting. Infra is a language with a structured visual editor. In presentation, author shows its flexibility for example to embedding CSS visually in document and how the corresponding data structure changing as the document changing. selfie is a tiny self-compiling C compiler, a MIPS emulator and a self-hosting MIPS hypervisor contained in one 7k lines C file. It’s a minimal(what I appreciate) craft but contains sufficient fundamental concepts (and code) in programming language and operating system. Author also uses selfie in his computer science introduction course. Last presentation is on impromptu. It’s a lisp-dialect language for music and multimedia with living programming environment. Section host Adrian Sampson says it’s the first time he sees presenter takes a speaker (It’s BOSE SoundLink Mini II). I have seen some programming languages for artists e.g. I used processing.js on a small project. Live-coding makes impromptu more fancier.

4. Papers Critics (maybe)

Most papers and talks are more or less inspirational. As a newcomer, papers from various subareas present different appealing to me. Hybrid PL with other areas in computer science is also interesting e.g. that paper using Natural Language Processing (NLP) on verification on ARM Specification.

However, I have personal taste. I don’t appreciate some approaches to problems that could have analytically exact solutions. Using fancy techniques may result in immediately good results, but it doesn’t head us get close to the putting green. To say in a metaphor, I don’t like doing calculus in numerical methods (not intended to irritate numerical guys).

5. Summary and Next

I decide to end this post here, leaving non-(directly-related)-academic part later (but soon). Honestly, I am more willing to write on chats, anecdotes and people met in coffee breaks, lunches, posters/ice-cream receptions and banquet. It could be more fun and easy. I am definitely more clear and conscious of my principles: I want (my research) to start in PL theory but end in impacts on industries and engineering. I don’t want to dig diamonds without selling or fortify my drill. This thought doesn’t come from a single sentence, but from the holography of this conference. The conference de-mystifies some academic world just-in-time, playing an important role in my career and ongoing application for a PhD program.

I got lots of homework from here. I could also write something on them when finished.