Portfolio: Jobson

A webserver + UI that turns command-line applications into a job system.

Key Points

Developed to tackle real-world problems Gaia developers faced
Frontend developed with SemanticUI (styling), React + Javascript (code), and Webpack (build).
Frontend entirely separate from server project.
Backend developed with Dropwizard (web framework), Java, Maven (build). Uses process forking to run jobs
Typical requests are 1-100 MiB, because some queries can be for millions of stars.
Typical results sets are 0.5-20 GiB
Typical jobs use python scripts to run Hadoop + Spark applications
Running in production for around 1 yr (as of Jul 2018) with minimal breaking changes
Still under active development. Used by other teams.

Gaia scientists depend on up-to-date satellite data to conduct their research
The applications that extract the data they need are complex, change frequently, and require many dependencies. Only specialist developers can realistically can run them. Executing the request is a time drain on the developers. Manual execution can be inconsistent.

Data requests can be large. For example, a typical request may include > 5 million IDs to retrieve from (e.g.) a Hadoop cluster
The resulting datasets can be large (1-100 GiB). Server must be able to handle this
Because the underlying jobs are quite different (e.g Java, Bash, Python), the system needs to be configurable
Jobs will need to run on remote clusters (e.g. Hadoop/Spark)

Make the UI flexible enough to automatically generate inputs/outputs on-demand.
Make the server’s API configuration-driven: developers write a job spec in YAML, which the server uses to generate the API. This means no code needs to be written when a new job needs to be hosted.
Leverage the power of the operating system to do the actual execution. Jobson forks off a separate process for each job, which means jobs can be written in entirely different programming languages and work fine.
Use simple data structures for configuration and persistence. Each job is persisted as a bunch of simple files in jobs/, each spec is persisted as a bunch of files in specs/.