Developed to tackle real-world problems Gaia developers faced
(code), and Webpack (build).
Frontend entirely separate from server project.
Backend developed with Dropwizard (web framework), Java, Maven
(build). Uses process forking to run jobs
Typical requests are 1-100 MiB, because some queries can be for
millions of stars.
Typical results sets are 0.5-20 GiB
Typical jobs use python scripts to run Hadoop + Spark applications
Running in production for around 1 yr (as of Jul 2018) with minimal
Still under active development. Used by other teams.
Gaia scientists depend on up-to-date satellite data to conduct their
The applications that extract the data they need are complex, change
frequently, and require many dependencies. Only specialist
developers can realistically can run them. Executing the request is
a time drain on the developers. Manual execution can be
Data requests can be large. For example, a typical request may
include > 5 million IDs to retrieve from (e.g.) a Hadoop cluster
The resulting datasets can be large (1-100 GiB). Server must be able
to handle this
Because the underlying jobs are quite different (e.g Java, Bash,
Python), the system needs to be configurable
Jobs will need to run on remote clusters (e.g. Hadoop/Spark)
Make the UI flexible enough to automatically generate
Make the server’s API configuration-driven: developers write a job
spec in YAML, which the server uses to generate the API. This means
no code needs to be written when a new job needs to be hosted.
Leverage the power of the operating system to do the actual
execution. Jobson forks off a separate process for each job, which
means jobs can be written in entirely different programming
languages and work fine.
Use simple data structures for configuration and persistence. Each
job is persisted as a bunch of simple files in jobs/, each spec is
persisted as a bunch of files in specs/.