Preparing for the Era of Data Driven Astronomy

Mario Juric


Data rates and data volumes collected by modern astronomical experiments are growing at an unprecedented rate. Contemporary optical surveys (e.g., Pan-STARRS) are already approaching a ~terabyte of imaging/night, a number which will rise by another order of magnitude in time for the LSST. Rates in radio astronomy are already well over that, approaching petabytes/night for the largest experiments.

In the optical, the Large Synoptic Survey Telescope (LSST) will produce on average 15 terabytes of data per night, yielding an (uncompressed) data set of over 100 petabytes at the end of its 10-year mission. Dedicated HPC facilities will process the image data in near real time, with full-dataset reprocessings on annual scale.

Our ability to analyze this and other data sets rests on having the necessary know-how, computing capacity, and software tools. In this talk, I will review what LSST will deliver once operational and discuss implications of LSST-sized data sets re astronomy in the 2020s. I will generalize expectations for LSST to analysis problems in survey astronomy, and discuss what they imply for training of future data-driven astronomers and the community as a whole.