Big Data in Radio Astronomy: Scientific Data Processing for Advanced Radio Telescopes provides the latest research developments in big data methods and techniques for radio astronomy. Providing examples from such projects as the Square Kilometer Array (SKA), the world's largest radio telescope that generates over an Exabyte of data every day, the book offers solutions for coping with the challenges and opportunities presented by the exponential growth of astronomical data. Presenting state-of-the-art results and research, this book is a timely reference for both practitioners and researchers working in radio astronomy, as well as students looking for a basic understanding of big data in astronomy.

Bridges the gap between radio astronomy and computer science
Includes coverage of the observation lifecycle as well as data collection, processing and analysis
Presents state-of-the-art research and techniques in big data related to radio astronomy
Utilizes real-world examples, such as Square Kilometer Array (SKA) and Five-hundred-meter Aperture Spherical radio Telescope (FAST)

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Big Data in Astronomy by Linghe Kong,Tian Huang,Yongxin Zhu,Shenghua Yu in PDF and/or ePUB format, as well as other popular books in Scienze fisiche & Astronomia e astrofisica. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Elsevier

Year

2020

ISBN

9780128190852

Topic

Scienze fisiche

Subtopic

Astronomia e astrofisica

Part C

Computing technologies

Execution framework technology

Ying Mei^a; Rodrigo Tobar^b; Chen Wu^b; Hui Deng^a; Shoulin Wei^c; Feng Wang^a^,^c ^a Center for Astrophysics, Guangzhou University, Guangzhou Higher Education Mega Center, Guangzhou, China
^b International Center for Radio Astronomy Research (ICRAR), The University of Western Australia, Crawley, Perth, WA, Australia
^c Kunming University of Science and Technology, Chenggong District, Kunming, China

Abstract

Astronomy has become one of the biggest consumers of computing resources in the past 10 years. Therefore, new computational solutions (both hardware and software) are emerging, dedicated to the various fields of science. For example, in radio astronomy, instruments are becoming extremely large, much like the square kilometer array and its precursor telescopes. Thus, it is expected that astronomical experiments will become larger in all dimensions: larger data collections, more accurate data analysis and processing, and more detailed results. The significant increase of these large-size experiments performed by instruments built around the world requires not only huge processing power, but also clever system design. We realize that execution framework technology is becoming a significant component for modern astronomical data processing.

Keywords

Execution framework; Data driven; Distributed computing; High-performance computing

Acknowledgments

The work in this chapter is supported by the National Key Research and Development Program of China (2018YFA0404603) and the Joint Research Fund in Astronomy (U1831204 and U1931141), the National Natural Science Foundation of China (11903009), the Yunnan Key Research and Development Program (2018IA054), and the major scientific research project of Guangdong regular institutions of higher learning (2017KZDXM062).

Thanks for the International Center for Radio Astronomy Research (ICRAR) and the resources provided by the Pawsey Supercomputing Center with funding from the Australian Government and the Government of Western Australia. This work is also supported by Astronomical Big Data Joint Research Center, cofounded by National Astronomical Observatories, Chinese Academy of Sciences and Alibaba Cloud.

1 Introduction

In the most recent decade, the information from cosmic perceptions has exponentially expanded. The Sloan Digital Sky Survey (SDSS) telescope, for instance, delivers roughly 200 GB information consistently, adding to a database that was around 50 TB in 2012 [1]. Moreover, the Large Synoptic Survey Telescope has a three-billion-pixel advanced camera and produces 5–10 TB of information every night.

The square kilometer array (SKA) [2, 3] will be the biggest radio telescope on the planet. The main period of the task—SKA1 [4]—will consist of hundreds of dishes and hundreds of thousands of antennas, empowering the surveying of the sky in phenomenal detail and speed, with a second phase expanding these capabilities to at least an order of magnitude.

In view of its gigantic size, only one SKA1 science venture will deliver connected information at a pace of 466 GB/s [5] for the low-frequency telescope (SKA1-Low) and 446 GB/s [6] for the mid-frequency part (SKA1-Mid). This related interferometry data will be injected into the science data processor (SDP) [7], a many-task computing [8, 9] center, responsible for reducing observational data, and producing science-level products continuously. The SKA1 will have constrained power allocations [10] to process observations as they are performed in real time. This poses considerable challenges to manage, process, and store such large datasets.

The working experience of building the data system [11] for the SKA-low precursor telescope [12] shows that the overhead related with data migration, reading/writing, release, and format conversion are progressively ruling the overall pipeline execution costs.

In addition, the complexity of data processing in radio astronomy derives not only from the simple algorithmic components (e.g., FFT, gridding, deconvolution, and so on) but also from the diverse combinations of ways in which these components access their input, output, metadata, and intermediate information. It makes applying a “one-size-fits-all” strategy (e.g., data reorganization, I/O overlapping, intelligent caching, etc.) very challenging for a global optimum across multiple pipeline stages on a distributed computing environment. Because the current state-of-the-art astronomy data processing systems are designed to handle data approximately two to three orders of magnitude smaller than the SKA1 [5, 6], a new data execution framework is much needed.

At present, defining workflow components statically in scripts is the most regular way to reduce radio astronomy data. These scripts (i.e., the source codes) can then either be run sequentially on a local computer or wrapped into job scripts submitted to job scheduling systems such as PBS or SLURM in a high-performance computing platform. These application-driven workflow models have several drawbacks. First of all, most astronomical projects involve many institutes across the world. It means that astronomers often have to refine or even redesign their workflow codes in order to make them work—compiling, deploying, running, monitoring, and so on—on a single machine or computer clusters with heterogeneous hardware or software architectures. This happens whenever there is an upgrade on the workflow, hardware, or telescope configuration, leading to considerable cost. Furthermore, there is no real-time monitoring and control for the execution of the process. For example, in many situations, users cannot easily determine the pipeline execution status (e.g., success, failure, etc.) until the entire process is completed or a significant amount of computational and storage resources has been consumed. For SKA-scale data processing (with tens of millions of concurrent tasks), this is not only infeasible, but it is extremely expensive to delay fault detection and subsequent recovery actions (e.g., reexecution). Last, users still need to restart the entire job for reexecution even though failures or exceptions can be informed at an earlier stage. Because a workflow driven by “processing” rather than “data” cannot adjust task execution dynamicall...

Cover image
Title page
Table of Contents
Copyright
Contributors
Preface
Acknowledgments
Part A: Fundamentals
Part B: Big data processing
Part C: Computing technologies
Part D: Future developments
Index