1.1 Introduction
Today, high-performance computing (HPC) ecosystems have become central in bolstering research and innovation in diverse domains and in reinforcing world economies on the competitive international arena. In the past decade, the rapid proliferation of processing technologies for HPC has facilitated the convergence of artificial intelligence, machine learning, data analytics, big data and the HPC domain platforms to solve complex computationally intensive and dataintensive applications in various scientific and non-scientific fields. The technologies combined with the workforce facilitating complex computational competences formulate an HPC ecosystem [1].
The complex infrastructure comprising increasingly evolving and highly unpredictable heterogeneous computing systems (currently operating at petaflop capacity and planned for exaflop performance by year 2021) forms the most important and fundamental component of the HPC ecosystem [2]. The main challenge here is not only to acquire these high-end computing infrastructures, but also to retain the cutting edge by continuously updating the existing infrastructures with newer hardware and software to realize the increasing needs of solving complex problems in diverse disciplines. The applications representing simulations of complex systems behavior or software enabling system operations are another key component of HPC ecosystem [3]. Scientists, researchers and users are interested in scientific fidelity, in insight analyses and in visualizations of the simulations of the implementation of various numerical models corresponding to numerous complex phenomena pertaining to various scientific fields. Another important element of the HPC ecosystem is data. With information growth exceeding Moore’s law, the traditional data processing applications and platforms are inadequate to handle the increasing amounts of generated data. The data storage, curation, sharing, analysis, visualization and privacy along with scalability of computing performance are some of the significant challenges witnessed in the era of big data. Lastly, the workforce highly trained and experienced in HPC skills is the crucial part of the HPC ecosystem [4]. As we move toward exascale future and beyond, the emerging superfacility frameworks combining the experimental and observational facilities with HPC centers, and the new convergent computing platforms along with a paradigm shift in programming applications leveraging these platforms increasingly open the HPC ecosystems to a myriad of security risks [5].
This book chapter covers significant cybersecurity solutions for protecting the current and emergent HPC ecosystems comprising users, data, infrastructure and applications supporting scientific research.
1.2 The Vital Importance of Securing High-Performance Computing (HPC) Ecosystems
As high-performance computing (HPC) ecosystems have evolved to become more and more powerful, so has their potential to do harm. Couple the advancement in cyberinfrastructures with the increasing number of domains in which HPC systems are used in that involve sensitive data and you have a recipe for disaster if one of these systems is compromised [6]. So not only would an attacker be able to harness the computational power of the machines to perform malicious activities, but also be able to have access to potentially confidential data. In today’s age, data mean power, and so even non-confidential could hold some value to an attacker. Researchers working on a compromised system could have their research stolen or tampered with, causing them to lose potentially years worth of work. It is therefore imperative that HPC systems, and the application code running on them, be built with security in mind. Security is an oft overlooked component of building scientific code for a variety of reasons [7]. Many researchers simply do not have awareness of the potential risks of building an insecure system or assume that the system they are using is secure enough and they therefore do not need to worry about securing their applications. Other times, security is ignored for the sake of speed or convenience, since baking in security to their application code introduces some amount of overhead and requires extra planning and code [8]. None of these are valid reasons in today’s world; threats are everywhere, and HPC systems are a major target of bad actors. There needs to be a continuing focus on training researchers in providing security measures within their application code, rather than depending upon infrastructure security.
One such thing HPC users have to be aware of when building their applications is communication within the cluster with respect to their application, and communication with the outside world. Generally, users have access to unprivileged ports on the system, to do things like interacting with streaming data that may be on an outside network. If an application does not ensure that these communications are secure and encrypted, it opens the door to attacks. Such attacks on HPC applications and computing systems could not only damage the system and application performance, but also lead to the damage in the reputation of the resource and the reputation of the security providers or data centers, which could lead to financial and more productivity losses in the long run [9]. The attacks can lead to the leakage of data from a HPC system or from user account to another, which could be devastating as it contains a lot of sensitive scientific data and results. Moreover, attacks such as distributed denial-of-service (DDoS) attacks [10] send out a large volume of packets, which if successfully delivered could make the HPC systems unavailable and impact the performance of the entire network. It could take down the system until the attack is completed, which could disrupt all the jobs executing on the computing systems [11]. Improper access control or some other security failure may allow some users to gain undesired access to sensitive information or give them the ability to execute or alter someone’s code, which could lead to loss of information or a full system shutdown. Having access to sensitive data could also lead to gaining access to different systems using social engineering techniques or leakage of protected data [12]. There exist many mechanisms to avoid data leaks. One mechanism to avoid the leaks in sensitive data is DLP (data loss prevention/data leakage prevention) that aids in checking and controlling the flow of sensitive data and in reporting the leakage when detected. Moreover, more stringent access controls employing the use of encryption and decryption for data transfer and storage can be deployed in addition to other security mechanisms [13].
One of the recent data breaches was encountered by Facebook, where the personal data of 533 million Facebook users were compromised due to a bug in Facebook systems [14,15]. Moreover, recently attackers have been successful in attacking many supercomputing facilities, which include ARCHER, TAURUS and Hawk, due to which the attacked facilities went off-line [16]. One of the factors leading to the attack was compromised credentials, such as username and passwords for accessing these resources. Many attackers try to acquire sensitive information such as username and passwords of the employees working at these facilities through social engineering as around two-third of people use the same password across multiple accounts. One other type of attack that is becoming more common during COVID-19 pandemic is the ransomware attacks, which are mainly carried out by a phishing attack in the form of an e-mail with a malicious attachment [17]. Once the user/staff of the HPC facility clicks on the attachment, it allows the ransomware to execute on the user’s system or user’s network. Once the ransomware is in the network or in the system, it might attack the main database files (MDF), secondary database files (NDF), transaction log files (LDF) and the backup files (BAK and TRN). This would lead the data servers toward an inoperable state because the SQL server service cannot open the master.mdf files.
Due to the increase in the cryptocurrency prices, adversaries are attacking HPC systems and trying to compromise the systems in order to gain remote access and use machines’ resources and processing power to perform cryptomining [18]. Once the attackers gain access, they perform malicious cryptomining by installing software, also known as cryptojacking, in which they use the system’s resources to mine for cryptocurrency or steal from crypto wallets. Many national laboratories have also been working on mechanisms to defend their HPC systems against misuse of computing cycles for cryptomining [19]. The Idaho National Laboratory have designed and implemented a machine translation-based cryptocurrency mining malware detector, which uses deep learning mechanism to accurately analyze and detect such malicious mining activities [20].
With the emergence in the complexity of the HPC ecosystems, there is a need for researching, developing, analyzing, adapting and integrating cutting-edge cybersecurity solutions, thus enabling security, privacy and performance of applications and workflows executing in HPC ecosystems.
1.3 Security for Supercomputing Infrastructure
The HPC ecosystem is a complex network of interconnected systems. Supercomputing systems promising to deliver exascale computing performance formulate the central pillar of the HPC ecosystem. The HPC ecosystem comprises of various supercomputers with different tiers of computing power, and each of the tiers is designed and modified based on the complexity and type of applications that will be executed on these supercomputers. For so many years, the performance efficiency and effectiveness of supercomputers have been some of the most important aspects studied and researched for a supercomputer. However, recently, with increases in malicious actors, the robustness and security of the supercomputers against the unintended events and targeted attack has become an extremely important aspect. Supercomputing infrastructures are considered critical infrastructures as they have a direct impact on research and an indirect impact on the economy if they are compromised. In May 2020, when most of the supercomputers in the Europe were expected to execute the HPC workloads that gave us a hope in finding a cure in the fight against the Coronavirus (COVID-19) research and other important researches, the computing systems were forced to shut down in order to investigate a cryptocurrency mining hack on them [16], thus necessitating the vital role of security in supercomputing environments. The following sections investigate some of the in-built security features provided by HPC vendors such as Cray and Intel, which help the computing systems to defend themselves from security attacks.
1.3.1 Software Security
Most of the modern supercompu...