1
Introduction to Storage Networking
1.1 OVERVIEW
Storage networks provide shared access to storage by multiple computers and servers, thus increasing the efficiency of storage and the availability of stored data. Storage networks enable storage devices from different vendors, which may use different access protocols, to be logically āpooledā for access and retrieval purposes. They permit information management functions such as backup and recovery, data mirroring, disaster recovery, and data migration to be performed quickly and efficiently, with a minimum of system overhead.
With the rapid increase in data storage requirements in the last decade, efficient management of stored data becomes a necessity for the enterprise. A recent industry study estimated the total size of the disk storage market to be almost 500,000 terabytes worldwide in 2002; this figure is expected to climb to 1.4 million terabytes by 2005. Many corporations now manage hundreds of terabytes of data in their information management divisions. However, the traditional āislands of storageā management approach is vastly inefficient; as much as 50% of storage capacity may be wasted or underutilized. The high cost of downtime creates a need for the increased reliability provided by distributed storage systems. Thus, the use of storage networks to manage access to data not only provides an increase in performance and survivability, but also generates real and immediate cost savings. The worldwide market for networked storage is anticipated to grow from US $2 billion in 1999 to over $25 billion by 2004. As business-to-business and business-to-consumer e-commerce matures, even greater demands for management of stored data will arise.
Increasingly, storage networks are being distributed over wide geographical areas to ensure data survivability and provide data synchronization over large distances. This book describes the evolution of data processing from a computer-centric model to a storage-centric model, and introduces the concept of a distributed storage-centric processing model. It describes common storage network functional components, such as fabric switches, storage directors, file managers, and gateways, and their roles in a distributed storage environment. It discusses distributed storage network applications, including storage integration, remote database synchronization, and backup/recovery functions. It provides a comparative view of Storage Area Network (SAN) and Network Attached Storage (NAS) functions and capabilities, and points out the advantages of each.
One of the primary obstacles to implementing a storage network cited by enterprise IT managers is a lack of knowledge about storage networking technology and the specific issues involved in extending a SAN or NAS over the MAN or WAN. This book addresses the āterminology gapā between enterprise network planners and telecommunications engineers, who must understand the transport requirements of storage networks in order to implement distributed storage networks. The primary goal of this book is to provide IT managers, planners, and telecommunications professionals with the information they need in order to choose the technologies best suited for their particular environment.
1.1.1 Who Should Read This Book?
This book is aimed at the IT manager, the enterprise network planner, and the network design engineer, who are responsible for the planning and design of storage networks in an enterprise environment. It is also intended to enable telecommunications engineers to understand the transport requirements of storage networks. This book assumes a basic knowledge of storage networks and applications; the reader is assumed to have read and understood, for example, Barker and Massiglia, Storage Area Network Essentials. It is not intended to be a detailed implementation guide that would specify specific equipment settings or test procedures; rather, it is intended to enable high-level managers and planners make intelligent decisions about what sort of network is best suited for their needs.
1.1.2 Overview of Contents
This book focuses on three primary areas: (1) architectures for distributed storage networks; (2) storage protocols and their inherent distance limitations; and (3) management techniques for distributed storage networks. Each is summarized below.
The architectures section provides an historical overview of the evolution of storage network architectures. It describes the evolution of storage networks from simple point-to-point topologies to switched fabrics providing complete node-to-node connectivity. It discusses redundant, multi-tier, and backbone fabric architectures, and outlines the advantages of each. Example configurations are given for each architectural variant.
The protocols section details the protocols used for distributed storage applications. Common storage protocols, including the Small Computer Systems Interface (SCSI), Enterprise Systems Connection (ESCONTM), FICONTM, Gigabit Ethernet, and Fibre Channel are defined and discussed. The evolution from parallel bus-based protocols to serial fiber-optic-based protocols is presented. Distance limitations inherent in storage protocols are described, and techniques for extending storage network functions over the metropolitan area network (MAN) and wide area network (WAN) are discussed, including use of Asynchronous Transfer Mode (ATM) and wavelength division multiplexing (WDM). Emerging technologies for distributed storage networking, including InfiniBandā¢ and IP-based SAN solutions, are presented and described.
Storage management requirements, including security management, are analyzed in the management section. The Storage Networking Industry Associationās Common Information Model (CIM) is used as the basis for describing a management architecture. Finally, the importance of planning and integration in formulating end-to-end storage solutions for the enterprise is emphasized.
1.2 EVOLUTION OF STORAGE NETWORKING
1.2.1 Mainframe Storage Networks
The mainframe computing environment developed in the 1960s provided the first conceptual model for storage architecture and management. In the mainframe-based architecture, a host processor uses a channel subsystem to communicate with external storage devices. The channel subsystem in turn addresses a control unit for each group of storage devices; a large mainframe computing environment might have several control units managing hundreds of tape and disk storage devices. (In the mainframe world, disk devices are referred to as Direct Access Storage Devices, or DASD.) A parallel bus/tag interface was initially used to provide connectivity between channels, control units, and storage devices; this copper-based bus limited both the bandwidth and distance of the I/O devices. The introduction of optical fiber and high-speed serial bus protocols such as Enterprise Systems Connection (ESCON) in the early 1990s reduced these limitations, and made it possible to extend storage device connectivity over geographically dispersed areas, sometimes referred to as channel extension.
1.2.2 Storage for Small Computer Systems
The introduction of first minicomputers, and then personal computers in the 1970s and 1980s, brought about large changes in computer system architecture; the development of the open systems UNIX and Windows operating systems required new approaches to I/O operations and storage access. As computers became smaller and storage devices increased their capacity and bit density, disk storage was increasingly integrated into the computer architecture, using the Integrated Drive Electronics/AT Attachment (IDE/ATA) bus, as is done with the ordinary home personal computer. However, for applications requiring large amounts of storage, such as application servers, there was a requirement to add various types of external storage devices. The Small Computer Systems Interface (SCSI) bus and protocol were developed in the 1980s to meet this requirement. Like the mainframe bus/tag interface, however, the parallel copper-based SCSI bus architecture limited the distance at which external storage devices could be located. The Fibre Channel data transport protocol, developed in the early 1990s, solved the distance problem by extending the reach of storage connectivity to as much as 10 kilometers, and also provided the basis for solving another problem: the increasing complexity of managing large amounts of stored data.
Figure 1.1 Mainframe Storage Network Architecture (1960s).
1.2.3 Managing āIslands of Storageā
In a traditional enterprise computing architecture, each computer is directly connected to its own storage devices, which it also manages. This approach creates āislands of storageā, which are not accessible by other computers (see Figure 1.2). It is difficult to manage storage efficiently, since one processor may run out of storage while another processor may have unused storage space that cannot be made available to the processor that requires it. Backup storage devices must be dedicated to each processor, even though they are typically used infrequently. The āislands of storageā approach makes it difficult for applications running on separate systems (for example, mainframe applications and server-based applications) to share data. Also, adding new storage devices normally requires the computer system to be powered down, resulting in lost productivity.
Storage networks (see Figure 1.3) solve these problems by allowing multiple computers to access a set of storage devices, which are managed as a network. Storage efficiency increases, since the total storage capacity is accessible to each computer, eliminating the possibility of a single processor exceeding its storage capacity while another processor has unused storage space. Backup storage devices are used more efficiently, since they are shared by all processors. Adding or deleting devices or units of storage capaci...