by Francis Starr, Wesleyan University
Professor Starr is a computational and theoretical physicist at Wesleyan University. In the last 10 years, he has published roughly 70 articles focusing on liquids, glasses, gels, polymers, and biologically inspired nanomaterials. Due to the computational demands of his research, Prof. Starr has been involved in developing computing infrastructure since he was a graduate student. He recently joined with several other faculty and the university ITS to provide a university-wide cluster and a companion educational center.
Originally Published December 16th, 2007
The technical nature of scientific research led to the establishment of early computing infrastructure and today, the sciences are still pushing the envelope with new developments in cyberinfrastructure. Education in the sciences poses different challenges, as faculty must develop new curricula that incorporate and educate students about the use of cyberinfrastructure resources. To be integral to both science research and education, cyberinfrastructure at liberal institutions needs to provide a combination of computing and human resources. Computing resources are a necessary first element, but without the organizational infrastructure to support and educate faculty and students alike, computing facilities will have only a limited impact. A complete local cyberinfrastructure picture, even at a small college, is quite large and includes resources like email, library databases and on-line information sources, to name just a few. Rather than trying to cover such a broad range, this article will focus on the specific hardware and human resources that are key to a successful cyberinfrastructure in the sciences at liberal arts institutions. I will also touch on how groups of institutions might pool resources, since the demands posed by the complete set of hardware and technical staff may be larger than a single institution alone can manage. I should point out that many of these features are applicable to both large and small universities, but I will emphasize those elements that are of particular relevance to liberal arts institutions. Most of this discussion is based on experiences at Wesleyan University over the past several years, as well as plans for the future of our current facilities.
A brief history of computing infrastructure
Computing needs in the sciences have changed dramatically over the years. When computers first became an integral element of scientific research, the hardware needed was physically very large and very expensive. This was the “mainframe” computer and, because of the cost and size, these machines were generally maintained as a central resource. Additionally, since this was a relatively new and technically demanding resource, it was used primarily for research rather than education activities.
The desktop PC revolution started with the IBM AT in 1984 and led to the presence of a computer on nearly every desk by the mid 1990’s. The ubiquity of desktop computing initiated tremendous change to both the infrastructure and uses of computational resources. The affordability and relative power of new desktops made mainframe-style computing largely obsolete. A computer on every desktop turned users into amateur computer administrators. The wide availability of PCs also meant that students grew up with computers and felt comfortable using them as part of their education. As a result, college courses on programming and scientific computing, as well as general use of computers in the classroom, became far more common.
Eventually, commodity computer hardware became so cheap that scientists could afford to buy many computers to expand their research. Better yet, they found ways to link computers together to form inexpensive supercomputers, called clusters or “Beowulf” clusters, built from cheap, off-the-shelf components. Quickly, the size of these do-it-yourself clusters grew very large, and companies naturally saw an opportunity to manufacture and sell them ready-made. People no longer needed detailed technical knowledge of how to assemble these large facilities; they could simply buy them.
This widespread availability of cluster resources has brought the cyberinfrastructure needs full circle. The increasing size, cooling needs, and complexity of maintaining a large computing cluster has meant that faculty now look to information technology (IT) services to house and maintain cluster facilities. Maintaining a single large cluster for university-wide usage is more cost effective than maintaining several smaller clusters and reduces administrative overhead. Ironically, we seem to have returned to something resembling the mainframe model. At the same time, the more recently developed desktop support remains critical. As technology continues to progress, we will doubtless shift paradigms again, but the central cluster would appear to be the dominant approach for at least the next five years.
The cluster is the central piece of hardware–but what makes up the cluster? How large a cluster is needed? Before we can address the question of size, we should outline the key elements. This becomes somewhat technical, so some readers may wish to skip the next five paragraphs.
First, there is the raw computing power of the processors to consider. This part of the story has become more confusing with the recent advent of multiple core processors. In short, a single processor may have 2, 4 or, soon, 8 processing cores, each of which is effectively an independent processor. This does not necessarily mean it can do a task faster, but it can perform multiple tasks simultaneously. Today, I think of the core as the fundamental unit to count, since a single processor may have several cores, and a single “node” (physically, one computer) may have several processors. For example, at Wesleyan, we recently installed a 36-node cluster, each node having 2 processors and each processor having 4 cores. So while a 36-node cluster may not sound like much, it has packed into it 288 computing cores.
This high density of computing cores has several advantages: it decreases the footprint of the cluster; decreases cooling needs; and decreases the number of required connections. For the moment, let’s focus on connectivity. The speed of connections between computers is glacial in comparison to the speed of the processors. For example, a 2-GHz processor does one operation every 0.5 nanoseconds. To get an idea of how small amount of time this is, consider that light travels just about 6 inches in this time. The typical latency–the time lost to initiate a transmission–of a wired ethernet connection is in the range of 0.1-1 milliseconds, or around 2000 clock cycles of the processor. Hence, if a processor is forced to wait for information coming over a network, it may spend a tremendous number of cycles twiddling its thumbs, just due to latency. Add the time for the message to transmit, and the problem becomes even worse. Multiple cores may help limit the number of nodes, and therefore reduce the number of connections, but the connectivity problem is still unavoidable. So what to do?
The answer depends on the intended usage of the cluster. In many cases, users want to run many independent, single process, or serial, tasks. In this case, communication between the various pieces is relatively unimportant, since the vast majority of the activity is independent. Ordinary gigabit ethernet should suffice in this situation and is quite cheap. If the usage is expected to include parallel applications, where many cores work together to solve a single problem faster, it may be necessary to consider more expensive solutions. However, given that it is easy to purchase nodes containing 8 cores in a single box, these expensive and often proprietary solutions are only needed for rather large parallel applications, of which there are relatively few.
All this processing power is useless, however, without a place to store the information. This is most commonly achieved by hard disks that are bundled together in some form, though for the sake of simplicity, they appear to the end user as a single large disk. These bundles of disks can easily achieve storage sizes of tens to hundreds of terabytes, a terabyte being 1000 gigabytes. The ability to store such large amounts of information is particularly important with the emergence in the last decade of informatics technologies, which rely on data-mining of very large data sets.
The last, and sometimes the greatest challenge, is housing and cooling the cluster. Even with the high density of computing cores, these machines can be large and require substantial cooling. A dedicated machine room with supplemental air conditioning is needed, typically maintained by an IT services organization. Fortunately, most IT organizations already have such a facility, and with the decreasing size of administrative university servers, it is likely that space can be found without major building modifications. However, do not be surprised if additional power or further boosting of cooling is needed. The involvement of the IT organization is critical to the success of infrastructure. Accordingly, it is important that IT services and technically-inclined faculty cultivate a good working relationship in order to communicate effectively about research and education needs.
OK, but how big?
Given these general physical specifications for the key piece of hardware, the question remains, how big a cluster? Obviously the answer depends on the institution, but I estimate 3 or 4 processing cores for each science faculty member. An alternate and perhaps more accurate way to estimate is to consider how many faculty members are already heavy computational users and already support their own facilities. I would budget about 50 cores for each such faculty member, though it is wise to more carefully estimate local usage. Part of the beauty of a shared facility is that unused computing time that might be lost on an individual faculty member’s facility can be shared by the community, reducing the total size of the cluster necessary to fulfill peak needs.
Software needs tend to be specialized according to the intended uses, but it is important to budget funds for various software needs, such as compilers and special purpose applications. The Linux operating system is commonly used on these clusters and helps to keep down software costs since it is an open source system. For many scientific computing users, Linux is also the preferred environment regardless of cost.
The cluster itself is of limited use without the human resources–that is, the technical staff–to back it up. At a minimum, a dedicated systems administrator is needed to ensure the smooth operation of the facility. Ideally, the administrator can also serve as a technical contact for researchers to assist in the optimal use of the cluster facility. However, to make the facility widely accessible and reap the full benefit for the larger university community, a more substantial technical support staff is needed.
The human element: resource accessibility
The presence of a substantial cluster is an excellent first step, but without additional outreach, the facility is unlikely to benefit anyone other than the expert users who were previously using their own local resources. Outreach is key and can take a number of forms.
First, faculty who are expert in the use of these computer facilities need to spearhead courses that introduce students to the use and benefits of a large cluster. This will help build a pool of competent users who can spread their knowledge beyond the scope of the course. This effort requires little extra initiative and is common at both liberal arts and larger universities.
Second, it is particularly important in a liberal arts environment to develop and sustain a broad effort to help non-expert faculty take advantage of this resource for both research and educational purposes. Otherwise, the use of these computers will likely remain limited to the existing expert faculty and the students whom they train.
Outreach across the sciences can also take the form of a cross-disciplinary organization. At Wesleyan, we established a Scientific Computing and Informatics Center, with the goal of both facilitating the use of high-performance computing and supporting course initiatives that use computational resources. The center is directed by a dedicated coordinator, who is not burdened with the technical duties of the systems administrator, and is assisted by trained student tutors.
The first goal of the center, facilitating cluster use, is primarily research-oriented. That is, the center serves as a resource where faculty and students can seek assistance or advice on a range of issues–from simple tasks like accessing the resources to complex problems like optimization or debugging complex codes. In addition, the center offers regular tutorials on the more common issues, making broader contact across the institution.
The second goal–educational outreach–is particularly important for liberal arts institutions. Educational outreach deals with all aspects of computational activities in the curriculum, not just cluster-based activities. For example, if a faculty member wishes to make use of computational software, the center staff will offer training to the students in the course, thereby leaving class time to focus on content. The center staff will also be available for follow-up assistance as the need arises. This eliminates the problem of trying to add or include training for computational resources in existing courses.
But efforts should not stop at this level. While we are still in the early stages of our experiment at Wesleyan, I believe that such a support organization will not have a significant impact if it simply exists as a passive resource. The center must actively seek out resistant faculty and demonstrate through both group discussions and one-on-one interactions how computational resources can enhance their teaching activities.
To maintain the long-term vitality of this kind of center, it is important to maintain a group of trained and motivated student tutors. To do this, we have chosen is to offer students summer fellowships to work on computationally demanding research projects with faculty. Some of these students then serve as tutors during the academic year. Combined with this summer program are regular lecture and tutorial activities. These tutorials may also be expanded to reach beyond the bounds of the university to other institutions as workshop activities.
Sometimes, all of these goals can be met by a single institution. But even if this is possible, there are still benefits to looking outside the institution. And for smaller institutions, pooling resources may be the only way to develop an effective cyberinfrastructure.
While high-speed networks now make it technically possible to establish inter-institutional efforts across the country, it is important to be able to gather together a critical mass of core users who can easily interact with each other. In my own experience, this happens more easily when the users are relatively nearby, say no more than 100 miles apart. It means that institutions can share not only the hardware resources over the network, but also the technical support staff. Of course, day-to-day activity is limited to interaction within an institution or virtual communications between institutions, but frequent and regular person-to-person interaction can be established at modest distances.
Balancing individual institutional priorities in such a collaboration is obviously a delicate process, but I envision that the institution with the most developed IT services can house and maintain the primary shared hardware resource, thereby reducing the administrative needs across several institutions. Adequate access to facilities can be guaranteed by taking advantage of the fact that most states maintain high-speed networks dedicated for educational usage. In addition, there are many connections between these state networks, such as the New England Regional Network. Personal interactions can be facilitated by regular user group meetings where users can share their questions and concerns with an audience that extends beyond their institution. In addition, new electronic sharing tools, such as wikis and blogs, can help foster more direct virtual communications.
To have a successful cyberinfrastructure in the sciences, it is essential to develop both hardware and human resources. Personal support and outreach to faculty and students is crucial if the benefits of the infrastructure are to serve a wider clientele. For liberal arts institutions, the presence of state-of-the-art infrastructure helps them to compete with larger institutions, both in terms of research and in attracting students interested in technology. At the same time, emphasizing outreach is of special importance to achieve the educational goals that make liberal arts institutions attractive to students.
Distributed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.