ECCE Version 2.0 Release Notes - July 17, 2001
Version 2.0 Final Beta Release - June 20, 2001
Version 2.0 External Beta Release - April 25, 2001
Version 2.0 Initial Beta Release - April 3, 2001
The intent of this page is to provide information specific to version 2.0 of ECCE. Except as mentioned herein release notes from previous versions of ECCE still apply so please do not consider this as standalone documentation.
Version 2.0 is a complete redesign of the data management component of ECCE. The previous proprietary Object Oriented Database Management System was abandoned in favor of a solution based on open standards, and using less monolithic and non-proprietary technologies. This has resulted in ECCE applications being faster (especially noticeable during startup), requiring less memory and disk resources, and no longer incurring any database licensing costs for sites outside EMSL. More detailed information about the new data management architecture is provided here.
Sites with existing version 1.5 data that they wish to migrate to version 2.0 should send mail to ecce-support@emsl.pnl.gov. There are tools that have been developed for this purpose but due to the complexity of data migration they are not being distributed with version 2.0. The ECCE team will migrate and verify the integrity of your data returning it to you in a form that can be loaded on a version 2.0 ECCE data server.
The Calculation Browser is not implemented in version 2.0. The consortium of Open Source software developers for the technology used in the new data management architecture is just now reaching a consensus on the standards document for the search and query capabilities required by the Calculation Browser. The Calculation Browser will be added back in a subsequent release of ECCE when the search and query software component implementing the standard exists. This application is the only significant omission in version 2.0 from V1.5.
Note: ECCE version 2.0 requires NWChem 4.0 or newer due to changes in the NWChem input and output file formats.
If you wish to install version 2.0 outside EMSL follow the on-line instructions for installing and configuring ECCE available from the main ECCE web page.
Release Notes for Recent Previous Versions
- Version V1.5 Release Notes - December 1, 1999
- Version V1.4.2 Release Notes - June 11, 1999
- Version V1.4.1 Release Notes - January 15, 1999
- Version V1.4 Release Notes - October 8, 1998
What's new? Known bugs What's fixed?
What's new?
- Online help updated for version 2.0
- Data server user name and password dialog
- Calculation Manager user interface redesigned
- Change in order of invoking tools for calculation setup
- New calculation editors architecture
- New options for NWChem Editor theory details window
- Calculation editors drop down basis set list
- Builder support for molecular dynamics
- Builder support for mixed display styles
- Improved trajectory toolkit in Builder
- Improved Builder structure library
- Updated basis sets
- New compute server registration features
- List of valid platforms no longer used
- Job Launcher staging capability
- Allocation account maintained with machine configuration
- Configurable remote shell commands
- SGI R4000 (Indy) support
Known bugs
- Job monitoring fails if calculation is renamed during run
- Creation date changes in Calculation Manager
- Operations using shell commands within ECCE such as importing sometimes fail the first time and work when repeated
- Builder sphere radius selection does not work
- Rings are not displayed in the Builder and Calculation Viewer
- Structures added from the Builder structure library can be of the wrong display style
- Refresh the Builder geometry table in order to see the new coordinates that result from using the bond or atom manipulators
- Cannot add measures in the Calculation Viewer to a property that is animating
- Viewing molecular orbitals in the Calculation Viewer while a calculation is running results in coordinates that do not match the orbital
- Geometry trace property visualization updates are inconsistent with table updates
- Opening an enabled code or theory/runtype property causes the Calculation Viewer to crash
What's fixed?
- Sun and SGI platform incompatibilities fixed
- Machine Configuration synchronization between tools
- Job Launcher scratch directory specification for EMSL mpp1 fixed
- Calculation Viewer window clipping problem work around
What's new? Top
- Online help has been significantly revised to reflect changes in version 2.0. The initial release of version 2.0 still accesses online help from the public EMSL web server but we anticipate that the online help will be distributed along with the rest of the ECCE data server distribution in a patch release. This will improve performance for those sites a significant distance from EMSL or having slow internet connections. It will also make the help available to those sites blocking external internet access.
- The new ECCE data server sits on top of backend web server technologies. Most sites will administer the web server so that each user will have a user name corresponding to their normal UNIX login name and a password that is unique (not corresponding to machine login passwords). The password will be given to you by the ECCE site administrator having themselves followed the ECCE Installation and Administration procedures. The first time ECCE is run the Gateway application will post a dialog prompting for the data server user name and password after the passphrase dialog. Correctly entering the data server authentication user name and password will save it to file in an encrypted format. ECCE will then recall it on your behalf as needed provided you give the proper passphrase to initially start ECCE. This data server authentication dialog will only be be displayed subsequently when you access data owned by another user.
- The Calculation Manager user interface has been redesigned to function much like Microsoft Windows Explorer on PCs. The two primary advantages to this approach are that most users are familiar with this design and there is more flexibility to display and manipulate calculation properties. The number of properties available for display has increased and in future releases we anticipate adding scalar output properties such as total energy to the list. These properties are displayed in a sortable table. As a result of our new data management architecture, the Calculation Manager no longer requires users to partition data into discrete databases. Instead, projects and calculations can be created in any area where the user has write permissions. In version 2.0, there are separate calculation editors for each computational code instead of one editor with the code selectable from an option menu. Thus there are separate sub-menu items under the "Calculation Editor" (under "Tools" in the menubar) menu item for NWChem and Gaussian-98 editors.
- The order in which tools are invoked from the Calculation Manager has changed slightly for version 2.0. This revised "workflow" accomodates both the new separate calculation editors and better fits the actual dependencies for collecting setup information. The Basis Set Tool can no longer be invoked until after a calculation editor has been brought up and a save operation performed. This action sets the computational code that will be used and also the level of theory which the Basis Set Tool requires. Note that bringing up the Basis Set Tool from within a calculation editor automatically does a save. It is still possible to bring up either the Builder or a calculation editor immediately after creating a calculation in the Calculation Manager.
- The calculation editor architecture has been redesigned in order to provide separate editors for computational codes and to allow users to register new codes including user interfaces. Further details of the new architecture are available here. The layout of the calculation editor main window has been slightly reworked in order to depict the new workflow for selecting the theory before invoking the Basis Set Tool. For consistency the selection of run type has been moved along with the theory. The buttons for invoking the details windows for these along with a summary of primary settings has been left below. The details windows themselves are now tailored to individual codes. Input fields that are not relevant to the current code no longer appear at all where they would have been grayed out in version 1.5. This significantly reduces clutter in the interfaces.
- The NWChem Editor has been updated to support NWChem 4.0. The theory details window DFT panel has been extensively modified to support additional exchange and correlation functionals and more options for choosing numerical integration grids.
- The Calculation Editor from previous versions had a hidden right mouse popup menu over the "Basis Set..." button which would bring up a "quick selection list" of basis sets to choose from rather than having to invoke the full Basis Set Tool. The new calculation editors show an arrow icon next to the "Basis Set..." button which makes it more obvious that the feature exists.
- The initial implementation of the molecular dynamics toolkit has been included in the Builder. The toolkit is activated any time data is imported from a PDB file. The present toolkit contains functionality for manipulating and editing structures from PDB files while maintaining the appropriate residue structure for the data. The MD functionality is in an early stage of development and is more prone to glitches than other ECCE components. As a foundation for implementing the MD toolkit the data structures and algorithms used in the Builder were reworked extensively. The need to interactively manipulate chemical systems with tens of thousands of atoms drove this effort and the resulting performance is a many fold improvement over the version 1.5 Builder. More details about the new molecular dynamics functionality can be found here.
- The Builder now includes support for mixed simultaneous display styles such as CPK and wireframe along with the capability to hide arbitrary atoms and/or bonds. These options are controlled from the "Display" menu. Selecting a set of atom and/or bonds and then choosing a display style from the "Style (Selected)" sub-menu will cause the style of only the selected objects to be changed. The "Selected (All)" sub-menu items apply the requested style to all objects regardless of the current selection. To control the visibility of atoms and/or bonds, the "Selected atoms" menu item will cause all unselected atoms and bonds to be hidden while selecting "All atoms" will restore the visibility of all atoms.
- A trajectory toolkit has been added to the Builder. It supports both a generic XYZ trajectory format and the MD trajectory format produced by the NWChem MD module. The latter format contains complete information about PDB attributes for all atoms so it is possible to control display styles based on residue membership or other PDB attributes. It is also possible to stop a trajectory animation and export the current conformation to a PDB formatted file. The trajectory toolkit is designed to work with very long animation sequences across multiple files--potentially gigabytes of data. These visualizations can be viewed as they are generated or stored in files for replay and creating standard video format movies. Combined with multiple display styles, this toolkit can be used to analyze trajectories representing thousands of configurations and provide important insights into the evolving structure and dynamics of the chemically important regions of the system.
- The structure library in the Builder has been significantly modified to make it more useful for setting up large scale molecular dynamics simulations. Structures can now be stored as residues or as simple molecules. If a structure is stored in the structure library with nubs, it can be bonded to an existing system in the Builder by first selecting a nub on the structure in the structure library and then clicking on a nub in the Builder. The structure in the structure library will be reoriented and bonded to the existing molecule in the Builder. Structure libraries can be created by the user as before. Several new libraries have been included as part of the ECCE distribution; these include libraries of amino acids and RNA and DNA residues that can be used in the setup of large scale biomolecular simulations.
- The basis set library used by the Basis Set Tool has been updated with the most recent version of data maintained by Dr. David Feller of the PNNL Environmental Molecular Sciences Laboratory.
- Many new features to streamline the process for defining compute servers that can be used to run NWChem and Gaussian-98 jobs (referred to as machine registration within ECCE) have been added. It is no longer solely an ECCE site administrator duty to register machines. This lessens the burden on the site administrator as well as removes any time delay experienced by users waiting for the site admin. to complete machine registration. Further, it allows users to "hide" their personal launch machines rather than all machines at a site, regardless of whether they are shared or private compute resources, showing up in the list of ECCE registered machines. As a result the list of available machines for running jobs shown in the Job Launcher and Machine Browser can be much more compact. Site administrators still need to define machines that are shared between multiple users if it is agreed by the admin. and users that registering a machine as visible to the whole site is beneficial. There is no restriction though that a resource that is shared but only by a small number of people can be registered by each user independently rather than by the site administrator. The only further restriction is that the site administrator must register any queued machines (for the purpose of simplifying the user's task the graphical user interface for registering machines does not include the portions for queued machines). Since site usage guidelines for queued machines vary greatly it normally requires customizing the job submission template files which should be done by someone very knowledgeable on the queuing system. Users register machines by selecting the menu item labeled "Register Machines..." in the Job Launcher and Machine Browser. The menu item labeled "Configure Machine..." has been relabeled "Configure Machine Access..." because the distinction between registering a machine (defining the name, machine platform information, number of processors, remote communications shells, etc.) and configuring a machine (defining the user name, password, calculation run directory for a user's jobs) was too subtle. Site administrators continue to access shared machine registration through the "ecce -admin" tool from the command line. The machine registration files for user machines are stored in their ~/.ECCE.v2 preference directory while the shared machine registration files remain in the siteconfig directory under the top-level ECCE installation directory. The list of machines that a user sees in Job Launcher and Machine browser is the combination of those defined in the shared siteconfig directory and in their personal preference directory.
- ECCE no longer requires that a list of known platforms (vendor, model, and processor description triples) be maintained. Platform information is entered as free format strings on Machine Registration application main window. It is possible to leave all values empty or only fill-in those fields that are known.
- A new feature in the Job Launcher allows launches to be broken into two parts: the first part stages all the necessary files to the remote compute server and the second part submits the job and starts the monitoring process. The final step of the first staging part is to start a remote xterm window in the calculation run directory. This allows the user to modify any necessary files, including the job submission script by adding or changing directives that may be required on a certain machine but not currently supported by ECCE. This is similar to the "Final Edit" feature for calculation editors except that any file can be modified and any changes will not be stored within the ECCE data management system. The buttons to do the two parts of the job launch are under the "Job" menu labeled "Stage Job Launch" and "Finish Staged Launch". The "Launch" button on the Job Launcher main window has the same function of doing a complete job launch as before and it is anticipated that the majority of users will use the traditional launch capability.
- The allocation account field required to launch jobs to certain queued machines such as mpp1 is now stored as part of the machine configuration. The Machine Configuration dialog is available from the Job Launcher and the Machine Browser. This feature saves you from continually reentering a value that seldom changes.
- Each site can now configure remote command shells to support new shells or command line options for existing shells that are different from options with the built-in ECCE support for ssh, rsh, telnet, and Globus. For example, Kerberized rsh, krsh, or openssh could be supported through this mechanism. The file remote_shells.site under the top-level siteconfig directory of the ECCE distribution is used to configure remote shells and also contains documentation for this feature.
- All executables and shared libraries for SGI are distributed as MIPS3 instruction set format. This allows ECCE to support the obsolete R4000 processor used on platforms including the Indy. The MIPS3 instruction set lacks some floating point optimization instructions contained in the MIPS4 instruction set. Because ECCE does relatively little floating point processing this trade-off was seen as acceptable. One particular visualization library, however, is distributed as MIPS4 format because the MIPS3 version is incompatible with SGI processors supporting MIPS4. A MIPS3 version of this library is provided with ECCE and the installation instructions describe how to swap versions.
Known bugs Top
- If a calculation is renamed in the Calculation Manager while the job is submitted or running, data from the job monitoring process will be lost. The work around is to apply the option "Reconnect Job Monitoring" from the "Run Mgmt" menu in the Calculation Manager. This problem also applies to renaming any projects that contain submitted or running calculations either directly or nested within other projects to any level.
- The creation date shown in the Calculation Manager currently changes when certain modifications are made to the objects. Thus it acts more like a modification date than a creation date.
- Behavior of operations that rely on command shells, either remote or local, within ECCE can be inconsistent. Occasionally they will fail when first tried but will then work correctly the second or third time when immediately repeated. If you see an error referring to a failed attempt to start a shell or perform a copy command or the like, repeat the operation. This appears to be a machine resource problem (pseudo-terminals) that shows up occasionally on just about any machine but more frequently on some than others. Freeing up allocated pseudo-terminals by closing applications using them such as unused terminal windows or logging out and back in again, can alleviate the problem.
- The sphere radius selection mechanism in the Builder, which is initiated by clicking and dragging on an atom currently does not work.
- Display of aromatic rings in the Builder and Calculation Viewer is disabled in this release. Double and triple bond displays are still supported.
- Adding structures from the Builder structure library will occasionally cause the structures to be added in the ball and stick display style even though the Builder may be in a different display style.
- You must manually refresh the geometry table after using the bond rotator or the atom manipulator to update the coordinates.
- Do not try to select atoms to use with measures while a calculation is animating. Press stop, select the atoms you are interested in and then start the animation again.
- If you are viewing molecular orbitals as a calculation is currently running, the coordinates might not match the orbitals.
- If you are running a calculation and viewing the geometry trace property, the graph will sometimes update faster than the visualization. This causes the latest step to appear missing. To fix this, close and reopen the geometry trace property or restart the Calculation Viewer.
- Trying to open an enabled code or theory/runtype property will cause the Calculation Viewer to crash.
What's fixed? Top
- The long-standing problem of ECCE databases being incompatible between the Sun and SGI platforms has been eliminated since abandoning the previous Object Oriented Database Management System. It is now possible to work back and forth between platforms on the same projects and calculations with no restrictions.
- In previous versions of ECCE if you configured a new machine in either the Job Launcher or Machine Browser then the new machine information would not be available to other ECCE tools without quitting and restarting them. For instance, you could configure a machine in the Job Launcher and launch a job to that machine, but you could not go into the Calculation Manager and bring up a remote shell in the calculation run directory if the Calculation Manager was already running. With v2.0 this problem has been fixed and updated machine configurations are shared between all tools immediately.
- In ECCE V1.5 if you specified a directory on mpp1 other than /scratch for the scratch directory in the Job Launcher, the directory was not correctly created on all assigned nodes. This problem has been fixed using the rsh command. User .rhosts files are automatically fixed to support this if necessary.
- In ECCE V1.5 panels in the left pane of the Calculation Viewer sometimes were clipped off on the right margin or interfered with each other. A work around is in place that should significantly reduce the likelihood of this problem with the underlying user interface widget toolkits.
Version 2.0 Implementation Details Top
ECCE v2.0 Data Management
The ECCE data management system was completely overhauled for version 2.0 to accomodate an open, lightweight, and cost-effective data server solution. There were numerous reasons for taking this on including being better positioned for future development based on industry trends such as web-based architectures with loosely coupled components. This effort also remedied limitations imposed by the previous proprietary Object Oriented Database Management System (OODBMS) such as soaring licensing costs for deployment, lack of complete portability across platforms, and an unproductive development paradigm requiring monolithic data schemas. We have benchmarked ECCE version 2.0 versus version 1.5 and found that all applications require less memory, and startup times are reduced--in some cases significantly.
The proprietary OODBMS in ECCE has been replaced with an Open Source protocol-based solution named Distributed Authoring and Versioning (DAV, also known as WebDAV). The sole focus of DAV is to make the technology for remote collaboration and coordination a standard part of the web infrastructure. The DAV protocol is an extension to the Hyper Text Transfer Protocol (HTTP) that supports remote, secure web content authoring. The protocol combines a coherent set of methods, headers, and Extensible Markup Language (XML) structured requests and responses that provides an integrated interface for any backend data storage system. This allows a client using generic HTTP API and XML parsing tools to interface with a DAV server. DAV is particularly intriguing as an interface to data storage systems because it provides constructs to logically organize opaque data and document the data with arbitrary metadata, thus providing a solid foundation for open, dynamic data repositories. This open architecture overcomes the barriers presented by the OODBM technology while allowing ECCE to benefit from managing data within an object model.
On the server side, data is partitioned into logical domain objects. This is similar to defining an XML document except that the elements are essentially distributed across physical collections and documents. This allows applications to access only the pieces of data they are interested in, allows metadata tags to be associated directly with the corresponding object, and improves performance by reducing the amount of data to be transmitted for many applications. The collection of domain objects and constructs are referred to as the virtual document.
On the client side, we met two goals: encapsulating the underlying protocol so that new alternatives can be put in place with minimal impact to applications, and allowing applications to continue to work at the natural abstraction level of domain objects. To accomplish these goals, we designed a multi-layer client-side architecture. At the lowest level is the protocol layer that talks directly to the data storage system. This consists of a series of C++ classes that implement the DAV protocol. HTTP persistent connections are supported by our implementation to improve overall performance. Other protocols such as LDAP or proprietary ones could also be implemented. The next layer is the data storage interface layer, which defines the key abstractions of our open, metadata-driven view of data management. This layer encapsulates the protocol layer from applications enabling the replacement of the underlying protocol and also supporting the use of multiple protocols. Finally, the layer upon which applications are actually built is the object layer. Existing ECCE objects such as Fragment, Basis Set Configuration, Project, Task, and Calculation were reworked so that they were free of connections to the OODBMS. Conceptually the objects are closely aligned with the objects used with the previous OODBMS-based data storage system making it possible to minimize changes necessary to ECCE user interfaces. With this new data management architecture we expect to be able to focus more of our efforts on chemistry domain functionality.
ECCE v2.0 Code Registration
A redesign and implementation of the computational code registration facility was also undertaken as an integral part of ECCE version 2.0. We committed to do this redesign due to an increasing interest from users outside EMSL to integrate chemistry codes beyond NWChem and Gaussian-98 and the desire to extend ECCE beyond electronic structure computational chemistry. The current code registration mechanism is based on defining user interfaces within the proprietary TeleUSE interactive design tool that are compiled as C++ executables, with a limited capability to specify the behavior of the interface through a script file format we have developed and augmented in the years since ECCE was first released. The new registration mechanism allows end users to register codes without having any proprietary 3rd party products like TeleUSE.
In version 2.0, interfaces and their behavior are developed using a cross platform tookit, Amulet, originally developed at Carnegie Mellon University and more recently maintained by an open software consortium. Amulet has an intrinsic dependency maintenance system that allows relationships between different objects of the interface to be expressed as "constraints" that are automatically evaluated whenever a dependency changes. Using this higher level of abstraction (sometimes called constraint logic programming or rules based programming) rather than the traditional "event callback structured" user interface will allow computational codes to be integrated into ECCE faster and more intuitively by domain scientists without significant user interface programming experience. We are working to design generic reusable interface objects that will serve as the basis for the computational code interfaces. Initially interfaces are being developed using the Amulet C++ toolkit that can be compiled by the public domain GNU gcc compiler. The next step will be wrappering Amulet within the Python scripting language, which will both lessen the learning curve for developing interfaces and eliminate any sort of compiler environment.
ECCE v2.0 Molecular Dynamics
Version 2.0 introduces the setup of molecular dynamics (MD) calculations into ECCE. For most large-scale MD calculations this involves two major components; the first is the creation and modification of the appropriate structure and configuration files and the second is the application of a force field to the system. A final step is the integration of these two components to create a topology file and initial configuration. The initial focus in ECCE was to create tools for assisting the user to create the necessary structure and configuration files for using the NWChem MD module. This required modifying the internal representation for storing molecular configurations and implementing several new functions for manipulating and assigning properties to large molecular structures. Most of these changes are focused in the Builder application. Although complete integration of the setup of molecular dynamics calculations will require extensive additional work, the current functionality is sufficient to begin considerably simplifying the setup process.
Most large-scale MD codes, including NWChem MD, are built around the concept of a residue. A new residue data structure was implemented above the atom list that currently describes all molecular structures in ECCE. To accomodate MD residue editing, two modes of operation are now supported in the Builder. Either all atoms must be assigned to a residue or no atoms are assigned to a residue. The first mode is implemented so users will be able to set up MD calculations while the second is available to maintain continuity with the traditional electronic structure setup capabilities of ECCE.
To support operations on large molecular structures, several of the importing functions have been re-implemented. The import (and export) PDB file methods have been completely rewritten so that information in the PDB files is used to create the appropriate internal residue structure. Information available in the PDB file is also assigned to each atom and will be exported if the structure is rewritten to a new PDB file. The bond generation algorithm, which is activated whenever a structure is imported into the Builder, has been completely rewritten so that the time required to generate bonds now scales linearly with the number of atoms instead of quadratically. This is essential for importing molecular structures with many thousands of atoms.
Operations appropriate to the molecular dynamics setup have been implemented in the MD toolkit, which can be accessed through the Builder. This includes a residue table, which lists all residues and can be used to modify residue names and attributes, as well as several functions for analyzing and modifying structures within the residue data structure. The most important of these are the "Analyze Structure", "Insert/Edit/Delete Residue" functions and the "Assign Residue Atoms" function. The "Analyze Structure" function scans through a structure and compares it to a database or databases of segment files specified by the user. This comparison determines whether the residues in the structure contain a complete set of atoms, all heavy atoms except hydrogen, or whether they are missing some heavy atoms. It also will determine if there is no segment file corresponding to the residue, indicating that it will be up to the user to create a segment file. This information is summarized in the residue table. The "Insert/Edit/Delete Resdiue" functions allow the user to add and delete residues for the system, as well as modify the structure of existing residues, while at the same time maintaining the correct relationship between residues and their individual atoms. The "Assign Residue Atoms" operation allows the user to automatically assign atom names from an NWChem segment file to a newly created or modified residue; provided that the corresponding segment file exists. The "Add Hydrogens" operation will add hydrogen atoms to all residues that are only missing hydrogen atoms. The geometry table feature of the Builder has been greatly expanded so that it is possible to modify almost all of the atom attributes by editing the corresponding fields in the geometry table. Finally there are "Write Segment File" and "Write Fragment File" operations that allow users to create new segment and fragment files, respectively. All of these operations are essential to completing the setup of systems that have missing or incomplete residues, or contain non-standard residues, a situation that frequently occurs in large-scale simulations of biological systems.