I came to the US Naval War College Naval War Gaming System (NWGS) site as the Site Systems Analyst (SSA) during the development phase in 1980. I remember that was the year that the Phillies beat the Kansas City Royals for the world series. Here are some stories from my time there.
Hot Swapping Components to Provide Reliability
The development site was in Moorestown, New Jersey, and Computer Sciences Corporation (CSC) was the development contractor. We had a dual-CPU Level 68 configuration with dual IOMs, dual memory boxes, several MSU0500 disk packs, tape drives, two printers, and two operator consoles. Upon getting my first tour of the computer room, I asked "Why two operator consoles?" That's when I was told that the War College had a contract reliability requirement requiring duplicates of every piece of hardware so that if a component failed the system could be reconfigured without it and keep running. Honeywell had promised that any component, except the IOM, could be "hot swapped" without a crash. In fact, I remember, on my first day there an engineer from CSC (who thought that a Unix system should have been awarded the contract) brought several senior management types from CSC and the Navy into the computer room and asked me to meet them there. He didn't believe we could ride out a hardware failure without a crash and demanded a demonstration. "What did you have in mind?" I asked. He went to the back of one of the disk packs and turned off the power switch. Immediately the operator console sprung to life with "syserr" messages,
Unable to swap page to disk...
I went to the down drive, pulled the disk pack and put it in a spare drive, pulled the device ID plug from the downed drive, plopped it into the spare drive and spun up the pack. Multics continued as if nothing had occurred. The senior management types were notably impressed and he was left with egg on his face. We never did get along and much of my time at NWGS seemed somewhat adversarial.
I took his stunt as a warning and came in in the middle of the night and on weekends (when no development was going on) to try to identify single-point-of-failure targets and, with the Field Engineers' help, developed a strategy for dealing with every hardware failure scenario we could think of. The operator console was the only component that seemed to be un-replaceable. During several attempts to replace a failed console, the system hung and required rebooting. Replacing the console required opening up the IOM and re-cabling to the spare console. Although we had the Message Coordinator consoles, they became overwhelmed by complete failure of the operator's console and their software design did not allow them certain levels of functionality (see MTB-628). Additionally, the operator's console was a half-duplex device and would hang the system in ring 0 if its interrupts were lost. Thus, we had a single point of failure.
I began looking at the BOS code, and the ocdcm_ DIM and decided that I could make some changes that would allow more than one operator's console in the config deck. If that console failed, ocdcm_ would search the config deck for another console and start using it for its I/O. At the time, I was too naïve to fully understand the implications of these changes I was making but I was hired on at CISL a few years later and did a complete redesign of ocdcm_ in support the IOM Reconfiguration project for the DND-H site. Those changes were also made as part of the "Bootload Multics" effort for the B-2 security evaluation. This was probably my major contribution to the NWGS development, although I had a couple of minor contributions as well.
CSC had a need to prove to the Navy that they were making progress in their development and staying on schedule. They proposed to the Navy that providing them with the ability to count the lines of code developed would be an acceptable metric and the Navy agreed. Of course, the Navy could not trust CSC to develop a tool that showed CSC's contract compliance so the Navy, at CSC's suggestion, requested that I develop the tool. This led to the analyze_source command that would peruse a directory or subtree and find files and analyze their content, reporting how many lines of code were in the source, the number and types of variables, the number of comments, etc. This was initially developed to work only on a single file, or directory, and was then executed using walk_subtree as a wrapper. Various control arguments allowed the user to control the elements of the source analyzed and the format of the report. Eventually, I added the functionality to walk the hierarchy internally without the need to invoke walk_subtree.
Playing with the mechanisms to traverse the subtree lent itself to another Copious Spare Time (CST) project. NWGS users new to Multics complained that they were getting lost in a fairly complex development hierarchy put together by CSC to organize and compartmentalize the NWGS source. I tried to convince upper management that using two and three character names for directories was unnecessary in Multics and that more descriptive names would be better. That fell on deaf ears and pathnames like >udd>NWGS>OL1>L2>R6... left users totally lost. To help, I created the map_subtree (ms) command that would provide the user a graphical representation of the hierarchy and their location in it. They could also display other attributes such as their access associated with each each directory level. This later became a Multics command and I think it was released in MR 11.5.
How Much Does Cache Cost and How Do You Get Around That?
The site was configured with 16K of cache memory for each CPU and performance was dreadful. I suggested upgrading to 256K of cache but that would reportedly cost $250,000 dollars per CPU. The Navy said that they were not going to spend any more money and that it was CSC's responsibility to provide a "mission capable" system as promised. CSC was continually on my back about how badly Multics performed (the same guy that pulled the hot swap demo on my first day). One of the Field Engineers (I think it was Dave Pickett) told me that the 256K was already in the processor but we could not access it without a special "edge connector". He told me all CPUs manufactured by Honeywell were shipped with the full cache in place but it was a "performance upgrade" that the customer had to buy. Seriously? I asked him to turn it on briefly so CSC could see the performance difference. He borrowed the right edge connector, reconfigured it, and the system hummed right along. Still, we could not get them to spend the money. I went to a Honeywell Tech Symposium and presented a paper on NWGS. While at the symposium there was a general session and a chance to ask senior Honeywell management questions. I got up and asked about the price for 256K of cache memory, already present in the processors, and that the customer was being asked to pay a quarter million dollars just to turn it on. I got a lame answer (from an SSA's point of view) about pricing strategy. I responded with "so, it's not the memory that costs a quarter million dollars, it's the edge connector?" The room of SSAs applauded and my manager pulled me out of the room. I almost lost my job right then and there. As I remember it, the required edge connectors mysteriously showed up at the site and were installed without any additional money changing hands.
I had moved to the NWGS site in Moorestown New Jersey from RADC, in Rome New York, a place that has horrendous winters and averaged 120 inches of snow a year. One day I got up and walked out to my car to go to work. About 4 inches of snow had fallen overnight. I brushed off my car and drove in, noting the roads to be barren of traffic. Upon reaching the office, I went in and found the night guard still at his desk. "What are you doing here?" He asked.
"Coming to work."
"There's a huge blizzard out there and all the roads are closed. We're closed. I'm still waiting to see if my relief can get here. I don't know how I'm going to get home," he opined.
"I'll be in my office," I said. "Call me when your relief gets here and I'll give you a ride home."
What's That Submarine Doing in the Pentagon Parking Lot?
The NWGS system used real-time graphic displays to display various resources (troops, planes, ships, etc.) involved in a war game. Prior to NWGS, the Naval War College played these games in auditoriums using large tables and little plastic ships on big maps. When a ship Captain would move his ship, some young sailor would use a long stick to push the little plastic ship around on the map. When the Captain engaged a hostile target in battle, the battle staff would take a break while a bunch of folks scurried around and collected data on all of the resources involved. Then they would run off to a computer room and input the data and the computer would decide the probability of victory, what damage was sustained by each of the resources involved ("You sunk my battleship!"), etc. and the battle staff would then return and be notified of the outcome. At least that's the way it was described to me.
Now, all of this would be displayed on big graphic screens in real time. Commands would be given by those playing the game to put their ship on a heading and at a particular speed and the icon representing that resource would begin its journey across the screen. It would continue on its course until a command was given to change course or speed, just like in real life. The CSC staff was demonstrating some of their software to me and we were discussing how the resources moved in real time. The scenario that they were simulating was a Russian nuclear submarine (a boomer) being engaged by one of our submarines (a hunter/killer) off the coast near Washington DC. We were deep in conversation about design performance when I looked up and asked, "What's that sub doing in the parking lot of the Pentagon?" The problem was that this was a two dimensional representation of the world. I hope this was solved in the production version.
I presented a white paper entitled NWGS -- Multics Goes to War, which described the application and the configuration and its reliability, at the Honeywell Tech VI Technical Symposium in Baltimore MD in 1981.
March 26, 2016.