## ACRST=1833 ORIGINAL OFFICIAL TRANSCRIPT OF PROCEEDINGS

Agency:

Nuclear Regulatory Commission Advisory Committee on Reactor Safeguards

Title:

Subcommittee on Computers in Nuclear Power Plant Operations and Subcommittee on Instrumentation and Control Systems Joint Meeting

Docket No.

LOCATION:

Bethesda, Maryland

DATE:

Wednesday, February 6, 1991

1 - 228PAGES

## ACRS Office Copy - Retain for the Life of the Committee



011

TRO4 (ACRS) RETURN ORIGINAL TO B.J.WHITE, ACRS P-315

THANKS! BARBARA JO #27288

1612 K St. N.W., Suite 300 Washington, D.C. 20006 (202) 293-3950

ANN RILEY & ASSOCIATES, LTD.

02120281 910206 R ACR3

|   | 1  |                                                                 |
|---|----|-----------------------------------------------------------------|
| ) | 2  |                                                                 |
|   | 3  |                                                                 |
|   | 4  | PUBLIC NOTICE BY THE                                            |
|   | 5  | UNITED STATES NUCLEAR REGULATORY COMMISSION'S                   |
|   | 6  | ADVISORY COMMITTEE ON REACTOR SAFEGUARDS                        |
|   | 7  |                                                                 |
|   | 8  | DATE: February 6, 1991                                          |
|   | 9  |                                                                 |
|   | 10 |                                                                 |
|   | 11 |                                                                 |
|   | 12 |                                                                 |
| • | 13 | The contents of this transcript of the                          |
| , | 14 | proceedings of the United States Nuclear Regulatory             |
|   | 15 | Commission's Advisory Committee on Reactor Safeguards,          |
|   | 16 | (date)                                                          |
|   | 17 | as reported herein, are a record of the discussions recorded at |
|   | 18 | the meeting held on the above date.                             |
|   | 19 | This transcript has not been reviewed, corrected                |
|   | 20 | or edited, and it may contain inaccuracies.                     |
|   | 21 |                                                                 |
|   | 22 |                                                                 |
|   | 53 |                                                                 |
|   | 24 |                                                                 |
| ) | 25 |                                                                 |
|   |    |                                                                 |

| 1   | UNITED STATES OF AMERICA                                     |
|-----|--------------------------------------------------------------|
| 2   | NUCLEAR REGULATORY COMMISSIC.                                |
| 3   | ***                                                          |
| 4   | ADVISORY COMMITTEE ON REACTOR SAFEGUARDS                     |
| 5   | ***                                                          |
| 6   | SUBCOMMITTEE ON COMPUTERS IN NUCLEAR POWER PLANT OPERATIONS  |
| 7   | AND                                                          |
| 8   | SUBCOMMITTEE ON INSTRUMENTATION AND CONTROL SYSTEMS          |
| 9   | JOINT MEETING                                                |
| 3.0 |                                                              |
| · 1 |                                                              |
| 12  | Nuclear Regulatory Commission                                |
| 13  | 7920 Norfolk Avenue                                          |
| 14  | Bethesda, Maryland                                           |
| 15  |                                                              |
| 16  | Wednesday, February 6, 1991                                  |
| 17  |                                                              |
| 18  | The aby ve-entitled proceedings commenced at 8:30            |
| 19  | o'clock a.m., pursuant to notice, Harold W. Lewis, Chairman, |
| 20  | presiding:                                                   |
| 21  |                                                              |
| 22  |                                                              |
| 23  |                                                              |
| 24  |                                                              |
| 25  |                                                              |
|     |                                                              |

-

D.

2

| 3  | H. Lewis                                |
|----|-----------------------------------------|
| 4  | W. Kerr                                 |
| 5  | I. Catton                               |
| 6  | P. Shewmon                              |
| (y | C. Michelson                            |
| 8  | J. Carroll                              |
| 9  | E. Wilkins, Jr.                         |
| 10 | C. Wylie                                |
| 11 | P. Davis, Consultant                    |
| 12 | W. Lipinski, Consultant                 |
| 13 | T. Rotella, Cognizant ACRS Staff Member |
| 14 | M. El-Zeftawy, Federal Official         |
| 15 |                                         |
| 16 | PARTICIPANTS:                           |
| 17 |                                         |
| 18 | L. Rib, AECL Technologies               |
| 19 | N. Ichiyen, CANDU-3                     |
| 20 | K. Scarola, ABB/CE                      |
| 21 | G. Remley, Westinghouse                 |
| 22 | Barry Simon, GE                         |
| 23 |                                         |
| 24 |                                         |
| 25 |                                         |

## PROCEEDINGS

[8:30 a.m.]

MR. LEWIS: The meeting will now come to order. This is a joint Subcommittee meeting of the Advisory Committee on Reactor Safeguards-Computers in Nuclear Power Plant Operations -- I didn't know that was the name of our Subcommittee -- and the Instrumentation and Control System Subcommittees.

9 I'm Hal Lewis, Chairman of the first named
10 Subcommittee and Bill Kerr, to my left, is the Chairman of
11 the Instrumentation and Control Systems Subcommittee.

12 The ACRS members in attendance are Jay Carroll, 13 Ivan Catton, Carl Michelson, Paul Shewmon, Ernest Wilkins, 14 and Charlie Wylie. Also in attendance are ACRS Consultants 15 Pete Davis and Walt Lipinski. I don't see them. It says on 16 my piece of paper that they're here, but they are, in fact, 17 here in spirit and not in body.

The purpose of this meeting is to discuss computer software applications in future nuclear plants, software reliability assurance, software ".rification and validation, and software sabotage issues. T might just interject that that's news to me, because I thought we were going to discuss both software and hardware. But that will emerge as we go along.

Tom Rotella, to my right, is the Cognizant ACRS

25

.

P1 6

.

.

staff member for this meeting. Medhat El-Zeftawy is the designated Federal official, somewhere in the room. There is to my left.

1

2

3

The rules for participation in today's meeting have been announced as part of the notice of the meeting previously published in the Federal Register on January 23, 1991. Portions of this meeting will be closed due to discussions of company-proprietary information and that has been so noticed.

10 A transcript of the meeting is being kept and will 11 be made available as stated in the Federal Register Notice. 12 It is requested that each speaker first identify himself or 13 herself and speak with sufficient clarity and volume so that 14 he or she can be readily heard.

15 We have received no written comments or requests 16 to make oral statements from members of the public. As a 17 general pattern, this is an introductory meeting, so we will 18 go through a number of experiences that people have had 19 trying to cope with the change in technology that has come 20 with the computer evolution in the nuclear business.

We're fortunate today to have some participation from our neighbors to the north, and I'm told that Lewis Rib will introduce the operation.

24 Do any of the other members want to say anything 25 before we get into the meat of the operations?

MR. MICHELSON: Yes. We did have the hardware 1 2 discussion yesterday, which is listed in the Subcommittee 3 meeting notice. Today was to be the software. 4 MR. LEWIS: I see. Ivan? MR. CATTON: It's my understanding that you really 5 shouldn't separate the two when you're looking to see 6 7 whether or not the system is going to work reliably. Is 8 there any rationalization for this separation? 9 MR. LEWIS: None. MR. CARROLL: That's why we had the meetings one 10 11 day after the other. 12 MR. CATTON: So the connection is 12 hours. MR. MICHELSON: The connection is it takes two 13 days to cover both subjects and somehow you have to have one 14 15 first and then the next one. 16 MR. LEWIS: But Ivan's point is well taken. There 17 aren't two subjects. There's one subject. 3.8 MR. CATTON: That's right. 19 MR. LEWIS: I just confess that perhaps I haven't 20 been reading my mail. I didn't know that the hardware was 21 going to be covered yesterday. MR. KERR: Why don't we decide to do better next 22 time and go ahead with this meeting. 23 MR. LEWIS: I think we should, but there is an 24 25 important issue here.

MR. CARROLL: I think you're going to hear a fair
 amount about hardware today anyway.

MR. SHEWMON: Otherwise it might be nice to have a summary of what was learned yesterday.

MR. LEWIS: Let's do that at an appropriate time and let's not interrupt the speaker. Let's proceed.

7 MR. RIB: I'm going to give just a very brief 8 introduction to the speaker. My name is Lewis Rib. I am 9 representing AECL Technologies, an American corporate 10 entity, with a local office in Rockville, Maryland.

Among other activities, AECL Technologies is representing AECL, which stands for Atomic Energy of Canada Limited, the AECL's CANDU-3 nuclear power plant design in the United States. The ACRS invited AECL Tech: plogies to participate in this Subcommittee meeting to describe our approach to the utilization of computers in nuclear power plant operations and instrumentation and control systems.

18 We welcome this opportunity as our second 19 appearance before the ACRS. I would like to introduce 20 Norman Ichiyen of CANDU-3 Design Team, who will make the 21 presentation on the CANDU computer control technology. 22 Normal Ichiyen's background includes a Bachelor's of 23 Engineering from McGiil, a Master of Applied Science from 24 the University of Toronto.

25

3

4

5

6

He started with AECL in 1973 in the safety systems

concept area. He was the program manager for the computer based shutdown systems development project in 1980 through
 1982. This concept was implemented at the Darlington
 Nuclear Generating Station. Currently, he is Manager of
 CANDU-3 Computers and Control Centers Branch.

6 MR. ICHIYEN: Good morning. As Lewis said, I've 7 been asked to talk about the use of computers and digital 8 systems in CANDU nuclear power plants. As the agenda says, 9 and this is what I assumed you wanted to hear about today, 10 was from the perspective of mainly in the software issues, 11 future applications and plans, issues like software V&V, 12 software reliability, and sabotage was another topic.

How I propose to address these topics is shown in this outline next.

[Slide.]

16 MR. ICHIYEN: This is going to be awkward. I 17 think I'll stay on this side.

18 MR. WILKINS: I wonder if you could rotate this19 about ten degrees clockwise.

20

15

MR. ICHIYEN: Is that okay?

21 MR. WILKINS: Thank you. In order to talk about 22 where we're going to; that is our future applications and 23 plans; I felt it's important that you understand where we're 24 coming from. In CANDU technology, we try to use an 25 evolutionary process. So what I will talk about first and,

again, very briefly is a bit of history of our use of computers in CANDU stations, and then bring you up to date to the latest station that's in service, which is the Darlington Station that just went recently into service.

1

2

3

4

5 I'll use that as an example of the state of the 6 technology for our current plants. Getting to future 7 applications. I'll have a bit of a discussion on how we have 8 evolved in our design and concept of digital systems. For 9 this part of the talk, I'll use the CANDU-3 project, which I 10 am associated with, as an example of the kinds of things 11 that we're doing for that design.

12 As I understand, the main kind of issues you 13 wanted to get at were how to produce reliable software and 14 aspects of it, like verification and validation, software 15 reliability. So rather than talk about these pieces as 16 parts of the puzzle in isolation, I'd rather prefer to talk 17 about our whole software engineering process. We feel that 18 the integrated process, which is integration between 19 development, verification and testing, software reliability, 20 are all the more important part.

In order to be able to discuss this in the short timeframe that I have, I'm going to concentrate on safetycritical software where we've had some experiences with the Darlington shutdown system. In this part of the talk, I'm going to talk about our experience licensing the shutdown

systems on Darlington; more important, what lessons we learned from Darlington or what lessons we hope we've learned from Darlington; and, how we're applying these lessons in what we're doing in the future.

5 For AECL and Ontario Hydro, we feel that the 6 important movement is in the area of standards, and I'll 7 describe why and some of the aspects of that later. Then 8 I'll move into some of the fundamental principals of a high 9 level standard for safety-critical software which we have 10 just now issued for our own internal use.

When I'm talking about these fundamental principals, they really embody the features of V&V and software reliability. So that's how I propose to get at those issues, while talking about these aspects of the standard. Next I'll talk about the overall status, where we are with this program, where we will be in the future.

As a special item, I was asked by Tom Rotella to talk about our experiences with the Bruce fueling machine incident which I assume most of you are aware of. It's an event that happened about a year ago. I'll describe it very briefly with some conclusions and lessons learned from that experience.

[Slide.]

23

1

2

3

4

24 MR. ICHIYEN: Starting off with our experience 25 with computers in CANDU, this is just a simple slide, just

meant to show the timeline over the years of what we've been doing with computers and, on the vertical axis, the degree of computerization that we've used in these plants.

1

2

3

From our first plant at Douglas Point back in the late 1960s, we had a fair degree of computer control on a number of the main processes. With respect to reactor and process control, this has steadily increased so that at the point where our CANDU-600 designs were in service in the early 1980s, we had pretty well reached full computerization of all systems.

Darlington is, I would say, about 99 percent there, with most systems having computerization in some aspects. Only in the late 1970s did we start in the use of computers in production systems. Back in the late 1970s, we used computers for monitoring of important variables and shutdown system variables at the Bruce A reactor.

In the early 1980s, we used trip comparitors, digital trip comparitors for the shutdown systems, for the process trips. On Darlington, for the shutdown systems, we have full computerization. I'll talk about that. The reason it doesn't show 100 percent computerized is because not all of the other safety systems have full use of computers.

24 MR. SHEWMON: You'll explain later what you mean 25 by full computerization?

MR. ICHIYEN: Yes.

[Slide.]

1

2

25

MR. ICHIYEN: In terms of historically, again, I said I would talk about Darlington. In answer to your guestion about computerization, I think this gets at that. Again, I'm trying to compress this into a short period of time. So it is fairly general.

8 We have three kinds of classes of computers on the 9 Darlington system. In Canada, we've called the main control 10 computers DCCs, stands for Digital Control Computer. We've 11 used that terminology right from our first reactors. On 12 Darlington, all the reactor and process control is done in 13 the DCCs. It's a dual redundant central kind of system with 14 triplicated channels and dual redundancy on the computers.

15 On Darlington, an additional difference that we 16 didn't have on the CANDU-600 design was that the device 17 logic control was done in PLCs, with the Ontario Hydro 18 proprietary design called the OH-180.

19 Computers were also used for the operator 20 interface in the main control room, used for alarm 21 annunciation and data logging. In CANDU, we have on-line 22 fueling, I think as most of you are aware, and that is done 23 on computer control. So we have separate fuel handling 24 computers for that function.

In the safety systems on Darlington, that was the

first time we have used full computerization of the shutdown systems. What I mean by full computerization is we have a computer, set of processors that carry out the trip functions, the trip decision functions. If the heat transfer pressure is below a certain level, then it says initiate the trip signal.

1

2

3

4

5

6

7 We use it also for operator displays, the 8 interface to the operator in the main control room, 9 triplicated channels of information. Both in the main 10 control room and in CANDU designs, we have a secondary 11 control area for the seismic events, as well.

12 It's also used for operator aided testing. In 13 CANDU, we've used the philosophy that we test the shutdown 14 system right from the transducer to the final elements, and 15 we do that through actually inputting the pressure signals 16 and pressure transducers and checking that the channel, in 17 fact, does trip.

18 MR. LEWIS: I wonder if I could interject a couple 19 of questions. One is when you use the term computer all 20 through this, do you mean digital computer or are you just 21 using the term computer generically?

22 MR. ICHIYEN: In all these cases, they're general 23 purpose computers, except for this one which is a PLC.

24 MR. LEWIS: I understand they're general purpose, 25 but are they digital or analog?

MR. ICHIYEN: Digital, yes. MR. LEWIS: They're all digital. MR. ICHIYEN: Yes.

1

2

3

25

MR. LEWIS: Second question. For example, this last issue of testing of the shutdown system; when you say testing, you mean put in -- you said put in the pressure signals and temperature signal and what have you and see that the system works.

9 But you don't try all conceivable malfunctions
10 within the system to test it.

MR. ICHIYEN: No. Historically, this is a periodic tescing. Again, moving back a bit, in Canada, there's a requirement to show an unavailability that the shutdown meets ten-to-the-minus-three. In order to do that, you have to do periodic testing largely, historically, to detect hardware faults that have occurred since the last time of testing.

MR. LEWIS: I'm just ceasing this example to ask a deeper question. By testing, what you mean is assuring that the system will perform as required if it gets the expected malfunction signals, but not simulating failures in the computer that could generate off-line strange signals. That you don't do. That's V&V, which you will come to, presumably.

MR. ICHIYEN: That's right. We do things like

self-checks and self-tests in the computers to try and get at that aspect of it. Periodic tests are -- for hardware, they're pretty thorough because you actually -- as far as the shutdown system knows, it can't distinguish whether this is an actual challenge from an event or a test.

1

2

3

4

5

6

7

We don't test just the processor, for example. MR. LEWIS: I understand.

8 MR. ICHIYEN: In the past this was manually done 9 and now we have it -- those controls are controlled by a 10 separate computer called the Safety System Monitor Computer. 11 As I mentioned, we did, earlier on Bruce, monitoring 12 important shutdown system variables, through a separate 13 computer again, and that's also done on Darlington.

On Darlington, another safety system that had a degree of computerization was in the emergency core cooling system where we use the OH-180s for discreet logic control. I did have one slide to show you a picture of what the Darlington control room looks like. Actually, I've got to flip backwards.

20 When I mentioned before the shutdown system 21 interface, those are these 12 CRTs; one for Shutdown System 22 1, one for Shutdown System 2. You can see some of the 23 others. I think that's another safety system, emergency 24 core cooling, and that doesn't have the digital display. So 25 it's marked contrast from what we do with the shutdown

systems.

1

25

| 2  | All I'm trying to point out here is that we have a           |
|----|--------------------------------------------------------------|
| 3  | fairly high degree of reliance on the computer interface     |
| 4  | through the CRTs, the annunciation and the data logging.     |
| 5  | (Slide.)                                                     |
| 6  | MR. ICHIYEN: Moving on from Darlington, as Lewis             |
| 7  | said, the reactor design that I'm associated with now is the |
| 8  | CANDU-3. It's really our next generation CANDU after the     |
| 9  | Darlington station. I've just listed some of the features    |
| 10 | here, one of which is a v. y short construction schedule.    |
| 11 | That aspect of it is a contributor to the directions that    |
| 12 | we're going in CNI, and I'll talk about some of those later  |
| 13 | when I talk about the features of the CANDU-3.               |
| 14 | I'll just run through these quickly. Modula.                 |
| 15 | design construction techniques which most of the other       |
| 16 | competitors are using in order to meet this construction     |
| 17 | schedule; an interesting feature we call 100-year life, not  |
| 18 | that all components are going to last 100 years, but our     |

15

19 target is to have everything replaceable.

20 On previous CANDUs, it was not aimed for a rapid 21 fuel channel replacement. On CANDU-3, we are aiming at that 22 target as part of this overall target of a 90-day outage 23 within which we should be able to replace all equipment, 24 including steam generators.

As I said, this is a target. Currently, the fuel

channel replacement takes longer than 90 days. We're talking about fort to vive months is what we currently see now for a complete retubing, refueling, defueling and refueling and the whole process. That's the current state of the design and we're still trying to get that down to within that target.

(Slide.)

1

2

3

4

5

6

7

8 MR. ICHIYEN: I thought I'd talk now, since we're 9 talking CANDU-3, how the digital systems are evolving on 10 CANDU-3 from Darlington. So I'm using Darlington as a 11 reference and I will describe what features on the CANDU-3 12 are different from it. With respect to control, as I said 13 before, Darlington used a dual redundant hot standby kind of 14 configuration for the control systems.

Now we're moving from this redundant central type system to wha. I call a true distributed control system architecture. There are two features what I feel are the true distributed control system architecture. One is geographic distribution where the processes are distributed throughout the plant in the areas where they have the applications.

The second feature is what I call closing the loop over the highway. A lot of vendors' products that are called distributed control are really distributed processing and a lot of them don't do the closing of the loop over the

highway. What I mean by that is if an input signal is taken at one location but is needed for a control function in a different processor, then that information, in our concept, is done by sending it over the highway for use at the second proces

1

2

3

4

5

23

6 The other area where we're evolving from 7 Darlington is in the operator interface, where we have the 8 central system that did control and display. We're 9 splitting that functionality up so that there is a separate 10 what I'll call plant display system which is responsible for 11 the operator interface for presentation of information in 12 the main control room.

On safety systems where we're evolving to is a higher degree of computerization, using more systems. Emergency core cooling, as I said, Darlington used it for the device logic, but not for the displays. Here we'll use all functions using computers. The other area is software practices. We feel that we are evolving in these practices and those are the concepts we're going to use for CANDU-3.

Largely the rest of my talk deals with this aspect, the software practices that we see we are evolving to.

(Slide.)

24 MR. ICHIYEN: The agenda said that you wanted to 25 talk about V&V and software reliability. The way I said I

would do it is talk about the whole process in general. As I said, I will concentrate on safety-critical software in order to focus on this in the time that I do rather than talk at even higher levels of generality.

5 What I will describe is our overall approach for 6 producing reliable software. We don't feel that any one 7 component or factor is sufficient and we use an overall 8 approach. I will discuss the part played by V&V, software 9 reliability measures and so on.

10

1

2

3

4

[Slide.]

MR. ICHIYEN: Again, in order to say where we're going to, it's important to spend a little time talking about where we come from and our experiences on Darlington are very relevant in the directions that we're taking, especially with respect to safety-critical software.

I think for those of you who are unaware, we have been having a dialogue, I'll say, with our Atomic Energy Control Board from about 1985 to 1990 about the licensing of these shutdown systems and particularly the software. I could spend about a day talking about all the issues in sequence, but that wouldn't serve the purpose here.

I'd like to characterize it with a few features.
One was it was an extremely drawn out licensing process.
The regulatory group started off with one set of concerns
and the issues kept changing over the years. So we would

react and make some changes. This process was a very, very long drawn out affair.

1

2

3 There were a different set of issues that started in 1987 when the Control Board hired a consultant, Dr. 4 5 Parnas from Queens University. I think a lot of you are 6 probably familiar with his name. He's guite a well known 7 authority in the area of software and reliability of software and his name is associated with the Star Wars 8 9 designs in the U.S. earlier. He is a very competent and 10 knowledgeable person and the Control Board utilized his 11 services.

In terms of conclusions -- I should first say what actions we took coming out of that experience on the licensing. Again, these are just some of the major highlights. There are a whole lot of actions that we ended up taking, but the characteristic ones are the ones here.

Again, back to the name Dr. Parnas, this is one of the things that he instigated. We what I called backengineered a software design specification using mathematical notation, which is known as formal methods in the industry.

Previously we had vield an English language specification, a functional specification. I think it's fairly well agreed that a lot of errors that do occur start with that English language specification. English is not a

very precise language. It's guite ambiguous. A lot of the 2 errors that are made in any software engineering process either are errors in the functional requirements or errors in interpretation.

1

3

4

22

25

20

6 Either way, use of a mathematical notation was 6 felt to help understanding and other features, and I'll go 7 into those later in more detail.

8 MR. LEWIS: Is this a matter of mapping the output 9 to the input using the Bacchus Nuir notation or are we 10 talking about something else?

11 MR. ICHIYEN: What we did was we took the English 12 language specification and turned it into the mathematical 13 notation.

14 MR. LEWIS: I'm talking about which mathematical notation was used. 15

16 MR. ICHIYEN: In this case, we used Dr. Parnas' 17 particular notations.

18 MR. LEWIS: His own idiosyncratic, not one that is used by anyone else? 19

20 MR. ICHIYEN: Probably that's true, yes. It's not 21 a standard notation.

MR. LEWIS: I see. Thank you.

23 MR. WILKINS: Is it at least published by him 24 someplace?

MR. ICHIYEN: He's got a number of papers that are

published. He's probably one of the more prolific writers
 in the field.

MR. LEWIS: I didn't know he used a notation different from the one other computer scientists use. That's news to me.

6 MR. ICHIYEN: I think in terms of notation, maybe 7 I'm causing some confusion. He uses certain notation to 8 describe constants, variables and what exact notation is 9 used for that probably isn't that critical.

10 MR. LEWIS: The one that is used by most computer 11 scientists is called the Bacchus Nuir notation. Never mind. 12 We'll go into that later.

MR. ICHIYEN: The second action that was taken 13 began really through Dr. Parnas' involvement. It was to 14 15 establish a walk-through in order to verify that the code met the formal software design specifications. In theory 16 this was a doable job, but neither he nor others had really 17 18 worked out the practical aspects of it. So that was one of the things that took a lot of time, was working out how we 19 do this in practice. 20

It involved creating new techniques that hadn't been used before, creating what we call program function tables from the code and comparing to the mathematical notations at the beginning. It involved establishing techniques and methodologies for doing that and how you

compare them.

1

| 2  | The third significant action is a random testing             |
|----|--------------------------------------------------------------|
| 3  | program. I won't go into more here. I'll talk about it       |
| 4  | later as we go. It's testing the processor. So it's          |
| 5  | putting inputs in, but it's not testing the whole system as  |
| 6  | an integrated system.                                        |
| 7  | MR. MICHELSON: Ivan, you need to use your                    |
| 8  | microphone.                                                  |
| 9  | MR. ICHIYEN: Sorry?                                          |
| 10 | MR. CATTON: I was being chastised for not                    |
| 11 | speaking into the microphone.                                |
| 12 | MR. LEWIS: I wonder if I could ask one question.             |
| 13 | In the specifications, do you distinguish between reliable   |
| 14 | operation and graceful failure modes when the system fails?  |
| 15 | That is do you go to the next level of assuring graceful     |
| 16 | failure when there are hardware failures?                    |
| 17 | MR. ICHIYEN: In the Darlington system, this is               |
| 18 | the shutdown system and not a control function. So what      |
| 19 | we're interested in in the safety features is that it's fail |
| 20 | safe. So wherever there is any doubt, the action is to       |
| 21 | if it's in an undefined stage or something that's not right, |
| 22 | we go to the trip state.                                     |
| 23 | MR. LEWIS: What I'm suggesting is, and that's why            |

23 MR. LEWIS: what I'm suggesting is, and that's why 24 I was extremely unhappy to learn that we'd separated 25 hardware from software in this meeting, the software has to

be written in such a way as to accommodate hardware failures, in such a way that it leads to a graceful failure of the overall system.

1

2

3

8

9

6 It's the job of the software to 20 that. Is that 5 within the specs that you laid dowr on the system?

6 MR. ICHIYEN: I wasn't that familiar with the 7 details.

MR. LEWIS: All right. Thank you.

MR. ICHIYEN: I could check into it.

10 MR. SHEWMON: A variant to that, which I will 11 bring up maybe twice, but at least later, has to do with the 12 Rancho Seco failure in which there was a power supply 13 failure, which is a variety of equipment failure that may 14 not have been safety primarily. But how the system copes 15 with something like that will come up also.

MR. ICHIYEN: Power supply fai ires?MR. SHEWMON: Yes.

18 MR. ICHIYEN: To the computers?

MR. SHEWMON: Well, this was a power supply to instrumentation and control in that case and then different systems went in different directions and the operator wasn't sure where they were. So he had a lot of problems.

23 MR. ICHIYEN: In the CANDU designs, which is 24 probably the same in other designs, the power supplies for 25 the safety systems are separate from the control systems.

We have a complete separation of safety and control. There 1 is no functional or equipment connections in any sort. 2 3 We use triplicated channels in the shutdown systems and there's two or three --4 6 MR. SHEWMON: Fine. I'll ask the question when we come to control and how it might interact with safety, then. 6 7 [Slide.] 8 MR. ICHIYEN: The real issue that came out of the 9 Darlington licensing experience, in our minds, was the lack of an accepted definition of the acceptable quality that 10 11 software has to have in order to be approved by our Atomic Energy Control Board. As I said before, the issues kept 12 13 changing. There was a lot of subjectivity and we feel that 14 the real cause of that was that it wasn't a real de facto 15 standard or real standard which set the requirements for 16 what is required. 17 So our objective now is to create a set of 18 standards, procedures and guidelines for software 19 engineering, overall categories of software. Our first task

20 is --

21 MR. CATTON: How do you define acceptable quality? 22 MR. WILKINS: Next two lines. It has to be 23 approved by the AECB. Of course, you can ask a different 24 question. How do you assure that it works? That's a 25 different matter.

MR. LEWIS: And still a third matter is how to be sure that when it doesn't work, it doesn't work in a relatively benign way, which is the third matter.

1

2

3

MR. ICHIYEN: As I said, our first task is the 4 5 creation of this set for safety-critical software. We're starting with that one because after all is said and done 6 7 and the plant is licensed, the Control Board has said the 8 effort to license the Darlington system required a lot of expertise and individual effort and a lot of subjectivity 9 10 even after having the formal specifications and the walk-11 throughs and so on.

12 So what they're setting as an objective for Ontario Hydro is to redesign the software over a period of 13 14 time of five to six years so that these problems do not 15 occur again. That's one of the main reasons for creating 16 this set of standards, so that we don't go into these same 17 problems again. We want to get an agreed set of requirements that we can both work towards and then use that 18 19 as the reference.

20 MR. CARROLL: The need to create a set of 21 standards in this area suggests that there are not adequate 22 standards already. What is your view of the existing U.S. 23 standards that are in the software area?

24 MR. ICHIYEN: I haven't read all of the U.S. 25 standards and so on. From what they're telling me, that the

standards are not aimed at the kind of applications or the techniques that we see or our requirements that we see.

There are a lot of good features in the standards. Some are more prescriptive and too low a level of detail. 4 5 What we're trying to achieve is a higher level standard 6 which, in a lot of ways, is methodology independent. We're 7 trying to specify what the core requirements are in a way that the Atomic Energy Control Board and the utilities and 8 9 ourselves can agree with, and then we work on our 10 methodologies.

11 We present these methodologies to the Control 12 Board and say does this meet the requirements as stated in 13 the standard. I'm saying we're trying to make it 14 methodology independent, but you really can't do that in all 15 areas. What we're trying to do is not limit the 16 methodologius unnecessarily.

17 I think maybe you'll get some of that picture when 18 I talk about some of our main principals to see whether those are embodied in the other standards. 19

[Slide.]

1

2

3

20

21 MR. ICHIYEN: There are really four parts to our 22 framework that we're developing. One is a categorization criteria. I think everybody would agree that software has 23 24 different categories in terms of what's required and what 25 kind of assurance requirements do you have for it, some of

which has no impact on safety, some of which is very important to safety, like a shutdown system, which is the last line of defense if there is an event.

1

2

3

25

4 The difficulty here is how do you quantify the attributes of what constitutes a category, how many 5 categories, what do you do with those categories. It's 6 7 clear the highest level being safety critical is an easy one to define. It gets a little fuzzier as you go through the 8 lower levels, like control software, monitoring software, 9 non-real time. It gets well beyond the plant software. You 10 11 get into analysis software and so on.

But we're starting with the safety critical one and working down from there.

MR. CARROLL: This standard would be broader than just control and protection. It would go into --

MR. ICHIYEN: Analysis software. 16 MR CARROLL: Analysis kind of software. 17 18 MR. ICHIYEN: We see it as a family of standards. For the real time ones, there should be a very close 19 relationship. So we're writing the safety critical one 20 21 first which, as I said, has been just issued for our own internal use. We are now working on the other categories. 22 Actually, I've already jumped ahead to my next slide. 23 24 [Slide.]

MR. ICHIYEN: Again, the four parts of this

framework at the categorization criteria, the high level standard which I've already described being largely aimed at being methodology independent, and then what a lot of people call sub-tier standards, which are a lot of the details and how-tos and specifics and things which are methodologyspecific.

Another aspect of what to do with developed 8 software, which is software that you purchase.

9 MR. KERR: Can you give me an example of a categorization criterion? 10

11 MR. ICHIYEN: Pardon?

1

2

3

4

5

6

7

MR. KERR: Can you give me an example of a 12 categorization criterion? 13

14 MR. ICHIYEN: Largely, the definition for safety critical, and I'll try and say this, safety critical 15 software would be software ... a system whose failure could 16 17 lead directly to a significant release of radiatio: to the 18 public or to the plant operations.

19 MR. KERR: Thank you.

MR. LEWIS: In this effort, which is more or less 20 a start from the bottom effort, did you bring in consultants 21 22 from other industries who have computerized themselves over the years, not just Parnas who is a well known person in the 23 24 business, but, for example, the 767 airplane, Boeing, is a 25 very highly computerized airplane. I would judge that it's

probably a factor of two less complicated than a nuclear
 power plant, but not a factor of ten.

The telephone company is probably a factor of ten or 100 more complicated. So there's plenty of industrial experience with people who are trying to make very large computer systems work reliably and in a fail safe mode. Did those people get prought into this effort?

MR. ICHIYEN: We did what we thought was a fairly 8 extensive survey of not only the nuclear industry, but, as 9 you say, other industries. There is one distinction, 10 actually. With the way we do our designs, we felt that the 11 12 functionality of the trip functions that are true safety critical are not very complicated and the order of magnitude 13 is more like ten or 100 less than some of these larger 14 15 applications.

The number of lines of code in a Darlington t 16 computer is -- some people will say it's 3,000; some people, 17 if you take out the comments, you're down to 700. It's not 18 19 a lot of lines of code. We feel that for that specific kind 20 of application, a lot of the applications that are used for larger systems are not as precise or don't give as high 21 22 degree of confidence and the reliability that we feel that the methods for smaller systems can do. 23

In Canada, there's a telecommunications organization, Bell-Northern, and all the people, as you

said, the telephone industry are in conjunction with Dr.
 Parnas. So they're doing work with him in that area, as
 well. So the 're using techniques that are similar to a
 large degree.

5 Within the nuclear industry, we did a jot of 6 talking to the people in the U.K. Actually, not in the 7 nuclear industry, but in the formal methods area. We've 8 talked to other people, France, the Westinghouse people and 9 what other vendors are doing in their software practices.

Having surveyed all of that, we feel that for our application the technology or the methodology that Dr. Parnas is refining, I'll say, is the one we feel the most comfortable with that will do the job for us. The thing about Dr. Parnas is that he hasn't defined what we think are workable methodologies.

16 He's been working on a lot of these things for 17 quite a long time. Things like trace specifications he's 18 been talking about for ten or twelve years. But it's only 19 when you get into real situations that you have to make the 20 methodologies work and you extend the boundaries of the 21 knowledge of a system.

So we've been working with these systems. In fact, we haven't defined our methodology yet. Again, I'm jumping ahead, but I'll say it now. We expect by the fall or mid-year of 1991 to have completed our studies on the

methodologies. AFCL and Ontario Hydro are working quite closely in this respect.

MR. LEWIS: The reason I started with the 767 example is that the complexity of the 767 is really not all that far from the complexity of a nuclear power plant. Certainly both of them are well below the Star Wars system. Even in the great Star Wars controversy, Parnas was really pretty much a minority of one on that advisory committee.

9 So he doesn't reflect at least the majority 10 sentiment of American -- forgive me -- of U.S. computer 11 scientists. But please go on. I'm slowing you down and 12 you're running behind.

[Slide.]

1

2

13

24

25

MR. ICHIYEN: What I thought I'd do is talk about 14 15 the high level standard and its features. As I said, this 16 will serve to highlight some of the principals that we feel 17 we are using in the future. The high level standard defines 18 the requirements on the software engineering process, 19 defines the outputs of that process. It defines the 20 requirements to be met by each output. What we try and do 23 is to specify this as measurable as possible, but, as I said 22 earlier, not necessarily to constrain the methodology to 23 produce the output.

[Slide.]

MR. ICHIYEN: In terms of fundamental principals

of this high level standard, I'm going to talk about five items. The first is the use of the documentation and the mathematical notation. We feel the documentation must describe the required behavior of the software using mathematical functions written in a notation that has clearly defined syntax and semantics.

7 We feel that using this kind of notation, you end up with more complete requirements. Since it's 8 9 mathematical, you can verify that the complete domain is covered and you can check it. Using mathematical notation, 10 11 as well, requirements can now be uniquely interpreted. Dr. Parnas uses this example of statements. One requirement 12 could be that you shut off the pumps if the water in the 13 14 tank is over the sctpoint for four seconds. In English, 15 that can be interpreted in any number of ways.

16 If you're the software developer, you have to say, 17 well, what does he mean; is that the roct mean square of the 18 level, the average level over the four second the median 19 of the level, when the minimum level is over for four 20 seconds. If you do it mathematically, there's do ambiguity. 21 You know what the requirement really is.

22 MR. WILKINS: Of course, you may not know that the 23 requirement is relevant.

24 MR. ICHIYEN: Yes. B t you could tell if it's 25 right or wrong.

MR. WILKINS: No. Whether it's been met.

2 MR. ICHIYEN: Other people can review it to see 3 whether it's also correct as stated. One of the problems in 4 the English language is that one person interprets it saying 5 that's right, that's what I understand it should be, but 6 that's not what the author meant and it may not be what the 7 software designer actually implemented.

Again, using mathematical notation facilitates use of mathematical verification techniques. That allows the design to be transformed into mathematical functions for comparison to the requirements directly. We've worked with this for quite a long time. Mainly Hydro we been doing a lot of this work.

We've got it to the point we think it's an actually doable task. We're at the point where we're about to start developing tools that will help us with this process.

18 MR. LEWIS: I hate to be a troublemaker, but I've 19 been counting pages. You have about ten minutes to go in 20 your allotted time.

21 MR. ICHIYEN: I may drop the Bruce fueling machine 22 thing. It's been published and talked about, I think, to 23 death and maybe you can ask me questions rather than my 24 talking about.

[Slide.]

25

1 MR. ICHIYEN: The second fundamental principal is 2 that the outputs from each development process must be 3 reviewed to verify they comply with the requirements specified in the inputs to that process. What I mean by 4 5 that is if you think of the development process in three stages; the requirements specification, the software design, 6 and then the coding; the verification process is on each of 7 these outputs. 8

9 You verify that the outputs comply with the 10 requirements as specified on the inputs. Where you use 11 mathematical functions, you can verify these against the 12 inputs using mathematical verification techniques.

13

[Slide.]

14 MR. ICHIYEN: The third principal is on the use of information hiding. I won't go into that in a lot of 15 detail, other than to say that the -- really what it is, in 16 17 simple terms, is that we try and -- this is for the effort for maint inability. What you try and do is to assess what 18 19 areas in the code or things are likely to change over the history of operation and you use this as a guide to now you 20 do your software design into your modules and so on. 21

In addition, you try and design the interfaces to the modules, to reveal as little as possible about the modules' internal workings. This is, again, to help the maintenance aspect.

[Slide.]

1

2 MR. ICHIYEN: Along with the verification and the 3 specifications, we feel that in terms of testing you need to 4 do both systematic and random testing. By systematic 5 testing, the normal things that are done in the industry, 6 you characterize it as white box or black box testing. In 7 white box, you understand, you know what the int ... nal workings of the code are and you test based on that and try 8 and look at c scontinuities and so on. 9

Black box, you treat it as a black box where you don't know the workings of it. You check that the outputs match the requirements that are stated in the requirements. Thirdly, we want to add random testing. We call it statistically valid random testing, which is a contentious issue in the software area.

16 MR. CATTON: Is this a good place to ask the 17 hardware question again?

MR. ICHIYEN: Could you state that one again. MR. CATTON: There's a school of thought that says when you're dealing with embedded software systems, you have to test the system, which means software and hardware, if you want to come to some meaningful conclusion about its reliability.

24 I haven't seen you mention anything about hardware 25 yet.

MR. ICHIYEN: At least in the past for Darlington, we do the integration testing which is how you test the system as a whole. It isn't meant to be as exhaustive as the kinds of testing we do here. In the specifications, if you do the specification right, you're specifying what the computer system must do and then you break it into what the software must do.

8 If you do the verification right, then the 9 exhaustive testing is really on the software. That's where 10 we've been doing in the past.

MR. CATTON: I know that at NASA-Dreiden, they're very interested in taking one of the computers that's tied up with data evaluation and coupling the whole system together and putting -- trying to figure out what would be nice set signals to give it that are a little bit out of sync with what they should be to see how the whole system operates.

18 They actually have people trying to figure out 19 what would be a good set of inputs to really test the 20 system. This is not nearly as critical as your shutdown 21 system.

22 MR. ICHIYEN: When you say testirg the system, 23 meaning --

24 MR, CATTON: They have some hardware. Actually 25 it's from a pilot in the aircraft sending information to the

1 ground, processed by a computer, and there's a human being 2 in the middle. The information is sent back to the airplane 3 and the pilot has to take an action.

4 They're seriously trying to figure out how to test 5 that system in one piece.

6 MR. ICHIYEN: Maybe I need to describe the 7 configuration, because it's not a complicated configuration. 8 In fact, we are testing -- I hope I have a slide of it here.

9 MR. KERR: Ivan. could you restate your question? 10 I don't understand it.

MR. CATTON: I'm not sure I do either. The question half of do with esting the system, the software drive's hardware. Some people feel that if you're going to establish a reliability for this system that includes computer software, you have to look at the whole thing. They even have a name for it now. It's called embedded systems where the software --

18 MR. KERR: Given that that may be valid, is it 19 impossible c establish standards for software and establish 20 standards for the total system and test them separately?

21 MR. CATTON: Personally, I don't know, but some 22 people feel that you eventually have to do the test on the 23 whole system.

24 MR. KERR: I haven't heard him dispute that. But 25 he is not --

MR. CATTON: I don't know if he's going to dispute
 it or not. I asked.

MR. ICHIYEN: I think you need to understand the configuration of it because it isn't a very complicated configuration. This is actually a fundamental principal we've tried to use on safety systems or safety critical systems. We make it as simple as possible.

[Slide.]

8

MR. ICHIYEN: These are what I call the safety 9 10 critical pieces of the boxes, which are the trip decision 11 functions. These are display computers which are used as the interface to the operator. What we do in testing is we 12 13 take this as a system. We know the inputs to it, we know 14 the outputs. We isolate that. We test that with varying degrees of testing. Software is done in smaller chunks than 15 16 unit testing, so on, building up to the random testing that I talked about, for example, would test this whole unit in a 17 18 random way.

Also, we do systematic tests. So I think, in essence, I am saying that we do test this aspect of it. We don't feel we need to test the whole system which is the operator interface, the displays on the monitor system, displays in the control room, to the same degree. We do test them, but to a lower level or higher level, depending on which way you're looking at it.

MR. CATTON: Thank you.

1

2

MR. ICHIYEN: I'll just move quickly to the last one. Another item that we feel is a fundamental principal in our safety critical software design is the use of huzard analysis. Again, I'm pointing this out because we feel that there's not just one technique that has to be used for safety critical software.

9 Hazard analysis is another tool in our tool chest 10 that we use to have a higher assurance of the quality of the 11 software. A simple definition of hazard analysis is that 12 you identify failure modes that lead to an unsafe action and 13 eliminate them or ensure the failure mode can be detected 14 and the system put into a safe state.

Dr. Nancy Levison, I guess, is the key proponent of this because she thinks she calls software fault tree analysis and I guess we've coined the term hazard analysis, and I think she probably uses the same term. She was hired by Ontario Hydro as a consultant during the Darlington licensing period. This is one of the extra features that she brought to the design process.

Basically what she's doing is using fault tree techniques that are used in hardware, applying them to software, looking for events, failure modes, and then seeing what things in the software have to happen in order that you

can mitigate that event.

In my mind, what it does is it gives you a more robustness to your software design as opposed to just meeting strict functional requirements.

MR. CATTON: I, today or actually last night on 45 the airplane, read the results of the workshop that I've 6 handed out here. They waintain that using the techniques 7 that you would for hardware, which usually means random 8 failure for software, is incorrect. Really what you ought 9 to do is go back and use the approach that there's a 10 possible design error, which is different, because if 11 there's something wrong with the software, it's a human 12 13 error somewhere, most likely. The answers you get out of it are different depending on the approach. 14

MR. ICHIYEN: It's identifying what are the others, and those are the key parts of any fault tree analysis. You don't have to identify how those failure modes can occur necessarily. What you try and do is to mitigate those occurrences.

20 MR. KERR: Ivan, under the theory of statistically 21 valid random testing slides that he has, there is a 22 distinction between at least a definition of reliability for 23 hardware and software.

24 MR. ICHIYEN: I skipped over those in the interest 25 of time. I don't know if you want me to go back.

MR. CATTON: I don't know if that statement I made was right. I just read it. I was looking for your response.

MR. LEWIS: We'll take a vote later about whether you're right, Ivan. But I am going to be brutal and try to wrap us up by 9:30 so we can keep on schedule.

[Slide.]

7

8 MR. ICHIYEN: In terms of overall status, as I 9 acid, the safety critical high level standard has just been 10 issued for use internally to AECL and Ontario Hydro. We 11 gian to have the sub-tier standards, procedures and 12 guidelines to be completed by the end of 1991.

13 The methodologies that we're using, safety 14 critical d configurations for systems, we're planning for 15 mid-1991 and that will probably be slipping till probably in 16 the fall. But it's in that order of magnitude in terms of 17 schedule. We're also working on other categories and 18 standards for those categories with really an undefined 19 closure date as yet.

20 MR. LEWIS: I'm going to thank you and assume that 21 we can skip talking about the Bruce incident in the interest 22 of staying on time. I know we would be very interested 'm 23 it and I know we would, therefore, spend at least a half-24 ...our on it. The best way to provent that is at the 25 beginning. Our purpose here today, I hope you understand

from our questioning, is not in any way to provide unwelcome
 advice to our Canadian friends, but to learn from your
 experience while we try to advise our American friends.

So I thank you very much for your presentation. It was very informative. I think we should just go on. I'm told that our next speaker is Ken Scarola, is that correct? MR. SCAROLA: That is correct.

[Slide.]

8

9 MR. SCAROLA: Good morning, gentlemen. My name is 10 Ken Scarola. I am the Manager of Advanced Control Complex 11 Engineering at ABB/Combustion Engineering. I'll be taiking 12 about software reliability issues for NUPLEX 80-Plus which 13 is the advanced control corplex being used by CE for the 14 System 80-Plus ALWR.

I might add that the NUPLEX 80-Plus advanced control complex is also being used for the heavy water reactor, NPR, at this point in time. To address very briefly what I heard about the relationship of hardware and software, I would say that CE would agree 100 percent that these are not separable issues. In fact, I think most of the industry does agree.

You will see, as I present our verification and validation approach, we definitely use V&V as an integrated process on a system basis and then an entire control complex basis. So these are definitely not separable issues.

MR. KERR: Does complex go with control or with engineering in that slide?

MR. SCAROLA: Good point. Probably both. What I'd like to do is address the issues that we believe are the main contributors to software reliability. There are numerous issues. Software reliability does not stand on any one particular issue. It's a building block defense-indepth approach and I think all things must be considered.

9 Certainly we talked about "tdware reliability 10 yesterday. What you're going to see in my slides is many of 11 those same points are now repeated here. In fact, I do have 12 some slides that I will throw in that may not be in your 13 handouts that I used yesterday and I will be at a to get you 14 copies of them.

15 Basically these are the software reliability issues that I will address. What I would like to do is 16 discuss CE's experience basically in between here before I 17 talk about the software design process. I'm going to 18 19 rearrange that from what's in your handout. That will help, 20 I think, set the framework for the software design process because it's based on our experience. Those slides are 21 22 misplaced.

[Slide.]

-3

24 MR. SCAROLA: The first subject that I would like 25 to talk about is what we call deterministic design. The

1 most important part of any software-based system is its 2 simplicity and your ability to prove that it works. When we 3 talk about computer-based systems, they range all over the 4 map. When we talk about systems for a 747 or something like 5 that, I would have to maintain that they are significantly 6 more complex than the types of software that we design for 7 nuclear power plant protection systems.

8 To give you an idea of what we mean by 9 deterministic designs, in our protection systems, the inputs 10 are scanned and processed continuously. There is nothing 11 like what you do in a complex system where you report data 12 changes by exception and then, when the particular data 13 changes, then you process it for that change. That's not 14 the case.

If. our protection systems, we look at the data with every cycle and, in our protection systems, we have less than a 50 millisecond cycle. That data is processed all the time. That's regardless of the state change of the data. Now, that can't be done in or it's difficult to do in very large complex computer systems because you cannot get the performance out of the system.

But if you look at a protection system for a nuclear power plant, and specifically CE's system, there are lo inputs. There is one output, reactor trip. That's not the case for things like DNBR and local power density, but

all of the other trips are, in fact, that simple. It's because of that simplicity that we can run the system on a continuous scan/continuous process basis. So there are no surprises when things change.

4

5 Similarly, the outputs are updated on a continuous 6 cycle. The outputs are not simply updated when the process 7 logic says they need to be. The outputs are always updated, 8 which means that in a protection system, 99.999 percent of 9 the time, the output is updated saying don't trip, don't 10 trip, don't trip.

11 On the one cycle when it doesn't get that update, 12 it trips. That's basically how the system works. Now, all 13 of the programs are run on a continuous basis, meaning there 14 is no multi-tasking as you would sea in most large computer 15 systems. The system does not run with interrupts. All the 16 data is processed on a continuous cycle basis.

Another important fact is where we use 17 18 programmable logic controllers, which is the fundamental 19 basic technology in our protection system, those machines run without branching, which means when they make a 20 decision, they don't go off and do something because of that 21 decision and then come back into the program. They make a 22 decision, they set a flag, the program continues on, and 23 some point later in the program on a continuous basis that 24 flag is recognized and something gets done because of that. 25

1 That's the inherent nature of programmable logic 2 controllers. That's not the inherent way most computers 3 run. Most computers run with branching, with sub-routines, 4 with calls, and you have to force them to run in a 5 deterministic nature. In our CPCs, we do that in CE's core 6 protection calculator because that is an inherently not 7 deterministic computer system.

8 So we have to write the code in a very structured 9 manner that forces it to run in a deterministic somewhat 10 non-branching type of approach. For the simple part of our 11 protection system where we look at analog variables, 12 pressurizer pressure, steam generator levels, we run non-13 branching.

14 MR. LEWIS: I'm really a little bit confused by 15 something here, a distinction you're making which appears to 16 be important, but which I don't understand. My computer at 17 home has a combination of software interrupts and hardware 18 interrupts. Hardware interrupts are branching interrupts, 19 in general, which simply tell the hardware to go off and do 20 something else, and to do something else may involve 21 returning to the original program or may not involve 22 returning to the original program, depending on the 23 character of the interrupt.

But it also has software interrupts which simply set a flag and the next scan time around to see if any

interrows have been activated. It notices whether they're there a switches off to another part of the program which may or may not involve a return to the main program, depending on what it is.

5 I don't see how that differs from what you're 6 describing.

7 MR. SCAROLA: It does in really two senses. First 8 of all, in your system, you're saying there are branches 9 where you may not return to the main program.

MR. LEWIS: Sure. That's dependent on how you write the program.

MR. SCAROLA: In our safety computer systems, that is not acceptable. Where we do branches, we always return to the same point in the program. It's what we call single entry/single exit of modules. That inherently makes the software more predictable that it's going to perform the required function when you want it to.

18 MR. LEWIS: I don't see that because it's 19 predictable either way. It depends on how you write the 20 software.

21 MR. SCAROLA: Not necessarily. When you branch 22 and the number of branches and the number of nests that may 23 occur in subsequent branches, these are the reasons why 24 software very often goes off and does unpredictable things. 25 It does things that you are not able to anticipate, like get

stuck in a loop somewhere.

1

MR. LEWIS: The Bruce event, which I didn't allow the previous speaker to describe, was a case in which, as I understand it, where what should have been a jump to a subroutine was instead written as an absolute jump, and that's what caused the problem. It would have been better if it had been written as a jump to a sub-routine and come back to the main program.

9 MR. SCAROLA: I think the point that I'm trying to 10 make here is that the structure of the code and the methods 11 that you use in coding are fundamental to the ability to 12 predict the performance of the system. I agree that you can 13 establish predictability in very complex systems. My point 14 is it's more difficult.

15 MR. LEWIS: In a sense, what you are saying is 16 you've made a decision that non-return branches are 17 inherently safer than return branches. Is that correct? 18 MR. SCAROLA: No. I'm saying that --19 MR. LEWIS: I'm trying to understand what you're 20 saying. 21 MR. SCAROLA: I'm saying the return branch is when 22 you branch and you return back to the same point in the

23 code.

24 MR. LEWIS: Yes.

2.5

MR. SCAROLA: That is inherently more predictable

4.8

performance than if you branch and subsequently branch again and subsequently branch again and may never return to the same point in the code.

MR. LEWIS: So I had it backwards. You've made a decision that only jumps to sub-routines which return to where they started are acceptable in your world.

7 MR. SCAROLA: Right. What I'm saying is that's 8 the approach that we take for complex calculations, such as 9 DNBR and local power density. For simple things, such as an 10 analog functional trip on low steam generator level, low 11 pressurizer pressure, the code works even more predictable 12 than that. What I have is a slide here that's not in your 13 package.

14

[S1\_de.]

MR. SCAROLA: This basically is a mapping of how 15 the software executes in our programmable logic controllers 16 17 independent of what the system inputs are doing, meaning the software follows this path every time. If you were to map 18 the software execution cycle of a conventional computer, you 19 would see that the mapping is all over the place. It zig-20 zags, it goes out, it comes back, it goes to many different 21 22 places.

A programmable logic controller inherently runs in
a deterministic cyclical manner. It never changes its
execution.

MR. LEWIS: Forgive me for being stupid, but I'm really trying to understand. It is your belief, then, that there is really no case in which it is preferable to leave the main program and never come back to it.

5 MR. SCAROLA: No. I can't say that, that is not 6 true.

MR. WILKINS: Let me try something.
MR. LEWIS: Go ahead. Try to explain.
MR. WILKINS: I don't operate quite at the level
of sophistication that these guys do, but on my computer
I've got a go-sub order, and that's okay. After the go-sub
order, you return. But go-to is not okay.
MR. LEWIS: I understand that.

25

14 MR. WILKINS: That's not what he's saying?

MR. LEWIS: I think that's what he's saying. In fact, that's nommon belief among computer scientists. In fact, when C was written, they originally didn't want to put the go-to into it. Now they've put it in, but they said it's strongly counter-productive.

I guess I'm asking -- there are, believe it or not, cases, using your analogy, in which go-to really is the right thing to do; that is, if you get a signal that tells you that the reactor has broken in half, you don't want to come back to the original program.

MR. SCAROLA: I'd like to move on and just point

out that the simplicity of the software execution is
 important.

MR. LEWIS: That's certainly right. In fact --MR. SCAROLA: We can accept that and recognizing there are many ways to make the code simple, we have selected one that we think is the simplest approach. There are others. I'd like to leave it at that.

8 MR. LEWIS: I couldn't agree with you more. In 9 fact, what Ernest was saying, which was that the use of go-10 to is discouraged, is certainly gospel among modern computer 11 scientists. Good programming practice does not use go-to. 12 It uses modular systems, it uses predictable systems.

13 If what you're saying is that one should use good 14 software practices, then I have no problem at all.

MR. SCAROLA: Thank you.

16 [Slide.]

17 MR. SCAROLA: The second important contributor to software reliability is the use of field-proven executive 18 software. In NUPLEX 80-Plus, all of our software-based 19 20 systems are composed of off-the-shelf commercial products 21 with extensive field-proven industrial experience. Now, this includes programmable logic controllers, as I 22 23 mentioned. We do use PC ATs. There are minicomputers, CRT workstations, etcetera. 24

25

15

All of these systems are bought with executive

software, meaning software that has been in use in the field for handling things such as the input/output processing, the arithmetic functions, communication drivers, and failure detection inside the system itself.

5 This software is what you would call \_\_usable 6 code, a code that has extensive operating experience, 7 thousands of applications. So we attempt to use reusable 8 code as much as we possibly can because we believe that 9 field experience is the best validation source.

10

[Slide.]

MR. SCAROLA: Now I'd like to talk about CE's experience. CE has been designing software for safety systems since the mid-1970s, basically with the core protection calculator for ANO-2. That was our first experience. Since then we have put CPCs in all of our plants.

In addition to CPCs, we have done safety systems for monitoring, accident monitoring, safety parameter display and others. The thing I'd like to point out is that for the CPC ~ituation, we have had basically more than 800 software modifications, what we call software change requests, since the installation at ANO-2.

Ninety-nine percent of these software change
 requests have been functional design changes, not software
 errors, not software buys. These 99 percent are the types

of things that would show up in a hardware-based system as well as a software-based system because the root cause is the functional design process, not the implementation process.

5 We do a very good job of writing software to wrong 6 requirements. Software runs the way the wrong requirement 7 told it to run. The other point that I'd like to make is 8 that in all of the operational experience that we have with 9 the CPCs and where it did things that we didn't intend it to 10 do functionally, none of those have resulted in failure to 11 trip conditions. All of the software errors have been what 12 we would call spurious trip conditions.

13 The point that I'd like to make on this slide is -14 -

MR. KERR: Is that because you were clever in your design or was it just a fortuit to the circumstance?

MR. SCAROLA: No. I think it's inherent in the fail safe nature of the design. What we do is we force the system to go into a trip condition in any situation that you might call a system not knowing what it should do. So we force it to trip under any failure situation.

We have never had a situation where it would not trip.

24 MR. KERR: Thank you.

25 MR. LEWIS: When you say failure, you mean



software failure, is that right? When you said in any
 failure situation.

MR. SCAROLA: Yes. I mean software failures in this slide. I would have to go back and research whether or not I could say that same thing about hardware failures, but I believe it's the same with hardware failures.

7 MR. LEWIS: Because the question of the Rancho 8 Seco event came up a little earlier and this was an example 9 of a place in which a failure in a power supply resulted in 10 not a software failure, but in incorrect inputs to the 11 software which then did what it was supposed to do, and 12 nearly brought on a really monumental accident.

13 MR. SCAROLA: I think that you're really 14 reemphasizing my point. We're looking at software as a 15 potential introduction of new failure modes into a system 16 when, in fact, the hardware relationship to software is 17 probably more dominant and the functional relationship is 18 more dominant.

MR. LEWIS: We don't disagree about that. Paul? MR. SHEWMON: With regard to the 99 percent functional design changes, not software, does that, again, reemphasize your point that it's the interaction of the hardware --

24 MR. SCAROLA: No. It's the interaction of the 25 functional designer to the system designer. In other words,

what I'm saying here is that 99 percent of these 826 changes
 were functional design algorithm changes where we decided
 that the algorithm was way too conservative and we had to
 relax the requirements.

5 We were getting spurious trips in situations when 6 we should not be getting trips. So the root cause of these 7 changes are functional design changes, not software/hardware 8 coupling at all.

9 MR. SHEWMON: I guess that's too subtle for me to 10 see it.

MR. WILKINS: He's saying it's a setpoint.

MR. SCAROLA: Yes. Maybe spurious is the wrong word. What I'm saying is the algorithm executes, makes a decision that says you should scram. A functional designer went back, looked at that algorithm, did some analysis and said if we're really in that situation, we don't need to scram, so let's change the algorithm.

18 MR. CARROLL: Unnecessary scram.

11

MR. SCAROLA: Unnecessary would be a better word. Spurious is not the correct word, I'm sorry. That is the background. Let me say that in addition to safety system software, we have been designing software-based control systems. Our first installation was at SONGS. We have installations at LPNL and all subsequent plants beyond that. Some of them are extremely difficult installations

where we used software-based systems right next to the power supplies that run our mag jacks. So when we talk about harsh EMI environments and the effect of EMI on hardware and software-base systems, we have extensive experience there, as well.

1

3

4

5

6 Now I'd like to talk for a minute about the 7 software design process and the software documentation 8 process that we use. First of all, as emphasized on the 9 slide before, we need an early focus on establishing what 10 would be really correct requirements and specifications. I 11 think this is a problem that's recognized by the industry as 12 the biggest contributor to the bad name that software has 13 gotten in the industry.

14 It's not that the people who write software do it 15 wrong, it's that the people who establish the requirements 16 don't do it correctly. So we put a lot of emphasis on the 17 requirements for the system both from a hardware and 18 software point of view and what we call functional 15 decomposition point of view. We decompose the functions 20 down into small units so they're very understandable and 21 manageable on a module basis.

We use standard coding and documentation techniques, things like deterministic coding. We have software standards, guides that tell the programmers what they can do when they program and what they are not allowed

1

2

3

4

5

to do, like going off in a branch and not returning.

MR. KERR: I guess I don't understand the relationship between the first bullet and those two things that follow it. The first bullet seems to say that correct requirements and specs weren't -- and then the second one --

6 MR. SCAROLA: What I'm trying to say is we 7 establish functional requirements for a system, and I'll 8 show it better on the next slide. Why don't I get through 9 this slide, and then I'll show it better on the next one. 10 Then we have a verification and validation program that I'll 11 spend more time on, and then, lastly, extensive 12 configuration control over the life of the product.

For example, I showed you on the CPCs, we track every CPC modification on every plant. That applies to both the purchased software -- when we buy an executive system, we know the rev of that executive system and we follow any modifications that the original designer of that software makes.

MR. KERR: That doesn't apply to your CPCsoftware, I presume.

21 MR. SCAROLA: It does apply for CPC software. I'm 22 sorry. Maybe you should ask your question again.

23 MR. KERR: I got the impression that you had 24 developed your CPC software and it wasn't off-the-shelf 25 purchased.

1 MR. SCAROLA: Yes. Excuse me. I answered the 2 wrong question. CPC is not purchased software. CPC is CE 3 custom software.

4

16

[Slide.]

5 MR. SCAROLA: Now maybe I can answer your question 6 on the relationship of the requirements to hardware and 7 software. Basically, the system development process, in a 8 very simplistic format, looks like this, where we establish 9 system requirements and these Vs are basically verification 10 points.

MR. KERR: From this distance, to me, that looks very fuzzy.

MR. SCAROLA: Have you got this in your handout?MR. CARROLL: We do.

15 MR. KERR: I'm looking for it.

17 requirements and the V is a verification activity. My point 18 is verification does not start at the software cycle of the 19 system design. Verification starts at the requirements 20 cycle. At this branch here is where we take the 21 requirements of the system, we define a system description, 22 but then we break those requirements into the allocation 23 between hardware and software.

MR. SCAROLA: That this says is system

We basically say for this system, this is what the hardware has to do and this is what the software has to do.

1 Then we verify those with regard to the original 2 requirements. In other words, are we meeting the original 3 requirements that have been established by the designer.

The hardware and software then will eventually come together in with we call system integration performance testing. We do this test -- this is actually called validation. But then we verify the test results again. So that was the point I was trying to make in the last slide where I said hardware/software.

We establish requirements for the system, break them into the allocation between hardware and software.

MR. LEWIS: You used the words validation and verification very quickly there. You said you validate it, then you verify the results. Can you expand on what's meant by those words?

16 MR. SCAROLA: Sure. Verification is used to mean 17 essentially the review process that the documentation 18 reflects the requirements in the previous step.

MR. LEWIS: So the verification is about the documentation.

21 MR. SCAROLA: It's also used in the software 22 coding process. You'll see over here that we write software 23 code. We design hardware in parallel. But the software 24 goes through module testing. That testing is a form of 25 verification on a software module basis and we actually

1

review the test results with formal verification documents.

2 MR. LEWIS: How do you distinguish between 3 validation and verification?

MR. SCAROLA: Verifice on is part of the step-bystep process. Validation is the integrated test at the end of the process where you say the integrated system goes back to the original requirements and meets what was established as an original requirement. Now, that's a validation test.

9 What I said is when you do a test, you write a 10 test report. That test report is then verified again.

11 MR. LEWIS: I'm just trying to make sure in my own 12 mind. So you use the words verification and validation to 13 mean different things from what the computer science 14 community uses them for. There's no problem will that. I 15 just needed to know what was going on here.

16 MR. SCAROLA: I didn't realize that I used \*.nem
17 differently.

18 MR. LEWIS: You do. They teach courses called 19 verification and validation in which the words mean entirely 20 different things.

21 MR. SCAROLA: I'll leave it at that. 22 [Slide.]

23 MR. SCAROLA: Something that might help to 24 understand the V&V process is this table where, on the 25 lefthand side of the table, we identify essentially the

documentation that is produced for a particular system. These are three levels of system importance in our design, from non-safety systems over to what we call safety systems

1

2

3

This basically identifies who produces that document and who reviews that document. We have an established program and that's applied to all systems in the design. We often talk about the level of independence of the verifier from the design process. This particular table identifies what the minimum level of independence needs to be.

Some of the things you'll see, we review the design process by the requirements team. Those people that set the requirements get involved in the review of the system descriptions and specifications, for example. Those people that establish the requirements actually do the final testing of the system.

This is a mapping that gives our system designers guidance as to how to do verification and validation for any particular system.

20 MR. LEWIS: I hate to interrupt you, because I am, 21 in the end, going to tell you we're running out of time. It 22 will be my fault. The people who do the verification are 23 not the people who have written the software. You are, for 24 the most part, buying commercial software and adapting it. 25 I'm trying to understand how you're putting this together.

MR. SCAROLA: We do both. We write custom
 software and we buy commercial software.

MR. LEWIS: The people who review the system presumably don't review the commercial software because that's written and often hidden from you, I would imagine.

6 MR. SCAROLA: What they review is the 7 configuration control on that commercial software. In other 8 words, we impose configuration control requirements on the 9 supplier. We go back and review the traceability and the 10 history of that software.

MR. LewIS: That's fine. In terms of the software you write yourself, the reviewers are not the people who wrote the software.

14 MR. SCAROLA: That is correct.

25

MR. LEWIS: And you know perfectly well that there is nothing harder in life than reviewing a code that someone else has written, even with the best of software practices. There's been lots of experiments and lots of mistakes get th: bugh the second and third iteration of third parties reviewing the software.

In the term configuration controls, do you include protection against sabotage?

23 MR. SCAROLA: I will address sabotage. Let me 24 move on.

MR. KERR: Our previous speaker, Mr. Ichiyen,

seemed to indicate that they had difficulty arriving at appropriate standards of performance for software. That was the impression I got. Do you have standards for performance that are adequate or appropriate, in your view, and, if so, where did you get them?

6 MR. SCAROLA: The standards for performance are 7 defined by the verification and validation team. They 8 establish them on a system-by-system basis. We are not 9 using an IEEE industry standard for software performance, 10 for example, because we don't know when it exists.

11 From an industry point of view, we have the same 12 problem. The way we handle that is internal system-by-13 system basis, we establish standards for performance.

MR. KERR: Thank you.

14

MR. CARROLL: How would I go about getting confidence that the standards that you've established internally are really the right standards or how would the staff get confidence of that?

MR. SCAROLA: The staff will review our verification and validation program. They will be able to audit the results of all the verification and validation steps. I think it's a process-related level of comfort. It's not a bottom line level of comfort that you can get.

I don't think you can establish software
reliability at the bottom of the process. I think you have

1

2

to establish it throughout the process.

[Slide.]

MR. SCAROLA: Another contributor to software 3 reliability is segmentation. What we mean by segmentation 4 0 is that we take a system and we break the system functions into smaller units to basically execute on smaller 5 7 processors. This allows complex software to become simpler by breaking it into little pieces to execute on small 8 machines. This adds a level of defense-in-depth, not the 9 10 final level of defense-in-depth, against common mode 11 failures because it introduces functional differences between the different processors that have to really execute 12 23 the software.

It introduces software coding differences. Because the machines are running asynchronously, the execution times are different. So they're exposed to different real world conditions. Because no one machine is the same as any other machine, it does introduce hardware differences, as well. So by segmenting functions, you do get some level of protection against common mode failure.

21 Another thing that segmentation does is it 22 partitions the more probable failures, which are actually 23 hardware failures, into manageable units.

24 MR. WILKINS: Of course, there's a price to pay, 25 isn't there? You have to cement the segments back together

again somehow.

5

25

Q.

| 2 | MR. SCAROLA: To a certain extent, yes. I think I      |
|---|-------------------------------------------------------|
| 3 | can explain in the case of the protection system what |
| 4 | segmentation means fairly simply.                     |

65

[Slide.]

6 MR. SCAROLA: This table summarizes, on the 7 lefthand side of this table are all of the events, the 8 design basis events that our protection system is designed 9 to protect against. Then across the top we have all the 10 signals that the protection system monitors; basically, all 11 of the reactor trip conditions.

12 The numbers are the processors that actually 13 handle that trip function. So if we look at one of these 14 for a simple case like a feedwater temperature decrease, 15 we're monitoring Steam Generator 1 pressure by Processor No. 16 1, the second steam generator pressure by Processor No. 2. 17 In this particular case, we will also get a trip from the 18 CPCs on DNBR.

So for this particular event, there are three machines running asynchronously, all trying to protect the plant. That's what we call segmentation. Now, I agree that these three functions have to be cemerted back together and the protection system does do that. It basically says you should get a trip on any one of these.

What makes it a little more complicated is we do

1 look for like coincidence among redundant safety channels, such that the A channel can't say I have a trip on DNBR and 2 the B channel say I have a trip on steam generator level, 3 and then you end up with a plant trip. 4 So we do look for like coincidence and that does 5 force inter-channel communication. 6 MR. CARROLL: What is VOPT? 7 8 MR. SCAROLA: VOPT is variable over power trip. MR. KERR: When you finish this process, are you 9 10 able to predict with some degree of confidence the 11 probability of failure per demand to trip? 12 MR. SCAROLA: We do put a number on it. That is 13 part of the availability analysis for this system and then that gets factored into the PRA. 14 15 MR. KERR: I don't understand the phrase "we do 16 put a number on it." 17 MR. SCAROLA: Well, what you said is to what level of confidence --18 19 MR. KERR: No. I said can you predict to a 20 reasonable level of confidence. Can you give me a number, 21 some indication of --MR. SCAROLA: Yes. By the standards that we use, 22 23 we can put -- we do put numbers on these things, and the reason that I qualify that is the industry right now has 24 25 difficulty putting a reliability number on certain

1 unreliability contributors, such as human error and software 2 reliability.

3 So to the extent that we can put numbers on 4 meantime between failure, meantime to repair, and we can 5 analyze failure modes and effects analysis, we do put 6 numbers on these things. The other contributors are not 7 well handled yet.

8 MR. CARPOLL: But wouldn't that be the same number 9 if this were an analog system?

10 MR. SCAROLA: No. Actually these are higher 11 numbers because the MTBFs on these systems are much lower 12 than analog systems and the meantime to repairs are much 13 shorter.

MR. CARROLL: So it does include hardware other than sensors.

16 MR. SCAROLA: Yes. The sensors are, in fact, the 17 same. There is no difference. I said segmentation gives a 18 level of defense against common mode failure. I'd like to 19 talk about another level of defense, and that's diversity.

[Slide.]

20

21 MR. SCAROLA: . operating plants today, we have 22 significant diversity. It's not necessarily by design, but 23 rather by the nature of the analog technology that we use in 24 that if you have an analog circuit that has to do a 25 particular function and then you have to define another

function, it usually takes a different analog circuit. So there are many different types of analog circuits.

1

2

25

It also relates to the number of people, the number of subcontractors that have gotten involved in the control complexes for nuclear power plants. This excess of diversity does give you a lot of defense-in-depth against common mode failure, but it may actually detract from plant safety because we have problems training maintenance personnel.

We have difficult repair times, long repair times because of that. We, of course, have spare parts availability problems. Spare parts availability is becoming even a bigger concern now with the obsolescence of analog --MR. KERR: I want to applaud somebody who has the courage to guestion the gospel of diversity.

MR. SCAROLA: I'm sorry? I didn't hear the question.

18 MR. KERR: I want to applaud you for having the19 courage to question the gospel of diversity.

MR. SCAROLA: What we do in NUPLEX 80-Plus is we maximize standardization. We use standardization to the maximum extent possible, but we do maintain a minimum level of system diversity to offer what we call the final defense against common mode failures.

We employ diversity as a minimum in all software

1 based components of our systems. We do also employ it in 2 some of the hardware based components of the system where 3 required by rules such as the ATWS rule.

[Slide.]

4

25

5 MR. SCAROLA: To give you an idea of what that 6 means in NUPLEX 80-Plus, on the lefthand side of this slide, 7 I identify major functions. Over here, I identify Design 8 Type 1 which is System No. 1 that accommodates that function 9 and then Design Type 2 which is the diverse design system 10 that can also accommodate that function.

What we do is we basically analyze that for every major function in the plant, such as reactor trip or for all of what we call critical functions, we have diverse means of accommodating that function or maintaining that critical function, whatever it might be.

We extend that as well to the information that the operator uses inside the control room. This may look somewhat complex, but, in its simplistic format, what this means is that Design Type 1 are all the safety systems in the power plant and Design Type 2 are all of the control systems in the power plant. So we force basically diversity between control and protection.

23 MR. KERR: Do you have different standards of 24 reliability for the two?

MR. SCAROLA: Certainly. The protection systems

have higher standards of reliability because of the
 redundancy and single failure criteria that they have to
 meet.

4 MR. KERR: How much less reliable do you permit 5 the control systems to be?

6 MR. SCAROLA: I can't answer that question. I 7 don't know that we have a number that's an acceptance 8 criteria. What I can tell you --

9 MR. KERR: To a certain extent, it seems to me 10 that there is a good bit of artificiality in separating 11 control and safety systems. The safety system is simply a 12 control system that needs to be fairly reliable. Some of 13 the other control systems maybe don't need to be as 14 reliable. You haven't really thought much about the 15 required liability of control systems.

16 MR. SCAROLA: It's not that. It's when you impose 17 the requirements, irrespective of reliability, when you 18 impose the requirements that we have to impose on protection 19 systems to meet single failures and to have periodic 20 testability, and when you extend that back to power supplies 21 and HVAC and everything else which you don't do on the 22 control systems, that's where you find that you have the major contributors to unreliability of the control systems. 23

24 MR. KERR: But, it seems to me, unreliable control 25 systems can increase risk and can increase risk

significantly. If you look at LERs and other incidents, you
 find case after case in which you get trips because of an
 unreliable control system.

MR. SCAROLA: I won't disagree, but we are making the control systems orders of magnitude more reliable than we made them in the past. We do do reliability analysis on all of our control systems.

8 MR. KERR: You said you were making it orders cf
 9 magnitude more reliable than something, and I didn't --

10 MR. SCAROLA: Than what we did in the past in 11 control systems.

12

MR. KERR: Thank you.

13 MR. SCAROLA: I do have a slide -- I probably 14 won't have the time. When I talked yesterday, I talked 15 about fault tolerance and what we do to have fault tolerant 16 control systems.

MR. MICHELSON: Excuse me, before you leave that slide. Your alarm and indication uses multiplexers, I guess, to get the information to the control room. Do you use different multiplexers for Design Type 1 than Design Type 2?

22 MR. SCAROLA: Yes. Anything that relies on 23 software in these systems is different. Multiplexer 24 certainly relies on software. The data communication relies 25 on software.

MR. MICHELSON: So Design Type 2 has a dedicated set of multiplexers to get all of its information. Is that right?

1

2

3

MR. SCAROLA: Yes, to a certain extent. For example, if I look at a specific parameter, and let me take an example of pressurizer pressure. We monitor pressurizer pressure with both Class 1-E sensors and non-Class 1-E sensors. The non-Class 1-E sensors come into the system that we call the process component control system.

10 They are multiplexed into the electronics by that 11 system. The safety-related sensors come into the protection 12 system. So at that level, the multiplexers are totally 13 independent. So when we get up to the monitoring systems, 14 we combine all the information on both sides of this line. 15 This system shows both safety and non-safety. This system 16 shows both safety and non-safety.

MR. MICHELSON: So it came in with dedicated
 multiplexing, but it was then combined at the display level.

19 MR. SCAROLA: Combined, but into different diverse 20 systems. In other words, both of these systems feed this 21 one and both of these feed that one independently.

22 MR. MICHELSON: But that's only in the control 23 room.

24 MR. SCAROLA: And at the remote shutdown panel. 25 Control room and remote shutdown panel. I should have

brought a block diagram. That may have helped. I don't 1 have much time, so let me talk quickly about sabotage 2 3 protection, since that was asked. 4 (Slide.) 5 MR. SCAROLA: Sabotage protection is an important issue. We handle it in several ways. First of all, we 6 7 maintain configuration control during the design, construction and the operation of the plant. That's very 8 9 important. Second, we physically separate into separate rooms 10 the four channels of our protection system, and those are 11 separate from the non-safety system. I think there's a 12 13 slide in your package on that. That's this next slide. 14 [Slide.]

MR. SCAROLA: What this basically shows is that there are four separate secured rooms, separate from a security point of view, for the four channels of the protection system. Those four are separate from the nonsafety equipment room.

We further have room access and equipment access security alarms. In other words, when you go into the room, the room is alarmed even though it is under configuration or it is under security control, as well. But then once you get inside the room, you have to get inside a cabinet. Those cabinets are locked and, when you open the doors, that 1 is alarmed, as well.

| 2  | The final protection against sabotage is that in            |
|----|-------------------------------------------------------------|
| 3  | every system we do a continuous program memory checksum.    |
| 4  | That checksum is reported continuously to the data          |
| 5  | processing system, which is our central plant computer. So  |
| 6  | if, for some reason, that memory in the machine is altered  |
| 7  | either because of an electrical fault or because of a       |
| 8  | maintenance error or even sabotage, the plant computer      |
| 9  | system will identify that there is a memory checksum error. |
| 10 | So that will be an indication that something has gone wrong |
| 11 | in that system.                                             |
| 12 | MR. LEWIS: In cases where the change in the                 |
| 13 | memory is intended, an update or something like that,       |
| 14 | there's some kind of personnel control associated with      |
| 15 | changing the checksum record.                               |
| 16 | MR. SCAROLA: Yes, in both machines.                         |
| 17 | MR. LEWIS: It's really a checksum that you use,             |
| 18 | not a CRC?                                                  |
| 19 | MR. SCAROLA: I say checksum in the simplistic               |
| 20 | sense it varies system-by-system.                           |
| 21 | MR. LEWIS: I admire your speed-up. Are you                  |
| 22 | almost there?                                               |
| 23 | MR. SCAROLA: Let me just show you one more slide            |
| 24 | that I used yesterday because I think it's important in     |
| 25 | understanding software reliability also. That's automatic   |

testing.

| 1.1  |  |   |
|------|--|---|
| 1.10 |  |   |
| 10.1 |  | ÷ |
|      |  |   |

1

[Slide.]

MR. SCAROLA: Historically, we talk about automatic testing in the sense of what does the machine do to test itself, then we forget about software. So we look at things like are we able to read and write from memory; does the CPU run; can we do communications; but none of that tells you that when the software needs to execute, that it will, in fact, execute properly.

In our protection system, we include continuous on-line automatic functional testing, meaning that in the protection system, we force the input to go into a trip state. That propagates through the system. It executes the software algorithm as if there was a trip, but we do it in one channel at a time.

We do it very quickly so that it does not propagate in the event that you get a second channel failure. We do it in a way such that a valid trip will always get through, that the automatic test will not get blocked. That's another layer that we put into the design to enhance software reliability.

22 With that, I thank you.

23 MR. LEWIS: We thank you very much. Everyone has 24 done well in terms of staying on time. In that case, with 25 the power vested in me by the system, I'll give us a 15minute break. Come back in 15 minutes promptly.

[Brief recess.]

1

2

3

4

5

6

7

MR. LEWIS: Can we begin? I apologize that we're running even a few minutes more late. My understanding is that your talk is not proprietary and can be open, is that correct?

MR. REMLEY: That's correct.

MR. LEWIS: The two after it, both before and 8 after lunch, do have proprietary parts. I've been asked if 9 10 the people who are going to give those could consider the 11 possibility of treating their proprietary parts separately so that the audience out here can stay for as much as 12 13 possible of their talks instead of having to leave for the 14 whole thing because there are a few proprietary parts. I'll 15 leave that to their judgment, but that's a plea I've been 16 asked to make and I have just made it.

17 You are Gil Remley?

18 MR. REMLEY: Yes, I am Gil Remley.

19 MR. LEWIS: Very good, We are yours.

MR. REMLEY: My name is Gil Remley. I'm with Westinghouse Electric Corporation in the Process Control Division. Presently I'm responsible for the design of the integrated otection system which is Westinghouse's generic protectice and design, and also the design of the primary protection system for the Sizewell B plant in England.

The package I've given you contains the overheads which I was going to use, and also there are two papers attached to the back of that package. Both of these papers were given at IEEE conferences in the United States. I thought they were particularly relevant to the topic.

6 One is one software diagnostics or software and 7 hardware diagnostics in our systems. The other one is on a 8 protocol for interface between multi-processors within a 9 system. I'll be talking about that in a little more detail 10 as I go.

11 MR. REMLEY: What I'm going to discuss is the 12 software design primarily associated with the reactor 13 protection and control equipment. However, a lot of what I 14 say will be applicable especially to the rest of the plant 15 I&C for control and data acquisition and, to some extent, 16 also for the equipment that's used for information display.

17 However, just so I can contain the topic right 18 now, I am going to concentrate on the protection and control 19 software designs. But if we want to, we can also ask 20 questions in the other areas and I'll try to clarify the 21 differences, if you want.

22 [Slide.]

23 MR. REMLEY: The basic equipment in this design is
24 based on distributed digital processing technology,
25 particularly microprocessor technology. We believe that

there are significant benefits to be achieved by the use of this technology. This is a list of those benefits.

1

2

The design that I will discuss is characterized by the following points. It's a modular design. It uses digital technology. High performance elements are used where necessary. It is distributed processing in the sense that the processing is physically distributed, as well as functionally distributed.

9 It uses data highway and datalink communications. 10 It is physically distributable. There is a hierarchical 11 architecture for the communication and data transfer within 12 the distributed digital processing system. Extensive use is 13 made of fiber optic cabling. The design is characterized 14 almost completely throughout by being fault tolerant. There 15 is a clean separation between safety and non-safety.

We've implemented improved control and protection algorithms. Presentation of information the main control room is done in context with navigational aides. That wasn't the point I'm going to get into a lot of detail on. As I said, that's in the upper part of the design. But I can attempt to answer questions there if you're interested. MR. KERR: Is this navigation of neutrons?

23 MR. REMLEY: No. It means navigation of 24 information. It means navigation through presentation of 25 information. It's a way of accessing data or information.

MR. KERR: Thank you.

[Slide.]

1

2

3 MR. REMIEY: The designs that we've been working 4 on Westinghouse have undergone pretty extensive licensing review to date. This chart depicts that. There are some 5 high points on the chart. I think one is the original 6 7 concepts that we developed were associated with the original design called the integrated protection system, which was a 8 hybrid design. It used both analog technology and 9 10 microprocessor technology.

In the United States recently we've had several applications of this type of technology in plants in the U.S. The South Texas plant was mostly associated with the display of safety information. The Prairie Island plant was a microprocessor-based digital feedwater control system. The Sequoyah plants are a replacement of the process protection racks.

In addition to that, we've been developing this design in conjunction with many countries around the world. In France, we had a joint program for the development of IPS and the SPIN system, which is their microprocessor-based protection system in the French plants.

In England, as I mentioned, we're applying this technology as the primary protection system on the Sizewell B plant. We are using it on the APWR plant in Japan. It

was applied on the Italian reference plant in Italy.

[Slide.]

1

2

MR. REMLEY: I'd like to start off with talking about the software process that we use for development of the software and these systems. The first step in cur design process is what we refer to as a requirements capture process. I agree with the previous speaker that mentioned that this is a very important step in the design process; to try to capture the requirements of your system.

10 However, what we do at this point with our 11 particular requirements document is to try to consolidate 12 and structure the various requirements that we get from 13 numerous sources because you get requirements from many 14 areas when you're trying to put together a design of a 15 protection or a control system. You get functional 16 requirements from the functional designers. You get 17 industry standard requirements.

18 All this needs to be organized and sorted and 19 defined with respect to an implementation that you would 20 have in mind for the system. I do believe that there are 21 significant industry standards available on which to base 22 your requirements on. I have a list of them later. I think there are many standards groups that have been working over 23 24 the past 15 years in this area that have done work to 25 establish requirements for these types of systems.

I think you have to go and interpret that within the context of what you're going to try to do. That's one of the key steps that we do in the development of our system design requirements document.

5 The next step is to produce a document which then 6 defines your particular implementation given that you've set 7 out to achieve these requirements. At this step, one of the 8 key things to do is to modularize the design or partition 9 the design between hardware and software. This does occur 10 in our system design specification document.

11 So it's the coordination document between the 12 hardware and the software, because there is an intimate 13 relation between the hardware and software in these designs. 14 One of the things it does is it establishes the architecture 15 for the system.

16 MR. KERR: Is there some sort of process of 17 performing that division of responsibilities between 18 hardware and software or is that left to the judgment of the 19 designer? Do you have a prescription for doing that? Have 20 you gotten that far along?

21 MR. REMLEY: Well, a cookbook prescription, no. 22 But I guess it's really based upon, in a large sense, the 23 traditional way people approach these problems. Maybe I'll 24 try to explain that in a second. And also the availability 25 of certain technology at some point in time.

Traditionally, certain functions are handled in certain ways because of boundary conditions in the plant. For example, the sensors generally produce analog signals. So that ends up being a given in the system design specification. You could revisit that and say you want the sensor to produce a digital signal.

3

4

5

6

But if you take that as a boundary condition, then 7 that starts to partition the hardware and the software in 8 the system. The second thing is that you have to work with 9 available technology. You're limited by available 10 technology. So certain things that were implemented in 11 12 software five years ago are now implemented in hardware. For example, cyclic redundancy check algorithms for 13 datalinks. 14

15 They are now embedded into the chip that handles 16 the protocol for the datalink communications. Before, that 17 was something that would be handled in software. You do, I 10 believe, in the practice, want to push as much of that 19 functionality to the hardware as you can, because I think 20 there are significant performance benefits and operational 21 base benefits you can get from that.

22 So I guess it really depends on your interface 23 boundary conditions, that establishes an awful lot, and also 24 what technology, what available technology you have to work 25 with at any point in time.

## MR. KERR: Thank you.

MR. REMLEY: So at this specification level, you do come up with an architecture for a system in partitioning between hardware and software.

[Slide.]

3

4

5

MR. REMLEY: This just happens to be a depiction 6 of a particular implementation of our integrated protection 7 cabinets. You see that we partitioned it into several 8 microprocessor subsystems. Last year at this time I 9 explained the rationale for having two reactor trip and two 10 engineered safeguards, the trip logic computer and a "uclear 11 12 instrumentation system, and some support subsystems . .ed the communications and the automatic tester systems. 13

14 These systems in our design are multi-processor 15 systems. That is within each subsystem, you have several 16 processors working together to perform the function of the 17 subsystem. Typically we have one host computer and then 18 several slave computers. The slave computers are mostly 19 oriented toward handling input/output functions that have to 20 be handled at very rapid speeds.

So what you do is you offload the host processor in the area where very rapid performance has to be achieved by using slave processors and then providing a way to exchange information between the two. This is a way to minimize the need for interrupts in multi-tasking, which we

1 do not want to put into the design.

2

3

4

5

So what we have done is we've distributed the processing and offloaded the higher performance needs into slave processors.

[Slide.]

6 MR. REMLEY: In the protection and control 7 systems, we have three slave processors; one associated with 8 analog inputs. We use this because we want to do digital 9 filtering to improve the accuracy and the speed of the 10 conversion from an analog signal to a digital signal. Also, 11 we have slave controllers for datalink and data highway 12 interfaces.

MR. LEWIS: Why is the digital filtering called intelligent AD?

MR. REMLEY: It was just a term that the software engineers used.

17 MR. LEWIS: Advertising.

18 MR. REMLEY: It's really just a --

MR. LEWIS: Gotcha. I understand. I won't press
20 the point.

21 MR. REMLEY: One of the attributes of the system 22 specification is to do this partitioning, as I mentioned, 23 between the software and the hardware. Another one is to 24 capture the functional requirements or the functional design 25 in a way that we believe it can be fed back to the functional designer in a way that he can understand what the implementation is going to be in the system so that it can be verified in a relatively straightforward way.

[Slide.]

1

3

4

20

5 MR. REMLEY: So as part of this document, we also 6 have a section that's associated with defining the 7 protection and control functions in logic diagrams. This is 8 an example of a logic diagram. It, in fact, is sort of a 9 hybrid data flow diagram that explains the interface between 10 the software and the hardware.

It also then defines, in a graphical way, in a 11 high level way, the protection or control function that is 12 to be implemented. I think this goes a long way into trying 13 14 to coordinate the interface between the software engineer who is going to program this in the system and the 15 functional design engineer who is specifying the functions 16 and making sure that the software engineer has a correct 17 interpretation of what the safety function is supposed to be 18 or the control function before he proceeds with the design. 19

21 MR. REMLEY: Excuse me? Okay. A partial reactor 22 trip, in our terminology, is the system is basically a two-23 out-of-four design and each channel set can produce one-24 fourth of the input to the final trip gate. We refer to 25 that as a partial trip on a particular function. Does that

MR. KERR: What is a low partial reactor trip?

answer your question?

1

10

65

2 MR. KERR: Except you haven't told me what a low 3 partial trip is.

MR. REMLEY: Low means that it is coming off of a low bistable. Low I think goes with the description of the function, not with the -- it's not an adjective for partial trip. Partial trip is something --

8 MR. KERR: So it's a low flow associated, is that 9 right?

MR. REMLEY: That's right.

MR. MICHELSON: What is the significance of the dotted line from the INE converter to the digital converter? Just above your total dashed line, those are all dotted lines. That does that mean?

MR. REMLEY: That is being performed in hardware, not in software.

MR. MICHELSON: Performed in hardware. MR. REMLEY: If you look at the next two sheets, you will see a coding of all the symbols. I just attached them for information. That will explain all the symbols. MR. MICHELSON: I don't see the dashed lines on there, but ==

23 MR. REMLEY: Okay. I can't say that it is. I 24 haven't studied it well enough to know that it is. 25 MR. MICHELSON: What's an analog line mean from an

## INL converter?

MR. REMLEY: This signal here is now a voltage level signal.

MR. MICHELSON: 1 got it.

MR. REMLEY: The basic point I want to make reemphasizes that this is a relatively straightforward way of depicting how the function is going to perform within the equipment. So that we minimize confusion among the people doing the design and also within the verification process itself.

11

3

4

5

6

7

8

9

10

[Slide.]

MR. REMLEY: Another requirement that we built into the design when we move into the software itself -once you've established the specification, then you're in a position to define the requirements that you need for both the hardware and the software. We do that independ\_ntly in separate documents at that point called hardware design requirements and software design requirements.

19 These requirements at this point tend to be 20 functional in nature. They tend to be requirements about 21 the function of the software in the systems. But in 22 addition to that, we have additional requirements that are 23 mostly associated with the system high level requirements 24 that are the standards in the industry that you need to meet 25 for software for safety-critical applications.

What we've done is produced actually a document
which defines the software design constraints that we use
then to go ahead and program the system after we have
produred this document. I'll give you some examples of some
of these constraints.
MR. WILKINS: What's that 414 IPS?

MR. REMLEY: That was the design -- if I can go without putting the original slide back up -- maybe I can find it.

MR. WILKINS: That's all right.

MR. REMLEY: It'll only take me a second. A lot of the original design constraints were established in this program in the late 1970s. The system that we produced, this hybrid prototype was called the 414 Integrated Protection System.

16

[Slide.]

7

8

0

10

25

17 MR. REMLEY: In this document, we speak to a lot 18 of areas in the area of software design, areas where we feel 19 that you need to place special attention and impose 20 constraints so that you will get a design which is highly 21 reliable and verifiable. These are the areas where we have 22 addressed these constraints. I can pick out a few examples. 23 We've discussed interrupts already and the constraint 24 associated with the use of interrupts.

We need constraints associated with multi-

processing because we intend to use multi-processing in our solution. There are other constraints associated with areas like data bounding, and you need to do this to help your verification process so they know the limits on which they need to work to do the verification itself. So within the system software it bounds the data that it uses.

1

2

3

4

5

6

7 The concept of application versus system software, 8 we do not use commercially-available system software. We 9 use system software that we've developed ourselves and 10 verified ourselves, but we have a definite concept of how we 11 want to distinguish between the system software and the 12 application software and how those interfaces work. I will 13 talk to that a little bit more.

14 This gets into also the topic of code versus data. 15 Each one of these would be a long discussion. I'm just 16 putting this up right now to point out that in the process, 17 what we've done is we've established constraints in all 18 these areas to improve the reliability and the verifiability 19 of the software design.

20 MR. KERR: Do you use quantitative reliability 21 criteria?

22 MR. REMLEY: The basis of the integrated 23 protection system with respect to reactor trip is ten-to-24 the-minus-seven failures per demand. What we have to show, 25 I think, is that the software is not going to degrade that.

With respect to quantifying the software itself, however, we 1 have no program to do that explicitly. We do look at the operation of the software within the environment of the 3 protection system in our analysis. 4

MR. KERR: Thank you.

MR. LEWIS: When you say ten-to-the-minus-seven 6 7 per demand, demand is what?

MR. REMLEY: It's a demand on the system to trip. 8 MR. LEWIS: A trip demand. Thank you. 9 10 MR. CARROLL: That's the whole system? MR. REMLEY: That's right. That's the whole 11 12 system.

[Slide.]

2

5

13

14 MR. REMLEY: When you apply the constraints, you can come up with I guess a lot of solutions. However, the 15 solution that we've come up with is a fairly straightforward 16 single loop within all the processors. This is a depiction 17 18 of what happens in that loop. As you can see, quite a bit 19 of activity goes on at what we call restart time, and that is the application of power to the system or going and 20 manually resetting the microcomputer subsystem. 21

22 Then this process here is a loop that basically repeats forever, assuming that nothing ever fails or you 23 never reset it again. As you can see, it's a 24 straightforward operation. At this point, we run in either 25

a mode that makes this loop go at a fixed frequency, and we call that synchronization and it doesn't mean synchronization of the multi-processors, it means just making the loop run at a fixed frequency, or we can choose to have it run at a non-fixed frequency.

6 An example of where we have the loop running at a 7 fixed frequency is like in the reactor trip groups and the 8 engineered safequards groups because we need an exact time base to do the dynamic functions calculation, like lead-lag. 9 10 Where we run it at a non-fixed loop frequency is in areas 11 where we don't need to have this time base available to us 12 for calculations. An example of that is where we do trip 13 logic. We don't need a time base at that point.

14

1

2

3

4

5

[Slide.]

15 MR. REMLEY: As I said, the software is divided 16 into basically two types; one that we refer to as 17 application software, this is an application by a particular 18 subsystem that I showed you on the first cabinet diagram; 19 and, generalized types of software modules. These fall into 20 two categories. Computer services; for example, analog 21 input processing or a computer service type module; or 22 protection and control algorithms are also done in a 23 generalized way.

24 An example of that would be a lead-lag. This is a 25 protection and control algorithm. The interface then

between the loop that I showed you in the previous diagram, which is the implementation of the application code, is then with subroutine calls to these two types of lower level software modules.

5 So the application level software, the code is 6 very straightforward and, in fact, mimics to a large extent 7 the logic diagram that I showed you previously. So it's a 8 straightforward process, then, of verifying this particular 9 code against this particular diagram, which is the 10 application-specific function.

11

25

1

2

3

4

[Slide.]

MR. REMLEY: Associated with these particular modules, the services modules and also the control protection algorithms, is data and this data is of two types. One is calibration data and the other type is configuration data. I'll try to define the difference.

17 Configuration data is associated with the 18 configuration of the equipment. An example of configuration 19 data would be how many analog inputs in the subsystem, how many datalinks in the subsystem. Calibration data, on the 20 21 other hand, is associated with adjusting the function of the 22 system or tuning the function of the system. The gain on 23 the lead-lag or the reset on the lead-lag would be an 24 example of calibration data.

What we've done is we've partitioned the software

so that this data is in tables separate from the executable code. The reason we've done this is to improve the ongoing maintenance of the system, because we would like to verify these algorithms once very thoroughly and then use them in different applications to gain a broad base of experience.

So it allows us to use them in a broad sense and 6 7 it also allows us to do a very thorough verification job, and then apply them by using different configuration tables 8 9 without having to go back and recompile these modules. So the application ends up being the straightforward calls that 10 I showed you that are associated with the logic diagram and 11 then the generation of these tables which is associated with 12 13 the calibration and the configuration of the system.

14 Then this application software then interfaces to 15 the modules, which we do not have to recompile after 16 verification.

17

1

2

3

4

6

[Slide.]

MR. REMLEY: I'd like to talk a little bit about the process for the software. As I mentioned, once we have developed the specification which partitions the hardware and software, we proceed through the step of defining the explicit requirements for the software then, writing a specification for that software and then the code that does the implementation.

25

We follow a process for that which involves peer

team review. This is a detailed logic structure of how that process is done. You'll see that there are certain points where design reviews are conducted. This is not verification. This is design review, peer design review after the requirements stage, after the preliminary specification and then, at the end, before the release to verification.

1

2

3

4

5

6

7

8 MR. LEWIS: Just out of curiosity, do you have a 9 preferred language for code writing?

10 MR. REMLEY: We have selected PLM-86 as the high 11 level language. The reason we selected it was because of 12 the support that came with that language associated with the 13 microprocessor that we were using, which was the 886 family 14 of microprocessors. It is an acceptable language in that 15 it's structured and it supports the constructs that we need 16 to do the job and the way we defined the constraints.

MR. LEWIS: Does this mean in particular that you can't hire programmers off the street without then teaching them to write in that language?

20 MR. REMLEY: No. It's not a very difficult 21 language to learn.

22 MR. LEWIS: I know that, but it's not the one that 23 most programmers are brought up with.

24 MR. REMLEY: Actually, it looks a lot like Pascal.
25 MR. LEWIS: I know it does.

1 MR. REMLEY: There isn't a whole lot of 2 difference. I don't think there's a whole lot of 3 difficulty. MR. LEWIS: Yes. But when you say one language 4 5 looks a whole lot like another language, that's an introduction to mistakes in programming. Just curious. 6 7 MR. CATTON: That's Intel's language, isn't it? 8 MR. REMLEY: Yes. MR. CATTON: It's Intel's own language, I think. 0 10 MR. REMLEY: That's right. That's correct. 11 MR. LEWIS: That's right. It's just that most 12 kids who come out of school now knowing programming have 13 different languages in their background. 14 MR. CATTON: This PLM-86 is not a very common 15 language. MR. LEWIS: That was the point I was trying to 16 17 make. 18 MR. CATTON: And it's not as simple as you think. You probably have used it, so you think it's simple, but 19 20 it's not. 21 MR. REMLEY: I'm not sure I said it was simple. If I did, maybe --22 23 MR. CATTON: Even if you know Pascal, it's still 24 tough. 25 MR. REMLEY: I said if you knew other languages,

it was simple to pick up. What I was trying to say is if 1 you already know Pascal, then the transition to PLM-86 is not difficult. That's what I was trying to say. I wasn't 3 characterizing PLM-86. 4 MR. KERR: He didn't mean the typical ACRS member 5 could learn it, Ivan. 6 MR. CATTON: I know one of them who can. 7 MR. CARROLL: Or even the atypical one. 8 MR. CATTON: What operating system do you use, the 9 RMX? 10 MR. REMLEY: We do not use an operating system. 11 12 MR. CATTON: You don't. MR. REMLEY: No. That's what I was trying to say. 13 14 [Slide.] MR. REMLEY: This structure is the operating 15 system, if you will. It's the way the software works. It's 16 a simple chain sequence. So in that sense, we don't use an 17 operating system or this is the operating system, however 18 you want to look at it. It isn't an operating system in the 19 sense of RMX-86. It's not a multi-tasking, interrupt-driven 20 21 system. MR. CATTON: There's got to be something between 22 23 PLM-86 and chips. MR. REMLEY: No. What there is are these 24 25 generalized computer services that are modules written in

PLM-86 that are called by other modules written in PLM-86. They're subroutines to the application. Now, sometimes we have to write in assembly language. Some of these are written in assembly language because of the performance requirements --

> MR. LEWIS: Assembly for what machine? MR. REMLEY: Excuse me?

8 MR. LEWIS: Assembly language is machine-specific. 9 MR. REMLEY: It's ASM-86. Assembly language for 10 the 8086 family.

MR. LEWIS: I see.

6

7

11

MR. REMLEY: But we tried to stick to the high level language, unless there is reason why we can't use it and we have to justify it internally before we go to assembly language.

16 MR. LEWIS: But the problem of finding programmers 17 who can write in assembly for a particular machine is worse 18 in spades than finding people who can write in PLM-86.

MR. REMLEY: It is more difficult to write in assembly language than it is PLM-86, yes.

21 MR. CATTON: Lots of hackers can do it, but most 22 of the hackers don't understand PLM-86.

23 MR. LEWIS: That's correct. It's just that the 24 problem -- the reason for hammering this point is that the 25 problem of getting independent verification becomes much

more difficult if you're talking about obscure language ...

MR. REMLEY: That sort of brings me to the next topic, which is software verification.

MR. LEWIS: And at the risk of really being mean to you, you're not running out of time, but you're getting close to it.

MR. CARROLL: We started him late.

8 MR. LEWIS: I know that. I'm giving him a 9 warning, not a knife in the throat.

10 MR. REMLEY: Thank you.

11 [Slide.]

1

2

3

7

21

MR. REMLEY: I guess to make sure we understand what goes on in context with respect to the software, we're talking about a verification and validation process in our program that's associated with all these staps here; system steps, hardware steps, and software steps. They're really treated equally by our verification program.

18 We bring in different specialists in different 19 areas, but we treat them, at least in a high level sense, 20 the same way.

[Slide.]

22 MR. REMLEY: This liagram here then shows the --23 this is hard to read, I understand -- but it shows the 24 appropriate verification steps between the various design 25 activities. As I said, this includes the software, the hardware, as well as the system. So actually the software gets exercised in three different areas. It gets exercised in the software verification, it gets exercised in the hardware verification, and it gets exercised in the system verification.

1

2

3

4

5

6 MR. LEWIS: Again, to repeat a question that came 7 up earlier. When you use the term verification, what do you 8 mean by it?

MR. REMLEY: What I mean by verification is the 9 10 process of assuring that the requirements of one step have 11 been implemented by the following step. In other words, 12 it's a review to see that -- or test, it doesn't make any difference. It can be a review or a test to see that the 13 14 step going from one activity to the next activity has been 15 implemented successfully. So it's the step from going from 16 here to here, it's the step from going to here to here, or 17 from here to here. It's the step from going to here to 18 here, and here to here,

In this case, we do testing. These steps are basically done by analysis and review. These steps are done by analysis and tests, and I was going to talk about that in a little more detail in a minute. And then finally bringing the system together, you test to see that you've integrated it properly, but then the concept validation applies to the fact that you take the final product and basically in some

way compare it back to the original basis, which is the requirements, either by test or by analysis.

You can make this concept smaller than just the system. You can talk about validation in terms of the software versus its requirements, if you want to.

MR. LEWIS: I understand, but you're using the terms verification and validation almost interchangeably.

MR. REMLEY: No, I'm not. I don't think so.

9 MR. LEWIS: You're not. But in both cases, and 10 you'll tell me the difference in a moment, but in both cases 11 you're speaking of the performance of the system against the 12 specs under normal conditions. You're not talking about --

MR. REMLEY: No.

1

6

7

8

13

14 MR. LEWIS: You're not.

15 MR. REMLEY: Because the system requirements talk 16 about what the system should do under abnormal conditions. 17 It's part of the requirements of the system.

18 MR. LEWIS: I see.

MR. REMLEY: It's more than just normalconditions.

MR. LEWIS: But then there's a real problem in describing what you mean by abnormal conditions, because for any reasonable size computer system that have 100 inputs, you're certainly not going to explore the complete range of possible incorrect inputs for all 100 channels. I'm

inventing the number 100. I don't know what it is for your system.

1

4

5

6

7

8

25

MR. REMLEY: In the big picture of the program, the answer is yes. But trying to do it from the outside of the system looking in, I agree that that's not the intent, but the intent is to build up to a point where you can justify the number of tests you've run on the integrated system.

9 You do what you're requesting, but it's done in 10 the verification of the software modules. It's done in 11 conjunction with the fact that we have bound the data that's 12 in those modules. Remember I talked about data bounding. 13 One of the reasons we do that is that the software itself 14 bounds the data that it will use.

15 Therefore, we know the domain in which to run the 16 test because it's inherent to the software module.

MR. LEWIS: For example, the last speaker spoke of testing the system with randomly generated inputs. Do you do something like that as part of the validation and verification program?

21 MR. REMLEY: We tend to use engineered test cases 22 rather than randomly generated inputs.

23 MR. LEWIS: So that means it's easier to overlook
24 an unforeseen mode. That's a prejudicial comment.

MR. REMLEY: Yes.

MR. LEWIS: Please. I'm holding you up and I want 1 you to roll. 2 3 [Slide.] MR. REMLEY: As I mentioned, this may be a little 4 5 out of order. Some of the standards that we use as our reference in basis are listed here, and our requirements. 6 This addresses both the design and the verification 7 activities. 8 MR. SHEWMON: What is IEC? 9 MR. REMLEY: International Electrotechnical 10 Commission. This document has received a lot of work in the 11 international arena for software for safety systems. 12 MR. LEWIS: Please continue. 13 MR. REMLEY: My point is there are a lot of 14 15 standards out there already. 16 MR. CARROLL: Are they any good? MR. REMLEY: I think they are, yes. I think what 17 18 is required is, like I said, you have to look at the standards and you have to write down what you're going to do 19 based on these standards. I think that's an important step 20 to say this is the standard, unis is what I'm going to do. 21 22 So it's clear to everybody who is reviewing the program what the intent is. 23 MR. WILKINS: To what extent and how rapidly do 24

these standards become obsolete? The ANSI-ANS business in

25

1982 which is starting to be a long time ago. You don't have any dates on the others.

1

2

MR. REMLEY: I don't think they really -- I think 3 the technology will become obsolete faster than the 4 standards. I believe that it would require updating from 5 6 time to time. People gain more experience with the technology and I think you can go back and improve most of 7 8 these standards, but I wouldn't say that there's something 9 that should just be tossed away. Also, they have a lot of 10 good work in them, a lot of good thoughts.

11 The problem comes with standards on how literal 12 the standard is meant. I think the writers sometimes mean 13 something as a guideline and then it may be interpreted by 14 other people as being literal. This is where you get into 15 some difficulty. Then the standards writers try to improve 16 their wording. That's why I think you need a document that 17 explains your use of the standard.

Then it becomes clear how literally you have interpreted a particular requirement or have used it as a guideline and, if you've used it as a guideline, these are the conditions under which you will comply to the standard, but at least clarify the application of the standard.

23 MR. KERR: This would be an analog perhaps to the 24 Nuclear Regulatory Commission's use of regulatory guides to 25 explain regulations.

MR. REMLEY: Yes, I think so. (Slide.)

MR. REMLEY: When we get to the step between the code and the requirements and the specification, we use an approach for the testing which is a bottom-up verification testing approach. It's an approach which is trying to stretch the design over the possible ranges of use of the modules. So that then they can be used from the higher level with assurance that they'll operate properly.

[Slide.]

1

2

10

MR. REMLEY: Graphically, the way we go about this is with a set of tools and approaches that are represented in this model. The model contains two dimensions. One is a manual and automatic dimension, and the other one is a static and dynamic.

Manual and automatic means how much automation there is associated with that particular activity on the part of the verifier. Static and dynamic means whether the code is, say, just sitting on a piece of paper there or is actually executing in a computer system. So what we've done is we have sort of a multi-dimensional attack in all these areas and we do all these activities with the code.

23 We have also developed some tools for aiding in 24 that process and these tools are down under the automatic 25 end. The tools are listed. I think this is effective for

the modules themselves. As I've mentioned, we do tend to want to build the system out of a lot of verified knowledge.

It tends to lose its effectiveness when you get into an integrated system because this doesn't deal with the interfaces yet. The issue in the integrated system tends to be the interfaces. For that, the system level testing is what addresses that.

(Slide.)

1

2

8

9 MR. REMLEY: The final point that I wanted to 10 cover is the software security requirements. The first 11 point to understand is that the embedded code in the system 12 is all resident in prong. So that it maintains its 13 integrity over loss of power and can be restarted without 14 any intervention into the system.

15 Then associated with that there is the periodic 16 testing, the periodic functional surveillance testing of a 17 safety system which assures that the software hasn't changed 18 or at least can perform its safety function. There are 19 built-in diagnostics, which, in our design, include software 20 keying. We actually have an ID associated with every 21 subsystem that's built into a hardware key that's then read 22 by the software, and then the software also has to have the 23 matching part of the key embedded in its prong.

24 So what this assures is that you cannot locate any 25 software subsystem in the wrong physical location in the

cabinets. The actual prongs have embedded checksums which are continually checked to see that the -- you can read them correctly and that the content hasn't changed.

1

2

3

4 As I mentioned, we make extensive use of read-only 85 memories. There is limited physical access to the systems. We accommodate door locks and, as a matter of fact, on the 6 7 Sizewell design they have a very elaborate door-locking 8 system called the Fortress Interlock System, which only 9 allows access to one what we refer to as channel setter 10 train at a time. It's actually a key interlock that only 11 allows you access to one type of key at a time.

12 We've designed it so that we only need limited 13 physical access to the equipment in any event. We don't 14 need to be tuning the system as much as we did with analog 15 systems. We have the integrated surveillance testing. So 16 we've limited the need for physical access and we don't have 17 any need for software access in the sense of reprogramming.

We do allow a few data items to be changed in situ and those are limited to an exact data item that we've predetermined that will be changeable from the point of view of system calibration in situ. But a lot of the protection setpoints are not in that area. This tends to be -- like the calimetric scaling is one of these types of numbers.

24 That concludes what I had to say. Do you have any 25 more questions?

MR. LEWIS: We've been asking questions as we went along So we thank you very much for staying almost on schedule. My agenda says the we're now going to hear from Tim O'Neil of GE, is that correct? Is this closed?

1

2

3

Å.

5 MR. SIMON: No. I have no problem with leaving it 6 open.

7 MR. LEWIS: That would be great. Thank you. In 8 that case, we are yours.

9 MR. SIMON: He is giving a presentation to Dr. 10 Murley today back in San Jose. My name is Barry Simon. I'm 11 the Lead System Engineer for Safety System Logic and 12 Control, which is our digital protection system. This 13 presentation is from the standpoint of our digital safety 14 systems design. I fully agree with the panel and with the 15 other speakers on the fact that the hardware cannot be 16 separated from the software.

So I will first present the general layout of our
system architecture. I will try to make this faster because
yesterday I made everybody go hungry by running over. So
I'll try to finish earlier.

This first slide just really says that computers will do good things for you, which I think we've established already.

24 MR. CARROLL: Why doesn't anybody ever present a 25 slide that describes the bad things computers will do for

you?

1

2

3

4

MR. SIMON: We only talk about that privately. MR. CARROLL: I see.

[Slide.]

5 MR. SIMON: If you're not careful, they will. One of the main benefits is that we're trying to reduce, as a 6 7 practical matter, panel volume in the control room. So 8 everything is being performed in microprocessor. essentially. Distributed processing we use for distributing 9 10 the intelligence to various points throughout the plant from 11 the reactor building to the control room and in separate 12 processors.

13 We're using multiplexing to cut down the guantity 14 of cable in the control room and throughout the plant, and fiber optics to reduce EMI effects in general and to have 15 16 high speed data processing and much smaller diameter cable, 17 much lighter weight. Continuous self-tests and fault 18 localization is one of our key benefits in reducing common 19 cause failure and in increasing availability by reducing 20 meantime to repair.

The use of micro-electronics allows standardization, which is a key item in inventory control. Also, to reduce maintenance time, surveillance time offline, we have computerized test equipment. The added functionality refers to the improved man-machine interface

and improved algorithms that we can implement and improved
 displays for the operator, for instance.

In general, reduce the overall burden on the operator and allow the use of touch panels, electronic switches, and advanced controls to benefit the operator.

6 MR. KERR: Any significant difference between this 7 and the earlier systems in terms of sensitivity to an in-8 plant fire?

MR. SIMON: To fire?

10

9

MR. KERR: Fire.

MR. SIMOII: The distributed processing limits the effects of fire. The fact that this is a -- we use multiple redundancy, four divisions separated system to reduce the effects of fire. It's entirely a two-out-of-four system. Each division can be bypassed individually and an entire channel can be lost.

MR. KERR: That could be said for a pre-computer channel, couldn't it? I was just curious as to whether there's some weak point that this might have that nondigital systems don't have or some strong point.

21 MR. SIMON: Of course, weak point could be if 22 erroneous signals were generated because of something like 23 overheating, something due to fire. The use of fiber optics 24 is a great benefit in reducing and having more complete 25 isolation.

MR. KERR: Thank you.

1

3

MR. SHEWMON: Are the fiber optics designed to take any higher temperature than the copper wire cable?

MR. SIMON: The fiber optic cable will have equivalent insulation to copper cable, and it can also be -you might also use armored cable. Are you referring to the heat effects on the glass fiber itself?

8 MR. SHEWMON: Yes. I would think those would be 9 less than on the copper, though I don't know how they 10 deteriorate with rising temperature. Certainly burning off 11 the insulation doesn't inherently destroy the optical 12 fibers' capabilities.

MR. SIMON: Of course, when you deform the coherent structure of the fiber, you lose the data or correct the data. But there's data transmission checking to take care of that. The main benefit is that you don't have the short grounding and hot short problem.

18 MP. CARROLL: You did say yesterday that 19 ultimately you also have continued with the hard wiring of a 20 manual scram.

21 MR. SIMON: Yes, definitely. That's the diverse 22 backup to unis whole system. The entire digital protection 23 system can go down and you would still have your manual 24 scram capability. Plus, through the remote shutdown system, 25 you have remote shutdown cooling capability because all of ECCS is there.

25

[Slide.]

3 MR. SIMON: If time permits, I'll also show you, 4 in addition to ABWR, the SBWR which is not in your handout. 5 I have additional material. In the SBWR, the simplified 6 boiling water reactor, we have tried to implement 7 simplification in the protection system design also, because 8 most of the safety features are passive in that plant.

9 Safety system logic, our digital protection system 10 integrates all the safety features, both scram and the ESF 11 runctions, ECCS and all the auxiliary support functions. It 12 is in today's terms an embedded real time data acquisition 13 and control system.

14 The fault tolerance is achieved on the highest 15 level through the four-channel two-out-of-four voting. Then 16 within each division, there's further redundancy, which I 17 will show shortly. The control room logic is coupled to the 18 reactor building multiplexers through the essential 19 multiplexing system, which is independent within each or the 20 protection divisions and which is also redundant.

All of our output switching is solid-state now also. We've had experience with the solid-state power switching since the Clinton solid-state design, which was not microprocessor controlled, but was all solid-state.

[Slide.]

MR. SIMON: Just as a quick overview, I presented this slide yesterday, also. This is the basic architecture of the system. The traditional sensor input are the usual analog sensors. Digitized at the remote multiplexer units, this is located in the reactor building in what we refer to as clean areas, temperature controlled and no radiation.

7 That data from the common network goes to the 8 control room SSLC logic where there are three separate 9 trains of logic. The top one, the top single train 10 dedicated to the fail safe functions, reactor protection 11 system and main steam isolation valve. The other two trains 12 are engineered safety features.

We've divided those so that a loss of any one of the trains will not disable the entire decay heat removal mechanisms. These are also multiple in the four divisions. There are three divisions of ESF. Additional redundancy within each division or the dual processing at the cutput to prevent software or hardware failure from producing a trip, an initiation signal of ECCS, for instance.

This is two-out-of-four within each processor and there's two-out-of-two voting. So you have to have both channels in complete agreement in order to get an actual initiation signal at the output. We're dual through the entire multiplexing system. This section is out in the reactor building, also. So we have two trains just like

that.

| 2 | ESF functions are multiplexed out, as was                   |
|---|-------------------------------------------------------------|
| 3 | mentioned before. Fail safe functions, scram and main steam |
| 4 | isolation valve closure are hard-wired. In addition to the  |
| 5 | manual scram which simply opens the power source for the    |
| 6 | valve solenoids, there is a manual divisional trip which is |
| 7 | separate from the microprocessor control devices. So all    |
| 8 | this could go down or be bypassed and you would still have  |
| 9 | manual trip capability.                                     |

10 MR. MICHELSON: Could you, for clarification, just 11 identify again the physical location of these blocks in the 12 plant?

MR. SIMON: Right. There are four -- around the 13 reactor -- in the reactor building, there are four 14 15 established rooms. The emergency electrical equipment 16 rooms, which have motor control centers and switch gear, which will also contain these remote multiplexing units. So 17 the sensors are wired from their locations near the vessel 18 19 to the rooms thereby where the analog-to-digital conversion 20 occurs.

The fiber optics, then, these dashed lines, are then run to the control room in the control building, which, in the ABWR design, is an entirely separate building from the reactor building. The control room multiplexing unit then is physically in the control room; simply is the

reseiving end of the data transmission path.
All of this equipment is in a single panel in each
division. There are four of these four panels, one for each
division, all in the control room.
MR. MICHELSON: They're all in the same room.

9 control console. They're separated -- they're in the same
 10 room.
 11 MR. MICHELSON: But all of those panels are in the
 12 same physical location, no physical barriers between the

MR. SIMON: These are back row panels from the --

MR. SIMON: They're in another room from the main

MR. MICHELSON: In the control room proper?

13 panels.

6

7

8

14 MR. SIMON: There's physical -- the four panels
15 are physically separated.

16 MR. MICHELSON: Four rooms?

17 MR. SIMON: But not in four rooms.

18 MR. MICHELSON: Physically separated means what, 19 they're a few feet apart?

20 MR. SIMON: Several feet apart within the control 21 room. On the output end, then, the control room multiplexer 22 units are still in the control room, but in a second panel. 23 There are fiber optic datalinks out back again to the 24 reactor building. These remote multiplexer units are simply 25 near these in the same place. Then the traditional contact closure outputs and inputs are wired to the motor control
 centers.

MR. MICHELSON: The RMUs are adjacent to the RMUs or in those four rooms that you talked about where the multiplexers were located?

6 MR. SIMON: They're in the four rooms where the 7 multiplexers are located.

8 MR. MICHELSON: Not where the RMUs are on the9 lefthand side of the drawing.

10 MR. SIMON: No. These are the same areas.
11 MR. MICHELSON: Same areas.

MR. SIMON: Same areas out in the reactorbuilding.

MR. MICHELSON: Same room.

14

25

MR. SIMON: Right. In the same rooms. They could 15 16 even be the same units, but for reliability we're separating them. So the sensors are input to separate units from the 17 outputs. Because of the sensor reduction in the ABWR, which 18 19 has two-thirds fewer sensors than a traditional plant, almost all the inputs to the system are actually contact 20 closures from the motor control centers, valves, pumps, the 21 22 interlock signals. You have limit switches, torque 23 switches, position switches. That's by far the greatest input to the system. 24

Others are thermocouples and various devices for

each system. But the actual critical safety sensors are 1 very few, probably less than dozen. 2 MR. MICHELSON: How are the RMUs protected against 3 picking up upon faults, picking up higher voltages from, 4 say, a motor control center? 5 MR. SIMON: The cabinets are shielded and 6 7 grounded. MR. MICHELSON: That won't stop 120 or 240 or 440 8 coming in on the wire, though, to the RMU. 9 MR. SIMON: We have surge protection for --10 11 MR. MICHELSON: That's in the RMU? 12 MR. SIMON: In the RMU, yes. Of course, the fiber 13 optics limits further propagation. MR. MICHELSON: What kind of circuit protection do 14 you put in to prevent a fault coming into the RMU? Say a 15 16 440 volt fault? MR. SIMON: There is surge protection on the input 17 line. 18 19 MR. MICHELSON: On each of those sensing --MR. SIMON: Yes. The RMUs actually -- it's a 20 21 combination. All of the RMUs are DC powered. MR. MICHELSON: That takes care of it. 22 23 MR. SIMON: That's one thing. 24 MR. CATTON: Will they stop lightening? 25 MR. SIMON: Will they stop lightening? Once.

MR. MICHELSON: They will not stop surges, either,
 by the way.

MR. CATTON: I had the usual surge protection that you buy at the local computer store and lightening burned out the back end of my computer anyway. Went right over the top of it.

7 MR. LEWIS: Lightening is hard to stop. MR. SIMON: Yes. We had experience with 8 lightening at the Grand Gulf Station in Mississippi 'a the 9 10 summer, where they have a lot of lightening. That plant gets a lot of direct hits which has damaged electronics. 11 12 But it's always been pretty localized and hasn't propagated very far. But it has to do with the plant grounding system, 13 too. The plant happens to have a very poor ground. You 14 15 nave to have the proper overall lightening protection.

[Slide.]

16

MR. SIMON: On reliability, I will just do this very shortly in order to get on to the software reliability. This is the hardware reliability I talked about yesterday. We have the defense-in-depth with protection systems separated from control systems because for the BWR, they're actually is no interface between the two.

We only send a reactor pump trip, recirc pump trip a signal from the safety system to a non-safety system. After a scram, there is a control rod run-in signal that's sent.

But there are no signals that go from the control system to 1 the protection system. And the redundancy, selfdiagnostics. 3 The continuous operation refers to each division 4 runs entirely independently and asynchronously. 5 MR. KERR: What sort of liability standards do you 6 use for your control system? 7 MR. SIMON: The reliability is essentially equal. 8 MR. KERR: Equal to what? 9 MR. SIMON: To the protection system. It's really 10 availability that we're going for more in the redundant 11 safety system designs. 12 MR. KERR: Reliability doesn't include 13 availability? 14 MR. SIMON: Well, certainly, but not always. For 15 instance, the redundancy that I showed you with the two-16 out-of-two decreases availability, but it's reliable because 17 the difference in the channels will actually --18 MR. KERR: It's reliable in preventing false 19 scrams, but it isn't as reliable in producing scram. I'm 20 sorry, it isn't. If you have to have two-out-of-two instead 21 of -- or two-out-of-four instead of one-out-of-four, you 22 could miss some scrams, in principal. 23 MR. SIMON: In principal, you could. That's true. 24 I forgot to say we have included a bypass for that, also. 25

1 So that one channel can be temporarily bypassed so that the remaining good channel could give you the correct trip while 3 the other one is being repaired.

2

4 MR. LEWIS: What does it mean to say worst case design including environmental effects? Does that mean a 5 combined magnitude earthquake and a 300-mile-an-hour tornado 6 7 and lightening at the same time?

8 MR. SIMON: Those global effects are considered in the common mode failure design. But the worst case design 9 refers to the power supply range, worst case range you 10 11 expect for over-voltage and under-voltage and current. It 12 does refer to the range of environmental conditions that you 13 would expect, including accident conditions.

14 MR. LEWIS: Expected range or the worst case 15 range?

MR. SIMON: Worst case range, loss of HVAC. 16 MR. LEWIS: I won't ask you to define worst case. 17 18 MR. SIMON: All our protection system equipment is mild environment. They're all in controlled environments 19 with safety-related HVAC. The rest is use high reliability 20 parts and, of course, qualification and then the integration 21 during the V&V that I'll talk about. 22

23 Part of the basis of software reliability for the 24 BWR is the simple repetitive or deterministic, as the previous presentation said, nature of the functions. 25

Essentially there are actually no calculations performed, but it's simply continuous reading of your sensor levels, determining whether they pass the trip thresholds. It is then just determined whether any two out of the four channels have passed the trip threshold.

6 You determine which of the sensors that involves 7 and that causes your trip. There are no actual calculations 8 performed.

9 MR. LEWIS: Is that regarded as a safety asset to 10 not do mathematics? I'm trying to understand.

MR. SIMON: I'm just saying it makes verification
 and validation of the software simpler potentially,

13 MR. LEWIS: I'm not quite sure what is meant by 14 not math intensive. That means you can add, but not 15 multiply or what?

16MR. SIMON: That means we do just add.17MR. LEWIS: You just add.

18 MR. SIMON: We don't perform complex algorithms.19 That's what it really means.

20 MR. CARROLL: I think the distinction, Hal, is 21 that PWRs in terms of some of their trip functions, like 22 DNBR and that sort of thing, have to do some calculations to 23 see where they are, whereas a boiler, whether it uses 24 digital or analog systems typically are you just reach a 25 trip point and that's it. MR. LEWIS: I appreciate that, Jay. I'm just trying to freeze in my mind the concept that doing mathematics is unsafe.

MR. SIMON' No, of course not.

4

5 MR. CARRO'L: I don't think that's what he's 6 saying. I think he's saying it's easier to check something 7 that's on-off than it is to check something that involves a 8 calculation.

9 MR. CATTON: I think it depends who does the 10 mathematics.

MR. LEWIS: I'm hardpressed to really agree with that, but please go on.

MR. SIMON: I'm just saying that we think it's just simply easier. Of course, you can make the -- you can do all kinds of complex calculations and still make it safe and reliable through the V&V program. It's just that we think it's -- we're doing the V&V on much simpler code. That's all I'm saying.

MR. LEWIS: You're talking to somebody who had a delegation of students showing up in his office two weeks ago to say that a problem that had been assigned which required converting from meters to feet was impossible because the book gave a t-le which allowed them to convert from meters to yards, but they didn't know how to convert from yards to feet

MR. CATTON: It must have been Santa Barbara
 physics students.

MR. LEWIS: This was -- why don't you proceed here? Please proceed. Tummy grumblings will appear fairly scon, so you better speed it up.

[Slide.]

7 MR. SIMON: We also modularize the software within 8 all those separate boxes I showed you. I should point out 9 that the multiplexing is not intelligent. It's really a 10 dumb multiplexer. It simply does the A-to-D conversion and 11 transmits the message with a time-tag data to the control 12 room.

13 Mainly for maintainability purposes, all the logic 14 is performed in the control room where if something goes 15 wrong, you simply replace a card within one of the boxes 16 very guickly.

17

6

[Slide.]

18 MR. SIMON: Functional segmentation refers partly to the various trains within each division, where each of 19 20 the systems, fail safe and not fail safe, and each of the 21 different systems has its own software in separate modules. Full operating system not required means that we would use a 22 23 non-formal operating system, as we mentioned previously, but 24 just a real time -- essentially a real time for controlling 25 the scheduling. Just putting the modules together and then

performing them, performing the functions. That's all
 backed up by the V&V program where I --

3 MR. MICHELSON: Before you leave the hardware, I 4 wonder if you could clarify something you said during our Subcommittee meeting. If it's proprietary, fine, don't 5 answer. You indicated the type of components you were using 6 7 for the hardware; namely, the specifications that you were using. Would you care to reiterate that? I guess you know 8 0 what I'm talking about. The temperature sensitivity of the equipment was discussed and you indicated the temperature 10 11 rating of the components and the spec to which they were bought. 12

MR. SIMON: You mean the Mil Standard, the MilSpec.

MR. MICHELSON: That's right. I didn't know whether that was considered proprietary that you were using Mil Specs or not. Could you tell the Committee the temperature -- I missed it.

MR. SIMON: As I said yesterday, we do specify Mil
 Standard 883(c) processing on all our components.

21 MR. MICHELSON: What was the temperature rating? 22 MR. SIMON: For the hermetically sealed parts, 23 we're using 125 C.

24 MR. MICHELSON: 125 Centigrade. Okay.
 25 MR. LEWIS: I noticed earlier you had Mil Spec.

Mil Spec also has minus 55 in it. Do you adhere to that? 1 MR. SIMON: We certainly hope so, except in Minnesota maybe. 3 MR. LEWIS: Is there a safety loss if you go to A minus 50 instead of minus 55? 5 MR. SIMON: No. In fact, the components may run 6 better when they're much colder. In fact, they're putting 7 out small refrigeration units now for semiconductors to make 8 them run faster. 9 MR. MICHELSON: I thought you also indicated that 10 these components were adequate for inside of containment in 11 hermetically sealed boxes without extra cooling needed. Is 12 that correct? 13 14 MR. SIMON: Certainly. MR. LEWIS: The reason for the Mil Spec minus 55 15 is for high flying aircraft. You're systems aren't going to 16 be in high flying aircraft. 17 MR. SIMON: We hope not, unless the plant takes 18 off. 19 MR. LEWIS: I am going to ask you to speed up, 20 though. 21 22 [Slide.] MR. SIMON: From the very top level through the 23 detailed design, that V&V simply outputs at various points 24 in the design to the V&V procedure. I will quickly present 25

-- I divided this into non-safety and safety just to show 1 the major difference. This relates to non-safety verification, which is simply a series of design reviews. It's essentially the same pattern that has been presented before, separate from the top spec, the hardware/software 5 spec, and the separate hardware and software development 6 down to final integration.

2

3

4

7

For non-safety, we have a series of formal design 8 reviews, but not the form sequential verification process. 9 10 [Slide.]

11 MR. SIMON: This is the same diagram as the previous one except for the added series of verification 12 steps performed in each part of the design and at each 13 14 software stage. We're separating V&V from actually the hardware steps which still go through the design reviews, 15 but, for safety-related components, use the qualification 16 process up until the integration stage. 17

18 Validation is the final step which was called the simulation test in the non-safety, but is the validation of 19 the entire system to your system specs, your design and 20 21 functional specs.

22 MR. CARROLL: What are the groundrules for 23 somebody that does design review as opposed to somebody that does V&V activity? Can a person doing a design review work 24 on the design? 25

MR. SIMON: No. The rules are somewhat the same, except the design review person would not necessarily have to be as knowledgeable of things like the software; ability to read the code, for instance. But the design review panels are simply independent review boards that are capable of understanding basically what the design is.

7 The verification people would have to go into much 8 more detail to totally verify that the previous steps had 9 been performed properly and that you were at a level where 10 you could proceed to the next step.

MR. CARROLL: But both kinds of groups are
 independent of the people that actually did the work.

MR. SIMON: Yes. Through our internal guidelines.
They are defined as being separate from the people who
actually perform the implementation.

MR. CARROLL: In the QA world, we always talk about the pressures of production on the management person to whom a QA organization reports and so forth. Is there any distinction in your scheme of things in terms of who the design review people can report to versus who the verification people can report to?

MR. SIMON: No. There is some limited rules to that effect in that your reviewers are not supposed to be your management or your subordinates, but peers from other groups or management from other groups. That's the only

real limitation. Unless only those other people by
 justification are fully -- are the only ones that can do the
 work, that can do the review.

[Slide.]

4

5 MR. SIMON: The review process; the V&V program 6 I've summarized is the one we submitted to the NRC for GE's 7 present line of safety-related controllers, the NUMAC line 8 of controllers. This would be extended to ABWR development. 9 The informal reviews are not documented necessarily.

10 They're just the day-by-day design process.

Independent design verification are those steps on the previous slide during the sequence of design. Now, the baseline reviews are formal reviews for establishing performance to the plan, essentially, the software management plan and to ensure compliance to performance specs to the high level code.

This review would ensure that the detailed code conformed to the high level code. These reviews are to review the actual test methods and review methods. They're always formally documented. The other part are just the formal testing steps, down through formal testing, release and methods for changing software, controlled changes of software by repeating various parts of the validation.

24 MR. SHEWMON: When you change the software in a 25 plant, how is that done? Send a disk, send it over a

telephone line?

1

2

3

7

MR. SIMON: No.

MR. SHEWMON: Send PROMS?

MR. SIMON: No. We've also done the changes at the factory and then sent the PROMS to the site or installed them ourselves. But it is in all in PROM.

[Slide.]

8 MR. SIMON: Sabotage protection, I will just 9 summarize very quickly. We have the usual physical security 10 through the secured control room, the locked panels. In the 11 reactor building, that's considered a vital area that also 12 has card reader access to all the separate rooms.

MR. MICHELSON: How do you protect the PROMS before they ever get to the plant if all the changes are made back at the factory, if I understood what you said earlier correctly? Is that true that you make any software changes back at the factory and the PROM is --

18 MR. SIMON: That is correct.

19 MR. MICHELSON: -- sent to the plant?

20 MR. SIMON: That's correct.

21 MR. MICHELSON: So how do you protect them at the 22 factory and during shipment and so forth?

23 MR. SIMON: Well, the software is --

24 MR. MICHELSON: Somebody can -- how much equipment 25 would it take to program a PROM in a little different way as

opposed to going it at the factory? Probably very little. 1 MR. SIMON: The safety software in PROM --MR. LEWIS: They cannot be redone. 3 MR. SIMON: They cannot be redone. MR. LEWIS: We programmed that --5 MR. SIMON: Right. It would be difficult because 6 of the type of checking that actually the other presenters 7 have mentioned, the checksum, CRC, that are stored. 8 Somebody who changed it would also have to know to change 9 10 that or may not have access to being able to change the 11 checksum. That would be discovered as soon as it was 12 installed to a system. MR. LEWIS: I'm suddenly confused because what 13 Carl was asking, as I understand it, was what happens after 14 15 they're originally programmed. 16 MR. SIMON: He's talking about physical 17 protection, I believe. 18 MR. MICHELSON: At the factory and during 19 shipment. But during shipment is a non-problem, I gather, unless you have sufficiently sophisticated equipment, but 20 21 how about back at the factory.

22 MR. LEWIS: No. It's not a matter of sufficiently 23 sophisticated equipment. There are two kinds of PROMS. 24 There's the kind you can reprogram after they've been 25 programmed and the kind that you can't. I thought he said

1 you physically cannot.

2 MR. SIMON: And these PROMS you could easily erase 3 and reprogram. That would be bad, of course. 4 MR. MICHELSON: So shipment is a non-problem. Only back at the factory do you have a security problem. 5 MR. CATTON: You'd just replace it. 6 7 MR. MICHELSON: But you've got to make one. 8 MR. CATTON: But making PROMS is easy. 9 MR. MICHELSON: If it's easy --10 MR. LEWIS: You can replace it. 11 MR. CATTON: You just swap it. That's quicker 12 than trying to --MR. MICHELSON: Then you worry about shipment, as 13 14 well. 15 MR. CATTON: What kinds of checks do you go 16 through once you get to the plant with these problems? MR. SIMON: Right, Yes. They cannot be just 17 immediately installed in the system, but have to be --18 MR. CATTON: Do they fully check out the code 19 20 that's on them and everything else? 21 MR. SIMON: Off-line. They have to be checked off-line in the actual piece of equipment, which all have 22 23 built-in surveillance testing. You can run them. MR. MICHELSON: But you must be able to change 24 25 them because that's what you do at the factory, is you make

changes.

| 2  | MR. CATTON: They make a new one, Carl.                     |
|----|------------------------------------------------------------|
| 3  | MR. LEWIS: They make a new one, that's right.              |
| 4  | MR. MICHELSON: But that's how they make changes.           |
| 5  | MR. CATTON: They burn it into a new one.                   |
| 6  | MR. MICHELSON: Yes, but they've got if you                 |
| 7  | want to make changes, you've got to make a new one.        |
| 8  | MR. SIMON: A plant under its own security system           |
| 9  | could burn its own PROMS if it had administrative controls |
| 10 | to do that.                                                |
| 11 | MR. LEWIS: Yes.                                            |
| 12 | MR. SIMON: But that would be fully under the               |
| 13 | responsibility of the plant, then.                         |
| 14 | MR. MICHELSON: But when the factory sends a new            |
| 15 | one that has some changes on it, which I think it can do,  |
| 16 | then how do you know whether the changes are acceptable or |
| 17 | not at the plant?                                          |
| 18 | MR. SIMON: Of course, they are tested against a            |
| 19 | revised specification.                                     |
| 20 | MR. MICHELSON: So somebody has to send them some           |
| 21 | kind of a document that says it's changed and then you     |
| 22 | recheck, and that can be sent independently of the PROMS,  |
| 23 | hopefully not together.                                    |
| 24 | MR. SIMON: Right.                                          |
| 25 | MR. MICHELSON: The only place then would be the            |

1

6

factory where you'd really have to be secure.

2 MR. LEWIS: I doubt that anyone at the plant could 3 do a complete check of a PROM. So you depend on it being 4 done at the factory and that physical security of the PROM 5 in transit.

MR. MICHELSON: That would be best.

7 MR. SIMON: I think the rest of it is all self-8 explanatory. I'm probably finished.

9 MR. LEWIS: In that case, I thank you for 10 accelerating the process a little bit. Therefore, we will 11 have our lunch break and come back at a quarter after one. 12 The next item on our agenda is EPRI. Does EPRI have to be 13 closed? EPRI will have to be closed. So the first object 14 after lunch will be closed session.

15 [Whereupon, at 12:20 p.m., the Subcommittees were 16 recessed for lunch, to reconvene this same day at 1:15 p.m.] 17 18

20

21

22

24

23

## AFTERNOON SESSION

1

[2:00 p.m.]

| 3  | MR. LEWIS: Let's get started. We are now on the              |
|----|--------------------------------------------------------------|
| 4  | record. Our next speaker is Leo Beltracchi, is it?           |
| 5  | MR. BELTRACCHI: Yes, my name is Leo Beltracchi.              |
| 6  | I am a member of the Research Staff, Human Factors Branch.   |
| 7  | I will be discussing the research activities, both our       |
| 8  | current and future programs on use of digital computers in   |
| 9  | nuclear power plants.                                        |
| 10 | [Slide.]                                                     |
| 11 | The starred items here generally cover programs              |
| 12 | that we have or information that we have with regard to      |
| 13 | safety applications, not all of them but most of them are.   |
| 14 | We also have some other applications that deal with the man- |
| 15 | machine interface. I just want to point this out, the        |
| 16 | cognitive aspects of the man-machine interface and           |
| 17 | performance measures, and example of which was published in  |
| 18 | NUREG/CR-5348, man-machine interface issues in nuclear power |
| 19 | plants.                                                      |
| 20 | That covered a workshop where we pooled many                 |
| 21 | experts in the area. The object of the workshop was really   |
| 22 | to propose experiments and guidelines, and we currently are  |
| 23 | in the process of pursuing some of the proposed experiments  |
| 24 | in that area. I have a few of these reports with me if you   |
| 25 | are interested. The basic talk will address the starred      |

133

Martin .

items, and let me address the first one on that now.

[Slide.1

1

2

9

We have had contact with several foreign countries. For example, we have talked with AECB in Canada on the Darlington experience.

6 MR. CARROLL: When you say we, will you tell me 7 about the team and its composition with respect to nuclear 8 engineers and computer scientists.

MR. BELTRACCHI: Okay.

10 MR. LEWIS: You learn fast, Jay.

MR. BELTRACCHI: Let me address that in the very first one in Darlington. We went up to see AECB, Curt Azmis specifically, about a year ago. The staff was supported by Oak Ridge, personnel from Oak Ridge National Laboratory who were experts in software engineering. We also had members of NRR come up on that visit as well.

MR. CARROLL: Was a multi-disciplinary group --17 18 MR. BELTRACCHI: Yes, you could characterize it as 19 a multi-disciplinary group. We discussed the Darlington experience with Curt Azmis and we also met with his 20 21 consultant, David Parnas and discussed in detail the problems that they had in the review of the code, discussed 22 23 why they had to go into reverse engineering. Let me quickly define that. They had a problem in trying to understand 24 whether the code met the specification, and they couldn't 25

quite determine how to approach that. They eventually went to a method utilizing reverse engineering.

1

2

They used actually, function tables that were 3 developed by Parnas to identify all the functions in the 4 code, compare those functions to the functions that were 5 called out in the specification and, of course, there were 6 more functions in the code than there were in the 7 specification. The excess functions were unintended 8 functions, and the licensee had to address the safety impact 9 of the uninterded functions. That they actually did do, and 10 those issues were resolved. 11

12 MR. LEWIS: So, the real honest to golly computer 13 scientist in this operation was Parnas?

14 MR. BEITRACCHI: That is correct.

15 MR. CARROLL: No one on your team was a computer 16 scientist?

MR. BELTRACCHI: I can't speak to the extent that they were a recent graduate from a computer -- say from a college accredited course, but we did have computer people from Oak Ridge National Laboratory support the audit.

21 MR. LEWIS: You described them before as software 22 engineers.

23 MR. BELTRACCHI: Yes.

24 MR. LEWIS: That's different from computer 25 scientists. I think we are just trying to find out who the

players are in this game.

1

25

programs.

MR. BELTRACCHI: The person from Oak Ridge was Mr. 2 Ned Clapp, and he has been involved with software --3 MR. LEWIS: That doesn't help me. 4 MR. BELTRACCHI: -- for a rather lengthy period of 5 time. 6 MR. LEWIS: What is clear is that you, yourself, 7 weren't all that concerned in detail about what their 8 background in computer science was. 9 10 MR. BELTRACCHI: Yes. MR. COFFMAN: Frank Coffman, Research Staff. This 11 visit and the other visits that true place up there was to 12 find out and learn from their experiance. There was no 13 audit, there was no inspection, there was no review in the 14 typical regulatory sense. The people who went up there 15 ranged everywhere from Commissioner Rogers to Jim Snezik, 16 Leo Beltracchi, Jay Persinski, myself, Joe Joyce, Jim 17 Stewart has been up there, consultants 18 There was a wide range of background, none of 19 which am I aware have a computer science degree from an 20 accredited college. We are looking at the regulatory issues 21 associated with the use of advanced systems and trying to 22 let those issues drive what we need in the way of regulatory 23 programs, and in terms of regulatory staff to support those 24

So, the direct answer is no, I don't know of any 1 2 computer science person. I'm not sure that's exactly what 3 we need yet because we haven't clearly defined the regulatory issues or having completed our definition of 4 5 regulatory issues from the research side of the house. 6 Maybe the regulatory side would like to address it. I am 7 just trying to give you a clear picture of where we are and what the context of the visit was. 8

9 MR. LEWIS: Thank you.

MR. KERR: What is a regulatory issue in this context?

MR. COFFMAN: One of the things that we did was to conduct a survey, and it identified something like eight regulatory issues. I think you probably only need one as an example.

16 MR. BELTRACCHI: It's the cost verification 17 validation.

18 MR. COFFMAN: Right, at what point do you draw the 19 line in expending resources on verification and validation 20 when you end up making a judgment at the end anyway.

21 MR. LEWIS: That's bearing in mind that we mean 22 different things, each of us, by verification and 23 validation.

24 MR. SHEWMON: One of the regulatory issues on this 25 that has been around for a long time is, how do you certify

whatever goes into the plant is not endangering the public health and safety here.

MR. KERR: I am just trying to find out what the staff means by it.

MR. COFFMAN: They range, but certainly one 5 example is that when you introduce the new displays as one 6 7 of the pictures that was shown this morning of the 8 Darlington displays, then you have all the human factors aspects that go along with the ability to match up the 9 10 operator's understanding of the physical processes that are going on with the way the information is portrayed on the 11 12 display. That's one.

MR. CARROLL: That isn't a software issue. We are sort of talking software today.

MR. KERR: No, I am just trying to find out what he means by --

17 MR. COFFMAN: One of the other ones is how do you 18 rely upon automatic monitoring, surveillance and calibration 19 in the equipment.

20 MR. KERR: Jay, I want to find out what he means 21 by a regulatory issue, and I think I am finding out.

22 MR. CARROLL: Okay.

1

2

3

4

23 MR. LEWIS: It's also true that when you speak of 24 human interactions that's really not a software issue, and I 25 worry a little bit about looking under the lamp post; that

is, see your own personal expertise in the subject. To use Paul's analogy, if you are interested in issues that involve the public health and safety you wouldn't call a computer scientist to come bang on the steel of a pressure vessel and say see, it sounds pretty good to me or anything like that. That's just as true of computer issues as anything else. You have to know what you are doing in order to know what you are looking for.

1

2

3

4

5

6

7

8

23

9 MR. KERR: I am satisfied with the information 10 that you provided.

11 MR. BELTRACCHI: The software issues certainly are going to be such issues as common mode error to result in 12 13 the loss of the safety function. You are looking for 14 software engineering practices that would lead to high 15 integrity software, and those characteristics that would support that. I don't think you can go out and say I have a 16 17 yardstick and this passes and this fails, because the 18 technology is not to that point yet.

MR. LEWIS: I have to differ with you on that. In many cases it isn't to that point and in some cases it is. There exists verification and validation tests which are pass/fail tests on computer systems.

MR. BELTRACCHI: That's right.

24 MR. LEWIS: So, sometimes it is true that you have 25 a yardstick. Knowing where you do and where you don't is a

non-trivial --

1

3

MR. BELTRACCHI: That's true, and I will address some of those issues in this talk.

MR. LEWIS: Please, we are holding you up. I think that pretty much summarizes what I wanted to say about Darlington. We have also talked with AECB about their experience with Bruce, and that appeared to be a quality control issue since the problem was identified in the previous version of the code and hadn't been fixed. It did appear in an operational sense, and we are continuing our communications with AECB on this issue.

I would like to discuss some experiences in France 12 13 and Germany. I was recently on a National Science Foundation survey, and this reflects some of the experiences 14 that we gained from that survey. The N4 series of plants is 15 the latest series of French designs. They just recently 16 announced that they were cancelling part of their digital 17 18 I&C for the control room. When you read about the details of it and also looked at the material, it appeared that the 19 20 front end requirements of that design had not been 21 thoroughly completed. It was certainly symptomatic of that 22 problem, anyway.

23 We also found out recently that there also 24 appeared to be some organizational issues that contributed 25 to that cancellation.

MR. MICHELSON: Does that mean they went back to analog control?

MR. BELTRACCHI: They were going to revert to an earlier design --

3

4

5

MR. MICHELSON: Was it still digital?

6 MR. JOYCE: Excuse me. This is Joe Joyce with the 7 Instrumentation branch. We have with us today John 8 Gallagher, also a member of the Instrumentation and Control 9 System Branch. He will be giving you a five to seven minute 10 status on the N4 design, so let's not waste time on this.

11 MR. BELTRACCHI: I also want to point out that the 12 French are in the process of designing a microprocessor-13 based safety system for the N4 series. They are using a 14 case tool called OST that was developed at Saclay. It is 15 also being used by their licensing people to evaluate the 16 code.

In Cermany, KWU is in the process of developing -also in the process of developing a microprocessor-based safety system. It's a ten year program. They are also use a case tool that they developed in-house. It's called space specification and coding environment. It aids a designer in specifying the requirements, and it also has some characteristics of automatically generating code.

24 The licensing in TUV Norddeutschland is one of --25 MR. LEWIS: What is meant by automatically

generated code?

1

MR. BELTRACCHI: Automatic cods generator, in a 2 3 sense that what they do is, they can end up with a 4 specification that is in symbolic form and can read it optically and generate code. 6 MR. LEWIS: The specifications in written form, in 6 7 symbolic form, are written in what language? MR. BELTRACCHI: They developed their own very own 8 9 direct language. 10 MR. LEWIS: Okay, so --11 MR. BELTRACCHI: It ended up being graphical 12 symbols. Of course, you have to understand the ---13 MR. LEWIS: This isn't really automatic code 14 generation, it is code translation. 15 MR. BELTRACCHI: Okay, fine. 16 MR. CARROLL: Did this team, we or whoever, also 17 look at the experience with digital applications in the United States? We heard each of the vendors to one degree 18 19 or another, Combustion in particular, talk about a 20 considerable experience base. Has this group looked at what 21 they have been doing? 22 MR. BELTRACCHI: No. I am reporting on my 23 experiences with a two week survey that was sponsored by the National Science Foundation. 24 25 MR. CARROLL: All of this was that?

MR. BELTRACCHI: No. This particular area here and here reflects that and overlaps my duties at the NRC, so I felt it appropriate.

4 MR. SHEWMON: The purpose of the NSF group was to 5 do what?

6 MR. BELTRACCHI: To evaluate the research and 7 current nuclear I&C instrumentation activities within Europe 8 and compare it with those in the United States. We didn't 9 do a specific survey within the United States. Experiences 10 of members of that team were drawn upon to --

MR. SHEWMON: I am just some surprised -- this was part of the engineering division of the National Science Foundation? You don't have to answer it.

14 MR. BELTRACCHI: Okay.

MR. SHEWMON: It just doesn't sound too much like what they normally do.

MR. LEWIS: The Science Foundation normallydoesn't have in-house expertise.

MR. SHEWMON: They normally don't worry about the engineering aspects of nuclear reactors.

21 MR. LEWIS: You bet you.

MR. BELTRACCHI: I guess two years back they did a survey of Japanese technology in the nuclear field, and this was sort of a follow up study.

25 MR. SHEWMON: Good. I have nothing against a

competent group.

2 MR. KERR: There is a report, I take it, in 3 preparation?

MR. BELTRACCHI: The report is in the process of being generated. There should be a report by mid-summer or early fall.

7 There is a licensing authority in one of the 8 German states, TUV Norddeutschland. They are using a case 9 tool called SOSAT to evaluate the code that is being 10 developed by KWU, and I will address that later in this 11 talk.

12

25

1

[Slide.]

The next program that I would like to discuss is a 13 program that we began two or three years ago. It was review 14 15 criteria for human factors aspects of advanced I&C. The 16 contractor was Oak Fidge National Laboratory. The 17 objectives are stated here. It is basically to develop 18 review criteria for evaluating safety implications of human 19 factors associated with artificial intelligence, expert systems and advanced I&C. 20

The initial objective of this program did perform an industry survey to define issues. These were reported -the results of the survey were reported in NUREG-5439, which I have a few copies of if you are interested.

MR. KERR: Who is in charge of the program at Oak

1 Ridge?

l

ĩ

-

10

| 2  | MR. BELTRACCHI: That was in the I&C stea, and                |
|----|--------------------------------------------------------------|
| 3  | that was Dwayne Fry's instrumentation and controls division. |
| 4  | MR. KERR: He is not responsible                              |
| 5  | MR. BELTRACCHI: No, it was Dr. Robert Urig, was              |
| 6  | involved.                                                    |
| 7  | MR. KERR: Is Urig responsible for the work?                  |
| 8  | MR. BELTRACCHI: Pardon me?                                   |
| 9  | MR. KERR: Urig is taking responsibility for this             |
| 10 | program?                                                     |
| 11 | MR. BELTRACCHI: Yes. The survey portion of the               |
| 12 | program has been completed. He is no longer I don't          |
| 13 | think he is any longer working on this particular project.   |
| 14 | MR. KERR: Who is then?                                       |
| 15 | MR. BELTRACCHI: I think it's Mr. Carter and Bill             |
| 16 | Kinney.                                                      |
| 17 | MR. KERR: Thank you.                                         |
| 18 | MR. BELTRACCHI: There were both human factors                |
| 19 | issues and instrumentation controls issues identified from   |
| 20 | the survey. I would like to just discuss one or two of the   |
| 21 | I&C issues that came from the survey.                        |
| 22 | [Slide.]                                                     |
| 23 | We found that there was a concern with respect to            |
| 24 | the resources requirements for verification and validation   |
| 25 | of advanced I&C. This concern reflected itself in the fact   |
|    |                                                              |

145

ia

đ

r Ø

that it was very costly and it was a question of a trade off, being able to deny anticipated improvements that were available from the use of digital technology, for example, your diagnostics that you can build into digital technology that are not available or readily available to an analog hard technology.

1

2

3

4

5

6

Another concern had to do with what are the configuration control requirements for digital systems backfitted in nuclear power plants. This manifested itself into the security of the software, being able to protect it against viruses, the maintenance of the software, as well as configuration control issues.

13 MR. CARROLL: It's really more than backfitted,14 it's also in an original design?

MR. BELTRACCHI: Yes. The acceptance criteria for advanced I&C was also an issue. Certainly, operators were concerned that poorly designed and ill-qualified equipment coming into plants would present them with problems. The need to establish acceptance criteria to avoid those kind of problems is certainly a factor that came from the survey.

The last bullet on here was a rather important one. In the course of our survey we found that one plant related a situation where their plant process computer was polling the protection system, and in the course of doing that there was a sneak circuit that actually tied up the

protection system. When they understood the problem, they way they solved it was to make the protection system just a broadcaster of data such as there would be no two-way communication, not even a handshake. That way, the information could be acquired by the plant process computer without impacting the protection system at all.

7 MR. KERR: Are there acceptance criteria for non-8 advanced I&C?

9 MR. BELTRACCHI: I guess if you would consider 10 non-advanced I&C current technology, I would have to answer 11 that yes.

MR. KERR: Where would one find such acceptance criteria?

MR. BELTRACCHI: In the form of regulatory guides
 and general design criteria.

16 MR. KERR: They also would cover, it would seem to 17 me, advanced I&C as well because they cover what one is 18 willing to accept in nuclear power plants.

MR. BELTRACCHI: To some extent they do and to some extent they don't. For example this one you could look at as a form of GDC 24 if you like, separation between protection and control. However, the intent of GDC 24 when it was written was really propagation of electrical faults. This is the software version of it, if you would like. That certainly was an issue when we addressed the core protection

calculator system in its review.

1

2

3

4

9

21

MR. KERR: What you are telling me is that the existing criteria are incomplete; they don't really lover it.

5 MR. BELTRACCHI: Or, the digital interpretation of 6 that has to be made very clear.

7 MR. LEWIS: Could I pursue that one for just a 8 moment?

MR. BELTRACCHI: Yes.

MR. LEWIS: This was a radio link or celephone link?

MR. BELTRACCHI: No, this was a digital link to the computer and actually had a hand shake in it. That is how it was described to us.

MR. LEWIS: It had a hand shake, that's what I was looking for. So, it is not true that it is a one way system.

18 MR. BELTRACCHI: That is correct.

MR. LEWIS: There is information going the other way?

MR. BELTRACCHI: That is correct.

MR. LEWIS: Therefore, the illusion that this is a one way system is perhaps mistaken.

24 MR. BELTRACCHI: This is the solution.
25 MR. LEWIS: Pardon?

1 MR. BELTRACCHI: This is the solution. I wanted 2 to say that --3 MR. LEWIS: The solution is truly one way? MR. BELTRACCHI: That is correct. 4 6 MR. LEWIS: Of radio link? MR. BELTRACCHI: Broadcasting out. There is no --6 7 MR. LEWIS: By how, with a wire? MR. BELTRACCHI: No, it just repeatedly presents 8 the data to be read. 9 MR. LEWIS: I am asking whether it is transmitted 10 11 by wire or wireless? 12 MR. BELTRACCHI: No, I believe it is transmitted 13 by either an optical link or a wire. 14 MR. LEWIS: There is no hand shake? 15 MR. BELTRACCHI: That is correct. MR. LEWIS: There is no signal going the other 16 17 way? I don't understand how it works then. 18 MR. BELTRACCHI: If you were cycling through == if 19 you periodically put out every one second, then the listener would have to be looking for --20 21 MR. LEWIS: It can be done. I agree that it can 22 be done, but one has to look very carefully. One often finds that even when people tell you it is one way there are 23 synchronization signals or other kinds of things that --24 MR. BELTRACCHI: That may very well be true. 25

MR. LEWIS: But that compromises the illusion of isolation that you may get from this kind of wording.

MR. BELTRACCHI: That may very well be the case. What we are looking at here is really how to interpret this or what kind of requirements the NRC should be coming up with respect to say a digital interpretation, GDC 24. 6

MR. LEWIS: Please go on.

[Slide.]

1

2

3

4

5

7

8

18

MR. BELTRACCHI: There is another project that we 9 have at Oak Ridge. It is entitled Computer Classification. 10 This was review and evaluate the adequacy of resisting 11 regulatory guidance for computer-based safety systems; and, 12 where necessary, recommend development of new guidance. 13 This was a two year program. It was initiated in February 14 of 1989. We currently have a draft NUREG and it is under 15 review, but it does need some additional work. We hope to 16 publish that later on this spring or early this summer. 17

(Slide.)

The next project I would like to discuss is the 19 20 expert system verification and validation guidelines. The contractor for this effort is Science Application 21 22 International Corporation. The objective here was to develop and document guidelines for verifying and validating 23 expert systems. This is a joint project funded by NRC and 24 25 EPRI. It's a two year program that was initiated in

1 October, 1990. We are progressing on that effort at this time.

2

25

3 MR. KERR: Which branch of SAI is responsible for this? 4

5 MR. BELTRACCHI: They are over in Tysons Corner. 6 MR. CARFOLL: Why would you have a project like this on expert systems and not one on control and protection 7 8 software systems?

MR. BELTRACCHI: I am getting to the one on --9 10 MR. CARROLL: You do have one on that. 11 MR. BELTRACCHI: Yes.

MR. LEWIS: I would have asked the converse 12 13 question. I would say how can you do this on expert systems 14 because, again, the term verify and validate would suggest a 15 degree of formalism or formality that expert systems 16 normally don't have.

17 MR. BELTRACCHI: There is one school of thought that the contractor has related to us that it would be 18 19 easier to verify expert systems because of the formal 20 methods and the tools that are available to do that.

21 MR. LEWIS: He must mean something different by the word verify again, because an expert system is just a 22 collection of rules. 23

24 MR. BELTRACCHI: True.

MR. LEWIS: I don't see what you verify about a

collection of rules.

MR. BELTRACCHI: You are certainly concerned about whether or not the rules are conflicting, and you obviously have the concern of the degree of completeness and boundaries of that knowledge. 5

MR. LEWIS: I understand, but that's not the sort 6 of thing that lends itself to verification in the computer 7 science sense. I guess we are using that word for different 8 purposes today. 9

[Slide.] 10

3

4

MR. BELTRACCHI: The NRC is a member of the Halden 11 12 Project, Halden Reactor Project. One of the software tools that is developing at Halden is SOSAT. This is a set of 13 tools for software safety assessment. It is being developed 14 at Halden because TUV Norddeutschland had contracted with 15 Halden to do this work. 16

The functions of this case tool are listed here. 17 It will do metrics computations. For example, it will 18 19 calculate the volume and length metric, check for illegal instructions and illegal accesses. It does static analysis 20 of code and dynamic analysis. One of the future functions 21 that they want to build into the system is symbolic 22 23 execution. They plan to develop an analysis module to compare the program functions with specifications. You can 24 read that as reverse engineering. 25

They are working on this effort, but they have not achieved that goal at this time.

1

2

Э

4

5

6

MR. LEWIS: I am missing a point somewhere. You are not -- I'm on the wrong page, that's my problem.

MR. SHEWMON: He's giving the right viewgraph. Would you tell me what metric computation means there?

7 MR. BELTRACCHI: Yes. The Halstad metric, for 8 example, is a measure of the complexity of the code. What 9 they do is combine operators and operans by some log 10 rhythmic formula, and if it exceeds a certain value it is an 11 indication that it may be fairly complex or too long.

12 They have been able to correlate -- it correlates 13 weakly, if I recall correctly, with a number of errors in a 14 program. There was no strong correlation. At least that is 15 what they found at Halden.

16 MR. SHEWMON: I wonder if it was calculating 17 volumes in cubic meters --

MR. BELTRACCHI: No. There are also other metrics 18 19 such as McCabe metric, and that has to do also with 20 complexity. It can do time analyses for portions of the codes, provided you have loop criteria. It is now being 21 22 used by TUV Norddeutschland, as I mentioned earlier. We have communicated with this regulatory agency, and we are looking 23 into the possibility of using the SOSAT tool for our own NRC 24 25 assessments of software.

[Slide.]

2 In another Halden project they had a goal of 3 establishing increased software reliability for safety systems. This was really a program on software test and 4 evaluation methods. They approached this by having one 5 6 organization develop a safety system spec. That was the 7 Safety and Reliability Directorate in the United Kingdom. 8 This specification was independently coded by three teams: one in Norway, one in Finland and one in the UK. 9

10 They looked into many features. However, I am 11 only going to focus on one aspect of the study; that is, the 12 fault finding strategies and test data selection. Let me 13 quickly describe the test data types that they had.

14

1

[Slide.]

15 There were six types. There were two sets of 16 deterministic data. That is, systematic data, the type that 17 you would manually produce test cases from the specification 18 functions like a requirements matrix that would list all 19 your functions, and you generate a test case for each of 20 those functions. They use plant simulation data which would 21 be like scenarios from training simulators.

They also had four sets of random data. One was uniform distribution with an equal probability inside the data range; that is, they had temperature from 500 to 600 degrees. They would have equal distribution of cases within

that range. They had a Gaussian distribution, mean and mid-1 range of that data, like 550. They had uniform distribution 2 at the boundaries, both at the high and low. That's 500 and 3 600, and the example that I just --4 MR. LEWIS: What does that mean? I don't 5 understand what a Gaussian distribution boundaries is. 6 MR. BELTRACCHI: That means it was Gaussian around 7 the high value and the low value of the input data range. 8 MR. LEWIS: I'm sorry, two Gaussian's, one at the 9 high volume and one at the low volume --10 MR. BELTRACCHI: Yes. 11 MR. LEWIS: -- and then uniform in between? 12 13 MR. BELTRACCHI: No. MR. LEWIS: Just two Gaussian's. 14 MR. BELTRACCHI: Separate. The latter. 15 MR. LEWIS: Two Gaussian's --16 17 MR. BELTRACCHI: These are separate cases. MR. LEWIS: I see. That's for any of the 18 variables, whatever they are? 19 MR. BELTRACCHI: Yes. And, the Gaussian 20 distribution at the boundaries. 21 22 [Slide.] In evaluating test data efficiency for fault 23 detection, they took and seeded each program with 62 faults, 24 25 and then they tested each program back to back against each

other with the input data types. They were able to find all of the seeded faults. However, multiple data types were required.

What they found was that the most efficient means 4 5 of finding these faults were through the uniform distribution inside data range and the Gaussian 6 7 distribution, the boundary. The least efficient were 8 Gaussian distribution, mean and mid-range, and systematic data. That sort of points out that systematic data is the 9 type of data that most people use to qualify their programs. 10 It says that it is necessary but not sufficient in terms of 11 12 a test strategy.

MR. KERR: Do you think this single investigation is enough to demonstrate this general conclusion that you are drawing?

MR. BELTRACCHI: No, but let me finish my point. I would like to also point out that in the core protection calculator they used a test strategy that was very similar to the combination of these two. Yes, you can question --MR. CARROLL: They, in this case, being

21 Combustion?

1

2

3

22 MR. BELTRACCHI: That is correct. You can 23 question the issue of yes, these are empirical data and 24 there is not a great deal of data here to support that. But 25 it sort of does imply that if you want to really have some

technique to look for unintended functions, you better 1 consider more than just systematic data. That's the only point that I wanted to make.

[Slide.]

2

3

4

We have another project that is addressed toward 5 Class 1-E digital computer systems. The contractor is 6 7 through an interagency agreement with Rome Air Development Center. The work will all be done by SOHAR, Incorporated. 8 The objective here is to conduct an industry survey and 9 develop technical bases for regulatory guidance on design, 10 development and test and acceptance of Class 1-E computer 11 systems. It's a one year program in response to specific 12 13 user needs from NRR.

14 The product of this effort will be a draft regulatory guide on design and development of Class I-E 15 computer systems. It will incorporate the survey results 16 17 and research results that we know of to date. That is our 18 goal.

19 M'3. SHEWMON: Is this aimed at the reliability of 20 hardware or software?

MR. BELTRACCHI: It will be principally addressed 21 22 toward software and those hardware elements that impact 23 software.

MR. KERR: This will be software that is capable 24 25 of withstanding a safe shutdown earthquake?

MR. BELTRACCHI: I guess we will have to try that and see whether it works.

MR. CARROLL: Who is this contractor?
 MR. BELTRACCHI: The contractor is SOHAR,
 Incorporated.

1

2

۶.

6 MR. CARROLL: I know, but tell me about them. I 7 have never heard of them.

8 MR. BELTRACCHI: Let me take and ask Herb Heck 9 from SOHAR to address that, and that will be a direct way of 10 answering your question.

11 MR. HECK: SCHAR is a contraction of software and 12 hardware reliability. We have been in business 13 years. 13 We service, among other things, the FAA advanced automation 14 system, service NASA and the Air Force. We also have a 15 number of contracts with the Department of Energy in the 16 nuclear field. In 1990 we were selected as the small 17 business prime contractor of the year for the West Coast 18 Region.

We have approximately 20 professional, very high
 level -- about one-quarter of the staff is Ph.D.

21 MR. CARROLL: Do any of them have their Ph.D. in 22 computer science?

23 MR. HECK: All of them, except myself. Mine is in 24 engineering, because when I got mine there wasn't much 25 computer science.

1 MR. CARROLL: Do you feel possibly you don't have 2 enough nuclear engineers to handle this job? 3 [Laughter.] MR. LEWIS: Is this a group that has been dealing 4 with RADC, or is it a spinoff of RADC personnel? 5 6 MR. HECK: No, sir. We have worked with RADC almost throughout our existence, but we are a for-profit 7 8 organization. We have a task order agreement for continuing support to RADC. 9 MR. LEWIS: It says RADC/SOHAR on the viewgraph. 10 11 MR. HECK: It means that it goes to the task order 12 agreement. MR. SHEWMON: In the presentation by the gentleman 13 14 from Westinghouse, there was a slide that listed various 15 ANSI and IEEE standards and V&V guideline codes and standards. I didn't hear you mention that at all. 16 17 MR. BELTRACCHI: No, I didn't but I can. I can address that. 18 19 MR. SHEWMON: Let me finish the question, and then 20 it might be more efficient. 21 MR. BELTRACCHI: I'm sorry, okay. MR. SHEWMON: My background comes more in the 22 pressure vessel end of things, where the NRC has a policy of 23 trying to encourage industry standards and influencing how 24

they get done. Is there a policy like that now formed in

25

1

2

3

4

9

17

NRC or will there be, or do they feel there is little value?

MR. BELTRACCHI: I believe NRR is going to be addressing their activities and the standards effort; is that right, Joe?

MR. JOYCE: We will have discussion on IEEE 7-5 4.3.2, with its endorsements of Reg Guide 1.152. We will 6 7 talk about some of the work that is going on with revising 8 the standard. We are fortunate to have a staff member with us that is part of that team. At that time, if you can hold 10 off maybe, he can help with the question.

11 MR. SHEWMON: All right. Thank you. 12 MR. BELTRACCHI: That concludes my portion. 13 MR. LEWIS: Thank you very much for your speediness. I admire your ability to do that, in the face 14 15 of the harassment you got from the members. Thank you. I 16 believe that we can have a break, which will begin now.

[Brief recess.]

18 MR. LEWIS: Let us reconvene the meeting. We are 19 now going to hear what it's really all about from the NRC 20 staff.

21 MR. JOYCE: Is that why they put me last. I am Joe Joyce. Good afternoon. I am with the Instrumentation and 22 Control Systems Branch. This afternoon I will be talking 23 about some of our early designs, lessons learned from the 24 25 early designs, present activities and criteria.

1 Then we will have Jim Stewart, also from the 2 Instrumentation and Control System Branch. Jim will talk 3 about future applications, advanced light water reactor, our passive designs, some retrofits that are going into the 4 5 operating plants. After him we will have Ray Ets, who is with Software Associates. Ray is our consultant, and has 6 7 been since 1987. Ray will talk about verification and 8 validation, and our review methodology. Yes, he has his 9 masters in computer science. 10 Then we will have John Gallagher, also with ---11 MR. KERR: He is not a member of the NRC staff 12 though, is he? 13 MR. JOYCE: I'm sorry, I could not hear you. 14 MR. KERR: He is not a member of the NRC staff, is 15 he? MR. JOYCE: John Gallagher is, at the present 16 17 time-18 MR. KERR: No, Ed. 19 MR. JOYCE: -- at NRR in the Instrumentation and 20 Control Systems Branch. 21 MR. KERR: No, Ets is the one that I am talking 22 about, the one that has --23 MR. JOYCE: Ray Ets is not a member of the staff. He is with Smartware Associates. 24 25 MR. KERR: He is the one that has a masters degree

in computer science, isn't he?

MR. JOYCE: That's correct.

MR. CARROLL: Does anyone in ICSB have such a degree?

MR. JOYCE: Computer science, I don't believe we 15 6 do.

MR. STEWART: This is Jim Stewart. If it helps, I am currently in a master of science for computer science 8 degree.

10 [Laughter.]

1

2

3

4

7

9

MR. STEWART: I would like to note that many of my 11 professors in the masters program do not have their degrees 12 in computer science for similar reasons to Herb Heck's, 13 14 there weren't any available when they got their degrees.

15 MR. LEWIS: Some of us who teach for a living are 16 delighted to meet people who are in a program on something 17 and learning something.

18 MR. JOYCE: One thing I would like to add to that, 19 because we have been asked that same question by the Commissioners. We have been trying to recruit -- we have 20 21 had interviews and we have recruited people within the Instrumentation Branch. I, personally, have probably seen 22 over 30 and I know the Branch has interviewed over 40. 23

We made three offers to computer science folks and 24 they turned us down -- not enough money. We are now 25

continuing to interview and we have not stopped. We 1 2 recognize that both the ACRS and the Commissioners would like to see ICSB staff augmented with a person that 3 understands code one's and zero's. Even though we have been 4 5 doing it for many moons and our consultants, we are still looking. It's not as if we aren't going to get one. 6 MR. CARROLL: How about statistics? 7 [Laughter.] 8 MR. JOYCE: That's a different branch. 9 MR. LEWIS: Ignore my trouble making friends and 10 11 go on. MR. JOYCE: I am going to go pretty fast because 12 we have a tight schedule. 13 14 [Slide.] 15 Our first review was the core protection calculator system, which was designed by Combustion 16

17 Engineering. You heard a little bit about it this morning. 18 That was the first computer system that was doing some 19 complex algorithms that used six minicomputers and 1.ad 12 20 inputs to the reactor protection system. Two of them were 21 digitized, the DNBR calculations and kilowatts per foot. 22 The rest were analog, and there was a warm feeling about 23 that.

I am not going to talk about the design or the configuration. What I want to point out is that the

licensee and vendor, from the time they started this design into implementation probably spent approximately 100 man years on this task. The staff alone spent 18 man years in its review effort of this. There was a major redesign 5 required during this process. At the end, the staff ended up developing 27 positions on the core protection calculator, 6 many of which are still used today.

1

2

3

4

7

MR. KERR: That's one and one-half positions per 8 man year.

10 MR. CARROLL: In what form are these positions? 11 Where would I go to look for them?

12 MR. JOYCE: You can start going to the document 13 called Arkansas ANO-2 safety evaluation report. There are 14 two chapters. Chapter seven is instrumentation and control 15 system. There is an appendix that talks specifically about 16 the review methodology and the review efforts by the core 17 protection calculator. In there you will see the 27 18 positions, 27 positions like being watchdog timer. Your system will have a watchdog timer. Any failures that happen 19 20 within the system, watchdog timer times out and you go to a 21 failed state and things like that.

22 MR. MICHELSON: Did they get into the standard 23 review plan?

24 MR. JOYCE: Did the positions get into the 25 standard review plan?

MR. MICHELSON: Yes.

1

2

7

20

MR. JOYCE: No, they did not.

3 MR. MICHELSON: When you say in the future that 4 you are going to follow the standard review plan it doesn't 5 mean that you go back to Arkansas, it means you go to the 6 standard review plan to see what is required.

MR. JOYCE: That's true.

8 MR. MICHELSON: Whatever was at Arkansas and 9 became a position somehow is also a requirement in the 10 standard review plan then, or it is no longer needed.

11 MR. JOYCE: It kind of takes two paths. It takes 12 the path called evolution, evolution and ICSB review effort. 13 As we go off and we do the reviews as we did this review and 14 other reviews, in the back of our mind and our hip pocket in 15 the top right hand drawer we have the 27 positions. We know 16 what the positions are, we documented them in previous 17 SER's, we know what the concerns are, and we can continue to 18 do the reviews incorporating positions and re-evaluating 19 positions.

MR. MICHELSON: What putzles me --

21 MR. JOYCE: Do they ever get to the standard 22 review plan, no they have not.

23 MR. MICHELSON: How do they get into the improved 24 evolutionary reactors? What positions do you have on those 25 that might relate to solid state control?

MR. JOYCE: I am going to let Jim Stewart talk 1 about those plants and those positions. 2 3 MR. MICHELSON: Okay, thank you. MR. JOYCE: You are welcome. 4 MR. CARROLL: Tom, could you get the pages out of 5 Arkansas that he is talking about? 6 7 MR. JOYCE: If you can't find them, call me and I will provide them to you or at least give you the documents. B MR. CARROLL: Probably most of the members would 9 like to see what the 27 positions are. 10 MR. JOYCE: As a matter of fact when I get done 11 here, I have the 27 positions xeroxed in case it came up. I 12 13 will hand them to you at the end of the presentation. 14 MR. ROTELLA: Thank you. MR. MICHELSON: It may be better to hand them 15 during the presentation. 16 MR. WILKINS: No, because then he would talk about 17 18 each one of them, and that would take 27 minutes. 19 MR. CARROLL: Or 18 man years. 20 [Slide.] 21 MR. JOYCE: The next review that we had was by Babcock and Wilcox. It was a reactor protection system II. 22 This used four microprocessors and it had ten inputs to the 23 reactor protection system, three of which were digitized. 24 Once again, the DNBR kilowatts per foot, and there was a 25

flux offset I believe.

1

2

3

4

5

21

The results of this review, there are 14 errors found during integration testing which indicated a lack of detail review prior to testing.

MR. CARROLL: Found by B&W?

6 MR. JOYCE: Found by our audit team, B&W, their 7 consultants and our consultants. We then had a contract 8 with Boeing to also go off and look at some software 9 modules. We went in and took samples of software modules, 10 and Boeing did a sneak circuit analysis on it. As a result 11 of that analysis there were nine software documented errors 12 but there were no sneaks in the circuit.

MR. MICHELSON: What vintage B&W plants did this appear on?

15 MR. JOYCE: This is the Belefonie.

16 MR. MICHELSON: That's what I thought.

MR. JOYCE: Which leads to the next bullet. Our review was terminated by the cancellation of Belefonte. It is my understanding that Belefonte is in talking to the staff about resubmitting their FSAR.

[Slide.]

You have also heard these words used this morning,
414 integrated protection system. This was a major change
for the Westinghouse design. This was a distributed digital
microprocessor-based system that encompassed reactor

protection system, engineered safety features and control systems. Both the ACRS and the staff had a number of concerns about this design. We were concerned about the 3 adverse interactions between all these systems. We have 4 concerns about sharing of a common sensor or signal. We are 5 also concerned -- one of the concerns was common mode 6 7 failure in the redundant elements and the degradation of the defense-in-depth concept. 8

1

9 We ended up putting together a review group to assess the defense-in-depth and diversity of the integrated 10 11 protection system. Half way through this review effort it became obvious to the staff that we were emersed in details, 12 13 and we were not going to achieve the goal of the task force 14 in finding the integrated protection system acceptable at 15 the level of detail in which we were doing the review. We 16 were down at the component level.

1-We decided that we can't do this, we will not be 18 successful. We had to take a different approach, so we 19 stepped backwards and decided to take what we call the 20 simple approach and assess the system architecture only. At 21 this time we had to develop what we call the block concept. 22 The block concept also had guidelines associated with the block concept. 23

24 In conjunction with the block concept and the 25 guidelines, these gave the staff tools to go in and assess

the integrated protection system with respect to the acceptability of the system architecture to the guidelines. We documented that in NUREG-0493. As a result of that we ended up giving Westinghouse a PDA for RESAR-414. I am sure you already know that.

6 MR. CARROLL: You have a xerox copy of the nine 7 open items over there too?

8 MR. JOYCE: Yes. The nine open items that we had 9 on RESAR-414, many of them have been closed. Many of them 10 had to do with -- in the SER we talked about ongoing design 11 and ongoing verification and validation. Many of these nine 12 open items ended up being put to bed through verification 13 and validation, which leads into the next slide.

MR. MICHELSON: On the Belefonte situation, you
 never got to the point of reaching positions, is that right?
 MR. JOYCE: That's correct, we really didn't.

17 MR. CATTON: This morning we heard from GE that 18 there were our IEEE standards and one ANSI standard and an 19 ICE publication 880. Does NRC require --

20 MR. JOYCE: Yes. We don't require -- we will talk 21 about them. If I can hold off on that, is that okay? 22 MR. CATTON: Fine.

MR. JOYCE: If I forget, remind me or Jim Stewart,
 one or the other.

[Slide.]

25

What were the lessons learned from these early 1 designs? The major lesson that was learned -- I shouldn't 2 say that. The NRC map rement made a decision that the staff 3 or the Agency itself could not afford the resources that 4 were required to go off and do this type of review. They 5 didn't put the burden back on the staff to go off and find 6 other tools, mechanisms, methodologies, something else other 7 than what we were doing on the previous design for the other 8 designs that were coming in that were using microprocessor-9 10 based systems.

At that time we looked around at the aerospace, we 11 looked at the military, foreign countries, and we even had 12 13 some people on working groups. It was decided that the 14 verification and validation methodology seemed like a pretty good tool to go in and assess software quality. It looked 15 like it was a pretty good tool that could be applied to the 16 17 type of designs that we have done in the past, and probably also will apply to designs coming in the future. 18

MR. MICHELSON: This was just for software?MR. JOYCE: Yes.

21 MR. MICHELSON: How about the hardware. I thought 22 you said you were getting bogged down on components on the 23 hardware, but maybe I misunderstood.

24 MR. JOYCE: You didn't misunderstand, that's what 25 I said. What we did was, we used a traditional review

method on the integrated protection system that didn't work.
 The traditional method was --

MR. MICHELSON: You dislocated the flow diagram. MR. JOYCE: Right, and started working our way down and said we can't get here. We are swamped under, we have to do something different. There are too many elements in there where we can go in and do our single failure to common mode failure --

9 MR. MICHELSON: Since that was kind of a unique or 10 new approach to trying to do these reviews, was it 11 documented in the standard review plan?

12 MR. JOYCE: What?

MR. MICHELSON: This new approach, this pproach of going --

MR. JOYCE: No, sir. It's documented in NUREG-16 0493.

MR. MICHELSON: In the future if I say the standard review plan defines what needs to be done, I guess you are going to address it later -- that clearly wouldn't help too well.

21 MR. JOYCE: The standard review plan keeps coming 22 up. Our goal is to update the standard review plan. I said 23 this year -- my boss cringes which I say that -- it is 24 needed. It is something that we have not necessarily done 25 over how long --1984 is the last version we did. There is a

real need for that. We have the material, we have the
 experience, we have some of the knowledge that should go in
 for the reviewers and industry to look at, to see things
 like NUREG-0493 that talks about diversity --

5 MR. MICHELSON: I am sure you are aware that a lot 6 of people think that the standard review plan is what the 7 Agency uses to review designs by. I think what I am hearing 8 today is that it wouldn't work too well in this particular 9 instance of the hardware.

10 MR. JOYCE: You are absolutely right. The things 11 that we are reviewing, these designs, are not in the 12 standard review plan.

MR. CARROLL: Our Committee letter on SP-90 said that very clearly; that they ought to get on with getting that standard review plan up to date.

MR. JOYCE: It's just not there.

16

MR. MICHELSON: It's just further validation of
 what we strongly suspected for the Westinghouse case.

MR. JOYCE: We ended up endorsing this IEEE standard with Reg Guide 1.152. The lessons learned from the early designs -- now we have something -- we have tool and a methodology, and now we can put the burden back onto the utility and licensee and the designers of this equipment to use such a mechanism called verification and validation, so all the staff has to do is go in and do an audit at the end

1 of the audit.

]

| 2  | With these tools including NUREG-0493, the lessons        |
|----|-----------------------------------------------------------|
| 3  | that were learned were from the early designs.            |
| 4  | MR. LEWIS: When you say all the staff has to do           |
| 5  | is audit, that means the staff has to look at what the    |
| 6  | licensee does in terms of verification and validation and |
| 7  | make sure he did it right.                                |
| 8  | MR. JOYCE: Yes.                                           |
| 9  | MR. LEWIS: That requires that one be able to do           |
| 10 | it better than the licensee could have done it.           |
| 11 | MR. JOYCE: Not necessarily.                               |
| 12 | MR. LEWIS: How do you audit something without             |
| 13 | knowing how to do it better?                              |
| 14 | MR. JOYCE: We will get into that when we talk             |
| 15 | about that with Ray Ets. We have a whole dissertation on  |
| 16 | review methodology.                                       |
| 17 | MR. LEWIS: You have a dissertation on it?                 |
| 18 | MR. MICHELSON: This is coming                             |
| 19 | MR. LEWIS: The first slide I put up                       |
| 20 | MR. LEWIS: You are going to talk about it later,          |
| 21 | is that what you are saying?                              |
| 22 | MR. JOYCE: Yes,                                           |
| 23 | MR. LEWIS: Is that what you were trying to                |
| 24 | communicate to me.                                        |
| 25 | MR. JOYCE: My wife accuses me of that so many             |

173

٥

2

100 j

times, poor communication skills. Sorry about that. 1 MR. MICHELSON: How much time do we have for all of this? 3 MR. LEWIS: You do disagree with the assertion 4 that in order to audit something you should know more about 5 it than the person who did it? 6 MR. JOYCE: No, I don't agree with that. 7 MR. LEWIS: Okay, fine. I thought you did. 8 MR. JOYCE: No. 9 MR. WILKINS: He said yes, he does agree that he 10 11 disagrees. MR. LEWIS: Okay, let's get the signs of the yes 12 and no straight. You do disagree with my assertion that in 13 order to audit something you ought to know more about it 14 than the person who did it; you do disagree? 15 MR. JOYCE: Yes. 16 MR. LEWIS: There's nothing wrong with 17 disagreeing. One of us will turn out to be right. 18 19 [Laughter.] MR. CARROLL: At most. 20 MR. JOYCE: The reason for that is because when we 21 do our review -- when we walk into an audit we are not fully 22 equipped. We don't have what I will call the years and 23 numbers of engineering effort that is put into a design. We 24 show up on site or at the vendor's for a week and sometimes 25

two weeks and it depends on the complexity of the system, we may end up doing four or five audits.

3 Historically when we do show up, we do bring a 4 multi-discipline team of staff members and consultants to go 5 in. We pick a few subjects, and we go in and do a thorough 6 review in that area of which we are doing an audit. If you 7 said let's take the same team and let's quiz them about the 8 rest of the design, we would fall short.

9 MR. LEWIS: I'm not going to argue the case, but 10 not because I agree with you but because I also have a 11 responsibility for keeping us on schedule.

[Slide.]

1

2

12

MR. JOYCE: Type of upgrades. This has to do with the present systems that we are looking at today. Since this last reorganization we have been looking at a number of retrofits that are going back into the plant, retrofit being that a utility is taking out a piece of equipment that is worn out or needs to be replaced, or decides to put microprocessor in for some other reasons.

The type of upgrades that we have been seeing is a direct replacement of a single analog function with a digital equivalent. That would be like the one that we looked at, Palisades where they put in a thermal margin monitor combining several analog process steps into a single microprocessor. That would be similar to the one we did up

at Haddam Neck where they are taking angates and orgates and set points and putting them into a single microprocessor.

3 Partial replacement of an analog system with digital. Partial replacement that Diablo Canyon put in a 4 signal median selector that was right in the middle of a 5 system. It wasn't the whole system, it was just partial 6 replacement. Complete replacement of an analog system with 7 a digital system. That was Prairie Island that put in a 8 complete digital feedwater control system. From sensor all 9 10 the way down to the control we had a complete digital 11 system.

MR. CARROLL: Just as an aside, how has that worked?

14 MR. JOYCE: In terms of our --

1

2

MR. CARROLL: Has the system been very satisfactory?

MR. JOYCE: The feedwater control -- John
 Gallagher can answer that.

MR. GALLAGHER: I just left Westinghouse, and I would say that to the best of our knowledge talking to the customer he has been very satisfied with its performance, both with respect to its reading the requirements and the operators have been very satisfied with it. A lot of effort went into that to also deal with the man-machine interface. There were some small problems with the way that

the AMSAC system was hooked in. I think they have been straightened out now.

1

2

3

MR. CARROLL: Thank you, John.

MR. JOYCE: Addition of a digital system that 4 interfaces with the plant, that would be plant safety 5 monitoring system. We saw that at Beaver Valley and a 6 couple of other places that put in a plant safety monitoring 7 system. This last one is Arkansas, where they replaced the 8 core protection calculator. The first slide that I had 9 which was a minicomputer, they already upgraded their 10 microprocessor. 11

The next set of slides, because of time, I am not 12 going to go into any detail. They are there for your 13 14 information. What they are is, they are going to show you the present designs that we are looking at that has the 15 plant's name, what the vendor is. For example, South Texas 16 put in a QDPS, Qualified Display Processing System. It was 17 built by Westinghouse. We reviewed it and wrote a safety 18 evaluation report in May, 1987. The QDPS is a system that 19 does a little protection, does some control, and it does 20 21 some class I-E displays.

The main thing that I want to focus on for the next three slides -- like I said, I am not going to put them up there -- they are there for your information. You can see that we go all the way up to current, it is current. I

guess the last one shows that this month we issued an SER on
 Turkey Point where they are using Eagle 21 for upgrading
 their RTD bypass manifold.

[Slide.]

4

5 The main thing that I want to focus on in this 6 slide is that the review process and the technique, tocls, 7 and criteria that were used for evaluation for these systems 8 to date --

9 MR. KERR: Excuse me. What is an RTD bypass? 10 MR. JOYCE: It is where they take the resistant 11 temperature detector. It is in the manifold. They take the 12 RTD's out of the manifold and bypass around the manifold, 13 and they are using the Eagle 21 -- I didn't review this 14 system --

MR. KERR: Was it in effect, a replacement of the RTD by another system?

17 MR. JOYCE: Yes. Like I was saying, the three things that I wanted to focus in on three slides were with 18 respect to criteria. The criteria are basically the same 19 criteria that I showed to you on an earlier slide called 20 21 lessons learned. We also recognize that verification and validation is not the only tool that can generate quality 22 software. This standard should be augmented and updated to 23 reflect some of the tools and methodologies that exist today 24 in the industry and that have been proven. 25

With that, last year we sent over to research -you heard from Leo Beltracchi right before me -- we sent over to research a thing that we call a users need that had approximately 14 items on it. This is just a short list of some of the things that we asked research for their help. It certainly is not all encompassing.

[Slide.]

7

25

8 We said in 279 we need a digital standard similar 9 to like our IEEE standard 279 that talks about things like 10 data communications. Data communications came up this 11 morning with viruses, one way communications to transmit 12 only, no hand shaking, security, reliability, diversities. 13 These were all the subjects that we put into this user need. 14 MR. MICHELSON: Excuse me. What is Firmware?

15 MR. JOYCE: Firmware is when it is stuck into the 16 hardware. It is soft --

MR. MICHELSON: When you take it --MR. JOYCE: Like concrete when it sits up.
MR. MICHELSON: No. I don't know if it is that.
Are these cards that you would plug into the other pieces of
equipment like in a breaker; is that what you mean by
Firmware?

23 MR. LEWIS: That would be a PROM is a standard 24 example.

MR. MICHELSON: Okay, the PROM is a Firmware.

MR. LEWIS: It can be a whole card. 1 MR. WILKINS: It could be a chip these days, couldn't it? 3 4 MR. MICHELSON: I don't know. 5 MR. WILKINS: You can do an awful lot of things with one chip. 6 7 MR. MICHELSON: Yes. 8 MR. LEWIS: Yes. 9 MR. MICHELSON: It's firm, in a sense that it is 10 non-programmable and so forth after it is burned in. 11 MR. LEWIS: The fact is that programmable is 12 irrelevant --13 MR. MICHELSON: It is. Not hardware -- I was ... nking of that as hardware. 14 15 MR. CARROLL: This list that you are showing us 16 here is actually from the January 25th letter from Gillespie 17 to Beckjord that we have in front of us? 18 MR. JOYCE: Yes. That is one of them. Actually, 19 there were three that were generated. Back on April 26th we 20 wrote the first one, which is a mirror image of that one. Then in December, Dr. Murley sent one over to Beckjord that 21 had the same ingredients in it. Research got it three 22 different times through different channels. 23 24 MR. CARROLL: If I just read the January 25th one 25 I know everything that is in the other two letters?

•

1

MR. JOYCE: Yes, and then some.

2 MR. SHEWMON: Why was it so necessary to write 3 three separate letters?

4 MR. WILKINS: You think it was because they 5 ignored the first two letters?

6 MR. JOYCE: No. Gosh, I don't want to do that. 7 We wrote the memo saying we need help. We documented it and 8 it goes up the chain, and then there's other programs going 9 on where they are trying to pull other things together for 10 research and prioritize their work. I don't know how it got 11 lost -- I don't know how to answer that other than it is 12 there now.

13 MR. CARROLL: It would seem to me that an awful 14 lot of these things were things you needed or should have 15 anticipated needing five years ago. Why didn't you ask 16 research --

17 MR. LEWIS: Only five?

18 MR. WILKINS: I was going to say ten.

MR. CARROLL: Why didn't you ask research for all this laundry list way back when?

MR. JOYCE: That's a fair question.
MR. CARROLL: I only ask that kind.
MR. KERR: You don't really what an answer to
that. What you want him to do is do something that he can
do something about, and he can't do anything about that.

MR. CARROLL: I would just like to know the history of this thing because it troubles me that the Agency, it seems to me, is behind the ball game here.

MR. KERR: It's a different agency now than it was five years ago, so you aren't going to be able to learn anything useful.

MR. CARROLL: All right.

7

8

MR. LEWIS: At the risk of --

9 MR. CATTON: I would still like to hear the 10 answer, even if what Bill says is true.

MR. JOYCE: All right. It's interesting in the 11 sense that when we got V&V there was a competence level by 12 the staff with respect to verification and validation. We 13 have performed a number of audits, and the audits that we 14 have performed that Ray Ets will talk about in the review 15 methodology did prove -- these audits using verification and 16 validation methodology did prove out to flush up out of the 17 design errors and techniques -- not necessarily procedural 18 errors -- software errors. The vendors that are here can 19 stand up and speak to it say that's not necessarily true. 20

What happened is, we got a competence level with verification and validation and felt some what secure about it, just like we did with 279. You give me 279 in a plant and I feel great. Let's go and see what we can find, let's do some single failures and separation: and seismic on them.

MR. CARROLL: It wasn't that easy with 279 20
 years ago when it was invented.

MR JOYCE: That's right.

3

4

MR. CARROLL: It was a nightmare to everybody.

5 MR. JOYCE: What we are doing is, we are breathing 6 easy a little bit because we had such a burden prior to this 7 with the earlier designs. You saw the man years we spent. We got V&V. We are still developing and it's an ongoing 9 thing, we have people involved in the standards. Jim will 10 talk about how we are going to upgrade standards.

11 There are lists. I can go back and pull out memos 12 about here's another hit list that somebody ought to help us 13 with. Why does it not get to research -- I probably 14 shouldn't even answer that. We were thinking about it and 15 it never got there. Like Dr. Kerr said, we are not going to 16 gain --

MR. KERR: We have wallowed in enough nostalgia.
Let's get on with it.

MR. LEWIS: Let me ask yo: two questions, both of them rhetorical. The first one is, are you going to issue some letter to the community telling them what V&V means, so they don't all come in here and mean different things by it? MR. JOYCE: It's issued.

24 MR. LEWIS: The second question, because that one 25 doesn't require an answer --

MR. JOYCE: Let me answer the first one. 1 MR. LEWIS: No. Let me go on to the second. The 2 question is, in your first viewgraph you listed four 3 speakers in this hour and you are the first one -- I assume 4 that you are managing the time. 5 MR. JOYCE: Yes. 6 MR. LEWIS: Now you can answer the first one. 7 MR. JOYCE: Right there. Are we going to issue 8 anything to the world --9 MR. LEWIS: Does that define V&V? 10 MR. JOYCE: That defines this, and this defines 11 12 V&V. MR. LEWIS: That defines V&V? 13 MR. JOYCE: Yes. 14 MR. LEWIS: Why don't people read the definition 15 and use it? I don't know what definition is in --16 MR. JOYCE: V&V -- everybody has been talking 17 about verification and validation all day today, and nobody 18 19 was disagreeing with the terms. I ran off and made a slide, 20 and that's what we mean by V&V. 21 [Slide.] That V&V is consistent with 493, it's consistent 22 with Westinghouse's definition, it is consistent with 23 Combustion's definition that was given this morning. 24 25 MR. LEWIS: What is this page from? Is this a

1 page from our Reg Guide 1. --MR. JOYCE: I copied it out of 7-4.3.2, 2 3 definitions. Someone can check that for me, and I will check it when I sit down. 4 MR. LEWIS: Okay, thank you. 5 MR. JOYCE: That is what we have been using, 6 7 that's the statute. MR. LEWIS: That is your definition, okay. I just 8 wanted a definition. 9 MR. JOYCE: Like I said, it is consistent with 10 everybody up here that talked about it, except maybe yours 11 was flip flopping back and forth a little bit. 12 MR. LEWIS: I don't think so, but go on. I will 13 read that more carefully. 14 MR. JOYCE: Does Combustion agree with this? Does 15 Westinghouse agree with this? 16 17 MR. MICHELSON: It's hard to read it. MR. CARROLL: Why don't we make a copy of it so 18 that we can read it. 19 MR. LEWIS: Let's make a copy so we will all know 20 what we are talking about. 21 MR. JOYCE: With respect to time, I showed you the 22 research. The next person we are going to have speaking is 23

24 Jim Stewart. Jim Stewart is going to get up and talk a 25 little bit about the future designs, what he is seeing

today, evolutionary plant or passive plant, and even the 1 retrofits that I have touched on a little bit. MR. STEWART: My name is Jim Stewart, with the I&C 3 Branch. As a quick aside to Mr. Carroll's comment, my 4 computer science program has a required statistics course in 5 it. 6 [Laughter, ] 7 MR. MICHELSON: That makes him a statistician. 8 9 [Slide.] MR. STEWART: I put up this slide yesterday. This 10 is just to show the plants that we are looking at, going 11 from the ones we are actively and currently involved with 12 down through very conceptual level that we don't have much 13 detailed information on. So far, the vendors have told us 14 that the passive plants in the I&C area will be pretty much 15 the same philosophy of design as the plants are currently 16 looking at. 17 MR. KERR: This is future applications of --18 MR. STEWART: These are plants that we have not, 19 20 as of yet, --MR. KERR: I am looking at the title. The title 21 is future applications of something. 22 MR. STEWART: Future applications of our review. 23 It's probably not a great title. 24 MR. CARROLL: Of our review of digital --25

MR. STEWART: Of digital systems within these
 plants.

MR. KERR: It's future applications of digital
 systems.

5 MR. STEWART: Right. We left retrofits and 6 upgrades on the bottom there, simply because we do expect 7 more retrofits and upgrades similar to what Mr. Joyce talked 8 about before.

9

[Slide.]

10 Design features. When we made this slide we didn't have the benefit of knowing what Combustion and GE 11 12 and EPRI were going to say. They have addressed all of that. The only one I would address in addition is expert in 13 14 AI systems down at the bottom. The reason why I want to 15 specifically mention that is that even though none of the current plants that we are looking at are intending on using 16 17 it, they are all being fairly careful to leave the option open for possible future use. I will address --18

MR. CATTON: What you call the DNBR, is that --MR. STEWART: I believe it's very, very difficult to draw the line between what we have as computers now and what you call an expert system.

23 MR. KERR: AI is a buzz word that people use to 24 get research contracts, Ivan.

25 MR. CATTON: I understand that.

MR. CARROLL: Some plants are actually using such systems in non-safety related --

MR. STEWART: In non-safety applications, and I expect you will see more of that before we see it in safety applications.

6 MR. LEWIS: There have, in fact, been some 7 spectacular successes with expert systems.

8 MR. STEWART: We are not ruling it out. We 9 haven't taken a position that it is not a possible thing 10 that can be done.

11 [Slide.]

There has been guite a bit of talk about what our 12 13 review criteria is and what it is going to be for the ALWR's. The first thing obviously is the standard review 14 15 plan. It does not specifically address digital systems, it does not have a lot of useful guidance as far as details of 16 what to look at in digital systems. It has a lot of good 17 things as far as the need for quality and the need to assess 18 19 the quality of the systems.

20 MR. CATTON: Do you require that the applicant 21 meet those standards?

22 MR. STEWART: The standard review plan, yes. 23 MR. CATTON: All the ones that you have listed 24 there?

25 MR. STEWART: I will talk about that.

MR. MICHELSON: The standard review plan, I think 1 I hear you say is admittedly inadequate. 2 MR. STEWART: Yes. 3 MR. MICHELSON: What else does the potential 4 vendor have to do besides meet the standard review plan? 5 Obviously, it is not adequate, or at least that's your 6 position. 7 MR. STEWART: That is my position. 3 MR. MICHELSON: So, how does he know what he has 9 to meet and how do we document that? 10 MR. STEWART: One thing that we have done is, IEEE 11 7-4.3.2 which is a description of verification and 12 validation, including the definition of what those words 13 14 mean. MR. MICHELSON: That's for software? 15 MR. STEWART: That's for software. We endorse 16 that with the Reg Guide, which was issued to everybody. 17 MR. MICHELSON: I am back to hardware --18 MR. CARROLL: That Reg Guide is dated --19 MR. STEWART: The Reg Guide is 1985. The standard 20 is 1982. I currently am on the working group to upgrade 7-21 4.3.2, IEEE working group. We have both computer science, 22 masters, Ph.D. people and nuclear engineers on the team. We 23 have the vendors, we have the NRC, we have academia, a wide 24 variety of people helping out on that. 25

MR. MICHELSON: What is your schedule?

2 MR. STEWART: We have a draft being reviewed 3 within the review process right now.

4 MR. MICHELSON: What is your schedule for getting 5 out a revised standard review plan?

6 MR. STEWART: Unfortunately, I don't have a lot of 7 control over INPEC's voting time. We are hoping to put it 8 up for a vote within the year.

9 MR. KERR: M~. Stewart, are you listing things 10 that -- I take it this is sort of the background material 11 that a reviewer uses?

MR. STEWART: Yes.

1

12

13 MR. KERR: Is this for an ordinary intelligent 14 engineer, if one could find anybody like that, or is it 15 anticipated that the person who does the review will have to 16 be an expert on computers in some fashion? Are you 17 designing this for in-house use by NRC staff or for a 18 contractor?

MR. STEWART: For all the people involved. For example, what we are putting into the upgrade are requirements that I believe any engineer could understand. We are putting in requirements that a V&V program must be in place and must be used. There was some question earlier with one of the vendors -- I forget who -- who had up there 1012 and 880 -- IEEE 1012 and IEC 880, as examples of

verification and validation plans which they used to develop
 their own in-house standards.

We are currently trying to use 880 and 1012 in the 7-4.3.2 upgrade. The 7-4.3. upgrade is intended to address all computer applications in nuclear power systems, safety grade. We agree with everything that the previous speakers have said. We do not separate the hardware and the software. We think it all needs to be addressed. The revised 7-4.3.2 will be an attempt to do that.

The details of how you would take the requirements that are in the 7-4-3-2 -- assuming eventually that we get it endorsed with the Reg Guide -- the details of how to do a design, which computer language yor, should use, the details of the best ways to use that computer language are not part of that --

MR. KERR: I am asking for the details of the review process and, specifically, what sort of person would you expect could carry out this review process? Would he have to be a computer expert or --

20 MR. STEWART: The computer person within the NRC 21 would be myself or a person like me.

22 MR. KERR: That's all I wanted to know. 23 MR. STEWART: Part of the process, and am 24 important part of the process will be for all of the 25 utilities and vendors to know what the guidelines are ahead

of time. We do think that 1012 and 880 are good V&V plans, and that's why we are trying to endorse them. They are very proscriptive, and they don't necessarily apply to all situations. Therefore, we do believe that it is appropriate for designers to take elements from these and blend them into an in-house standard that they can specifically apply to a design.

8 MR. KERR: Would you anticipate that a vendor, if 9 you go to that, would submit a topical report indicating how 10 he developed this and you would review that? How would you 11 use that, in other words?

MR. STEWART: They could either submit a topical report saying here's our V&V plan and we could review that and say that it's okay by reviewing topical report. Currently the way most of them are being done is, when a retrofit or design certification submittal is given to us and then we will get it.

18 MR. CARROLL: You are open to the idea of 19 approving a V&V plan for a vendor that he could use time and 20 again through a topical rep rt review.

21 MR. STEWART: Sure. Currently how that is 22 happening is, we will issue an SER for a specific 23 application and then somebody else will come down the road 24 and use that same equipment with the same V&V at their 25 plant, and they will just reference that they have used it

1 before. It simplifies our review.

MR. CARROLL: All right.

3 MR. STEWART: I think that's pretty much all I had to say on that. I did want to mention that in addition to 4 my participation on 7-4-3.2 and then eventually they will 5 pull NRC's participation when it gets to the Reg Guide, John 6 Gallagher is the Chairman of one of the Subcommittee's under 7 880 also on our staff. We are very definitely involved with 8 the standards. We are using them, they are not hard 9 criteria that I can say this is an NRC regulation. 10

[Slide.]

2

11

25

These are review issues that are coming out of the ALWR reviews. Some of them we have pretty good answers for, some of them we don't. We are asking help for research. I will try to go through them as quick as I can here.

Diversity, one of the people asked how the venfors know what our positions are. It is because we go and meet with them and tell them. All of the vendors that we are currently reviewing I think have a pretty good idea of what we are looking at.

21 MR. MICHELSON: Are those positions relative to 22 electronic types of controls that we are talking about now? 23 Are those positions in the standard review plan or somewhere 24 else that someone can read them, on the diversity?

MR. STEWART: The diversity issue I am going to

talk about here is not in the standard review plan.

3

25

MR. MICHELSON: Is it anywhere else one can read to determine what your position might be?

MR. JOYCE: e could probably start with NUREG-0493, defense-in-repth and diversity of integrated protection. There we talked about definitions of diversity, functional diversity, so we do some definitions.

8 MR. MICHELSON: You are saying that there is 9 nothing unique about solid state control system, digital 10 control systems that would change this diversity argument at 11 all?

MR. STEWART: Yes, there is some unique featureson software.

14 MR. MICHELSON: Where do I find your modification 15 of your position back in 0493?

MR. JOYCE: You won't find the modification with respect to software --

MR. MICHELSON: No, I'm talking about hardware. MR. JOYCE: Hardware, you will find it back in there. What we did is, from block concept address common mode failure. That technique went off and the common mode failures of certain blocks -- you look to see what was left. MR. MICHELSON: You are saying you are still using the requirements of 0493 --

MR. JOYCE: That is what was on ---

MR. MICHELSON: -- in diversity area even today. MR. JOYCE: That question goes out on every single

MR. MICHELSON: Is that in the standard review plan that tells the reviewer that's the position and to go back and use it?

application, and we have some commitments.

ä

3

1

7 MR. STEWART: The 0493 is not in the standard 5 review plan. We agree, the standard review plan --

9 MR. MICHELSON: 0493 was written a long time ago. 10 MR. STEWART: We agree the standard review plan 11 needs to be updated in many areas.

MR. KERR: Diversity is independently of howreliable the system is.

MR. STEWART: What we are going to tell you now is 14 our concern in this area and what our intentions are. This 15 16 is not a criteria that the NRC has endorsed with the Reg 17 Guide. IEC 880, for example, does have some discussion of 18 diversity, and we are trying as much as we can to stay in 19 the same kinds of definitions. Our main concern is common mode failure of systems which use software for this 20 particular item. 21

The particular concern that we have is a design error that has not been found and the possibility for that design error to take out all four channels at the same time. MR. KERR: This is diversity in software, you

said?

1

2

MR. CATTON: Hardware.

MR. STEWART: I will get to that, okay? The best 3 answer that we have come up with to date of how to address 16 that common mode problem because we do not think you can 5 prove that the software is 100 percent accurate or 100 6 percent reliable, because we don't think that can be proven. 7 The best answer we have come up with to date is diversity. 8 There is very many types of diversity; diverse programmers, 9 10 diverse languages, CANDU reactor, goes all the way up to systems, languages, programmers, verifiers. There is many 11 12 different methods.

We don't have a set position on which one of those various methods is the answer. All the vendors that we are looking at now have a different answer. They all have some level of diversity, but they are all doing it a different way and addressing it with different criteria. Every one that we have looked so far has a different application.

I don't believe there is a consensus in the industry of the best way to do it, and we go along with that. We do have questions over at research to help us in this area. We do believe that some level of diversity is required. That's our current position.

24 MR. LEWIS: We went through a lot of this 25 conversation long ago, and I remember the example I used at

the time which was the requirement in twin engine airplanes to have a propeller engine on one wing and a jet engine on the other wing for diversity. Everyone laughs, but I really don't see that diversity .n itself is a virtue. 4

5 MR. STEWART: Diversity may or may not be a virtue. I do believe it's a viable solution to common mode 6 software problems. 7

MR. LEWIS: I don't agree.

MR. STEWART: We disagree.

1

2

3

8

9

MR. LEWIS: But this is a subject that we have 10 11 discussed before, not you and I. I won't do it on my time. 12 MR. STEWART: We bring this gut now. I know it's 13 not a published criteria. We wanted to --

14 MR. KERR: You believe this in spite of having 15 looked carefully at the other problems that diversity may introduce? 16

17 MR. STEWART: Like I said, there are many 18 different kinds of diversity.

19 MR. KERR: No, I was saying the other problems 20 that diversity may introduce.

21 MR. JOYCE: Like you talked about on the ATWS issues? 22

23 MR. KERR: On the ATWS issue and other issues. 24 MR. JOYCE: The answer is yes to that. 25 MR. STEWART: We are also even considering the

cost to the vendors of how to address it. Diverse software introduces a tremendous cost burden. That may not be the best way to do it.

1

2

3

MR. KERR: I thought you were requiring it not only of software but of hardware as well. You are not requiring it of --

7 MR. STEWART: That's one of the presible ways of 8 achieving diversity. Trigger research reactors with the new 9 General Atomics Control Council -- which are very simple 10 reactors and don't have he complications of the massive 11 decay heat removal problems, have an analog protection 12 system, hard wired copper analog system, and they have a 13 digital microprocessor software base protection system.

14 MR. CARROLL: Propeller engine and jet engine.
15 MR. STEWART: They both work.

16 MR. CATTON: And, some of them are in the basement 17 of a university.

18 MR. STEWART: Some of them are in basements of 19 universities. I think you are about four or five miles from 20 one right here. It can be done. There are different ways 21 of doing it, and for the sake of time I think we should 22 leave it there and keep going on.

EMI, we talked about yesterday. It's an ongoing developmental area. We are in the process of trying to collect which of the criteria we think should be applied the

most. Expert/AI systems, right now if somebody came in and said they were putting a AI system in a safety function, we don't have review criteria. We would have heartburn on how to do it. It's a research issue right now. I will leave it with that. Nobody has said they are doing that, by the way.

Design certification level of detail, we talked about that a little bit yesterday. In the area of software, the basic question would be at what point in the process do we go and audit the software.

10 MR. MICHELSON: I thought the question was how
11 much detail do you need for certification?

12 MR. STEWART: Yes. How much do I need before 13 certification and how much can I do after certification.

MR. MICHELSON: Do you have any thoughts presently on that point?

16 MR. STEWART: I have many thoughts on the subject 17 --

18 MR. JOYCE: Excuse me.

3

2

3

4

5

MR. MICHELSON: Any positions, maybe would be a better question.

21 MR. JOYCE: The Commission --

22 MR. STEWART: Don't worry, Joe, I am not going to 23 get into it.

24 MR. MICHELSON: I know what the Commission thinks. 25 I was just wondering if you had any thoughts. We will share

them later, no doubt.

5

6

7

9

15

MR. STEWART: We have had many discussions --MR. MICHELSON: We will share them later.

MR. STEWART: -- we have many other groups working on it, ckay? Passive plant criteria I talked about, and we are looking at that. Most of that will come out of the systems requirements. We talked about HVAC. Commercial dedication is simply -- I believe Combustion talked about 8 it. When you use previously existing software there is ways that you can convince yourself that it's good software 10 without necessarily doing the V&V effort yourself. 11

12 MR. MICHELSON: It seems to me that an evolutionary plant also has HVAC questions about protecting 13 14 the solid state equipment and --

MR. STEWART: Yes, the specific --

16 MR. MICHELSON: -- you list that. Is that because it's conventional or something, and this something more --17

MR. STEWART: The HVAC at the evolutionary plants 18 19 is redundant safety grade channels with safety grade backup MR. MICHELSON: Yes, but it all has to be 20 power. ventilated and has to be kept cool. We don't even have 21 standard review plans for chiller systems, for instance, 22 23 which is predominantly what are being used.

MR. STEWART: We do have in the standard review 24 25 plan the requirement that the designer demonstrate that the

0

1

2

3

4

equipment is qualified for its environment.

MR. MICHELSON: That doesn't quite --

MR. STEWART: We do audit that testing and we do audit the installation.

5 MR. MICHELSON: Generally a standard review plan 6 has a little more guidance to the reviewer than that, but if 7 you think that's all you need --

8 MR. STEWART: We gave examples yesterday of where 9 I have gone out and done those audits and found problems 10 with it.

MR. MICHELSON: We are talking now about the evolutionary plants in the same context as the passive plants.

MR. STEWART: The passive plant issue - MR. MICHELSON: No, I am talking about the
 evolutionary HVAC.

17 MR. STEWART: The evolutionary HVAC --

MR. MICHELSON: You don't have a standard review plan. It isn't a plan that exists, and you can't go out and look at it. You have to look at it on paper and have to review the paper, and what kind of a review procedure do you use to look at paper on chilled water systems --

23MR. STEWART: As far as design certification?24MR. MICHELSON: Sure.

25 MR. STEWART: They have committed to provide

safety grade HVAC and keep the electronic within their design envelope.

3 Segmentation, I think you have heard enough discussions on what segmentation means from the vendors. We 4 5 think segmentation is a good idea. I personally think segmentation is a good idea, and will go from there. 6 Separation and independence, what I am talking about here is 7 different from the traditional IEEE 279. What I am talking 8 about here is the data intercommunications was brought up. 9 We think it is important enough. In the new 7-4.3.2 we are 10 attempting not only to put in words about one way 11 dedication, we are trying to put in pictures of exactly what 12 we mean which buffers can talk to which buffers so that it 13 can be easily understood by all people. 14

15 MR. JOYCE: Jim, excuse me. You are going to 16 have to -- because I now have the clock, you are going to 17 have to move. We have another 20 minutes.

MR. STEWART: That hits through -- you can read the last couple of bullets there. I am pretty much at the end anyway, Jim.

[Slide.]

21

1

2

The last issue we had, we have talked lots about standards development. I don't think I need to add anything to that. We have been having a lot of technical exchanges with foreign countries and list a few of them here. We have

gone there, they have come here. We have taken advantage of
 the lessons that we have learned from Canada and France.
 'ohn Gallagher will talk a little bit more about the EDF.

As the research develops and additional guidance we will take advantage of it, and we will put that into the review guides. Are there any guestions?

[No response.]

7

8

9

MR. KERR: Are there questions?

'No response.]

MR. ETS: One of the advantages of being last is that everything has been talked about, so hopefully I can go through it real quickly. My name is Ray Ets, and I am a consultant to the ISCB, the Instrumentation Branch One of the things that we have been addressing here all day is just the software which now implements functions which previously had been done by the logic and analog devices.

Today, what I wanted to focus in on and what I have been asked to focus in on is two things; an overview of the criteria at the working level that has been used for the audits; and, secondly, some of the techniques that the NRC staff has used to implement these criteria.

22 MR. LEWIS: Just out of curiosity, could you tell 23 us what Smartware Associates is?

24 MR. ETS: Smartware Associates is the trade name 25 of my consulting company.

| 1   | MR. LEWIS: I see, so you are Smartware                      |
|-----|-------------------------------------------------------------|
| 2   | Associates?                                                 |
| 3   | MR. ETS: Yes. As my wife calls it, smart ass,               |
| 4   | for short.                                                  |
| 5   | [Laughter.]                                                 |
| 6   | [Slide.]                                                    |
| 7   | MR. ETS: Basically what we have here is a key               |
| 8   | problem with regard to software that there is no consensus  |
| 9   | of what constitutes good software. For example, just today  |
| LO  | we have seen one approach using off the shelf modules and   |
| 11  | another approach using straight line code, another approach |
| 1.2 | using a table driven set of modules. This is a problem in   |

13 evaluating software and the review of that.

The NRC has determined or has made a decision to use verification and validation as the tool with which to evaluate the viability of software for use in safety systems. A key area here is that verification and validation, everyone has talked about it, but this is a separate and parallel process to the development of the software.

With regard to definitions, the verification is determining whether the process -- whether the requirements at one phase of the process have been completely consistently carried over into the next level of the development process, basically providing us a good feeling

or ensuring that there is functional correctness. The second part, the validation, this is a test of the integrated system. This is the hardware/software integration. To see whether it satisfies the initial specification at the highest level, the functional spec or 5 6 system spec -- what the validation test does is provide a 7 good feeling or high level -- I know it's ambiguous but a 8 feeling of confidence that the software in combination with the hardware, the safety system, will in fact perform its 9 10 safety function.

1

3

4

11 MR. LEWIS: Does that mean that in particular -- I 12 hate to harp on definitions, but we have to get them straight. That means that you also are not using the 13 14 strictly formal definitions of verification and validation 15 that are common in the computer science community.

15 MR. ETS: The definitions that we are using are 17 based on 7-4.3.2. We are using that as a basis, and those definitions are consistent with the definitions that are 18 19 presented in IEEE 1012 which is the software community 20 standard on verification and validation.

21 MR. LEWIS: When we speak of the functional representation or the functions that are supposed to be 22 23 represented by the software in validating them, validating a function space is a matter of many inputs to many outputs, 24 25 and real validations means that you check all possible

inputs against all possible outputs and make sure that the software matches what the original functional definitions -- is that overly restrictive?

1

2

3

You spoke in terms of having a good feeling, and I can tell you lots of ways I can get a good feeling but I can't put them on tape.

7 MR. ETS: That was the objective in developing a 8 sufficient level of confidence that from the NRC point of 9 view you can license this software for use in a safety 10 system.

MR. LEWIS: In other words, you do not check the whole function space but just enough to feel good about it? I know what I am --

MR. ETS: Let me clarify the point. Again, the 14 NRC as Joe alluded to, because of limited resources does not 15 do the V&V function. What we are looking for is from the 16 17 vendors point of view, it has the vendor applied the verification and validation process consistently in the 18 development of the software. We would audit that, as I get 19 to later, in taking a representative sample and taking a 20 close look at that. 21

MR. LEWIS: I am just trying to get away from, if you will forgive me, from the excuses. I know the resources are limited. I am trying to get at the definitions. We are not talking about validating the full input space against

the full output space, which what in computer science is called verification and validation. You are doing it selectively. That's all right. There's nothing wrong with doing it selectively.

5 If you do it selectively, then there has to be an 6 informed selectivity in picking that part of the space that 7 you do validate. That's what you are going to talk about?

8 MR. ETS: I will be talking about how the NRC 9 approached it, yes.

MR. LEWIS: Not quite the same thing, but I will wait to hear it.

[Slide.]

1

3

4

12

24

25

MR. ETS: The criteria for the safety software, 13 being a regulatory agency, we have to fall back on the 14 criteria that allows us to say that this is required and 15 needed. You have the ANSI 7-4.3.2 which has been endorsed 16 by Reg Guide 1.152. That has been discussed. We also look 17 in at, in doing our audits in a real sense, we also use the 18 guidelines of 0493 to look at the software system 19 susceptibility to the common mode failures. We have 20 actually done and put together block diagrams of the 21 proposed systems and used that as the basis of our analysis 22 23 for the common mode failures.

We have not neglected the other criteria that we use to provide general guidance and guidelines in IEEE 1012

and IEC 880, the IEC being the European standard for software in nuclear systems.

.....

1

3

4

5

6

7

MR. LEWIS: Incidentally, since Jay hasn't asked you, let me be the heavy on this one. I am taking for granted that you are a really honest to golly computer scientist. With a name like Smartware Associates you must be.

8 MR. ETS: I do have a degree in computer science. 9 I worked under John Carr at the University of Pennsylvania. 10 I did my thesis under Noah. I hope that satisfies you.

MR. LEWIS: I took it for granted, and thought I would put it on the record.

MR. ETS: When we are doing an audit on a proposed 13 software system these are the general subject areas that we 14 15 focus in on. Number one is the process for V&V. Everyone has said that while we need validation and verification --16 here we are specifically looking at does a V&V plant exist 17 and does that plant include the procedures, how are you 18 doing the V&V. We are looking at what the vendor is 19 proposing or has done or has accomplished with regard to 20 V&V. 21

The other key element is that the V&V is independent, as I stressed before. This is a process which occurs in parallel. It was brought out in GE's slide very well, in parallel with the development effort. Here with

regard to independence is the minimum criteria that the staff has looked for is that the first line supervisor for the verification team is different than the first line supervisor for the development, that they are not all under the same hat.

The third point is the application of V&V. They 6 7 have a V&V plan and within the plan they have defined 8 independence, but have they done the validation and verification throughout the software development process. A 9 key thing there is that the V&V has to be formally 10 documented. A very positive note today is that everyone is 11 12 talking about applying V&V at the highest requirements level. In some of the audit reviews that we did in the last 13 three years that has not always been the case from the point 14 15 of view of the vendors.

16

25

1

2

3

4

5

[Slide.]

17 Looking at the requirements documentation, 18 software -- this whole area of software is a documentation problem. Software as such doesn't exist. It exists as 19 20 documentation, be it a requirements document or software 21 design document, a flow chart, a program design language or the code itself. What is key is that all of the development 22 process is fully documented, starting off with the 23 requirements document. 24

Configuration management, although not strictly

stressed, configuration management is a key element. This shows the feedback mechanism that the vendor has in place so that when he uses the classical waterfall software development paradigm that is always presented -- you have the functional requirement, design test, integrate, et cetera.

1

2

3

4

5

6

You find errors in there. How do you feed back
those errors, those lessons learned into the previous levels
and do it in a way that is manageable and coherent. This is
configuration management.

Finally, it is the developmental methodology they 11 use. This really comes down to -- as required by 7-4.3.2 --12 do they have the phases, the development phases fully 13 defined. What is the product of each phase, how is that 14 phase represented and through what documentation. Again, 15 how are the errors handled. How are the errors handled when 16 the product -- when the software or the integrated product 17 is within the design groups somebody finds an error, are 18 they tracked. Also, how are the errors handled when the 19 20 errors are uncovered by the verification team. That is, 21 after the product of that particular phase has been put under configuration management. 22

These are all the areas that we focus in on when we go to do the audit and get the information that we need to be able to find out whether the software can be licensed

and detail of design.

MR. LEWIS: I couldn't help but notice that you keep using the word we, but you are a consultant to NRC. Presumably you mean they.

MR. ETS: We, in the sense that I have been part 5 of the audit team for the past three years. I use the we, 6 meaning myself and the staff in doing these audits. There 7 was a discussion previously about interdisciplinary teams 8 and how you bring them together. I am a computer scientist 9 and have had to learn a lot about nuclear engineering in the 10 last three years. Hopefully, I have taught them a little 11 bit about software engineering. 12

MR. LEWIS: Hopefully.

14

13

1

3

4

[Slide.]

15 MR. ETS: Basically on the audit methodology, as I stressed before, we are doing the vendors -- looking at the 16 vendor V&V program, the pragmatic of the situation that we 17 can't do it ourselves. The first point we do is, we have a 18 series of questions that will look at -- when we get the 19 document, we do a series of questions, the answers to which 20 we will look at. The standard set has been developed by 21 dissecting 7-4.3.2 which currently is our only enforceable 22 criteria and seeing how many of those questions are answered 23 in the initial documentation. What is not answered there 24 25 would be presented as RAI's for the next go around in order

to get the response.

1

8

The questions are particularly important when we are talking about the advanced ALWR designs for which actual software or even a software design for that matter, has not been completed or defined. In the case of retrofit systems where an actual system exists and the software has been written, we use what is called a thread concept.

[Slide.]

9 On the thread concept what we do is, this is a 10 schematic of a typical safety system with the sensor input. 11 You have a conversion calculation block and a trip logic 12 leading to trips. They trip signal out of one or four 13 channels. What we do is, we se act a thread starting from a 14 sensor and going through these blocks all the way to the 15 trip. The selection of this thread, perhaps in answer a 16 little bit to Dr. Lewis' question of what inputs are we 17 looking at, the selection of a thread is a team decision in 18 which you have the electrical engineers who have the nuclear 19 sciencists and computer scientists -- we want a thread that 20 will be as representative and as encompassing as possible 21 with regard to the vendor's development of the safety 22 system.

Each appropriate discipline then takes their extension of the thread and takes a closer look at it. In the present day most commonly the software is used in the

conversion and calculation block, and that's the area where I have focused my efforts.

[Slide.]

1

2

3

4 Looking at the documentation of a supposedly 5 simplified diagram, you find that the documentation peels back the cover of a seemingly simple conversion block and 6 7 exposes the underlying complexity of the software itself. 8 Very often designs are purported to be simplified, and they 9 may be graphically simplified true by having what was 10 previously a logic shown in an one liner replaced by little 11 blocks saying that this is the calculation computer or this 12 is a bi-stable computer or whatever.

What is really happening is that previously 13 14 visible complexity has in fact been pushed down into that 15 graphically neat block, and it is something that we cannot 16 really forget about. There is a lot of complexity hiding in 17 the software program as well as complexity added. The focus 18 is not on whether we do mathematical functions in the 19 software but rather how the software gets the data in, what 20 it uses as a criteria for its various branches because 21 that's the essence of the logic.

We focus in and get a view of what the software is like in the block. What we do is, select one of the modules in there from the block and take a look at that module's life cycle development as represented by the life cycle

document starting from, again, functional requirements, software design, looking at the code where possible, then looking at their results of verification. The document of verification that they did as the software module moved through the various phases of the validation, and also looking at the validation test of the completed subsystem.

7 Basically, looking at this slice will give us a good idea or impression of whether they we horough in 8 9 their design, whether the design covers the aspects, 10 whether the design elements have been v. , whether the 11 validation included exhaustive testing or random testing, 12 what did they use as their test implement. . . . . ly, 13 again, it reduces to selecting one representative piece and 14 examining it thoroughly.

15

25

1

2

3

4

5

6

[Slide.]

16 I am going to jump to my conclusions here. 17 Basically what we have is that V&V is in fact a proven 18 technique. It does uncover errors that otherwise would go 19 uncovered, just for the sake of having somebody else look at 20 it, another team look at it. Outside of the nuclear 21 community I think a very prominent example is the Hubble 22 Space Telescope's mirror in which it was said that 23 independent reviewers can't be used on this because nobody 24 knows this subject as well as we do.

Apparently that argument was bought, but we see

what the results are.

1

3

4

5

6

MR. LEWIS: That was a result of not being able to measure lengths to within a millimeter, and it was partly the kind of smugness you describe. It was also partly a matter of having trouble feeding earlier tests which did show the mistake up through the layers of management so 7 somebody noticed it.

MR. ETS: The other half of it, yes, there was 8 very precisely ground -- it was the correct answer to the 31 wrong question, would be another way of putting it. 10

11 MR. LEWIS: Since you mentioned the Hubble Telescope I will say one thing. I just read the report on it 12 13 a few weeks ago, and the report told me that they used the well known Hindle Sphere test to test the telescope. I asked 14 all my astronomer friends who had never heard of it, I went 15 16 and looked at my library and found one 1931 book which 17 mentioned it. It's not all that well known.

18 MR. ETS: That happens. Basically, it's a proven technique. The other thing is, V&V is technology 19 20 independent. Although in our case we are using requirements, we have the list of guestions and a list of 21 22 database. That is not to say that the application of V&V cannot employ computer-based tools. This would be 23 24 especially true if, for example, a vendor has a well 25 integrated case development facility from which it would be

much more -- we provide much more insight to the NRC reviewer of which tests were done and would lead one to a much more confidence that the software as well as the system had been validated and verified.

1

2

3

4

15 Finally, as I said, the 7-4.3.2 is the handle that we have had to use in doing the reviews. I think that when 6 7 I first saw it I personally expressed an opinion that it was kind of weak, but that's what we have to work on. We have 8 been using throughout the reviews -- we have been using the 9 10 other standards, and in particular 1012 and IEC 880 as 11 guidelines in how better to assess and evaluate the system 12 software of safety systems.

13 Basically, one of the things that I wanted to 14 stress again that I skipped over before, standards -- we 15 keep on coming back to that - standards, in fact, reflect a 16 certain body of accepted opinion within a discipline. 17 Standards are evolutionary. The software engineering is, in 18 fact, an immature discipline compared to some of the others. 19 It is maturing, and I think in evidence of that is looking 20 at the amount of new standards that have been propagated by 21 the IEEE in the last four years.

I think this is where the NRC should look for
additional help and assistance to having their standards
evolve as the consensus of opinion within the software
community as to what constitutes good and reliable software.

Thank you.

1

3

4

6

25

MR. LEWIS: Thank you. I have two questions. One is, would you feel more comfortable if there were another real, live computer scientist working with you?

MR. ETS: To be honest with you, yes.

6 MR. LEWIS: Second question. I am just curious 7 whether you know from your involvement in the trade whether, 8 for example, the 767 which is a highly computerized airplane 9 that is built by Boeing, whether they did any formal V&V 10 during the design of the airplane. I just don't know the 11 answer to that question.

12 MR. ETS: I haven't looked at it specifically, but 13 I understand that Boeing does have a good software 14 development program. Whether how independent the V&V was, I 15 don't know. I can't answer that.

16 MR. LEWIS: I know they have very good people. I 17 wonder if they went through formal V&V on the system.

MR. FIS: I think they had to, to get FAA certification. FAA does have a requirement on the avionic system that is similar to what the DOD requires, where in fact Boeing would have had to have an independent -- the FAA would have an independent contractor team to do the verification and validation in the ultimate sense of independence, really.

MR. LEWIS: Thank you very much.

MR. CARROLL: I have one question. We have heard this morning the Canadian experience.

MR. ETS: Yes, sir.

3

25

MR. CARROLL: I guess I came away with the impression that the licensing authority in Canada and their consultant or consultants had a real hard time accepting the software development and V&V that went into the Darlington design.

9 Would you have the same problems that were 10 described this morning if you were looking at that design in 11 Canada? I just want to get some calibration as to whether 12 they are being more rigorous that we are.

MR. ETS: I am not familiar with the V&V that the ACANDU went through, so I can't answer that.

MR. KERR: My impression is that the goal of the staff as this point is to ascertain that the licensee, applicant or whatever have a reasonable V&V program and that it has been applied in a reasonable way but not to attempt to estimate the reliability that results therefrom. Is that a reasonable conclusion?

21 MR. ETS: You asked about the staff, and I think 22 perhaps the staff should answer it. What I do is, I give my 23 own report and that's input to the staff report. They can 24 take my impressions with a grain of salt, so --

MR. STEWART: Are you talking about placing a

reliability number on the software?

1

23

2 MR. KERR: A number, or some other indication of 3 reliability.

MR. STEWART: We have looked at what some of the vendors have done. I don't believe that there is a consensus in the industry of a way to measure a reliability number for the software. We have it as a research question, to see if that can be done. As of right now, we don't think that there's a way to do that.

10 MR. KERR: I am used to this sort of thing because 11 in the education field what we do is, we have a process and 12 have no idea what the results of that process ultimately 13 are, but we spend a lot of time worrying about the process. 14 I guess that is sort of an analog here.

15 You have faith that the process will produce 16 something that is your goal but you have no way of really 17 measuring that.

MR. STEWART: We are using verification and validation because we believe it's a proven process that gives us a high level -- granted, we can't put a number on it -- a high level of confidence that the software is of good quality and will perform its function as intended.

MR. KERR: Thank you.

24 MR. GALLAGHER: I would just like to say one thing 25 before I start with respect to the international standards.

I am the Chairman of what is called Subcommittee 45-A which is responsible for writing the standards on systems and equipment for the use in nuclear reactors. It is equivalent to the IEEE group that writes these standards, but it is on an international basis. The U.S. is a voting member of this international basis.

1

2

3

4

5

7 We work on a lot more than just 880 and are now in 8 the process of revising 880. It was written in the 1980's, 9 and we realize that it did not address the use of a lot of 10 software tools and more modern methodologies that are now 11 available. I would say that two-thirds of the people who do 12 this actual writing are computer scientists. I am not sure exactly what their degrees are but that's what their work 13 14 is, and they represent the software expertise in their various countries. 15

16 MR. CARROLL: John, it might be interesting for 17 the Committee for you to tell them a little bit more about 18 yourself. You are newly arrived at the NRC --

MR. GALLAGHER: Yes, I just joined the NRC in January the 7th. I spent my years, from 1956 until I left Westinghouse at the end of November, 1990 working in the area of the development of advanced I&C products for the Westinghouse plants, starting with the ion chamber, the protection systems that are in the operating plants, then on to digital technologies. I was the manager of the IIS

project that was spoken about earlier.

| 1971 | project mat and opened about threet.                        |
|------|-------------------------------------------------------------|
| 2    | More recently, I worked in the application of               |
| 3    | digital technologies to the feedwater system. Most recently |
| 4    | looking into retworks and things like that to improve the   |
| 5    | instrumentation system capability with respect to           |
| 6    | incorporating decision making processes.                    |
| 7    | MR. CARROLL: I have known John for a long time,             |
| 8    | and I can say that I have bad to make systems that he has   |
| 9    | developed work in the real world.                           |
| 0    | MR. LEWIS: Have you succeeded?                              |
|      | MR. CARROLL: Ye. we usually figured a way around            |
| 12   | the problems that he created.                               |
| 13   | [Laughter.]                                                 |
| 1.1  | MR. LEWIS: Are there and Accesses that you have             |
| 15   | had in apply neuro-networks to decision making systems of   |
| 16   | the kind that you are talking about?                        |
| 17   | MR. GALLAGHER: We were just getting started into            |
| 18   | this and we found some interesting things. The people that  |
| 19   | I was work! g with                                          |
| e'9  | MR. LEWIS: In other words, the answer is no.                |
| 21   | MR. GALLAGHER: No, the answer is yes, because the           |
| 2.2  | recuple that I was working with were doing this for the     |
| 23   | defense business and were able to see things on the ocean   |
| 2.4  | floor that nobody else could see. They were building and    |
| 23   | shipping neuro-network systems for this purpose, and they   |
|      |                                                             |

worked.

я

MR. LEWIS: I know something about that stuff. [Slide.]

222

MR. GALLAGHER: I would like to just give you a sort of overview of what the French N4 I&C system is, 5 because I think there's a lot of confusion about what is 6 being talked about. This is a view, starting up here at the 7 top with the control room, you hear about Level three, two, 8 9 one zero. Level three is the control room. Below that is the processing and communications system which takes in 10 information from the plant and processes it, sends it up to 11 12 the operators and also develops signals or commands that are 13 then sent down.

Below this is what the level one. It basically is the control level. It is made up of the safety system which is the reactor protection system and the signals that go out to activate the engineered safety feature systems. This was developed by Framatone, and it basically -- when it is in operation it's in the 1,300 megawatt reactors and sustained fairly well.

The area that is being talked about is the P20 I&C system which covers this system. What they are talking about and the problem is how to fix this, and I will get into it a little bit later what the problem is of how to fix this and keep this, and especially keep this. It is also

interesting to note that there is a mimic ar gram and 1 auxiliary panel that go directly down into th .20 system and bypass all the data. 3 4 [Slide.] 5 This is a view --MR. KERR: You are going to tell us what the P20 6 system is supposed to do? 7 MR. GALLAGHER: Yes, I will. The P20 system, this 8 9 is another view of the same thing. The difference on this 10 view is that it shows that down at this level there is a 11 large number of these controllers, the P20 type of 12 controller. There is roughly 13 or so of them that 13 communicate with each other as well as send information up. 14 [Slide.] 15 The control room has work stations. This is the -- it has four of these work stations, two for the operator. 16 17 This is for the senior reactor operator, this is for the 18 reactor operator, this is for the shift supervisor, this is 19 in the technical center. This whole thing is driven by either central computers, the Gould machines, or by 20 21 microprocessors including the computers that serve to 22 operate the work stations. 23 The French spent in this area, the documents show that they spent somewhere on the order of 60 engineers for 24

four years. That's 240 man-years of engineering. They

25

would like to keep this. They spent about \$50 million on that, the problems in this area.

[Slide.]

1

2

3

25

What does the P20 look like and what does it do. A It's a microprocessor based system that has a data highway. 5 It has up at this end what they call the cluster head which 6 7 serves to perform the functions and make the data highway operate. These functions are identified here. There is 8 coupling into the plant networks with are basically ether 9 10 networks. There is a coupling into the interim cluster, so the one cluster which is this group, can talk to another 11 cluster. There is the management of the traffic, things 12 that do calculations. 13

Here is the block that does the maintenance configuration and monitoring. Down at this level are the connections from the data highway out into local networks, local buses, where you pick up the measurements, the analog and digital measurements and perform the controls. This is a distributed digital processing system made up of microprocessors.

21 MR. KERR: I have to ask a stupid question. What 22 is this supposed to do? What is it for?

23 MR. GALLAGHER: It is for the purpose of measuring
 24 process variables and performing control algorithms.

MR. KERR: So, it's a control system.

MR. GALLAGHER: That's right. It's basically a distributed digital processing control system with its own proprietary data highway and being able to hook into an open highway system.

The problem that they ran into -- let me just say 5 one more thing about that. The problem they ran into 6 appears to be three-fold. One of the problems is the large 7 amount of data that is being processed, much larger than 1 previously. If we look here there is some 65,000 digital 9 signals that are being processed. If you look at prior 10 experiences on the 1,300 megawatt reactor there were 5,000 11 digital signals. The earlier estimates on this job were 12 something like 20,000. 13

They are a factor of three above their earlier 14 estimates which they made around 1985. One of the things 15 16 that we have heard is that they significantly increased the number of equipments that were being monitored. For 17 instance, a lot of the valves that are normally manually 18 19 operated and are entered into the -- if there is a computer 20 system and they are entered manually, all that was done now by automatic monitoring. 21

There was a large increase there. Also, and I spoke to the man that wrote the document that put these numbers down. He is going back to check this. This now says that there is something like 200,000 points per second, so

when you add this up you end up with a total digital rate of something like 265,000 plates per second which is much larger than anything that I think of any us have ever seen anybody talk about. The analog is about normal.

[Slide.]

5

Evidently, this is their problem as it shows up. 6 At a very high data rate there is evidence that they took 7 advantage of the advanced computational capabilities within 8 the microprocessors to work around this high data rate, 9 which means they got very clever with their programming. 10 Rather than using some of the rules that you heard people 11 12 talk about earlier, very strict restrictions on how you modularize, they were working on ways where you could part 13 data, how the place you were to go was filled up and where 14 you could go elsewhere and things like that. 15

There was obviously, I think, uncertainty in the 16 17 specifications. The system kept growing, so that as the development of the system was going along the specifications 18 were being changed. There has been some evidence to that 19 fact. The equipment was new. It was new in two ways. When 20 the job originally started out they had planned on doing 21 some of this with a different set of equipment. Over the 22 23 course of time that company was acquired by somebody else, somebody else who now makes the P20 wanted to make 24 25 everything the same product line so that was changed.

1 They made a change to what their original plans 2 were, and they also introduced -- as best as I can find out, 3 there is not a lot of experience in the application of this 4 particular equipment. When you add these three thirds up, you end up with complex programs and poor documentation. 5 Then you say okay, not let's go do and V&V program. You 6 7 cannot do the type of V&V program that you heard about with complex programs and poor documentation. 8

9 I think something very similar to this happened 10 last year on the Mohasha plant in Czechoslovakia. I don't 11 know how many of you know, but there was a plant that was 12 using a distributed digital system there. They had 13 engineered it. As they went through the engineering process 14 they kept changing the specifications and they had some of 15 this problem. When they took it to Kiev where they were supposed to do the V&V program they ended up with the very 16 17 same problem here.

18 I think that one of the lessons here is that while 19 V&V certainly puts rigor into this, what is equally 20 important is a design process that realizes that at the end 21 you have to go through a V&V program. You heard some of the 22 people who recognized this saying that one of the chief 23 roles of the rules that the design process has to follow is to make sure that you end up with a product that you can do 24 verification and validation on. 25

They are now in the process -- they, being the 1 French -- they are in the process of figuring out what they can do, and hopefully make a decision within six months on 3 how they can go to something else and still be able to save 4 most of their control room product. 5 Are there any questions on this? 6 [No response.] 7 MR. GALLAGHER: Okay, thank you. 8 MR. LEWIS: Thank you very much. That finishes 9 our formal set of presentations. Does the Subcommittee have 10 any further questions for any of the speakers? 11 [No response.] 12 MR. LEWIS: In that case, I think we should 13 probably relieve the transcriber of her duties. I don't 14 know what the rule is. We want to have a little bit of 15 conversation around the table. I don't think we need to 16 transcribe it. 17 18 MR. ROTELLA: It's up to you. MR. LEWIS: You are relieved. The formal session 19 is over. 20 [Whereupon, at 4:43 p.m., the meeting concluded.] 21 22 23 24 25

#### REPORTER'S CERTIFICATE

This is to certify that the attached proceedings before the United States Nuclear Regulatory Commission

in the matter of:

NAME OF PROCEEDING:

Joint Meeting -- Computer in Nuclear Plant/Control System

DOCKET NUMBER:

PLACE OF PROCEEDING: Bethesda, Maryland

were held as herein appears, and that this is the original transcript thereof for the file of the United States Nuclear Regulatory Commission taken by me and thereafter reduced to typewriting by me or under the direction of the court reporting company, and that the transcript is a true and accurate record of the foregoing proceedings.

Mary C. Lackin

Official Reporter Ann Riley & Associates, Ltd.







AECL CANDU EACL CANDU

USE OF COMPUTERS IN CANDU STATIONS

FOR

ACRS MEETING BETHESDA, MARYLAND 91-02-06

> N. ICHIYEN (AECL-CANDU)



gimliovda/ichiyen so 91/02/01 Page 21



AECL CANDU EACL CANDU

### BRUCE A FUELLING MACHINE INCIDENT ACTIONS TAKEN

#### HARDWARE:

A number of hardware changes are being made/considered: for example,

- ac power to bridge, carriage and trolley drives are being moved to a non-bypassable bus
- protective output only a permissive, requires another command signal
- addition of hardwired interlock for bridge motion if head is moved forward



gim#/ovds/tchiyan so 91/02/01 Page 22





£ -

AECL CANDU EACL CANDU

### BRUCE A FUELLING MACHINE INCIDENT ACTIONS TAKEN

### SOFTWARE:

A number of software changes are being implemented/considered: for example,

- "bug" fixed
- separate bridge motor and brake software
- hazard analysis completed
- SQA review and changes



#### BRUCE G.S. "A" FUELLLING SCHEME

.



(1)



AECL CANDU EACL CANDU

### BRUCE A FUELLING MACHINE INCIDENT

### EVENT

- the Operator was trying to carry out a manual control function (using the South Extension controls for the South trolley while it was in the CSA), due to an equipment failure (abnormal but normally permissible). Note use of manual control disables the AC protect circuit.
- this was not allowed by the protective computer (because the South trolley was in the CSA)
- this caused the protective computer program to enter a section of code that had a specific "bug" that caused the program to jump to a subroutine that ended up releasing the brakes to the Unit 4 fuelling machine (last access of the subroutine was from a Unit 4 operation to release the breaks)
- it so happened that the Unit 4 machine was actually latched onto a fuel channel and this resulted in the machine coasting down and causing a leak from that channel
- manual reactor shutdown and cooldown proceeded without further incident

gimikovds/ichiyen so 91/02/01 Page 18



#### COMPUTER CONFIGURATION

0

加強的



The Martin of



AECL CANDU EACL CANDU

### **OVERALL STATUS**

- Safety critical high level standard just issued for trial use.
- Sub-tier standards, procedures, guidelines, to be completed by end of 1991.
- Decision on safety critical methodologies, configurations by mid–1991.
- Other category standards work also now underway.







AECL CANDU EACL CANDU

### BRUCE A FUELLING MACHINE INCIDENT

### BACKGROUND

- 4 unit station (4 x 848 MWe)
- on-power re-fuelling using computers
- 3 fuelling subsystems for the 4 units
  - each subsystem has 2 fuelling machine heads carried on a mobile trolley system
  - 2 sets of tracks for the trolley
  - once trolley is positioned at the selected unit, each fuelling machine is raised to the reactor face elevation by a carriage supported by a bridge structure



AECL CANDU EACL CANDU

### THEORY OF STATISTICALLY VALID RANDOM TESTING

Hardware reliability is defined as the probability that a failure occurs given a demand.

Software reliability can be defined as the probability that the software will encounter a demand which causes it to fail.

- distribution of inputs presented to the software (including time histories) must duplicate the real demand distribution (operating profile)
- an appropriate number of tests must be performed to gain the required statistical confidence.







AECL CANDU EACL CANDU

### FUNDAMENTAL PRINCIPLES OF HIGH LEVEL STANDARD FOR SAFETY CRITICAL SOFTWARE

5. Hazard Analysis

 identify any failure modes that may lead to an unsafe action and thus either eliminate them or, where possible, ensure that the failure mode can be detected and the system put into a safe state.



AECL CANDU EACL CANDU

### FUNDAMENTAL PRINCIPLES OF HIGH LEVEL STANDARD FOR SAFETY CRITICAL SOFTWARE

- 4. Both systematic and random testing must be performed.
  - white box
  - black box
  - randomly generated (statistically valid random testing)



gimsiovda/ichiyen so 91/02/01 Page 12



AECL CANDU EACL CANDU

### STATISTICALLY VALID RANDOM TESTING

- random testing is seen as complementary to systematic testing.
- provides added/independent confidence in the robustness, correctness, trustworthiness and reliability of the software.
  - improves effectiveness of testing by compensating for false assumptions and biases of the tester.
  - can be defined simply as testing using inputs selected at random from some known distribution.

gimli/ovdi/ichiyen ko 91/02/01 Page 13

## FUNDAMENTAL PRINCIPLES OF HIGH LEVEL STANDARD FOR

AECL EACL

AECL CANDU EACL CANDU

2. The outputs from each development process must be reviewed to verify they comply with the requirements specified in the inputs to that process.

SAFETY CRITICAL SOFTWARE

 those outputs using mathematical functions must be systematically verified against the inputs using mathematical verification techniques or rigourous arguments of correctness.



gimiliovda/ictuyen so 91/02/01 Page 10



AECL CANDU EACL CANDU

### FUNDAMENTAL PRINCIPLES OF HIGH LEVEL STANDARD FOR SAFETY CRITICAL SOFTWARE

- Software structure must be based on "information hiding" concepts.
  - the interface to each software module is designed to reveal as little as possible about module's internal workings.
  - as a result, if it is necessary to change the functions internal to one module, the resulting propagation of changes to other modules is minimized (easier to maintain)
  - results in loosely coupled modules and hence is easier to review as well.

gimil/ovds/ichiyen so 91/02/01 Page 11



AECL CANDU EACL CANDU

### HIGH LEVEL STANDARDS

- defines requirements on the software engineering process
- defines outputs of that process
- defines requirements to be met by each output
- specified as measurable as possible but does not unnecessarily constrain the methodology to produce the output.



gimil/ovds/lchiyen sc 91/02/01 Page 6



### FUNDAMENTAL PRINCIPLES OF HIGH LEVEL STANDARD FOR SAFETY CRITICAL SOFTWARE

- 1. Documentation must describe the required behaviour of the software using mathematical functions written in a notation that has clearly defined syntax and semantics.
  - more complete requirements (domain coverage can be checked)
  - requirements can be uniquely interpreted
  - facilitates use of mathematical verification techniques that allow the design to be transformed into a mathematical function form for comparison to requirements directly.

gimiliovda/ichiyen ao 91/02/01 Page 9



AECL CANDU EACL CANDU

### REAL ISSUE

- lack of an accepted Jefinition of the acceptable quality that the software had to have in order to be approved by the AECB.
- our objective is to create a set of standards, procedures and guidelines for software engineering over all categories of software.
  - our first task is the creation of this set for safety critical software.



gimikovds/ichiyen so 91/02/01 Paga 6



AECL CANDU EACL CANDU

### STANDARDS FRAMEWORK

### 4 PARTS:

- 1. Categorization criteria
- 2. High level standard
- 3. Sets of standards, procedures, guidelines
- 4. Pre-developed software qualification

gimlivovds/ichryen so 91/02/01 Page ?

AECL CANDU EACL CANDU

### ISSUES FROM DARLINGTON A SHUTDOWN SYSTEM SOFTWARE LICENSING EXPERIENCE

- Drawn out licensing process
  - from 1985 (start of dialogue)
     to 1990 (approved for full power operation)
- Issues kept changing
- Different set of issues, from AECB-hired consultant, Dr. Parnas (1987)







AECL CANDU EACL CANDU

## ACTIONS TAKEN

- Back engineering a software design specification using mathematical notation ("formal").
- Walkthrough process to verify that the code met the formal software design specifications.
  - involved creating new techniques never used before like creating Program Function tables from the code and comparing to the formal specifications.
- 3. Random testing program.



AECL CANDU EACL CANDU

# CANDU 3 EVOLUTION OF DIGITAL SYSTEMS

# Control

- from redundant central" type system to a true distributed control system architecture
  - geographic distribution
  - closing the loop over the highway



## **Operator Interface**

 separate PDS (Plant Display System) from control computers for operator interface

# Safety Systems

- higher degree of computerization (i.e. more systems)
- evolutionary software practices



riniliovds/ichiyen io i1/02/01 Page 3





# SOFTWARE ENGINEERING PROCESS

- concentrate on safety critical software category
- will describe the overall approach for producing reliable software
  - will discuss parts played by V&V, software reliability measures, etc.



AECL CANDU EACL CANDU

# DARLINGTON A COMPUTER SYSTEMS

# DCC3

- reactor and process control (device logic control done in PLCs (OH-180s))
- operator interface (MCR)
- alarm annunciation
- data logging

# **Fuel Handling Control**

separate computers for on-line fuelling control

## Safety Systems

- Fully computerized shutdown systems (SDS-1, and SDS-2)
  - trip functions
  - operator displays
  - operator aided testing
  - monitoring of important SDS variables
- Emergency Core Cooling System (ECCS)
  - use of PLCs (OH-180s) for discrete logic control



gimtPovde/ichiyen so 91/02/01 Page 1



AECL CANDU EACL CANDU

# CANDU 3

Next generation CANDU after Darlington

ø

### Features

- 35 month construction schedule
- modular design/contruction techniques
- 100 year life
- replaceable fuel channels
- all equipment can be replaced within a 90 day outage (including st. gen, etc.)

giml/ovds/ichiyen so §1/02/01 Page 2



AECL CANDU EACL CANDU

# OUTLINE

- 1. Background
  - historical
  - Darlington station
- 2. Future applications Evolution Of Digital Systems
- Software Engineering Process (concentrating on safety critical software)
  - Darlington shutdown system licensing experience
  - lessons learned
  - direction for future (use of standards)
- 4. Fundamental Principles of High Level Standard for Safety Critical Software
- 5. Overall Status
- 6. Bruce Fuelling Machine Incident





Identification / Location Initials 90/00/00 2/age 1

.

Slide179/1.0.1

6 . . . .

NUPLEX 80+

2

## SOFTWARE RELIABILITY

KEN SCAROLA

MANAGER, ADVANCED CONTROL COMPLEX ENGINEERING

## NUPLEX 80+ SOFTWARE RELIABILITY

- O DETERMINISTIC DESIGNS
- O FIELD PROVEN EXECUTIVE SOFTWARE
- o SOFTWARE DESIGN PROCESS AND DOCUMENTATION
- O SEGMENTATION
- O DIVERSITY
- o SABOTAGE PROTECTION
- O EXPERIENCE

#### DETERMINISTIC DESIGNS

- O INPUTS ARE SCANNED AND PROCESSED ON A CONTINUOUS CYCLE REGARDLESS OF STATUS CHANGE.
- O SIMILARLY, OUTPUTS ARE UPDATED ON A CONTINUOUS CYCLE
- O PROGRAMS ARE EXECUTED ON A CONTINUOUS BASIS

NO MULTI-TASKING NO INTERRUPTS

PROGRAMMABLE LOGIC CONTROLLERS IN SAFETY SYSTEMS EXECUTE PROGRAMS WITHOUT BRANCHING.

#### FIELD PROVEN EXECUTIVE SOFTWARE

O ALL SOFTWARE BASED SYSTEMS ARE COMPOSED OF COMMERCIALLY AVAILABLE PRODUCTS WITH PROVEN INDUSTRIAL AND UTILITY PERFORMANCE:

- PROGRAMMABLE LOGIC CONTROLLERS
- PC-AT COMPUTERS
- MINI-COMPUTER
- CRT WORKSTATIONS
- ELECTRO-LUMINESCENT DISPLAY WORKSTATIONS
- COPPER AND FIBER-OPTIC COMMUNICATION NETWORKS
- o MOST OF THESE ARE USED IN NUCLEAR APPLICATIONS
- o FIELD PROVEN EXECUTIVE SOFTWARE INCLUDES:

1/O HANDLING ARITHMETIC FUNCTION BLOCKS COMMUNICATION DRIVERS FAILURE DETECTION

#### SOFTWARE DESIGN PROCESS AND DOCUMENTATION

- O EARLY FOCUS ON ESTABLISHING CORRECT REQUIREMENTS AND SPECIFICATIONS:
  - HARDWARE/SOFTWARE
  - FUNCTIONAL DECOMPOSITION
- o STANDARD CODING AND DOCUMENTATION TECHNIQUES
- O THOROUGH VERIFICATION AND VALIDATION PROGRAM
- O EXTENSIVE CONFIGURATION CONTROLS
  - PURCHASED SOFTWARE
  - CUSTOM SOFTWARE



•

Figure 2-0 V&V Reviewers

|                                   | Non-Safety            | Important<br>to<br>Safety or<br>Availability | Safety   |
|-----------------------------------|-----------------------|----------------------------------------------|----------|
| Functional<br>Requirements        | RT/DT<br>or<br>DT/YT  | RT/DT<br>or<br>RT/VT                         | RT/DT&VT |
| System/Software<br>Description    | DT/RT<br>Dr<br>DT/VT  | DT/RT<br>or<br>DT/VT                         | DT/RT&VT |
| System/Software<br>Specifications | DT/DT                 | DT/RT<br>or<br>DT/VT                         | DT/VT    |
| System/Software<br>Implementation | DT/DT                 | DT/VT                                        | DT/VT    |
| Module<br>Test Proc               | DT/DT                 | DT/VT                                        | DT/VT    |
| Module<br>Testing                 | DT/DT                 | DT/VT                                        | DT/VT    |
| System Test<br>Procedure          | DT/VT<br>or<br>DT/RT  | DT/VT<br>or<br>DT/RT                         | RT/VT    |
| System Testing                    | RT/DT<br>or<br>VT/DT  | RT/DT<br>or<br>VT/DT                         | RT/VT    |
| Кеу ХХ/ҮҮ                         | XX = Orig<br>YY = Rev |                                              |          |

DT - Design Team RT = Requirement Team VT = V&V Team

Safety Sys: PPS, E-CCS, PAMI Important Sys: DIAS, P-CCS, PCS Non-Safety: DPS, NIMS, SOE

-

#### SEGMENTATION

- O BREAKS SYSTEM FUNCTIONS INTO SMALLER UNITS EXECUTING ON SEPARATE PROCESSORS.
- O ADDS A LEVEL OF DEFENSE AGAINST COMMON MODE FAILURES BY INTRODUCING:
  - FUNCTIONAL DIFFERENCES
  - CODING DIFFERENCES
  - EXECUTION TIME DIFFERENCES
  - HARDWARE DIFFERENCE

O PARTITIONS MORE PROBABLE FAILURES INTO MANAGEABLE UNITS

| \ TRIPS                                |      | SG2<br>Lo P |      |   |   |   |     |     |   |     |   | DNBR     |     | VOPT | CON |
|----------------------------------------|------|-------------|------|---|---|---|-----|-----|---|-----|---|----------|-----|------|-----|
| TRANSIENTS                             | 1    | 1           |      |   |   |   |     | 1   |   | (   |   | 1        |     |      | I   |
| FU temp decrease                       | 1 1* | 2*          | <br> |   |   |   |     |     |   |     |   | ICPC*    |     | 1    |     |
| FV flow increase                       | 1    |             | <br> |   |   |   | 1 1 | 1 2 |   |     |   | CPC      | CPC |      | Į   |
| Main steam flow<br>increase            |      | 2           |      |   |   |   |     |     |   |     |   | CPC      |     | 1    |     |
| IOSGADV                                | 1 1  | 2           |      |   |   |   |     | 1   |   |     |   | CPC      |     |      |     |
| SLB 1/0 containment                    | 11   | 1 2         | <br> |   |   |   |     |     |   |     |   | ICPC     |     |      | Į   |
| LOL                                    | 1    |             | <br> |   |   |   |     |     |   | 1,2 |   |          |     |      |     |
| TTRIP                                  |      |             | <br> |   |   |   |     |     |   | 1,2 |   |          |     |      | ļ   |
| Lass of cond vacuum                    | 1    |             | <br> |   |   |   | 1   | 1 2 |   | 1,2 |   |          |     |      |     |
| MSIV clesure                           |      |             |      |   |   |   |     |     |   | 1,2 |   |          |     |      |     |
| Loss of non-emerg<br>AC to station aux |      |             |      |   |   |   |     |     |   |     |   | CPC      |     |      |     |
| Loss of norm FW flo                    |      |             | 1    | 2 |   |   |     |     |   | 1,2 |   |          |     |      |     |
| LOSE of RC flow                        |      |             |      |   |   |   |     |     |   |     |   | CPC      |     |      |     |
| 1 RCP seizure                          |      |             |      |   |   |   |     |     |   |     |   | CPC      |     |      |     |
| RCP shaft break                        |      |             |      |   | 1 | 2 |     |     |   |     |   | 1        |     |      |     |
| Uncent CEA withdraw<br>at low pwr      |      |             |      |   |   |   |     |     |   | 1,2 |   | CPC      | CPC | 1    |     |
| at power                               |      |             |      |   |   |   |     | 1   |   |     |   | CPC      |     |      |     |
| 1 f/l CEA drop                         |      |             |      |   |   |   |     |     |   |     |   | 1        |     |      |     |
| s/u of inactive RCP                    |      |             |      |   |   |   |     | -   |   |     |   | 1        |     |      |     |
| Core flow rate incr                    |      |             |      |   |   |   |     | 1   |   |     |   | 1        |     |      |     |
| Inadvert deboration                    |      |             | <br> |   |   |   |     |     |   | 1,2 | 2 | [CPC     | CPC | 1    |     |
| CEA ejection                           |      |             | <br> |   |   |   |     |     |   |     |   | 1        |     | 1 1  | ļ   |
| CVCS melfunction                       |      |             | <br> |   |   |   |     |     |   | 1,2 |   | <b>!</b> |     |      | ļ   |
| SG tube rupture                        |      |             |      |   |   |   |     | 1   |   |     |   | [CPC     |     |      | 1   |
| LOCA                                   | 1    |             |      |   |   |   |     | 1   | 2 |     | 1 | CPC      | 1   | 1    | !   |
| 1* · BISTABLE PROCESS                  |      |             |      |   |   |   |     |     |   |     |   |          |     |      |     |

#### TABLE 1

System 80+ RT Function vs Trip Processor Assignment

Document No. NPX80-1C-SD560 Rev. 00

#### DIVERSITY

- O OPERATING PLANTS EXHIBIT SIGNIFICANT DIVERSITY. NOT BY DESIGN, BUT RATHER BY THE RESULT OF ANALOG TECHNOLOGY AND CONTRACTING OF NUMEROUS SUPPLIERS.
- O THIS EXCESS OF DIVERSITY MAY ACTUALLY DETRACT FROM PLANT SAFETY DUE TO:

PERSONNEL TRAINING REPAIR TIMES SPACE PARTS AVAILABILITY

- O NUPLEX 80+ MAXIMIZES STANDARDIZATION WHILE MAINTAINING A MINIMUM LEVEL OF DIVERSITY TO OFFER THE FINAL DEFENSE AGAINST COMMODE FAILURES
- O DIVERSITY IS EMPLOYED WHERE SOFTWARE BASED COMPONENTS ARE UTILIZED.

SLIDE170.DOC

#### NUPLEX 80+ DIVERSITY

NUPLEX 80+ MAXIMIZES STANDARDIZATION WHILE 0 MAINTAINING DIVERSITY IN KEY AREAS TO ENSURE THAT THE DEFENSE IN-DEPTH CONCEPT IS NOT COMPROMISED

o NUPLEX 80+ DIVERSITY:

FUNCTION DESIGN TYPE 1 DESIGN TYPE 2

REACTOR TRIP PLANT

PROTECTION SYSTEM

ALTERNATE REACTOR TRIP WITHIN PROCESS-CCS

FLUID SYSTEM EMERGENCY NORMAL CONTROLS

SUCCESS PATHS (E.G., EMERGENCY FEEDWATER) VIA FEEDWATER) VIA PROCESS-CCS ESF-CCS

SUCCESS PATHS (E.G., MAIN

REACTIVITY CONTROLS

EMERGENCY BORATION VIA ESF-CCS

NORMAL CEA CONTROL - VIA POWER CONTROL SYSTEM

INDICATION

NUPLEX80+

ALARM AND ALARM TILES CRT DISPLAYS -AND DISCRETE VIA DPS INDICATORS -VIA DIAS

#### SABOTAGE PROTECTION

- O CONFIGURATION CONTROL DURING DESIGN, CONSTRUCTION AND OPERATION
- O GEOGRAPHIC SEPARATION OF SAFETY CHANNELS
- O ROOM AND EQUIPMENT ACCESS SECURITY, ALARMS
- O CONTINUOUS PROGRAM MEMORY CHECKSUM REPORTING TO THE DATA PROCESSING SYSTEM



#### NUPLEX 80+ SEPARATION AND ISOLATION

-



and the second

2

#### EXPERIENCE

- ABB/C-E HAS BEEN DESIGNING SOFTWARE FOR SAFETY SYSTEMS SINCE MID-1970'S.
- o THIS INCLUDES SOFTWARE FOR:

CORE PROTECTION CALCULATORS SUBCOOLED MARGIN MONITOR QUALIFIED SAFETY PARAMETER DISPLAY SYSTEM INADEQUATE CORE COOLING MONITORING SYSTEM

- O FOR CPC'S THERE HAVE BEEN 826 SOFTWARE CHANGE REQUESTS SINCE INITIAL INSTALLATION AT ANO-2
- 0 99% ARE FUNCTIONAL DESIGN CHANGES, NOT SOFTWARE BUGS
- O THERE HAVE BEEN NO FAILURE TO TRIP ERRORS



## PRESENTATION TO THE ACRS JOINT SUBCOMMITTEES

ON

## COMPUTERS IN NUCLEAR POWER PLANT OPERATIONS

AND

INSTRUMENTATION AND CONTROL SYSTEMS

FEBRUARY 6, 1991

J. B. REID





# **OVERVIEW AND OBJECTIVES**

1.8.760

SLOUBSIGGOSOGOSOFT EGGE SLOUBSIGGOSOFT EGGE SLOUBSIGGOSOFT EGGE SLOUBSIGGOSOFT EGGE SLOUBSIGGOSOFT SLOUBSIG SLOUBS SLOUBSIG SLOUBSIG SLOUBSIG SLOUBSIG SLOUBSIG 1990's BORNESS CONCERNESS CON SHIBISKS IDINO DURANI WANT (w) Instrumentation & Computer Product Evolution PIN 1980's Eagle Farnily WDPF POSSBODICIODIN IN B SUBISIO ABINDOW 19DELEW GSBD EIED JUELD Plant Computer Series 1970's 7300 SUBSES POLIDO IDINOLUO CONDENCIONAL SSED CONDENCIONAL SSED CONDENCIONAL SSED CONDENCIONAL SSED SOJARO DUSISLICIT DI DIOLO 1960's 7100 P-50 Microcomputer Information Processing Control Control Control Analog Digital 122

.

.

\* 1

欎

.

I&C ARCHITECTURE

.

\*\*\* ....

44. 19



•

ł

-

٢

.

à

9

.

**↓**17

## Westinghouse Electric Corporation

#### OBJECTIVES OF

WESTINGHOUSE I&C SYSTEM DESIGNS

USE DIGITAL TECHNOLOGY TO PROVIDE IMPROVEMENTS IN:

- COST
- SCHEDULE
- CONSTRUCTABILITY
- MAINTAINABILITY
- OPERABILITY
- FLEXIBILITY
- RELIABILITY
- LICENSEABILITY

INTEGRATE AND UNIFY THE YOTAL PLANT I&C SYSTEMS



J. B. Finid

File:obj

# **1&C** Architecture Characteristics

- Modular Design
- Digital
- High Performance where necessary
- Distributed Processing
- Data Highway and Data Link Communications
- Physically Distributable
- Hierarchial Architecture for Communication and Data Transfer
- Fiber Optic Cabling
- Fault-Tolerant Design
- Clean separation within safety equipment and between safety and non-safety equipment
- Improved Control and Protection Algorithms
- Information Presentation in Context with Navigational Aids

126



146

Westinghouse Electric Corporation

# SOFTWARE DESIGN PROCESS

J. S. RED

File:ACRS08.WK1





SWB-PPS-0003 Rev. B

Westinghouse Proprietary

#### PPS SYSTEM SPEC



Figure 5-17: Integrated Protection Cabinet Front Layout

CONFIGURATION SPECIFICATION

Page 5-91 of 5-97



12

- In each subsystem, the processing is distributed over multiple processors, typically one host processor and several slave processors.
- The most processing intensive I/O functions have been moved to slave processors which communicate with the host via shared memory.

# Slave Processors

Intelligent A/D (IAD): Digital filtering of analog inputs.

]

- Datalink Controller (DLC): Point-to-point simplex datalinks.
- Data Highway Controller (DHC): Multipoint data highway.



AL 11 A FUNCTION ANTON PRODUCTS A TRUE OUTPUT AL AL A FLAKTION MATCH PRODUCES A TRUE OUTPUT A FLACTION MATCH PRODUCES A TRUE GUTPUT CUTPUT IS OFFICED TO FRUE WILL THE DAYL THE RELATION A DEFINITE INTERVISION INFE RELATION OUTPUT DAMAGES TO THE INVERTIGATE.V OUTPUT IS OPPAGED TO THE AFTER THE INPUT HAS BEEN THE FOR A LETTING INFORTIONE. THE BELFUT OUTPUT OPPAGES TO FRAGE INVESTIGATELY WEN 2 OR MORE INSULTS FREE TRUE WHEN 2 OR HOME DUPLITS HAVE THE UPON RECEIVING A TRUE INFOL UPON RECEIVING R FR.S. INPUT PERMISSINE STRINE RCTURFICK STRTUS BYPRES STRICE VETO STRTUS THIP STRICE INDICATOR LANS THE TELAY OFF THE BLAN ON 2 OUT OF 4 2 007 05 3 NO GA TD 047 -2.4 -Ģ Ģ 0 4 FOR RELEADERT STORES CONTREFERENT VARIOUS SUBBRIETEN. THE COMMUNICATION SUBSISTEN SHALL STLEFT THE STORE WITH THE HIGHEST PRIORITY PUN COOD CARLITY TO BE SERVICE THE DETERMENT SYSTEMS. THE ORDER OF RELOADED IS CONT, RT1, RT2, EST1, PAN EST2, REDITIONEL INFORMATION IS SERVICED RECARD, WITH THE RELECTED STORE, TO INDUCTE THE CARLITY PUN THE SUBSISTEM ORIGIN OF THE STLEFTED STORE. ST THERE WERE INDUIT THE THE ST CHERTER THAN THE VARIABLE SETPOINT SIGNER CONDITIONING FIRSTION FOR OF OCING FOR COMPLEMENTING INPUT COMPUTER DIGITRE. OUTPUT FUNCTION COMPATER DIGITR. TRIP OUTPUT FUSCTION AITH REPORTON. 2F TRIP IS OUTPUT FRED REPORTON DOES NOT LOSS OF IRFUSEN CRUESS FUNCTION TO PASSAFE TRIP STRTE. FLACTION ANION PERSORNE STORM. UNLIDITY OFFICE THROUGH THE USE OF REFERENCY AND TOGGE STORMS. COMPATIEN ISOLATED DIGITAL IMPUT ANY UNSTREPH FIELD SIGNES. C SPECIFIES THAT THE STATE WILL C-CONFIGNERE, T-TREE, F-FR.SE FORGE BARD FEER DARK ERROR AGRE, DEPUTIN IS NOT UPDATED. SECIFIED BY THE LOVER LETTER IF BED OUR ITY IS PRESENT FOR RESURE THE CONFIGURED STRIFE, CTRAFE, FRIGHE, PREVIOUS STRIFE, OR DIRPUT STRIFE). BISTHREE FOR WHICH THE OUTPUT IMPLIE OF CONTRICT OPEN-CLOSED THE OUTPUT PESSIVES THE STATE COMPUTER DIGITRE INPUT TRAUSIE INTON. 5 TRAUMAN ISOURCE DIGITAL INPUT STORES DIGITH CUTPUT OUR ITY OFOR CONTRACT INFINIT TIGITH. INPUT TRIP OUTPUT -8 -80 - 11 . 2 - 11 THE INPUT PLACE TRAIN IS PARKED ON TO THE DUTPUT ONLY MADY THE COMPANY SIGNER IS THEE FUNCTION ANTON PRODUCES IN TRUE DUTING FUNCTION MATCH PRODUCES A TRUE OUTPUT FLACTION MATCH PRODUCES A THREE OUTHUT MADEN THE INPUT IS FRACE BESTREALE FOR WHICH THE OUTPUT IS TRAKE WHEN THE LARUT UNPOLTABLE IS GREATER THEN THE SETERIONT BISTREALE FOR MATCH THE OUTPUT IS TRAKE MARY THE INPUT WARTHELE IS LESS THAN THE SETFOLMT CUTPUT ONLY MEN EVERY INPUT IS THE SPEC HS PROVE MITH HYSTORESTS STAR HE HEOR WITH HYSTERESIS 10.00 81 LOGIC UNIT CONTROL OST I BISTREE E BISTRE E 3 MHISIE 057-CURRENT -1 1 -o kan 1020 102 8 INFIG

HAZON MEDTY THE NOTING ( SEE THRLE ). THE FOLLOWING DECOMPTION PRODUCED BY THIS FUNCTION HER IN DEVICED VIA THE TEST POINT HARN 2 OR MORE INAUTS HAR TRUE THIS FUNCTION ALSO RECEIVES BRIPHES INPUTS R FUNCTION WHICH PRODUCES A TRUE OUTPUT A MEN THO OR NOW BRITESED EVIST R TITLE BRINGS REAM IS PRODUCT INDORTED WITH R SAFEX RS LISTED. R. THIP ENTRE STORE
 C. REDICTOR TRIP STORE
 D. GLOBR. TRIP STORE

S OLT OF A BURNEY

E

B'a 21-11

|       | NAMER O | F RESOCIATED |         |         |
|-------|---------|--------------|---------|---------|
|       | 1       | RV.          | e       | *       |
| ortho | 5/3     | 2/1          | OUTPUT- | CUTPUT- |

# RETENTIVE HEHORY

EVENT CRASING R RESIDENT PLOSES OF POWER, OUTPUT STRITE LOON THE OCCURRENCE OF 9N THE HEMORY FLACTION S R DEPOSY FIRETION MACH RETRING ITS HAUR RESTRET, ETC.

INTERPORTION PRO PLA SARROLENT DETEMPRITIONS OF OUTPUT STRITES PRE PERE PRODUCTION TO THE TIGGLE SHORE FOR THE OFT PETUREN PEDUDON RETURNE FROM THE EVENT CRUSTING IN RESTRATE LISTING. THE RETAINED PREVIOUS OUTING STRATE IN THE SIM REDAMATION OF THE NEW OUTPUT STRIE. COMMON FLAGTION.

# SFT RETURN NEMORY

STR R HENORY FILMETTON MATCH TACE NOT RETRIN TTS OUTHAIT STRIFE UPON THE OCCURRENCE OF FLANT CRAETING & RESTRIFF 0.0055 OF PONER. EVENT CRISING A RESIDENT (LOSS OF PORCH,

RETLARS RESUME LEVEL OF SELVER RESUME TO THE RETLARS FROM THE EDUCTORISING A RESUMENT THE OFF STRIFT OF FRAME. LURING OPENHILLO THE TRUE BELOW HEALE HESTRAT, ETC. J. THE HEHORY FUNCTION

| 4 | S R PREVIOUS OUTPUT OUTPUT OUTPUT (SET) (SET) STRIFE STRIFE | STIPPLE |
|---|-------------------------------------------------------------|---------|
| X | RAG                                                         | RAS     |
| 3 | TR.E                                                        | TRE     |
| Ж | DO NOT CHAE                                                 | FRE     |
| 6 | DO NOT ORSE                                                 | TRE     |
|   | 20 NOT OTHE                                                 | RE      |

HETHER RIV OF THE RELUNDANT STORES HED BED Q. 4.174, HED HETHER RAV COMPRESSION FRILIDES EXIST

WINEN WINDRY STORES WITH 3000 OVER ITY

STEDADL B 1PC SYMBOLS SHE SHE SHE

Rev: B 2495098 [SH: 03 Hestingnouse Proprietary

X

| UNCTION HATCH AND A THE RETURNING STORE, HE INDICATED RELOW<br>& SIGNE TO R<br>RECORDED STORE RECORDER STORE RECORDER                       | ACTION ANTON<br>A STORE TO PAR<br>A STORE TO PAR                                                                                                                                                      | SIGNE CONTINUE FUNCTION MICH CONDITIS                                                                                                                                   | OUTHAIT STORM.<br>DISTRIBUTED<br>R CONFUNICATION<br>SQUARE ROOT                                                                                                                                                                                                                                                      | ISOURTING FUNCTION                       |                                                                                                                                                                  | Υ.<br>Έ                                                        | 1. PPP LOGIC IIPAGONES SOB-51-714832 REV 3 29-490-1998 | SITERII B IPC<br>SPERCE<br>Bave B 2450930 Steller<br>Have R 2450930 Steller |
|---------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------------------|-----------------------------------------------------------------------------|
| E TO I<br>a store constrictions function which<br>E/I +-28 wh store.                                                                        | E TO 0<br>In STORM CONDITIONING FUNCTION MATCH<br>E COMMOTIS AN ELECTRICH. STORM TO AN<br>E O OPTICH, STORM                                                                                           | a 10 a                                                                                                                                                                  | TEST POINT<br>TEST POINT<br>TO ALTO TESTER OR CONTINUENTED<br>TEST INVECTION<br>TEST INVECTION<br>FOR ALTO TESTER                                                                                                                                                                                                    | E a J                                    | ALTIPOS A                                                                                                                                                        | LAPP DRIVER                                                    |                                                        |                                                                             |
| HARDALIAECO INFORMATION TRANSMISSION<br>C = = = = = = ARRA.05<br>LOGIC                                                                      | NOISSIDERAMEL NOIDAMOUNT RAALLOS                                                                                                                                                                      | LOGIC<br>HATTRE<br>MATTRE AD/06 LOGICS                                                                                                                                  | HALTIPLOGID BETHEN PROS.<br>THE NUMBER INDICATES THE PROS.<br>THE NUMBER INDICATES THE PROS.<br>CONNECTION POINT ON THE PROS.                                                                                                                                                                                        | THERE I THE PART I THE STREET            | <ul> <li>FUNCTION MATCH RECEIVES</li> <li>INFORMATION MATCH RECEIVES</li> <li>INFORMATION MATCHERED</li> <li>Received Stream</li> <li>Received Stream</li> </ul> | HALL BIPPES                                                    | ×                                                      |                                                                             |
| R TO E<br>R TO E STORM CONDITIONALING FLANCTION WATCH<br>R CONNERTS A RESISTENCE TO A VOLTINGE<br>R/F<br>STORM, INCLINES SLACE FLI TERTING. | I TO E<br>PE STORE CONDITIONING FUNCTION HATCH<br>COMMERTS A 4-28 and SIGNN, TO A<br>COMMERTS A 4-28 and SIGNN, TO A<br>TO THE SIGNAL INCLURES SIGNE<br>FILTERING, A PONER SIGNLY IS RESO<br>PROVINED | I TO E<br>SIGNAL CONDITIONING FUNCTION MATCH<br>P SIGNAL CONDITIONING FUNCTION MATCH<br>I/E CONDER'S A 4-28 MA SIGNAL LO A<br>VOLTAGE SIGNAL INCLUDES SUPPE<br>TLINEING | MALOG TO DIGITAL<br>COMMITINE MALOG INMUT MARI MARLOG<br>AD<br>TO DIGITAL COMMERSION FLACTION.<br>AD<br>FILIPENKG.<br>DIGITAL TO MALOG<br>DIGITAL TO DIGITAL VALLE<br>UNITS TO DIGITAL VALLE | LECTRICH, AND ENGINEERING UNTI' COMPRICH | UNC<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L<br>L                                                               | ALLEREARS FUNCTION<br>BETCARE<br>BETCARE<br>ALBERTARE FUNCTION |                                                        |                                                                             |

# Software Design Constraints

"Certain general constraints are imposed on the characteristics of that software which provides an essential protection or control function."

These constraints are consistent with the 414 IPS design philosophy. The standards represent a formalization of this philosophy, along with additional thinking based on new conditions and capabilities.



- INTERRUPTS
- CONCURRENT ACCESS
- MULTIPLE PROCESSORS
- RE-ENTRANCY
- MODULARITY
- PROCEDURE STRUCTURE
- DATA BOUNDING
- APPLICATION VS. SYSTEM
- CODE VS. DATA
- HIGH LEVEL LANGUAGE

File: ACRS03.WK 1



**Top-Level Software Structure** 



# IPS AND ICS APPLICATIONS SPECIFIC AND GENERALIZED SOFTWARE MODULES



SUBROUTINE CALLS

Software Module Development







File:ACRS10 WK1

J. B. REID



# V&V GUIDELINES, CODES, & STANDARDS

## ANSI/IEEE-ANS-7.4.3.2-1982

Criteria for Programmable Digital Computer Systems in Safety Systems of Nuclear Power Plants

## **IEC PUBLICATION 880**

Software for Computer Systems in the Safety Systems of Nuclear Power Stations

## **IEEE STANDARD 603**

Standard Criteria for Safety Systems of Nuclear Power Generating Stations

## **IEEE STANDARD 730**

Standard for Software Quality Assurance Plans

## **IEEE STANDARD 829**

Standard for Software Test Documentation

## **IEEE STANDARD 1012**

Standard for Software Verification and Validation Plans

2/4/91

# VERIFICATION AND VA. JDATION PHILOSOPHY

## BOTTOM UP VERIFICATION TESTING APPROACH Hardware and Software modules are individually tested in depth

## STRESSING THE DESIGN

By testing each module over its possible range of use, a higher level of assurance is achieved over any testing that could be done at the integrated system level.

## ANOMALY REPORTING

Anomaly reports provide an auditable demonstration of the completeness of the verification, and the disposition of the issues raised by the verifiers.

2/4/91

vv\_2.wkt









86702-1-10





J. B. REID

File:ACRS11.90K1

# SOFTWARE SECURITY REQUIREMENTS

- PERIODIC TESTING
- BUILT-IN DIAGNOSTICS
- EMBEDDED CHECKSUMS
- READ ONLY MEMORIES
- DOOR LOCKS
- LIMITED PHYSICAL ACCESS
- LIMITED SOFTWARE ACCESS

File ACRS04 WK

J. B. REID

# DIAGNOSTIC SOFTWARE AND HARDWARE

M. D. Bowers, J. P. Arnold, and A. W. Crew

Engineering Technology Division Westinghouse Electric Corporation Research & Development Center 1310 Beulah Road Pittsburgh, PA 15235 (412) 256-2456/2601/2539

#### Abstract

0

Many techniques have been developed to deal with the various issues inherent in the fault-tolerant design of critical real-time systems. Central to these techniques is a defense-indepth philosophy, in which different layers of the design address both different and overlapping fault detection and recovery issues. The addition of microprocessor-based technology offers a new opportunity to extend the defense-indepth philosophy for critical real-time systems, particularly in the nuclear industry. Traditionally, in protection systems for commercial nuclear applications, a complete off-line functional test of the system was performed. Now, embedded selfdiagnostics can provide a continuous test of the system to speed fault identification and repair. These embedded diagnostics are an additional laye: of defense which was never before possible. Since the embedded self-diagnostics run while the system is performing the critical function. It is desirable to keep them as simple as possible. An attempt to detect every possible fault would increase total system complexity and decrease response time. Thus, the embedded diagnostics have been designed to detect the more probable faults quickly. Safety is never compromised since complete functional tests are still performed on a periodic basis.

This paper describes a library of diagnostic algorithms. In addition, a dedicated diagnostics board, the Multibus Diagnostic Monitor, will be described.

#### Introduction

This paper describes a diagnostic software library which contains algorithms to test read-only memory, read/write memory, address lines, the main processor instruction set, the numeric data processor instruction set, and mutual exclusion hardware. The software library also contains algorithms to respond to unexpected software interrupts. In addition to the diagnostic software library, a dedicated subsystem diagnostics board called the Multibus Diagnostic Monitor (MDM) will also be described. R. J. Gibson and W. D. Ghrist III

Nuclear and Advanced Technology Division Westinghouse Electric Corporation P.O. Box 598 Pittsburgh, PA 15230 (412) 733-6540/6343

#### **Diagnostic Software**

A library of diagnostic algorithms for use in real-time microprocessor-based systems has been developed. The major design goals are to detect the most common failures as quickly as possible and to detect a majority of the less common failures in a timely manner. Note: Some highly unlikely failures inay still go undetected by these algorithms, but the algorithms are only the lowest layer of protection. All of our critical systems have fault tolerance (usually redundancy) at higher levels in the system so that even these unlikely failures can be detected and will not compromise safety.

The algorithms are intended to be used with intel Corporation's 8086 family of microprocessors. This includes the 8086, 8088, 80186, 80188, and 80286 (in real mode only). The algorithms are not compatible with 80286 protected-mode operation.

The diagnostic algorithms were developed to detect the failures in several hardware devices. The following is a list of these devices; general descriptions of the algorithms used to detect corresponding failures are given on the following pages:

- Read-only memories containing software, configuration or calibration data.
- Read/write memories containing program variables and data.
- Address lines addressing read/write memories.
- · Main processors.
- · Numeric Data Processors.
- Hardware used to mutually exclude multiple processors from the same shared-memory resource.
- · Interrupt hardware.

The uniqueness of the algorithms used to test these devices lies in their structure. The structure was dictated by the system

C The Institute of Electrical and Electronics Engineers, Inc., 1988

0

time constraints in which the software is to run. For these systems, start-up time is relatively unrestricted, while run-time processing consists of time-restricted cycles. The cycles usually contain a large amount of application specific processing, with the remaining time being used for diagnostic testing. For this reason, the algorithms were developed such that all of a particular resource could be tested at once (such as during system start-up) or incrementally (during run-time cycles). In the case of the memory and address line diagnostic algorithms. configuration tables are used to define the locations of the regions of memory, or combinations of address lines to be tested. Each incremental algorithm tests a subset of the regions of combinations. Similarly for the instruction set diagnostic tests, the incremental algorithms test a subset of the instruction sets. In all cases, the size of the subset to test is an argument to the incremental algorithm; this value could be determined by the amount of time left in a current run-time cycle. The complete start-up diagnostic tests are implemented by calling the incremental algorithms repetitively until all of the corresponding resource is tested.

Unlike the incremental algorithms, the algorithms used to detect failures of mutual exclusion hardware and interrupt hardware a not test "blocks" (regions, combinations, or sets) of their corresponding resources. Instead, the mutual exclusion algorithm is designed to be implemented once a cycle for every memory resource shared by a processor. For the case of un-expected software interrupts, interrupt handlers are installed during system start-up; run-time diagnostic processing is not necessary.

•

For all the diagnostic algorithms, failures are reported in the same way. Upon the detection of a failure, diagnostic information describing the kind of failure and the location of the failure is immediately copied to dedicated memory jocations. These is cations usually exist in the shared memory of another processor, this second non-failing processor has the responsibility of propagating the failure report outside the subsystem or saving the information in non-volatile memory for examination at a later time. In the case where the information is reported to a non-failing processor which propagates the failure report outside the subsystem, the identification of the subsystem and the failed processor is included in the report. After failure information has been reported by the failing processor, it enters its shut-down or halted state. Nonmaskable interrupts can bring the processor out of this state (which is not desirable); consequently, the non-maskable interrupt must be handled by a fallure reporting routine that subsequently re-enters the halted state.

#### Read-only Memory Diagnostic Algorithms

Read-only memory devices are tested by summing all the memory words (a word is two bytes or 16 bits) in a particular region; when all the words have been summed, the final sum is verified against an expected checksum previously-computed by an independent system. Configuration of this algorithm is achleved through a table that describes the regions to be tested and the locations of the previously-computed checksums. The incremental algorithm sums and verifies against the expected checksum when appropriate.

More exactly, the incremental algorithm sums the number of words requested by its argument and saves the intermediate sum until, after successive increments are summed, the end of a region is reached. When the end of a region is reached, the sum is verified at that point, and the intermediate sum is then cleared. The next words to sum are words in the next region specified by configuration. Figure 1 presents an example for the incremental algorithm. In this figure, Num\_Words\_to\_Sum represents the argument to the algorit n = the size of the subset to test.



#### Read/Write Memory Diagnostic Algorithms

There are two basic types of read/write memory failures: 1) The failure of a memory cell, data line, or address line that is stuck in a particular state (always 1, or always 0); or 2) The failure of a memory cell, data line, or address line that is in a particular state because it is "coupled" to one or more other memory cells or data lines. Coupling faults usually occur between physically-adjacent cells or lines.

While a test for stuck memory cells or data lines needs only to write and read both a state and its complement to that cell or line, detecting coupling faults can only be detected by writing all combinations of bit patterns to all combinations of groups of memory cells or lines and then checking all the other cells or lines for interference. One way to reduce the complexity of full coupling fault tests would be to use groups of memory cells or data lines that are physically adjacent to a cell or data line to be tested.

However, RAM chips can have physically different architectures yet have identical pin outs. In time, changes in technology mean even more different physical architectures for repiacement chips. Therefore, no consistent physical architecture can be assumed throughout a typical RAM-based system. Physically adjacent or "neighboring" cells cannot be easily identified and po assumptions can be made that might reduce the number of coupling faults to faults that only occur between neighboring cells. Therefore, tests for coupling faults in memory cell arrays must check for interference between a cell and all other cells. This kind of diagnostic function is intended for chip fabrication time diagnostics and is not suitable or practical for run-time system diagnostics.

On the other hand, data line and address line coupling faults can be detected much more practically. The number of data and address lines is small compared to the number of memory cells and the number of possible interference patterns is reduced.

The read/write memory diagnostic algorithms described in

0

this section detect memory cells (bits) and memory data lines that are stuck in a particular state. They also detect coupling failures between memory data lines (coupling failures occur when the state of one or more memory cells or data lines affect other cells or lines). Although the read/write diagnostic sigorithms modify memory locations, they are considered nondestructive tests because the algorithm returns the locations to their original values upon completion of the test. Configuration of these algorithms is achieved through use of a table that describes the regions of memory to be tested. The algorithm used to incrementally test read/write memory tests a subset of one or more of these regions.

A data line coupling fault test needs only to compare written and read values (data line patterns) at any address. This test has been combined with the test for stuck memory cells by using the data line patterns (and their complements) as bit patterns for detecting the stuck cells. (The test would therefore test all the bits in a memory word instead of only testing one bit as was needed to accomplish a test of a memory cell.)

At the heart of the read/write diagnostic algorithm is a lowlevel algorithm used for testing two adjacent words or two adjacent bytes at a time. For the purposes of this paper, an algorithm that tests two words at a time is described. An algorithm for testing two adjacent bytes can be derived from a simple extension of the algorithm described. The use of either one of these algorithms depends upon the width (word-wide or bytewide) of the data bus and the memory devices.

If interrupts are enabled, they are locked out for critical portions of the test. The test can not be run on RAM which is subject to DMA, dual-port access, or access in response to nonmaskable interrupts.

Provisions are made after each memory write to prevent inaccurate test results due to storage of the test patterns on the stray capacitance of data lines. Therefore, after every write of a test pattern and before the verifying read, the complement of the pattern is written to or read from a different location. This resets the data lines so that the verifying read will not provide misleading results. The most convenient way to satisfy this requirement is to use the next word as the different location. By writing a test pattern to the first of two words to test, writing the complement of the pattern to the second word, reading the first word to verify, and finally reading the second word to verify, these requirements are satisfied. At the same time, the two adjacent words have been tested.

In order to lest for possible coupling faults, different test patterns mus, be used on the two memory locations. A series of test patterns that contains all 64K ( $2^{16}$  for 16 data lines) different word values must be used to detect all 64K possible coupling faults. However, the test patterns that detect more probable coupling faults should be used more often than test patterns that detect less probable coupling faults. The most probable coupling faults occur between physically-adjacent data lines. These lines are assumed to be logically adjacent. The most probable coupling faults are detected by using a word test pattern consisting of alternating bits (for example, "5555" hexadecimal).

Thus, the series of test patterns used by the read/write diagnostic algorithm is such that as the memory regions specified in the configuration tables are tested over and over again, and the series of patterns is repeated over and over again, every word in all the regions is tested with every word test pattern in the series. To accomplish this, the total number of words to test in all the regions should not be a multiple of the number of test patterns in the series. For this purpose, the series of test patterns consists of all 64K possible patterns with the "5555" hexadecimal pattern used every other time - with the exception of when the series repeats. After the "FFFF" hexadecimal pattern is used, a test pattern of "0000" hexadecimal is used without using "5555" in between. The test series consists of "0000", "5555", "0001", "5555", "0002", "5555", "FFFF", "0000", "5555", "0001", "5555", "0002", "5555", and "FFFF". This makes the total length of the series of test patterns equal to 2\*2<sup>16</sup> - 1, or 131,071 (a prime number). Each test pattern is used to test two words. Since the length of the series is a prime number, the probability that the total size of memory regions to be tested being a multiple of the length of the test pattern series is sma<sup>11</sup>.

Before any word memory locations are tested, the previrus values of those locations are saved in microprocessor registers and then restored after the test of those locations is complete. The restoration is attempted even in the event that a failure is detected - in case those locations held vite. Information needed to report the failure.

Figure 2 presents a sample algorithm that satisfies the requirements described above. In this example, a local variable. Test\_Pattern, is used to specify the word value to use to test the words. The sample algorithm assumes that the number of words tested is even. Also, the sample algorithm uses the following intermediate variables to maintain which test pattern is to be used next.

- Pattern\_Counter: A word value that is incremented with each use and generates all possible word test patterns. (Assume, for the example, that this variable has been initialized to zero.)
- Used\_Alternating\_Bits\_Last: A Boolean flag that indicates "true" if the alternating bit pattern was used as the last test pattern. The flag indicates "faise" if the last test pattern was generated from the Pattern\_Counter. (Assume, for the example, that this variable has been initialized to faise.)

#### Address Line Diagnostic Algorithms

The address line diagnostic algorithms detect both failures of address lines that are stuck in a particular state and coupling faults between address lines. Coupling faults between address lines are assumed to be modeled by shorts between physical address lines. As described below, configuration for these algorithms describes the combinations of address lines that can be tested.

In order to detect a failure in an address line, two addresses must first be generated that model the failure: A "failure model" address where the suspect address line is in the state that would model the failure, and a "base line" address where all the lines are the same as in the "failure model" address except for the suspect address line; the suspect address line in the "base line" address must be in the correct state - the inverse of the modeled failure state. For example, if the function was to detect a "stuck at one" failure of the fifth address line, the binary representation of the two addresses would be

Failure Model:

(MSB) XXXX XXXX XXXX XXXX (LSB)





#### IF Used Alternating Alts Last. OR IT Patters\_Counter is zero. THEN SET Test\_Petters equal to Patters\_Counter SET Used Alternating Bits Last to faise INCREMENT Pattern\_Counter by one (for its next use) FLSE SET Test\_Pattern equal to "\$355" bexadecimal SET Used Alternating Bits Last to true. END IF COMPUTE the addresses of the next two words to test. Temporarily SAVE the previous values of the two words to test. WRITE Test\_Pattern at the address of the first of the tero words 10 1461 WR/TI the complement of Test\_Pattern at the address of the second word. (The second write clears the data lines after the first write.) READ the values in the order written. (The first read clears the data lines after the second write.) VERJFY that the respective patterns were stored correctly.

IF a failure occurred. THEN

inmediately RESTORE the previous values of the tested words. REPORT the failure, and HALT the processor.

END F

WRUTE the complement of TestPattern at the address of the first word.

WRITE Test\_Pattern at the address of the second word. (Again, the second write clears the data lines after the first write.)

READ the values in the order written.

(The first read clears the data lines after the second write.) VERIFY that the respective patterns were stored correctly.

IF a failure occurred. THEN

Immediately RESTORE the previous values of the tested words. REPORT the failure, and HALT the processor. END IS

RESTORE the previous values of the tested words.

Figure 2: Example Lower Level Read/Write Memory Tere

Base Line:

(MSB) XXXX XXXX XXXX XXXX (LSB)

where "X" represents the states of the other address lines. The states of the other lines are inconsequential to the test for detecting address lines that are stuck, but as described below. they can be used to also detect lines that are shorted together.

In order to detect an address line that is in a particular failure state because it is shorted to one of the other address lines that can be tested in an address region, the "failure model" address must consist of all the address lines that can be tested set to the same state - the particular failure state. As before, all the address lines in the "base line" address must be the same as in the "failure model" address except for the suspect address line; the suspect address line in the "base line" address must be in the correct state - the inverse of the particular failure state. As a second example, if the lower 14 lines can be tested and if the function was to detect whether the fifth address line was in its one state because it is shorted to the one of the other 14 address lines, the binary representation of the two addresses would be:

Failure Model:

(MSB) XXXXI XXXII 1111 1111 1111 (LSB)

#### Base Line

(MSB) XXXX XX11 1111 1110 1111 (LSB)

where "X" represents the states of the address lines that cannot be tested. The untested lines select the address line region itself as opposed to any other address line region of the same size. Notice that these two generated addresses detect whether the fifth address line is "stuck at one" or shorted to one of the other testable address lines.

Similarly, if the function were to detect whether the fifth sddress line was "stuck at zero", or shorted to another testable address line, the binary representation of the two generated addresses would be

Failure Model:

(MSB) XXXX XX00 0000 0000 (1.SE)

Base Line:

(MSB) XXXX XX00 0000 0001 0000 (LSB)

As a conclusion to this discussion. In order to test each address line that can be tested, two sets of "failure model" and "base line" addresses are generated for each line. One set that is used to detect whether the line is "stuck at one" or shorted to another testable address line, and another set that in used to detect whether the line is "stuck at zero" or shorted to another testable address line.

After the "failure model" and "base line" addresses have been generated, the failure is detected by first writing a pattern at the "base line" address and then checking to see if that pattern was written at the "failure model" address instead. In order to detect whether the test pattern was written at the "failure model" address, the previous value at the "failure model" address must be different than the pattern written at the "base line" address. This is accomplished by first writing the complement of the test pattern to the "failure model" address.

Before any test is performed, the previous values of both locations are saved in microprocessor registers and then restored after the test is complete. The restoration is attempted even in the event that a failure is detected - in case those locations held vital information needed to report the failure.

Figure 3 presents an example algorithm that might be used to satisfy the requirements of the address line diagnostic algorithm. In this example, a local variable Line\_To\_Test is used to specify the number of the address line to test.

Configuration for these algorithms is achieved through the use of a table that describes the allowed combinations of address lines to be tested. In order to ensure that address lines are tested along their entire physical path, address line combinations that address each physical read/write memory device should be configured.

In order to describe the set of allowed combinations of address lines that can be tested, the term "address line region" is introduced. An address line region is a region of memory that is addressed by all combinations of a particular set of address lines. The smallest address line region would be a region of memory locations that are addressed by all combinations of one address line - the least significant address line. This address line region would consist of two contiguous bytes.

| GENERATE the "failure model" and "base line" addresses        |
|---------------------------------------------------------------|
| needed to detect if Line_to_Test is stuck at one or chorted   |
| 10 přis.                                                      |
| GENERATE the 'Failure model' and 'base line' addresses        |
| meeded to detect U Line_to_ i wit is stuck at zero or shorted |
| 10 2470.                                                      |
| WITH each sai of "failure model" and 'c ass line' addresses.  |
| DO the following loop                                         |
| Temporarity \$AVE the previous values at the two addresses.   |
| WRITE '55' bezadecimal at the "failure m del' address.        |
| WRITE 'AA hexadecimal at the "base line' address.             |
| READ the value at the "failure model" address and             |
| CHECK to ensure that '55' hazadecizeal is still there.        |
| IF a failure occurred. THEN                                   |
| Immediately RESTORE the previous values at the                |
| tested addresses. REPORT the failure, and HALT                |
| the processor.                                                |
| END D                                                         |
| RESTORE the previous values at the taxted addresses.          |
| END of the DO loop.                                           |
| Figure Sy Example Address Line Test Algorithm                 |
|                                                               |

The Intel 8086 family of microprocessors (including the 80286 operating in real mode) can address, at most, 1M byte of memory. To address all possible locations, 20 address lines are used. Thus, the largest address line region is the total address space of the processor. If a is the number of address lines to be tested (where n is an integer greater than zero and no greater than twenty), then the address line region consists of all the memory locations addressed by line 0 through line n = 1. Lines n and above select the address line region itself as opposed to other address line regions of the same size. For example: Memory addresses 20000H through 200F"H comprise an address line region of eight address lines. Memory addresses 20100H through 201FFH comprise another, separate address line region of eight address lines. Memory addresses 20000H through 207FFH comprise an address line region of 11 address lines. This 11-line region happens to contain both of the eight-line regions as subsets. Memory addresses 20800H through 20FFFH comprise a different 11-line region, one which does not contain the previously-described eight-line regions.

The configuration table for the address line diagnostic algorithm specifies a beginning address of an address line region. and the number of lines that can be tested for that region (n). For each physical bank of P.A.M. an address line region corresponding to that bank should be specified. If possible, in the address line configuration table.' This allows the address lines to be tested along their entire routes. For example, suppose the total RAM space of a processor was 64K bytes, and suppose the 64K bytes were divided into two 32K-byte physical banks of RAM. This is illustrated in Figure 4. The memory locations in these banks are selected by adoress lines 0 through 14 (213 = 32.768 = 32 K). In order to test these address lines in each of the 32K-byte banks of RAM, an address line region corresponding to each bank would have to be specified in the configuration table. Address line 15 selects one of the banks as opposed to the other. In order to test line 15, an address line region corresponding to the whole 64K-byte bank of RAM must also be specified in the address line configuration table. Thus, the configuration table would contain three entries. If the 32Kbyte banks were physically separated further into smaller

Some of the regions that correspond to physical banks of RAM cannot be tested in their entirety because of shared-memory or DMA communication restrictions.

63

banks, the configuration table would contain entries that corresponded to those smaller banks.

If only the larger 64K-byte bank of RAM were specified in the configuration table, then tests of combinations of address lines forming low addresses (the  $15^{th}$  address line would be clear) would test the lines along their routes into Bank 0. Tests of combinations c? lines forming high addresses (the  $15^{th}$  address line would set) would test the lines along their routes into Bank 0. Tests of combinations c? lines forming high addresses (the  $15^{th}$  address line would set) would test the lines along their routes into Bank 1. Therefore, all combinations of lines 0 through 14 with the  $15^{th}$  line being clear would not be tested along their routes into Bank 1. Similarly, all combinations of lines 0 through 14 with the  $15^{th}$  line being set would not be tested along their routes into Bank 0.



#### Main Processor Instruction Set Diagnostic Algorithms

The main processor instruction set diagnostic algorithms detect failures in the 8086 and 8088 class of microprocessors. In addition to the 80188, 80186, and 80286 (in real mode only) processors. Instruction set failures are detected by checking for correct results after executing a general set of microprocessor commands. The general set of commanus will be divided into subsets of commands. The incremental main processor diagnostic algorithm tests one or more of these subsets, depending on the argument to the algorithm.

Since microprocessor address failures are usually independent of data failures, the testing of every command with every address mode is not necessary, addressing mode failures are detected by testing the addressing modes with a subset of the general set of commands. The general set of microprocessor commands, registers, and addressing modes tested are as follows. The commands are listed merely in similar groups, the groups are not required subsets.

- microprocessor registers.
- e shifts, rotates, and logical operators,
- · conditional jumps,
- · stack operations,
- signed and unsigned byte-wide integer multiplications.
- signed and unsigned word-wide integer multiplications.
- signed and unsigned word-by-byte and double word-by-word divisions.

decimal and ASCII adjusts.

- conversion of bytes to words and words to double words.
- repeat operations and the byte and word string scans, load, stores, compares, and movies.
- · the no operation command.
- indirect addressing with no displacement, and with 8- and 16-bit displacement, and
- · segment override addressing.

The microprocessor commands that are not tested by this library of diagnostic routines are listed below. If these commands are implemented, they must be tested separately.

- · software generated interrupts.
- · translation, escape, and hait commands.

#### Numeric Data Processor Instruction Set Diagnostic Algorithms

The numeric data processor (NDP) diagnostics detect failures in the 8087 class of numeric data processors. Numeric data processor failures are detected by checking for correct results after executing a general set of numeric data co-processor functions. The general set of functions are divided into subsets of functions. The incremental numeric data co-processor diagnostic algorithm tests one or more of these subsets, depending on the argument to the algorithm.

The general set of numeric data processor functions that are tested is as follows: The functions are listed merely in similar groups, the groups are not required subsets.

- real addition and subtraction.
- · real multiplication and division.
- · the square root utility function.
- conversion of real numbers to integers and integers to real numbers.
- e real comparisons.
- the masked response to an exception not configured to interrupt the CPU.

The general set of functions tested in the numeric data coprocessor instruction set diagnostic algorithms is general for the 8087 class of co-processors. If commands outside 'his set are used, those commands and processors must be tested separately by the application that uses them.

#### Mutual Exclusion Hardware Diagnostic Algorithms

In multi-processor environments, information is typically exchanged between processors through regions of shared RAM memory. (Two or more processors may have read or write access to the same memory location.) It is obvious that one or more processors should be prevented from reading a block of memory locations the, is currently being modified by another processor.

A system bus lock is a hardware mechanism used on the most elementary level to allow no other processor access to a block of memory while one processor is modifying a value in that block of memory (mutual exclusion). This hardware mechanism can be used in combination with a software mechanism (such as a semaphore) where arbitration between multiple processors using the system bus is desired. System bus lock feilures are detected by testing a software semaphore lock mechanism (which uses the hardware system bus lock mechanism) and its ability to successfully manage a "test" location in shared memory. The ability of the system bus lock used for all semaphores for the particular shared-inemory resource is verified by testing this semaphore. Therefore, only one test semaphore is needed per shared-memory resource. However, the test semaphore does not need to be dedicated to diagnostics: instead, only one of the memory locations managed by the semaphore is dedicated to diagnostics. Furthermore, each processor that uses the system bus lock hardware mechanism as part of its software semaphore mechanism should conduct the pame test of the same test semaphore and test memory location. Each processor writes a unique (for that processor) value to the test location.

A system bus lock fails in one of two ways. A processor is not allowed access in an acceptable amount of time, or more than one processor is allowed to simultaneously access a region. Semaphore lock failures most likely indicate failures in the hardware used to implement the non-interruptable read-modify-write instruction on which the lock depends.

Figure 5 presents a nample elgorithm that is used to satisfy the requirements of the mutual exclusion diagnostic algorithm. The amount or time necessary to wait after writing the unique value to the Ost location depends on the asynchronous properties of the processors sharing the memory resource. If the processors are truly suprchronous, practice has shown that failures are detected even when this time is minimal.

| REPEAT sitem    | pting to acquire the semaphore until either an unacceptable   |
|-----------------|---------------------------------------------------------------|
| amount of       | time has express, or the semaphore is acquired.               |
| G an unaccept   | table amount of time has expired. THEN                        |
| ASPORT L        | e failurt                                                     |
| EXTT the si     | Forithm.                                                      |
| SMD IF          |                                                               |
| IF the semaph   | ore became acquired. TELEN                                    |
| WRITE a W       | tique test pattern to a ter location managed by the semaphore |
| WAIT some       | amount of time.                                               |
| EEAD back       | that value () determine if another processor was erroneously  |
| allows          | d to access to the locall a and corrupted it with its own     |
| unique          | I VALUE.                                                      |
| IF the value    | e read was not the value written. THEN                        |
| REPOS           | T the failure.                                                |
| EXIT d          | he algorithm.                                                 |
| SND F           |                                                               |
| END IF          |                                                               |
| REPORT as fai   | lure.                                                         |
| EXIT L've algor | ithm.                                                         |
| Figure S.       | Example Mutual Exclusion Diagnostic Algorithm                 |

#### Unexpected Software Interrupt Diagnostic Algorithms

For environments that do not allow interrupts, the occurrence of an interrupt indicates a fatal failure. For environments that do allow interrupts, the occurrence of an interrupt that is not being used also indicates a fatal failure. The functions described in this section provide a default interrupt service for all interrupts. The elgorithm used during system start-up time stores the address of the default servicing function in memory locations appropriate for each interrupt vector. Since these algorithms run on the 8086 class of microprocessors (inc<sup>1</sup>) ding the 80286 in year mode only), there are 256 possible different interrupts. The el-byte interrupt vectors are stored in the 4 x 256 = 1024 bytes in memory with the vector for inter. It zero stored at location zero and each successive vector stored at contiguously higher locations.



If a service routine for a specific interrupt is needed, the address of that routine is copied over the address of the default service routine for that interrupt vector.

The default interrupt service routine determines which interrupt occurred, reports the interrupt failure, and halts the CPU.

#### Diagnostic Support Hardware

Westinghouse uses commercially-available microcomputer boards (IEEE-796 compatible) in many of its real-time systems. Each subsystem consists of several boards in a single chassis. The chassis provides physical support, power, and a status panel. Each of the boards serves one of three functions:

- Host Processors The host processors are the microcomputer boards which actually perform the application function. The processors may share the function or they may by rranged in a redundant configuration.
- Slave Processors The clave processors an litelligent UC subsystems.
- VO Boards These are non-inicilityen interface boards.

Some of the hardware necessary to support diagnostic functions is not generally found on commercially-available boards. A general purpose support function card has been designed for use in real-time systems. This card, the Multibus Diagnostic Monitor (MDM), is an intelligent IEEE-796 slave processor. The on-board microprocessor communicates with the host processors via its shared memory.

The basic features of the MDM arn as follows.

- <u>Non-Volatile Memory</u> The MDM provides 2K bytes of IEEE-796 bus accessible non-volatile memory for the retention of diagnostic and post mortem information. This information is placed in the memory by the host processors. The information is accessible via a terminal driven by special maintenance software resident on the host processors.
- Temperature Monitoring The MDM interfaces to as many as eight temperatur's sensors (Analog Devices AD590LH). Two of these sensors will be located on the card chassis itself. The other six sensors are intended to be distributed throughout the cabinet.

The software running in the MDM reads these temperature sensors, converts the readings to degrees, and places the values into shared memory for use by the host processors. The software also compares the temperatures to programmable high and low thresholds. If any threshold is exceeded, an alarm indication is placed in shared memory and the alarm indicator on the status panel is lit.

 Power Supply Monitoring - The MDM monitors the IEEE-796 power supplies and the redundant 15-Vdc power supplies (used by signal conditioning modules associated with the L/O signals). Additionally, six channels of general purpose power supply monitoring are provided.

The software running in the MDM reads these voltages, converts the readings to volts, and places the values into shared memory for use by the host processors. The software also compares the values to programmable high and low thresholds. If any threshold is exceeded, an alarm indication is placed in shared memory and the alarm indicator on the status panel is lit.

 Door Limit Switch Monitoring - The MDM interfaces to two door limit switch loops. Each loop consists of several door switches wired in series. Thus, if any door is opened the loop is broken.

The software running in the MDM reads these contacts and places the values in shared memory for use by host processors.

- Subsystem identification Ten bits of digital input are available for use as a subsystem identification code. The software running in the MDM reads this code and places the value in shared memory for use by host processors. The host processors compare this value to a copy burnt into their PROM memory. In this manner, a bost can determine if it is in the proper subsystem.
- Redundant Subsystem Select Logic The MDM provides circuitry to perform the selection of a subsystem in a redundant subsystem architecture. The MDM boards in the two subsystems are cross coupled: this electrical interface is accomplished via the front edge connectors. The MDM drives two in-dicators on the local status panel. The "RUN" light indicates that the local subsystem is operational. The "CONTROL" light indicates that the local sub-system is actually in control of the process.
- Host Watch Dog Timer/Auto Restart The host processors strobe individual keep-alive locations in the MDM's shared enemory. If ell of the host processors fail to do this, the MDM causes the redundant subsystem selection logic to pass control to the other subsystem. The MDM also asserts the IEEE-796 INIT line, holding the subsystem in reset. The MDM can optionally be programmed to release the reset line, thus restarting the subsystem. The Auto Restart option is disabled by default. Any host processor can enable this feature if desired. If the feature is enabled, the Auto Restart Enable Indicator on the status panel is lit.
- MDM Hardware Deadman A hardware deadman is provided for the MDM processor. Once enabled, if the deadman is not serviced, the MDM processor is reset.
- I/O Module AOK Loop Sec. al custom signal conditioning modules (E-Series modules) are associated with evich subsystem. These modules are used to interfule field signals to the IEEE-796 cards. They provide all of the signal conditioning, signal conversion, isolation, buffering, termination, and testability requirements of the subsystem. Each module cristics an "All is OK" (AOK) contact closure. The primally-closed AOK contacts of all the incluies (E) clated with a given subsystem are tied in series. This loop is monitored by the MDM.

The software running in the MDM reads this contact loop and places the value in shared memory for use by the host proc. isors.

Reset on +? Vdc out of Specification - The chassis
 +S Vdc supply is monitored. If ic goes out of che-ification, the IEEE 796 INIT line is asserted.

7



 IEEE-796 Compaubility IEEE-796 compliance is Slave D8 M20 which means that the MDM is a slave board with an eight-bit data path and a twonty-bit address bus.

Pins 1 through 40 of the P2 connector are defined as category number one signals (unconstrained use) and used for MDM specific functions. To prevent the use of other cards in slots wired for a MDM card, the P2 connector is keyed between pins 7 & 9 and pins 51 & 53.

 Status Panel Interface - The MDM provides the interface to the status panel located on the chassis.
 All status panel connections are made via the IEEE-796 P2 connector. Interfaces for the following switches and indicators (in addition to the those mentioned elsewhere) are provided:

#### . Alarm Indicator

e Reset Pushbutton

#### Conclusion

As microprocessor-based technology is applied to critical applications, the design of fault-tolerant systems becomes increasingly important. Low-level diagnostic software and hardware are critical components, but indiscriminate and excessive use of embudded diagnostics can diminish the overall performance of the system. A judicious combination of highlevel, fault-tolerant architectures and low-level diagnostic hard ware and software is a must. Westinghouse has achieved this ideal combination in the design of many critical systems.

#### Acknowledgements

The authors wish to acknowledge the following individuals for their contributions: G. W. Remley, B. M. Cook, J. A. Neuner, J. E. Hasenkopf, D. M. Rao, L. L. Santoline, and J. F. Sutherland.

#### Bibliography

- W. Berraclough, A. C. L. Chiang, and W. Sohl, "Techniques for Testing the Mircocomputer Family", <u>Proceedings of the IEEE</u>, Vol. 64, pp. 943-950 June 1976.
- [2] J. Knaizuk, Jr., and C. R. P. Hartmann, "An Algorithm for Testing Random Access Memories", <u>IEEE Transactions on Computers</u>, Vol. C-26, pp. 414-416 April 1977.
- [3] J. Knaizuk, Jr., and C. P. P. Hartmann, "An Optimal Algorithm for Testing Stuck-at Faults in Random Access Memories", IEEE Transactions on Computers, Vol. C-26, pp. 1141-1144 November 1977.
- [4] R. Nair, S. N. Thatte, and J. A. Abraham, "Efficient Algorithm for Testing Semiconductor Random Access Memories", <u>IEEE Transactions on Computers</u>, Vol. C-27, pp. 572-576 June 1978.
- [5] C. A. Papachristou and Narendar B. Sangai, "An Improved Method for Detecting Functional Faults in Semiconductor Random Access Maxiories", <u>IEEE Transactions on</u> <u>Computers</u>, Vol. C-34, pp. 110-116 February 1985.
- [6] K. K. Saluja and K. Kinochita. "Test Pattern Generation for API Faults in RAM", <u>IEEE Transactions on Computers</u>, Vol. C-34, pp. 284-287 March 1985.
- [7] Y. P. Srini, "Fault Location in a Semiconductor Random-Access Memory Unit", IEEE Transactions on Computers, Vol. C~27, pp. 349-358 April 1978.

### MULTIPROCESSOR SHARED-MLMORY

### INFORMATION EXCHANGE

L L Santoline, M. D. Lowers, and A. W. Crew

Engineering Technology Division Westinghouse Electric Corporation Research & Development Center 1310 Beulah Road Pittsburgh, PA 15235 (412) 256-2537/2456/2539

#### Abstract

in distributed microprocessor-based instrum. "tation and ontrol systems, the inter- and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically, the protocol allows for multiply processors to exchange information via a shared-memory interface. Our primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function process r boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol. a standard master-slave shared-memory interface suitable for use in nuclear safety systems, is designed to pass unigirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange. This is achieved by providing multiple buffers for each unique block of information passed between the two processors. Another important feature of the design is that the interface between masters and slaves is identical for different types of sive processors. Thus, the amount of custom software in the final system is minimized. The use of standard system software not only eases initia "oftware verification and validation requirements. It also simplifies long term system software maintenance.

#### Introduction

This paper describes the design of a standard shared-memory interface for intra-subsystem communications. The interface provides a method for reliable information exchange between processors in a cingle computer chassis which have access to a subsystem bus and shared-memory resources. This interface protocol, known as Multiprocessor Shared-Memory Information Exchange (MSMIE), is optimized for real-time critical process control and Instrumentation systems such as nuclear safety systems

A distributed processing architecture is a natural choice when The Institute of Electrical and Electronics Engineers, Inc., 1988 C I. Rostund and W. D. Ghrist III

Nuclear and Advanced Technology Division Instrumentation Technology and Training Center Westinghouse Electric Corporation P.O. Box 598 Pittsburgh, PA 15230 (412) 733-6780/6343

designing critical real-time systems such as nuclear safety systems. By oustributing the processing requirements of e time-critical function across multiple !! ocessors, the tasks to be performed by each component processor are reduced. Furthermore, the processing tasks of most subsystems within a system can be functionally viewed as a unique application fraction and a set of common "operating system" type functions such as I/O handling and pre-processing, external communic "ir processing, and diagnostics, to name a few. By off-load/ , / dedicated "operating system" type functions onto ind. ". " "slave" processors, two distinct advantages are gained. ) the hardware and software for the processors performing and common system tasks may be of a standard. configurable design. Secondly, the processing burden of the subsystem application processor, or "master" processor, is substantially reduced, both in volume and execution time. However, the use of multiple processors to implement a single subsystem creates an additional communications burden, that of communications among the processors within the subsystem The most efficient mechanism for intro-subsystem communication is a tightly-coupled architecture in which all processors share a bus, and communicate via a bus-accessible shared memory.

A typical architecture for a functionally-distributed computer system is shown in Figure 1. The system shown contains two subsystems, each containing a "master" processor, a "slave" processor, and a shared memory. Intra-subsystem communications are accomplished via the shared memory, and inter-subsystem communications are accomplished between the two slave processors via an unspecified physical communicating channel.

In order to avoid designing unique interfaces between master processors and each different spe of slave processor, a standardized shared-memory interface protocol is required. The MSMIE protocol defines such an interface. It is optimized for real-time, process control type systems, such as nuclear safety systems, where operation in a non-interrupt driven environment is highly desirable.



#### The MSMIE Interface

#### The Participants

One of the primary goals of the MSMIE design is to identify a set of standard, configurable slave processor boards, and to design the slave processor software st. that each slave microprocessor board of a given type ... suld be used interchangeably throughout the overall system. This type of design not only minimizes the number of different microprocessor board types and the amount of custom software in the overall system, it also restricts custom software to the master processor boards. In a nuclear safety system, this type of design significantly improves overall system quality and integrity by focusing the total design effort (including verification and validation), on a small number of hardware and software components which are used as "building blocks" throughout the system. An additional benefit of such a standardized design is that long term hardware and software maintenance is simplified. For all of these reasons, the MSMIE interface has been designed so that master processors have the ability to configure individual slaves and thereby tailor them to the specific requirements of the subsystem. Thus, sleve processors are dependent upon the master processors for their configuration information, which is passed to the slave processors during initialization. Sleve processors communicate with the master processors via the subsystem shared memory. usually resident on each slave processor board. To further solate the functionality of the slaves, they are only permitted to communicate with a master processor via the sharedmemory interface. Slave-to-slave communications within a subsystem are not permitted except through some external communications device or via the subsystem master.

For some critical subsystems, an added degree of fault tolerance implemented via redundant subsystem master processors may be necessary. For true fault tolerance, the master processors must be fully redundant and isolated so that fault of one master processor will not cause the others to fail. 10 allow the MSMIE interface to function properly, only one master processor may have the power to control the sharedmemory interface at any given time. This processor is denoted the "primary" master, while all other masters are called "auxiliary" masters. The primary master is responsible for initial communications establishment with the slave processors. configuration of the slave processors, and if warranted, resetting the slave processors. The auxiliary masters may only monitor the shared-memory interface until after the slaves have been configured and MSMIE communications are fully established. At that point, the auxiliary masters may participate in shared-memory message passing to and from the slave processor boards.

#### Shared-Memory Organization

Master and slave processor communications is implemented via a predefined set of shared-memory data structures, which form the basis of the MSMIE interface. Between each slave processor and the subsystem host processors, a shared-memory region exists which is organized as shared-memory configuration data structures followed by message buffers. To maintain configurability of the slave processors, the shared-memory data structures are passed from the master to the slave processors during MSMIE initialization. The data structures contain information used by the slave to define the number and operation of any physical communications channels on the slave, the number, directionality, and definition of the messages communicated over cach physical channel and between the masters and the slave, and other general configuration information. In addition, the data structures contain locations

for passing diagnostic and run-time status between the masters and the slave, and locations for controlling the establishment of communications and resetting the slave from the primary master.

#### Method

The MSMIE protocol is designed so that each measage buffer exchanged between the slave and masters is unidirectional, with the contents of the message being a continuously-updated image. The method for one processor to communicate with another. 3

- The processor sending information will continually copy the newest image of a message into a sharedmemory buffer
- The processor receiving information will read from the shared-memory buffer containing the newest image.

A shared-memory buffer is either updated by a master processor, and the flow of information is from master-to-slave, or sitematively, a shared-memory buffer is updated by a stars processor, and the flow of information is from slave-to-mast-

The use of shared memory as a means of exchanging information between multiple, asynchronous processors is only successful if a mechanism exists to prevent simultaneous access to a given memory resource. Without this mechanism, the possibility of "data tearing" erises. Data tearing occurs when one processor writes to a memory area while it is being read by another processor. If this situation exists, it is possible that the processor reading the memory area actually reads portions of both old and new data. This can happen whenever the memory location being accessed is of a size that requires multiple machine instructions to read or write the location. Consider the following simple example, where processor number one reads a location which is concurrently being written to by processor number two:

WORD VALUE

| High<br>Byte | Low<br>Byte | Processor One  | Processor Two   |
|--------------|-------------|----------------|-----------------|
| 55           | 55          | READ Low = 55  |                 |
| 55           | 33          |                | WRITE LOW = 33  |
| 33           | 33          |                | WRITE High = 33 |
| 33           | 33          | READ High = 33 |                 |

As indicated, the word value read by processor number one is invalid as it contains the low byte of the "old" data value, but the high byte of the "new" data value. This simple example can be extended to more sophisticated situations, where the integrity of whole blocks of data must be maintained.

In order to prevent data tearing, mutually-exclusive access to the shared-memory area must be guaranteed. In MSMIE, this is partially accomplished with software semaphores. The semaphores allow a processor, while it has access to a particular shared-memory area, to prevent, or "lock out", other processors from accessing that same shared-memory area.

While semaphores prevent data tearing, they introduce the possibility of one processor being denied access to a sharedmemory area because that area is locked by another processor. The processor which desires buffer access must wait for the other processor to release the locked shared memory area. This is readily apparent if only a single shared-memory buffer is allocated to hold each message image. In this case, the buffer acquisition/release scheme of master and slave processors is as shown in Figure 2. As illustrated, buffer contention problems are normal with a single buffer per message allocation scheme. When messages are passed from "slave-to-master", the master processor must walt for the slave to update the data in the buffer, and then release the buffer so the master can access the newest data. While the master is reading the new data from the builer, the slave has no free area to build a new message. For message images passed from "master-to-alave," the situation is reversed. In either case, each processor must walt for the buffer to he in the correct state (either "idle" or "newest") that the before it may access the buffer. There is no guara ... ouffer will be in the correct state at any given time. this primitive message passing interface is used for in. ALION exchange between the master and slave, only a fraction of the full power of a multiple processor architecture can be realized. since each processor wastes a portion of its execution cycle waiting for the other processor to release the shared-memory resource.

The addition of a second buffer for each message image communicated between the master and siave processors solves some buffer contention problems. While one or more processors are reading a message image from the first sharedmemory suffer, another processor may be building a message update in the second shared-memory buffer. Using this method, the participating processors can simultaneously operate on (read from or write into) shared-memory buffers without interfering with one another.

However, because master and slave processors may run asynchronously, buffer contention is still possible even with dual memory buffe", allocated for each message image. This is true because there are no real restrictions on the amount of time a processor can hold a buffer assigned to itself. The buffer contention problem which occurs using dual sharedmemory buffers for each message image is explained in the following scenario (see Figure I' Assume the slave processor is operating with a slower cycle in we than the master processor. This implies that the slave will take longer to access and release e data bufrer than the master. When the slave processor is reading an image of a message from one shared-memory buffer, the faster master processor builds a new image of the message and places it in the second shared-memory buffer. If the master processor builds another new message image before the slower slave processor has finished using its data buffer. then the master processor must either wait for the siave to release its buffer to the idle state, or overwrite the previous "newest" image with the fresher data. The first alternative is undesirable because the operation of the two processors is coupled. The second alternative is also undesirable because the newest image is destroyed. In the second case, when the slave processor finally releases its buffer, it can no longer immediately access a new image. It must instead wait for the master processor to finish updating the newest image and release the buffer to the newest state.

To eliminate the possibility of builder contention, three shared-memory buffers can be allocated for each message image. With this scheme, at any given time, one of the buffers can hold the newest complete image, a second builder can be assigned to a processor for reading an image, while a third buffer can be assigned to a processor for building another new image. A buffer is guaranteed to be available to each processor at the time it requests access to a buffer, and a third buffer is always available to prevent the newest complete image from being destroyed.

Thus, for each data image which must be communicated between a slave and the masters' processors, the MSMIE protocol mainteins a set of triplicated buffers. Information about the buffers and access control to the buffers is provided by "buffer descriptor tables" in shared memory. Each set of triplicated buffers has an associated buffer descriptor containing the following entries:

- A three-element Buffer\_Status array which holds the status of each of the three shared-memory buffers. Each buffer can have one of five buffer statuses: "idle", "assigned to master", "assigned to slave", "newest", and "not used".
- A semaphore location which provides exclusive access to the Buffer\_Status array for both master and slave processors. The semaphore and the Buffer\_Status array are used to control access to the triplicated shared-memory buffers.
- A Number\_of\_Readers location which maintains a count of the number of master processors currently reading the buffer whose status is "assigned to master".
- An Access\_Mode location which determines the number of read accesses permitted to "newest" data buffers.

The method in which the buffer descriptor parameters are used varies according to the direction of information flow and whether the processor resiring buffer access is a master or slave processor. The rise of the buffer descriptor parameters will be described in the following sections.

Slave-to-Master Message Passing. Slave Processor Buffer Acquisition/Release In slave-to-master message passing, the slave processor marks messages for use by the master processor. Typically, he slave processor has a buffer assigned to it at initialization to hold the first message. Then, when the acquire/release buffer procedure is invoked, the slave releases the buffer which was assigned to it (by updating its status to "newest"), then acquires an "idle" buffer to hold the next update of the message. The procedure which the slave processor follows to release its current buffer and acquire an "idle" buffer is described below:

- The slave processor locks the Buffer\_Status array by acquiring the buffer descriptor semaphore. Once the slave proce or has the semaphore in the locked state, the master processor is denied access to the Buffer\_Status array.
- 2. The Buffer\_Status array is searched for a status of "newest". If a "newest" status is found, then the data which the slave processor is updating for the master replaces this buffer, so the Buffer\_Status of "newest" is changed to "idle".
- The Buffer\_Status array is searched for a status of "assigned to slave". If the status of "assigned to slave" is not found, then an error has occurred.
- 4. The buffer whose status is "assigned to slave" is changed to "newest". This completes the release of the "assigned to slave" buffer. An "idle" buffer must now be acquired.
- 5. The Buffer\_Status array is searched for a status of "idie". This buffer will be used to hold the next message update. If an "idie" buffer is not found, then an error has occurred.

6. The slave processor acquires the "idle" buffer by

changing its status from "idle" to "assigned to sieve".

7. The buffer descriptor semaphore is released.

Sizve-to-Master Message Passing, Master Processor Builfer Acquisition/Release: The master processor accesses the newest messages provided by the sizve processor. Two separate procedures are provided one to acquire the "newest" buffer, and a second procedure to release the "assigned to master" buffer cace the data has been used. The actions which must be taken by the master processor in order to acquire "newest" data buffers are described below:

- The master processor locks the Buffer\_Status array by acquiring the buffer descriptor semaphore. Once the master processor has the semaphore in the locked state, the slave processor is denied access to the Buffer\_Status array.
- The Buffer\_Status array is searched for a status of "assigned to master".
- 3. If an "assigned to master" buffer is found, then another master processor has a "uired this buffer. (in this case, multiple master processors have access to the slave's shared memory.) The master processor presently desiring buffer access is constrained to read the data in the current "assigned to master" buffer.
- 4. If an "assigned to master" buffer is not found, then this master processor is free to search the Buffer\_Status array for a "newest" buffer. If a "newest" buffer is found, then it is acquired by changing the Buffer\_Status to "assigned to master". If a "newest" buffer is not found, then no message has been provided by the slave processor.
- To mark the number of master processors reading this buffer, the Number\_of\_Readers location is incremented.
- 6. The buffer descriptor semaphore is released.

At this point, the master processor uses the data from the "assigned to master" buffer. When the master no longer d. spres access to this buffer, it may release the "assigned to master" buffer such that the slave processor can reuse this buffer. The procedure which the master follows to release the buffer is described below:

- The master processor locks the Buffer\_Status array by acquiring the buffer descriptor semaphore. Once the master processor has the semaphore in the locked state, the slave processor is denied access to the Buffer\_Status array.
- The Number\_Of\_Readers location in the buffer descriptor is decremented, as this master processor no longer requires access to the "assigned to master" buffer.
- 3. If the Number\_Of\_Readers location is not equal to zero, then another processor has access to the "assigned to master" buffer. If this is the case, the "assigned to master" buffer cannot be released. The Buffer\_Status is left in the "assigned to master" state, and the buffer descriptor semaphore is released.
- 4. If the Number\_Of\_Readers location is now equal to zero, then this master processor was the only master processor using the "assigned to master"

buffer, and the buffer may be released. In order to release the buffer, the Buffer\_Status array is searched for a status of "newest". If such a buffer is found, or the Access\_Mode location in the buffer descriptor indicates that each buffer is to be accessed only once, then the status of the buffer to be released is changed to "idie". Otherwise, the buffer which is to be released still contains the newest data, and it should be released by changing its status back to "newest".

5. The buffer descriptor semaphore is then released.

Master-to-Sleve Message Passing, Master Processor Buffer Acquisition/Release in master-to-sleve message passing, the master processor marks messages for use by the slave processor. The master processor uses two separate procedures to access the shared-memory data buffers, one to acquire an "idle" buffer, and a second procedure to release the "assigned to master" buffer once the new message has been moved into the shared-memory buffer. The actions which must be taken by the master processor to acquire an "idle" buffer are described below:

- The master processor locks the Buffer\_Status erray by acquiring the buffer descriptor semaphore.
- The Buffer\_Status array is searched for a status of "idie". If an "idic" buffer is not found, then an error has occurred.
- The buffer whose status is "idle" is acquired by the master processor by changing the status to "assigned to master".
- 4. The buffer descriptor semaphore is released.

At this point, the master processor moves the "newest" message image into the acquired shared-memory data buffer. Once this data transfer is complete, the master processor may release the "assigned to master" buffer such that the slave processor can use the "newest" data. The procedure which the master folic ws to release the buffer is described below:

- The master processor locks the Buffer\_Status array by acquiring the buffer descriptor semaphore.
- 2. The Buffer\_Status array is searched for a status of "newest". If a "newest" status is found, then the data which is being provided by the master processor replaces this buffer, so the Buffer\_Status of "newest" is changed to "idie".
- 3. The Buffer\_Status array is searched for a status of "assigned to master". This is the buffer which has been filled with the "newest" data. If such a buffer is not found, an error has occurred.
- The buffer whose status is "assigned to master" is changed to "newest".
- 5. The buffer descriptor semaphore is released.

Master-to-Slave Message Passing, Slave Processor Buffer Acquisition/Release. The slave processor must access the newest message provided by the master processor. Two separate procedures are provided: one to acquire the "newest" buffer, and a second procedure to release the "assigned to

If sultiple master processors are used, the Access\_Mode must be configured to allow an unlimited number of accesses to a single "newest" buller. This is so all master processors are guaranteed access to a "newest" message once one has been provided by the slave



slave" buffer once the newest data has been used by the slave processor. The actions which must be taken by the slave processor in order to access "newest" data buffers are described below:

- 1. The slave processor locks the Buffer\_Status array by acquiring the buffer descriptor semaphore.
- The Bullier\_Status array is searched for a status of "newest".
- 3. If a "newest" buffer is found, it should be acquired for use by the slave processor by changing its status to "assigned to slave". If a "newest" buffer is not found, then the master processor has not yet provided any messages in shared memory.
- 4. The buffer descriptor semaphore is released.

At this point, the slave processor is free to use the data in the buffer assigned to it. When the slave processor no longer requires access to the "assigned to slave" buffer, it must be released so that the mester processor can reuse the buffer. The procedure which the slave follows to release the buffer is described below:

- The slave processor locks the Buffer\_Status array by acquiring the buffer descriptor semaphore.
- 2. The Buffer\_Status array is searched for a status of "newest". If such a buffer is found, or if the Access\_Mode location in the buffer descriptor indicates that each buffer is to be accessed only once, then the status of the buffer to be released is changed to "idle". Otherwise, the buffer which is to be released still contains the "newest" data, and it should be released by changing its status from "assigned to slave" back to "newest".
- 3. The buffer descriptor semaphore is ther, released.

<u>Multiple Channel Siaves</u>: The basic method of applying triple buffering to shared-memory communications of data images has been described. This method can be extended to suit the particular needs of different types of slave processor boards. As previously mentioned, the slave processors are typically designed to offload the master processors from performing standard system tacks. One common slave processor function is simplex point-to-point (datalink) communications. This type of slave processor benefits from a variation of simple sharedmemory triple buffering due to multiple communication channel considerations.

A datalink controlier type of slave processor generally has greater than one physical communications device. Each of the physical communications channels (datalinks) can operate as a transmitter, receiver, or bidirectional channel. It is also possible that multiple message issages are to be communicated over a single physical channel.

Because dataiink activity is serial, only one message at a time can be transmitted or received on any given channel. Thus, triple buffering is implemented as follows: The number of shared-memory buffers required for each datalink channel is equal to two times the number of unique messages communicated over that channel plus one additional buffer. The "extra" buffer is for the physical channel itself, i.e., the buffer into which messages are received or from which messages are transmitted. In this arrangement, the sharedmemory buffers are in a free pool of buffer space, and are not associated with a particular buffer descriptor except at initialization. At initialization, two shared-memory buffers are

assigned to each buffer descriptor, and are initialized to the "idle" state. The third Buffer\_Status in each buffer descriptor is initialized to the "assigned to slave" state, as the third buffer for all buffer descriptors associated with a single channel corresponds to the single "extra" buffer which is assigned to the physical channel. In this case, the "assigned to slave" Buffer\_Status can be 'hought of as "assigned to physical channel". Because the buffers are not rigidly allocated to a particular buffer descriptor, the size of each of the allocated buffers must be at least as large as the largest message received or transmitted over the given channel. At any given time, two buffers are associated with each particular message image, and the third buffer is always assigned to the datalink controller physical communications device. When triple buffering is implemented in this manner, the method of acquiring and releasing buffers from the master side is identical to that previously described. From the datalink controller side, buffer acquisition and release is a "swapping" process.

On datalink controller receive channels, messages are received over the datalink and must be marked for use by the master processor. The messages are received into the shared-memory buffer assigned to the physical channel. Once a new message has been received, the datalink controller must determine which buffer descriptor the message is associated with, and find the correct buffer descriptor. This buffer descriptor is where the "assigned to sleve" buffer must be returned, and from where an "idle" buffer result be acquired to rearm the physical channel. Once the correct buffer descriptor is found, the procedure which the datalink controller follows to release its current buffer and acquire an "idle" buffer is identical to standard triple buffering for the slave-to-master message passing case described previously.

For transmitter channels, the master processor provides "newest" message buffers which contain data to be transmitted over the physical datalinks by the datalink controller slave. If multiple messages are transmitted on a single datalink, each message is transmitted separately and in order. In this case, buffer acquisition and release is a two step buffer swapping process, as described below:

- Acquiring buffers for transmission involves swapping a current "assigned to slave" buffer with the "newest" buffer from the buffer descriptor containing the least recently-transmitted message. "Newest" buffer acquisition is described below.
  - a. The servephore of the buffer descriptor corresponding to the least recentlytransmitted message is acquired.
  - b. The Buffer\_Status array is searched for a status of "newest". If a "newest" status is found, then the data!ink controller must swap the current "assigned to slave" buffer with the "newest" buffer so that the datalink controller can transmit the newest data.
  - c. The Buffer\_Status of the "assigned to slave" buffer is changed to "idle".
  - d. The Buffer\_Status of the "newest" buffer is changed to "assigned to slave".
  - e. The buffer descriptor semaphore is released.
- 2. The transmission of the "assigned to slave" buffer is initiated. Upon the completion of transmission, the buffer must be returned so that the master processor can reuse the buffer. The buffer <u>must</u> be returned to the same buffer descriptor from which

it was acquired. The procedure for returning "assigned to slave" buffers is described below:

- The buffer descriptor semaphore of the most recently-transmitted buffer is acquired.
- b. The Buffer\_Status arrey is searched for a status of "newest". If such a buffer is found, or if the Access\_Mode of this buffer descriptor indicates that each "newest" buffer is to be used only once, then the buffer which is presently "assigned to sizve" remains in that state. Otherwise, the buffer which is "assigned to sizve" still contains the "newest" data, and it should be released by changing its status from "assigned to sizve" back to "newest".
- c. If the "assigned to slave" buffer was released (its status changed to "newest"), then an "idie" buffer must be acquired for use by the datalink controller. The Buffer\_Status array is searched for a status of "idie". This buffer is acquired by changing its status to "assigned to slave".

d. The buffer descriptor semaphore is released.

The triple buffering procedure followed by slave processors which are similar to datalink controllers is just an extended version of simple shared-memory triple buffering. However, this method significantly reduces the memory requirements when many messages of a similar size must be transmitted or received over a single physical channel. This method reduces to simple triple tuffering when only one message is transmitted or received per physical channel.

<u>Multiple Master Processor Considerations</u> The basic triple buffer acquire/release algorithms which have been described are applicable regardless of whether one or more than one master processor is communicating with the slave processor. When multiple masters are present, the Number\_of\_Reade's location in each buffer descriptor allowe each master processor to simultaneously read the same buffer of a message passed from slave-to-master. However, only a single master processor is permitted to supply each image of a message passed from master-to-slave in order to prevent master-to-master buffer contention.

Although no changes are required to the buffer acquire/release algorithms, there are various operating constraints when more than one master processor is present in a subsystem:

- 1. Because each multiple master processor may run asynchronously with respect to other master processors and the slave processor, one master could essentially prevent the others from ever reading data provided by the slave if the Access\_Mode selection on slave-to-master message passing buffers were configured for one time buffer access. This leads to the requirement that all buffer descriptors with direction "slave-tomaster" must be configured to allow an unlimited number of accesses to a single "newest" buffer to ensure that all master processors are able to access the "newest" date at least once per master processor cycle.
- A timing constraint must be placed on the amount of time any master is allowed to access a buffer.

This constraint is necessary so that a buffer which is "assigned to master" is guaranteed to be released by all masters at least once per master processing cycle. For a system with M masters, the amount of time any master processor may assign a buffer to itself must be less than 1/M of the smallest loop cycle time of any master.

#### Reliability

The method of using triplicated buffers for message passing between master and slave processors provides for information exchange between the processors within a minimal and deterministic time frame. For use in nuclear safety systems, the information exchange must also be extremely reliable. The MSMIE protocol provides reliability in the data exchange by several mechanisms, which are summarized below.

- A field specifying the length of the message is embedded into a header which is part of every message image.
- A message serial number is embedded into the message header. The serial number is used by the receiving processor to determine if a message has been read before.
- New message buffers are timestamped when they are placed into the shared memory. The receiving processor can calculate the age of a buffer by comparing the timestamp of the buffer with a representation of "current time", maintained in shared memory by subsystem slaves.
- Source-to-destination error detection is provided by a word-wide checksum which is embedded into the message. The checksum is computed by the processor thich originates a message. It is recomputed by the end processors which eventually receives the message.

#### Summary

The MSMIE protocol has soveral features which make it ideally suited to inter-processor communications in distributed, microprocessor-based nuclear safety systems. At this time, implementation of the MSMIE protocol is a central part of the embedded software of several large Westinghouse nuclear system designs. The MSMIE protocol maximizes overall system performance in a multiprocessor environment while guaranteeing reliable communications between processors, deterministic performance, and maximum software reusability. This protocol represents a significant development in the design of nuclear safety system software.

#### Acknowledgments

The concepts and procedures described in this paper are taken from internal Westinghouse documents authored by the following persons: Mark D. Bowers, Albert W. Crew, William D. Ghrist III, Gilbert W. Remley, Charles J. Roslund, and Linda L. Santoline.



Figure 1: Intra-Subsystem and Inter-Subsystem Communications

Master-to-Slave Message Passing



Slave changes buffer status to IDLE after using the data from buffer.



\*



Master change buffer status to IDLE after using the data from buffer.

Figure 2: Master/Slave Buffer Acquisition and Release -- Single Buffer Per Message Case

#### Master-to-Slave Message Passing



Figure 3: Master/Slave Buffer Acquisition and Kelease -- Dual Buffers Per Message Case

8

16

# RESEARCH AC I VITIES

CURRENT AND FUTURE PROGRAMS ON THE USE OF DIGITAL COMPUTERS IN NUCLEAR POWER PLANTS

- \* LESSONS LEARNED: EXPERIENCE IN OTHER COUNTRIES
- \* REVIEW CRITERIA HUMAN FACTORS ASPECTS -ADVANCED I&C
- \* COMPUTER CLASSIFICATION
- \* EXPERT SYSTEM VERIFICATION AND VALIDATION
- \* HALDEN REACTOR PROJECT PROGRAMS SOFTWARE TOOLS SOFTWARE TEST AND EVALUATION
- \* CLASS 1E DIGITAL COMPUTER SYSTEMS

NUREG/CR-5348, MAN-MACHINE INTERFACE ISSUES IN NUCLEAR POWER PLANTS



# LESSONS LEARNED: EXPERIENCE IN OTHER COUNTRIES

CANADA

DARLINGTON: REVERSE ENGINEERING

BRUCE: QUALITY CONTROL

FRANCE

N4 SERIES: FRONT END REQUIREMENTS ANALYSIS

KWU: 10 YEAR PROGRAM, USE OF CASE TOOL TUV NORDDEUTSCHLAND: SOSAT PROJECT: REVIEW CRITERIA - HUMAN FACTORS ASPECTS - ADVANCED I & C

CONTRACTOR: OAK RIDGE NATIONAL LABORATORY

OVERALL OBJECTIVE: TO DEVELOP REGULATORY REVIEW CRITERIA FOR USE IN EVALUATING THE SAFETY IMPLICATIONS OF HUMAN FACTORS ASSOCIATED WITH CURRENT PLANTS USE OF ARTIFICIATION AND EXPERT SYSTEMS AND WITH ADVANCED CONTROLS AND INSTRUMENTATION

INITIAL OBJECTIVE: PERFORM INDUSTRY SURVEY, DEFINE ISSUES

NUREG/CR-5439, JUNE 1990

## HIGH RATED I&C ISSUES

WILL HIGH RESOURCE REQUIREMENTS OF VERIFICATION AND VALIDATION OF ADVANCED I&C DENY ANTICIPATED IMPROVEMENTS?

WHAT ARE THE CONFIGURATION CONTROL REQUIREMENTS FOR DIGITAL SYSTEMS BACKFITTED INTO NUCLEAR POWER PLANTS?

## MEDIUM RATED I&C FACTS, ISSUES

ACCEPTANCE CRITERIA FOR ADVANCED I&C ARE NEEDED TO AVOID ITS PREMATURE USE AND POSSIBLE ERRORS

USE GF ONE-WAY (OUTWARD) COMMUNICATION WITH SAFETY SYSTEMS ENHANCES SECURITY

# PROJECT: COMPUTER CLASSIFICATION

CONTRACTOR: OAK RIDGE NATIONAL LABORATORY

OBJECTIVE: REVIEW AND EVALUATE ADEQUACY OF EXISTING REGULATORY GUIDANCE FOR COMPUTER-BASED SAFETY SYSTEMS; WHERE NECESSARY, RECOMMEND DEVELOPMENT OF NEW GUIDANCE

A TWO YEAR PROGRAM INITIATED IN FEBRUARY 1989

DRAFT NUREG/CK UNDER REVIEW

PROJECT: EXPERT SYSTEM VERIFICATION AND VALIDATION GUIDELINES

CONTRACTOR: SCIENCE APPLICATION INTERNATIONAL CORPORATION

OBJECTIVE: TO DEVELOP AND DOCUMENT GUIDELINES FOR VERIFYING AND VALIDATING EXPERT SYSTEMS

DESCRIPTION: A JOINT RESEARCH PROJECT FUNDED BY EPRI AND NRC

A TWO YEAR PROGRAM INITIATED IN OCTOBER 1990

# HALDEN PROJECT: SOFTWARE TOOLS

SOSAT - A SET OF TOOLS FOR SOFTWARE SAFETY ASSESSMENT

> - DEVFLOPED BY HALDEN FOR TUV NORDDEUTSCHLAND

- FUNCTIONS

METRIC COMPUTATIONS

STATIC ANALYSIS OF CODES

DYNAMIC ANALYSIS

SYMBOLIC EXECUTION

PLAN TO DEVELOP ANALYSIS MODULES TO COMPARE PROGRAM WITH SPEC

TIME ANALYSIS

- NOW IN USE BY TUV NORDDEUTSCHLAND FOR SAFETY EVALUATION OF MICRO PROCESSOR BASED SAFETY SYSTEM



# HALDEN PROJECT: SOFTWARE TEST AND EVALUATION

GOAL: INCREASED SOFTWARE RELIABILITY

ONE SAFETY SYSTEM SPEC BY SAFETY AND RELIABILITY DIRECTORATE (UK)

INDEPENDENTLY CODED BY THREE TEAMS:

HALDEN REACTOR PROJECT TECHNICA'\_ RESEARCH CENTER OF FINLAND CENTRAL ELECTRICITY RESEARCH CENTER (UK)

. 8 -

SCOPE OF RESEARCH:

FAULT FINDING STRATEGIES TEST DATA SELECTION

#### HALDEN PROJECT: SOFTWARE TEST AND EVALUATION (CONT'D)

TEST DATA TYPES:

DETERMINISTIC DATA:

SYSTEMATIC DATA - MANUALLY PRODUCED TEST SPEC FUNCTIONS PLANT SIMULATION DATA

RANDOM DATA:

UNIFORM DISTRIBUTION, EQUAL PROBABILITY INSIDE DATA RANGE GAUSSIAN DISTRIBUTION, MEAN IN MID-RANGE UNIFORM DISTRIBUTION AT BOUNDARIES GAUSSIAN DISTRIBUTION AT BOUNDARIES

- 9 -

#### HALDEN PROJECT: SOFTWARE TEST AND EVALUATION (CONT'D)

#### TEST DATA EFFICIENCY:

FAULT DETECTION:

EACH PROGRAM SEEDED WITH 62 FAULTS TEST EACH PROGRAM BY INPUT DATA TYPE ALL FAULTS FOUND, MULTIPLE DATA TYPES REQUIRED

**RESULTS**:

MOST EFFICIENT:

UNIFORM DISTRIBUTION, INSIDE DATA RANGE

GAUSSIAN DISTRIBUTION, BOUNDARY LEAST EFFICIENT:

GAUSSIAN DISTRIBUTION, MEAN IN MID-RANGE

SYSTEMATIC DATA

#### PROJECT: CLASS 1E DIGITAL COMPUTER SYSTEMS

CONTRACTOR: RADC\SOHAR INCORPORATED

OBJECTIVE: CONDUCT INDUSTRY SURVEY AND DEVELOP THE TECHNICAL BASIS FOR REGULATORY GUIDANCE ON THE DESIGN, DEVELOPMENT, TEST, AND ACCEPTANCE OF CLASS 1E COMPUTER SYSTEMS

A ONE YEAR PROGRAM

RESPONDS TO SPECIFIC USER REQUESTS FROM NRR

DEVELOP A REGULATORY GUIDE ON DESIGN AND DEVELOPMENT OF CLASS 1E COMPUTER SYSTEMS USING SURVEY AND RESEARCH RESULTS

- 11-

## COMPUTERS IN NUCLEAR POWER PLANTS

5

- PRESENT ACTIVITIES JOE JOYCE NRR/ICSB
- FUTURE APPLICATIONS JIM STEWART NRR/ICSB
- SOFTWARE V&V RAY ETS SMARTWARE ASSOC.
- STATUS OF EDF P20 SYSTEM JOHN GALLAGHER NRR/ICSB

ACLANCY SC

## EARLY DESIGNS (1975 - 1980) CORE PROTECTION CALCULATOR (CPC)

- FIRST NRC REVIEW OF A DIGITAL SAFETY SYSTEM THAT USES COMPLEX ALGORITHMS
- 6 MINICOMPUTERS

ACCULATE AND

- 10 ANALOG & 2 DIGITAL TRIPS
- VENDOR/LICENSEE DESIGN AND INSTALLATION REQUIRED OVER 100 MAN YEARS
- NRC REVIEW EFFORT REQUIRED 18 MAN YEARS
   MAJOR REDESIGN REQUIRED
  - STAFF DEVELOPED 27 POSITIONS

# CONTINUED

## RPS-I1/B&W

- 4 MICROCOMPUTERS
- 7 ANALOG & 3 DIGITAL TRIPS
- RESULTS
  - 14 ERRORS FOUND DURING INTEGRATION TESTING, INDICATING LACK OF DETAILED REVIEW PRIOR TO TESTING
  - STAFF/BOEING PERFORMED SNEAK ANALYSIS • 9 SOFTWARE DOCUMENT ERRORS • SOFTWARE CONTAINED NO SNEAK CONDITIONS

#### STAFF REVIEW TERMINALED WHEN BELEFONTE CANCELLED



# (CONTINUED)

## WESTINGHOUSE RESAR-414 INTEGRATED PROTECTION SYSTEM

- · MAJOR CHANGE IN WESTINGHOUSE DESIGN
- MICROCOMPUTER-BASED SYSTEM ENCOMPASSING
   RPS
  - ESFAS

- CONTROL SYSTEMS
- REVIEW GROUP FORMED TO ASSESS DEFENSE-IN-DEPTH AND DIVERSITY OF THE IPS
- STAFF DEVELOPED SIMPLIFIED APPROACH
   ASSESS ONLY SYSTEM ARCHITECTURE
- Developed Block Concept And A Set Of Guidelines
- NUREG-0493, MARCH, 1979
- STAFF ISSUED PRELIMINARY DESIGN APPROVAL ON 11/78 WITH 9 OPEN ITEMS

# (CONTINUED)

### LESSONS LEARNED

- NRC SEARCHED FOR OTHER MEANS TO REDUCE REGULATORY RESOURCE EXPENDITURES
- INDUSTRY SHOULD PERFORM VERIFICATION & VALIDATION (V&V)
- Use ANSI/IEEE-ANS 7-4.3.2-1982
- ENDORSED WITH RG 1.152
- NUREG-0493 D-I-D AND DIVERSITY



## TYPES OF UPGRADES

- DIRECT REPLACEMENT OF A SINGLE ANALOG FUNCTION WITH A DIGITAL EQUIVALENT
- COMBINING SEVERAL ANALOG PROCESS STEPS
   INTO A SINGLE MICROPROCESSOR
- PARTIAL REPLACEMENT OF AN ANALOG SYSTEM
   WITH A DIGITAL SYSTEM
- COMPLETE REPLACEMENT OF AN ANALOG SYSTEM
   WITH A DIGITAL SYSTEM
- ADDITION OF DIGITAL SYSTEMS THAT INTERFACE WITH PLANT
- REPLACEMENT OF MINI-COMPUTERS WITH MICROCOMPUTERS

# PRESENT DESIGNS

| PLANT          | VENDOR/SYSTEM                                       | SER   |
|----------------|-----------------------------------------------------|-------|
| South Texas    | WESTINGHOUSE<br>Qualified Display Processing System | 5/87  |
| VOGTLE         | WESTINGHOUSE<br>Plant Safety Monitoring System      | 6/87  |
| PALISADES      | GAMMAMETRICS<br>THERMAL MARGIN MONITOR              | 10/88 |
| MCCLELLAN AFB  | GENERAL ATOMICS<br>TRIGA DIGITAL CONTROL CONSOLE    | 10/88 |
| SONGS 1        | WESTINGHOUSE<br>NIS                                 | 12/88 |
| BEAVER VALLEY  | Westinghouse<br>Plant Safety Monitoring System      | 4/89  |
| BIG ROCK POINT | GENERAL ELECTRIC<br>Neutron Flux System             | 4/89  |

0

1

8 C X S X X Y - 5 L B

]

10 m

# PRESENT DESIGNS

#### (CONTINUED)

| PLANT                      | VENDOR/SYSTEM                                                        | SER  |
|----------------------------|----------------------------------------------------------------------|------|
| WATTS BAR                  | WESTINGHOUSE<br>EAGLE 21 RTD BYPASS                                  | 5/89 |
| ARMED FORCES<br>RADBI INST | GENERAL ATOMICS<br>TRIGA DIGITAL CONTROL CONSOLE                     | 7/89 |
| GA TEST REACTOR            | GENERAL ATOMICS<br>TRIGA DIGITAL CONTROL CONSOLE                     | 8/89 |
| PRAIRIE ISLAND             | WESTINGHOUSE<br>Digital FW Control<br>Digital Median Signal Selector | 1/90 |
| ANO-2                      | Combustion Engineering<br>Upgraded CPC from Mini to Micro            | 1/90 |
| HADDEM NECK                | Foxbord<br>Phase I RPS Upgrade 2/90                                  | 3/90 |
| DIABLO CANYON              | Westinghouse<br>Digital Median Signal Selector                       | 3/90 |
| SEQUOYAH                   | Westinghouse<br>Eagle 21 Replaces RPS Except Neutron<br>Flux         | 3/90 |
|                            |                                                                      |      |

8

2

2

.....

11-

# PRESENT DESIGNS

#### (CONTINUED)

| PLANT               | VENDOR/SYSTEM                                            | SER   |
|---------------------|----------------------------------------------------------|-------|
| HADDEM NECK         | Foxboro<br>Phase II Upgrade                              | 4/90  |
| HADDEM NECK         | GAMMAMETRICS<br>NIS Upgrade                              | 4/90  |
| MAINE YANKEE        | FOXBORD<br>PRIMARY INVENTORY TRACKING SYSTEM             | 1/91  |
| TOPICAL REPORT      | GENERAL ELECTRIC<br>Wide Range Neutron Monitoring System | 10/90 |
| PEACH BOTTOM        | Foxboro<br>RPS Upgrade                                   |       |
| TROJAN              | Westinghouse<br>Remote Shutdown System<br>(Non-Safety)   |       |
| TURKEY POINT<br>3&4 | WESTINGHOUSE<br>Eagle 21 TAS FOR RTD Bypass              | 2/5/1 |



ACCOUNT, SLO

## RESEARCH REQUEST (14 ITEMS)

#### · DEVELOP DIGITAL STANDARD (IEEE 279)

- SOFTWARE/FIRMWARE
- DATA COMMUNICATION
- SECURITY
- RELIABILITY
- DIVERSITY

#### • DEVELOP ACCEPTANCE CRITERIA

- FAULT TOLERANCE
- FAULT AVOIDANCE
- SLOW DEGRADATION
- SOFTWARE TESTING
- CONFIGURATION MANAGEMENT
- SURVEY INDUSTRY FOR CRITERIA
  - TESTING CRITICAL SOFTWARE
  - REVERSE ENGINEERING
  - FORMAL SPECIFICATION
  - SOFTWARE AUDIT TOOLS
- DEVELOP STANDARDS/CRITERIA
  - EXPERT SYSTEMS
  - ARTIFICIAL INTELLIGENCE

## **FUTURE APPLICATIONS**

- EPRI ALWR (EVOLUTIONARY)
- GENERAL ELECTRIC ABWR
- COMBUSTION ENGINEERING SYSTEM 80+
- EPRI ALWR (PASSIVE)
- WESTINGHOUSE AP600
- GENERAL ELECTRIC SBWR
- COMBUSTION ENGINEERING SIR
- ABB/CE PIUS

]

- MHTGR/CANDU/....
- RETROFITS AND UPGRADES

## DESIGN FEATURES

1:10

- DISTRIBUTED DIGITAL MICROPROCESSORS
- AUTOMATED OPERATIONS
- CRT AND PLASMA DISPLAYS
- FIBER OPTICS

]

- SELF DIAGNOSTICS
- TRIPLICATED CONTROL SYSTEMS

· - -

- DATA HIGHWAYS
- EXPERT/AI SYSTEMS

## **REVIEW CRITERIA**

#### REGULATIONS

- STANDARD REVIEW PLAN
- RG 1.152
- ANSI/IEEE 7-4.3.2 1982

#### ADDITIONAL REVIEW GUIDANCE

- PREVIOUS REVIEWS
- IEEE STANDARDS 1012/729/730
- IEC STANDARDS 880/987
- MILITARY STANDARDS 2167
- DESIGNER IN-HOUSE STANDARDS
- FIPS/NSAC/FOREIGN

## **REVIEW ISSUES**

- DIVERSITY
- ELECTROMAGNETIC INTERFERENCE
- EXPERT/AI SYSTEMS
- DESIGN CERTIFICATION LEVEL OF DETAIL
- . \* SIVE PLANT CRITERIA
- PASSIVE PLANT HVAC
- COMMERCIAL DEDICATION
- SEGMENTATION
- SEPARATION/INDEPENDENCE
- DEFENSE IN DEPTH
- FAULT DETECTION/DIAGNOSTICS
- RELIABILITY

## ONGOING DEVELOPMENT

- STANDARDS DEVELOPMENT
  - ANSI
  - IEEE
  - ISA
  - IEC
  - NRC REGULATORY GUIDES AND SRP
- INTERNATIONAL TECHNICAL EXCHANGES
  - REGULATORY
  - VENDORS
  - UTILITY
  - RESEARCH
  - FRANCE / UNITED KINGDOM / CANADA / GERMANY / SWEDEN / NORWAY
- NRC RESEARCH
  - NRR USER NEEDS



# SOFTWARE VERIFICATION & VALIDATION (V&V)

### AGU R. ETS SMARTWARE ASSOCIATES, INC

6 FEBRUARY 1991

© Smartware Associates, Inc.

file: V&V title

# SOFTWARE V&V

Ensure functional correctness

.

 Confidence that performs safety functions

© Smartware Associates, Inc.

# CRITERIA FOR SAFETY SOFTWARE

- ANSI/IEEE Std 7-4.3.2 (1982)
- RG 1.152
- NUREG 0493
- OTHER CRITERIA

© Smartware Associates, Inc.

# SUBJECT FOR AUDITS

- Process for V&V
- Independence of V&V
- Application of V&V
- Requirements Documentation
- Configuration Management
- Development Methodology

# Audit Methodology

Questions

Thread Concept

© Smartware Associates, Inc.

file: Methodology





A PLANT



file: SW review

# CONCLUSION

- V&V Is Proven Technique
- V&V Is Technology Independent
- V&V Can Be Supplemented With Other Standards

© Smartware Associates, Inc.

## SOFTWARE VERIFICATION AND VALIDATION

VERIFICATION AND VALIDATION (V&V) IS A SYSTEMS ENGINEERING PROCESS EMPLOYING A RIGOROUS METHODOLOGY FOR EVALUATING THE CORRECTNESS AND QUALITY OF THE SOFTWARE PRODUCT THROUGH THE SOFTWARE LIFE CYCLE.

#### VERIFICATION

THE COMPARISON OF THE STAGE-BY-STAGE SOFTWARE DEVELOPMENT TO DETERMINE THAT THERE IS A FAITHFUL TRANSITION OF ONE STAGE (SUCH AS THE DESIGN) INTO THE NEXT STAGE (SUCH AS THE IMPLEMENTATION)

## VALIDATION

COMPARES THE SOFTWARE REQUIREMENTS SPECIFICATIONS WITH THE FUNCTIONS IMPLEMENTED BY THE COMPUTER PROGRAM IN THE COMPUTER HARDWARE. ALSO PROVIDES ASSURANCE THAT THE OVERALL ACCUMULATION OF THE UNDESIRED STAGE-TO-STAGE SIDE EFFECTS HAVE BEEN CORRECTED.

NOW/DIGITAL MPD/012990.1

FRENCH NY ISC SYSTEM

3

#### FOR

.....

#### CHOOZ-BI NPP

SUSPECTED PROBLEM

J. GALLAGHER SICB

2/6/9/





1.

.

i.

H 1 H

37 3. T ()





ž



FIGURE 89 - GENERAL STRUCTURE OF THE N4 CONTROL AND INSTRUMENTATION STATES



and a

NAME OF COLUMN



and the second

NY-CAPACITY AND PERFORMANCE OF SURVEILLANCE & CONTROL SYSTEMS

SYSTEM MONITORS & CONTROLS 10,200 ITEMS

DIGITAL CONTROLS - 8400 (ON-OFF)

ANALOG CONTROLS - 330 (CONTINUOUS)

DIGITAL DATA

55000 @ 1 SEC 10500 @0.05 SEC DIGITAL RATE = 265,000 / SEC

ANALOG DATA

100 @ 0.5 SEC 900 @ 2 SEC 1450 @ 20 SEC ANALOG RATE = 730/SEC

#### PZO SUSPECTED PROBLEM

