June 27, 2007

Ugly Babies: Another Reason Why We Need Testers

Posted by Ben Simo

... I got on the train ... And I noticed that the woman across from me in the aisle had her baby with her. Ugly baby. Ugly baby.

From the other end of the coach comes this guy and he was very drunk and he was staring at the baby. ... And the guy said, "I'm looking at that ugly baby. That's a horrible looking baby lady. Where'd you get that baby from?"

And the woman said, "I don't have to take that!" And she snatched the emergency cord and the train came to a screeching halt, and the conductor came running in.

Now this was his moment. At this moment he represented the Pennsylvania Railroad. And he said, "what's going on here?"

And the woman said, "This man just insulted me. I don't have to spend my money and ride this railroad and be insulted. I'd rather walk."

And the conductor said, "calm down! Calm down! Madame there's nothing, nothing that the Pennsylvania Railroad will not do to avoid having situations such as this. Perhaps it would be more to your convenience if we were to ... let you sit somewhere else in the coach. And as a small compensation from the railroad ... we are going to give you a free meal; and maybe we'll find a banana for your monkey."

- Flip Wilson

We are like parents when it comes to our creations: we can be blind to ugly.

I am a believer in developer testing. Good unit testing is essential to good software. Practices like Test Driven Development (TDD) help ensure that software works from the start. Anyone that's been in the software development business for long knows that the cost of defects exponentially grows as a project progresses. Preventing and finding defects early should be a goal in every software project. Early testing by developers does not remove the need for testing by someone other than the coder.

There are a number of reasons for "independent" testing. These reasons often include references to requirements testing, end-to-end testing, integration testing, system testing, and independent verification. These reasons focus on testing things as complete systems or from a user's viewpoint. I have another reason: Creators can be blind to ugly.

Creators have an emotional attachment to their creation. This attachment can get in the way of objective evaluation. It is difficult to be both creator and critic. It is as if creativity and critique are opposite sides of a scale. We can be both, but increasing one can harm the other. Creators need to balance the two without allowing self-criticism to hamper creativity.

I think its good for creators to have an emotional attachment to the product of their thoughts and labor. External critics can help a creator improve their creation without stifling creative thinking.

Creating and critiquing are separate skills. When one group can focus on one and another group can focus on the other: together they are stronger (if the critics are more tactful than the drunk in Flip's story).

At least that's my opinion. Now I open the floor to the critics. :)


June 24, 2007

Software Development Life Cycle Explained

Posted by Ben Simo

A couple weeks back, there was some discussion on the software-testing group about the use of the term "SDLC" on resumes. (Matt Heusser posted some excerpts from this conversation here.) My warning flags go up when people claim to have a "full understanding" of the SDLC. I sometimes see this as an indication that someone may not be as experienced as they claim. The "SDLC" will vary from one company to another and even one project to another. SDLC is a process documentation term -- and there are many differing processes used to develop software. Its not how people talk in the real world.

I recently re-stumbled upon a description of the SDLC that seems to be fairly common across many companies and projects. Perhaps this is what people are referring to when they claim full knowledge of the SDLC. :)

Software doesn’t just appear on the shelves by magic. That program shrink-wrapped inside the box along with the indecipherable manual and 12-paragraph disclaimer notice actually came to you by way of an elaborate path, through the most rigid quality control on the planet. Here, shared for the first time with the general public, are the inside details of the program development cycle.

  1. Programmer produces code he believes is bug-free.

  2. Product is tested. 20 bugs are found.

  3. Programmer fixes 10 of the bugs and explains to the testing department that the other 10 aren’t really bugs.

  4. Testing department finds that five of the fixes didn’t work and discovers 15 new bugs.

  5. See 3.

  6. See 4.

  7. See 5.

  8. See 6.

  9. See 7.

  10. See 8.

  11. Due to marketing pressure and an extremely pre-mature product announcement based on over-optimistic programming schedule, the product is released.

  12. Users find 137 new bugs.

  13. Original programmer, having cashed his royalty check, is nowhere to be found.

  14. Newly-assembled programming team fixes almost all of the 137 bugs, but introduce 456 new ones.

  15. Original programmer sends underpaid testing department a postcard from Fiji. Entire testing department quits.

  16. Company is bought in a hostile takeover by competitor using profits from their latest release, which had 783 bugs.

  17. New CEO is brought in by board of directors. He hires programmer to redo program from scratch.

  18. Programmer produces code he believes is bug-free.

Even the above demonstrates some naivete.

Bugs are not necessarily the fault of a developer. Many bugs are defects in the requirements and design; not the code of any specific developer.

Developers rarely get royalty checks or bonuses.

I've never known of a "testing department" serving a single "developer". Now that's quite some tester to developer ratio.

Perhaps "SDLC" is just a term we use to model something that is really complex.


June 23, 2007

How many load generators do I need?

Posted by Ben Simo

How many load generators do I need to run a [insert number here] user load test on a web application?

I am often asked how many load generators are needed for a load test with a certain number of simulated users.

My answer: It depends.

It depends on the system under test. It depends on your test tool. It depends on your specific script. It depends on your load generation hardware.

There is no straight forward answer to this question. There is no formula that can be used to extrapolate an answer. There is no one-size-fits-all rule of thumb. Some tool vendors will attempt to provide an answer, but they are wrong. I once spent a half hour arguing with a tool vendor support representative that claimed that I could run 200, and no more than 200, simulated users per load generator regardless of what those simulated users did or what hardware I used to host them. I had successfully simulated 1200 users with this tool for one script but could not simulate 50 users with another script. The number I could run on that same piece of hardware varied depending on the script.

The real question being asked is: How much load can I put on a load generator without impacting performance?

Isn't this one of the most common questions that load testing attempts to answer? Performance testers, of all people, should understand that there is no single formula for determining how much load you can place on a load generator system.

That "system" includes computers, software on those computers, and the network between the load generation computers and the system under test. To determine the requirements for this system we need to monitor that entire system. We need to monitor CPU usage. We need to monitor memory usage. We need to monitor whatever other system resources the script and test tool may impact. We need to monitor bandwidth usage. We need to monitor at the start of a test and at the end of a test. We need to monitor and load test the load generation environment just as much as we need to monitor and test the system under test.

So, where do we start?

Test the test environment. I start by running a small number of simulated users from a single load generator system. I monitor the system resources on the load generator. I estimate the bandwidth between me and the system under test. Once I have system resource numbers for my small number of simulated users, I extrapolate how many I think can run on the system and I test it. If I find that it is too many, I decrease the number of users. If I think my environment can support more, I will test that.

Even when I think I know the answer, I monitor the test environment during load tests. This gives me information should I ever question the environment. Sometimes I discover that something in my test environment is the bottleneck and not the system I am testing. When this happens, we need to be willing to say "oops" and try again.

My real concern.

My real concern about such questions is not the answer, but that they are being asked with an expectation for a formulaic answer; and they are being asked by people that are designing, executing, and analysing load tests. The following may seem harsh, but I believe it to be true:

I believe anyone that is looking for a one-size-fits-all answer to the question of how many load generators are required should not be leading any load testing.

They should not be designing tests. They should not be analyzing results. They may be able to safely script what other people have designed. Even new load testers quickly learn (if they are paying attention) that even minor changes in activity within an application can impact performance and the load generated on the system under test.

There are no one-size-fits-all answers. That's why we test. Understanding this should be requirement #1 for selecting any load tester.

What is a new load tester to do?

1) Educate yourself.

Most people I talk to got into load testing without much direction. This was also the case for me. There seems to be a perception that anyone can learn a tool and be a great load testers. This tool-centric emphasis often leads people astray. Some smart people I know at a leading tool vendor even tell me that their training is just the introduction. I encourage all new load testers to seek education in addition to the formal vendor training.

Some great resources on the web are:

Get to know your tools. However, it is more important to get to know performance testing. The tools are the easy part.

2) Befriend experienced load testers, network engineers, systems engineers, systems adminstrators, developers, and anyone else that may have useful information.

It is disappointing that new testers are often sent out into unchartered territory all by themselves. This has happened to me. I don't like being set up to fail. I don't like seeing other people being set up to fail.

So, why didn't I sink? I had some great mentors that helped keep me afloat. These mentors were rarely assigned to a project with me, so I had to seek them out.

Learn to ask questions. Performance and load testing often requires information gathering from more sources than any other testing. Asking questions is part of the job. If you don't understand something, seek out the answer. If you're embarassed to ask publically within your project team, ask someone privately. Look for resources outside your project. Find others in and outside your company that are willing to help. Participate in online forums -- with the understanding that there's also lots of misinformation on the web.

The Bottom Line

There are no magic formulas. That's why we test. Educate yourself. Get to know the people in your neighborhood.


June 21, 2007

Marching Through Hell

Posted by Ben Simo

There is a Churchill quote about when you're marching through hell, just keep marching. Well, that's basically what we're doing. We need to slow this down and get it right.
- Charles Burbridge
CFO, Los Angeles Unified School District

A couple weeks ago, the Los Angeles Unified School District's new computer system incorrectly paid over 32,000 employees. That's almost a third of the district's employees. And the trouble started this past January.

I've been involved in some software projects with a "hell" phase in the SDLC. I've worked on projects that failed. I've been involved with projects that went millions over budget. However, I'm thankful that I have never worked on a project for which tens of thousands of production issues were reported in a handful of months.

Through the replacement of our aging financial, human resources, payroll and
procurement systems, the District will:
  1. Dramatically improve service delivery to schools
  2. Radically improve the efficiency of District operations and our ability to manage them
  3. Reduce/eliminate paperwork and redundant manual processes
  4. Increase accountability and transparency to the public in the use of public funds
  5. Provide better data for decision makers and stakeholders at all levels
- LAUSD ERP Project Vision Statements, March 2004

This enterprise software development project started out with good intentions and lofty goals. However, it appears that something went terribly wrong in implementation. The district's public web site does not include much information about the problems. The latest document I saw with information for employees with payroll problems has a February date. I also find it interesting that a more recent document is missing item #4 from the above list. I guess that when things go wrong, there's no need to keep the public and employees informed. Right? Wrong. I don't know the details of what went wrong, but I do know that when things go wrong people tend to communicate less. I also know that it can be difficult to admit failure. This project reminds me of the six stages of a project that I occasionally encounter in forwarded emails and on bulletin boards. It will be interesting to see what and who is eventually blamed.

The six stages of a project
  1. Enthusiasm
  2. Disillusionment
  3. Panic
  4. Search for the guilty
  5. Punishment of the innocent
  6. Praise and honor for the non participants

Enterprise software systems can be very complex. The vision statement says that they chose a commercial package and partnered with an integrator because they want to configure software and not customize software. A payroll system for over 100,000 school district employees has got to be one of the most complex "configuration" projects around. I believe there were good intentions, but this looks like a game of semantics. Enterprise systems are difficult to configure and deploy -- especially when you have numerous customized needs.

I don't know what kind of testing was performed on this system before it went to production. The original visions document references the need for testing support.
A complete test environment (development, QA, and production instances), including adequate infrastructure, staff (DBA), storage and network capacity, a full volume of test data.
- LAUSD ERP Project Vision Statements, March 2004

I wonder what happened in implementation. Obviously, something was missed or skipped.

I did notice that the project appears to be on schedule. When Churchill said "If you're going through hell, keep going.", the point was to find your way out of the situation -- not ignore it.

I was once offered a "great opportunity" to work on a "high profile" project. I accepted that offer. I learned a great deal from that experience. One of the things I learned is that it may not be a good thing to be visible when there are problems that are out of your control. Yet we have to take risks to gain a reward.

Right now, I am thankful that I am not "marching through hell". And I'm thankful that my paychecks come as expected.

To those in the LAUSD, these problems will pass. Right?
We expect these problems to subside in the very near future.
- Charles Burbridge
CFO, Los Angeles Unified School District
How near is very near?

If you have any more information about the development and testing practices on this project, I'd like to see it.


June 19, 2007

Crapper Automation

Posted by Ben Simo

Mechanization best serves mediocrity.
- Frank Lloyd Wright

Automation in public restrooms is becoming commonplace. The restrooms at my workplace have recently been remodeled and automated. The lights turn on and off automatically. The toilets flush automatically. Soap dispenses automatically. Faucets turn on and off automatically. Paper towels dispense automatically.

All of the above appear to be good candidates for automation. Automation in the restroom is supposed to improve cleanliness, reduce maintenance, and cut supply costs. However, I am not certain that this is true in implementation. I have found the following bugs in our newly automated water closets.

  • I have to walk about ten feet into the restroom in the dark before the lights come on.

  • It has been reported that if one spends too much time seated on the throne, the lights will turn off leaving the occupant to finish their business in the dark.

  • The paper towel dispensers spit out towels when someone walks by the dispenser.

  • The faucets come on when I stick my hand under the neighboring automatic soap dispenser but the water flow stops by the the time I get my hand to the water. I then have to remove my hands from the sink and put them back under the faucet.

  • If I'm not quick enough, the soap dispensers drop the soap into the sink instead of on my hands.

  • The water temperature is often too hot or too cold -- and there's no way to adjust it.

  • Sometimes the soap dispenser drops soap on my clean hands as I remove them from the sink; requiring that I wash again.

  • The paper towel dispenser spits out either too much or two little towel. One sheet is not enough to dry my hands but two is more towel than needed.

  • Using two too-large towels contributes to trash receptacle overflow.

  • It is difficult to retrieve items accidentally dropped in the sink without engaging the soap and water. I hope no one ever drops a cell phone or PDA in a sink.

None of these automated tasks are particularly sapient processes. However, design and implementation flaws in the automation have helped create new problems. Automation that was designed to save time and money can end up costing more if it is not properly implemented. A single mistake can be perpetually amplified by automation.

The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency.
- Bill Gates

The privy automation reminds me of the time I installed a home automation system in a former home. I installed an automatic doorbell that would ring when people stepped on my porch. I put some lights on motion detectors so they'd automatically go on and off as people entered and exited rooms. I set other motion detectors to turn off manually-engaged lights some time after the last detected motion. Outside lights were configured to not come on between sunrise and sunset. My bedroom lights came on just before the alarm clock went off. I could turn lights on and off with a remote from anywhere in the house. I had a single button I could press when leaving the house that turned off all the lights.

My home automation required a great deal of tweaking after it was installed. I discovered that I needed motion detectors at both ends of the hallway to make the lights come on before I got half way through the hall. I learned that turning off lights in the bathrooms and bedrooms based on motion was a bad idea. I had to reconfigure outdoor motion detectors to prevent the lights and doorbell from going on when cats walked across the front yard. It took a few months to get everything configured as I liked it. By the time I had everything tweaked right, my electric bills had dropped by 20%. I was happy with this automation and it paid for itself in a couple years -- plus I had fun playing with the gadgets.

Whether we are automating toilets, towel dispensers, household lights, or software testing tasks: good design, implementation, and testing are a necessity. In my experience, intelligent application is more important than the tools.

I'd like to hear your automation success and failure stories. Please share them here.


June 16, 2007

Installing Windows Vista

Posted by Ben Simo

I have not yet made the jump to Windows Vista on any of my personal machines.

We have installed Vista on some machines in the test lab where I work. When we started playing with a Vista beta, Vista was not allowed on our network but our customers were using Vista; demanding that we test our products on Vista. There were numerous technical and political battles that had to be fought to get Vista installed in our test lab.

We quickly discovered that Microsoft's minimum system requirements are just the minimum to install the OS. I can't imagine any user being happy with Vista on a machine that just met the minimum requirements.

We encountered incompatible DVD drives from a major PC manufacturer. We followed Microsoft's instructions for copying the DVD to a hard drive for installation from the hard drive instead of the DVD only to have the Vista install inform us that it cannot be installed as Microsoft instructed.

We've had to call Microsoft for permission to reinstall failed installations.

I have decided that I do not need this trouble at home. I have not yet seen any feature worth the upgrade. I would rather not have to buy new hardware. For now, I am going to stick with Windows 2000, Windows XP, and Linux.

For those that really want to make the leap, I suggest you watch the following video to see what kind of machine is compatible with Vista ... or is it?

Vista install in 2 minutes

Please let me know how your install goes.


June 15, 2007

Modeling the Windows Calculator: Part 2

Posted by Ben Simo

Adding Basic Validations

In the previous post, I created a simple model for starting and stopping the Windows calculator, and for switching between standard and scientific modes. I then created the code needed to execute that test and ran a test that hit each of the defined actions once.

As the next step, I reran the test with the MBTE configured to capture GUI object information as it executes. This created a checkpoint table that I then ran through a script that removes duplicates and combines rows that are the same for multiple states. I also manually reviewed this table to verify that the reported results are as I expected. I made some tweaks to the table based on my expectations. I can then use this checkpoint table as input for the next test execution. You may view the edited checkpoint file using the link below.

The checkpoint table is one of two table formats that I use for defining test oracles. I call the other format a state table. The state tables contain one validation per row and have additional fields for creating user-friendly descriptions of the validation. The state tables can also be used to reference external code for complex results validations. The checkpoint files contain one GUI object per row and the columns define the properties to validate and the expected values. While not as user-friendly as state tables, checkpoint tables are easy to automatically generate during test execution and reuse as input for future tests.

My calculator checkpoint table currently contains only positive tests to ensure that expected objects appear as expected. It does not yet contain any validations to ensure that the unexpected does not occur. For example, it contains no check to ensure that the calculator stops when the window is closed.

I then created a state table and added two oracles stating that the calculator window should exist when running and not exist when stopped. I gave each of these a failState value of "restart" to indicate that if these checks fail, the application should be restarted to resume testing.

My model currently contains the following files:

I then ran a test with this model. The MBTE executed a test that hit each of my test set actions once without me needing to give it a sequence of test steps. The MBTE automatically generated the test steps based on the model.

The results from this test execution may be viewed here. Some features in the results require Internet Explorer and may not function in other browsers. These results are usually placed on a file server, so there may be issues I have not yet noticed when accessing them from a web server.

There are some failures reported in the results. These appear to be tool issues rather than bugs in the Windows Calculator. I will look into these failures later. Do you have any ideas about the failures?

The color-coded HTML results make it easy to tell what happened. Each row indicates what happened, where the action or validation was defined, the code executed, and other pertinent information. Please explore the results and send me any feedback.

What would you like to add to this test next? More validations? Additional actions?

Do you have any observations or questions about this automation approach? Please add them to the comments.

Modeling the Windows Calculator


June 14, 2007

Modeling The Windows Calculator: Part 1

Posted by Ben Simo

I have received a number of requests for some sample models. Based on a question I received a couple weeks ago, I'd like to create a test model for the Windows Calculator. The Windows calculator contains some things that are very simple to model as a state machine (such as switching between standard and scientific modes) and other things that do not have clear distinguishable states (such as performing the actual calculations).

I plan to model the calculator a piece at a time in a series of blog posts. I welcome your input.

I will start by modeling the obvious states that I see in the Calculator's user interface.

At the highest level, I can partition the Calculator's behavior into two states: running or not running. Next, the calculator has two major modes of operation: standard and scientific. After a little experimentation, I see that if I stop the calculator it will return to the previous mode when it is restarted. These transactions can be modeled as follows:

calc.standard -> calc.scientific
calc.scientific -> calc.standard
calc.standard -> stopped.standard
calc.scientific -> stopped.scientific
stopped.standard -> calc.standard
stopped.scientific -> calc.scientific

One problem with implementing the above in a machine-executable form is that we don't know the state of the calculator the first time we start it. This requires that we code detection of the starting state at the start of the test. This can be done by modeling virtual states that have guarded transitions going out. For example, the following can be used to start the test. The state of "start" is my MBTE's starting state.
start -> detectMode
detectMode (if standard) -> calc.standard
detectMode (if scientific) -> calc.scientific

In addition to the built-in state of "start", my MBTE has states called "restart" and "stop" that are used to restart an application after a failure and to shut down and cleanup at the end of a test. These state transitions should also be added:
restart -> detectMode
stop -> stopped

Now that I have defined the basic high-level transitions, I can put them in an action table and create the automation code needed to make these transitions happen.

The action table may be viewed here. The MBTE generated the following image for the model. (Click the image for a larger version.)

The next step will be to add some validations for the states modeled so far.

While I was running a test on this model, my daughter noticed a potential bug in the calculator that I had not noticed before. This model does not yet contain any calculations. Any idea what the bug may be?

Please send me your questions and suggestions for what should be added to this model next.


June 12, 2007

Sapience In The Age Of Automation

Posted by Ben Simo

Society has unwittingly fallen into a machine-centered orientation to life, one that emphasizes the needs of technology over those of people, thereby forcing people into a supporting role, one for which we are most unsuited. Worse, the machine-centered viewpoint compares people to machines and finds us wanting, incapable of precise, repetitive, accurate actions. Although this is a natural comparison, and one that pervades society, it is also a most inappropriate view of people. It emphasizes tasks and activities that we should not be performing and ignores our primary skills and attributes -- activities that are done poorly, if at all, by machines. When we take the machine-centered point of view, we judge things on artificial, mechanical merits. The result is continuing estrangement between humans and machines, continuing and growing frustration with technology and with the pace ans stress of a technologically centered life.

- Donald A. Norman
Things That Make Us Smart:
Defending Human Attributes in the Age of the Machine

In his book, Things That Made Us Smart, Donald Norman argues that the technology that has made us smarter by allowing us to manage the artifacts of cognition needs to be made to conform to people instead of the more common practice of people conforming to technology.

Society often heralds the benefits of machines and overlooks the wonders of the human mind. This seems to be especially true in software development and testing. Many people talk about automating all tests with the assumption that the traits of the machine are better than those of human testers.
In any activity done by a human, there is the human aspect (not practically mechanizable), a physical aspect involving translation or transformation of matter and energy (mechanizable in principle), and a problem-solving aspect (sometimes transformed by mechanization, sometimes not affected).
- James Bach, Sapient Processes
If we limit our definition and practice of testing to the strengths of the machine, we are not really testing: we are overlooking the insight, intelligence, and wisdom that sets us human beings apart from the machines we create.
(noun) sapience ability to apply knowledge or experience or understanding or common sense and insight
In his Blog post Sapient Processes, James Bach applies the term sapience to processes that require skilled human beings. Sapient processes cannot be fully automated. Testing is a sapient process.

Automation can be a useful tool to help us manage the artifacts of cognition during testing. However, no automation on its own is sapient. Therefore, no automation on its own is testing.

Don't anthropomorphise the machines. Doing so requires stepping down to the level of machines.

The future masters of technology will have to be light-hearted and intelligent. The machine easily masters the grim and the dumb.
- Marshall McLuhan


June 10, 2007

Performance: Investigate Early, Validate Last

Posted by Ben Simo

Performance and load testing is often viewed as something that has to be done late in the development cycle with a goal of validating that performance meets predefined requirements. The problem with this is that fixing performance problems can require major changes to the architecture of a system. When we do performance testing last, it is often too late or too expensive to fix anything.

The truth is that performance testing does not need to happen last. Load test scripting is often easier if we wait until the end, but should we sacrifice quality just to make testing easier?

Scott Barber divides performance testing requirements and goals into the following three categories:

Speed is where things get fuzzy. Some speed requirements are quite definable, quantifiable and technical; others are not.
- Scott Barber

Scott says that hard measurable requirements can usually be defined for scalability and stability; however, meeting technical speed requirements does not ensure happy users. I often hear (and read) it said that one must have test criteria defined before performance testing can start. I disagree. When requirements are difficult to quantify, it is often better to do some investigative testing to collect information instead of validating the system against predefined requirements.

In additional to the three requirements categories, Scott argues that there are two different classifications of performance tests.

  • Investigation -- collect information that may assist in measuring or improving the quality of a system
  • Validation -- compare a system to predefined expectations
Performance testers often focus on the validation side and overlook the value they can bring on the investigation side. Sometimes we need to take off our quality cop (enforcement) hat and put on our private investigator hat and test to find useful information instead of enforce the law (requirements). Testers that work primarily with scripted testing are accustomed to the validation role of functional testing and try to carry that info performance testing. The problem is that most performance testing is really investigation -- we just have trouble admitting it.

Investigate performance early

Validate performance last
Traditional performance testing is treated as a validation effort with technical requirements. It is often said that a complete working system is required before testing can begin. Extensive up-front design is common. Tests are executed just before release and problems are fixed after a release. A couple years ago, Neill McCarthy asked attendees at his STAR West presentation if these really are axioms. When we consider the potential of investigative testing, these assumptions of traditional performance testing quickly dissolve.

Agile Manifesto

  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan
Neill recommended that we apply the Agile Manifesto to early performance testing. How can we apply agile principles to investigative load testing?

Model user behavior as early as possible; and model often. A working application is not needed to model user behavior. Revise the model as the application and expected use change. Script simple tests based on the model. Be prepared to throw away scripts if the application changes.

Conduct exploratory performance tests. Apply exploratory testing techniques to performance testing: simultaneous learning, test design, and test execution. Perform "what if" tests to see what happens if users behave in a certain way. Adapt your scripts based on what you learn from each execution.

Evaluate each build on some key user scenarios. Create a baseline test that contains some key user scenarios that can be run with each build. A common baseline in the midst of exploratory and investigative tests provides supports comparison of builds.

Investigative agile performance testing can increase our confidence in the systems we test. Exploratory tests allow us to find important problems early. Testing throughout the development lifecycle makes it easier to measure the impact of code changes on performance.



June 9, 2007

Free Internet: Flushing The Web

Posted by Ben Simo

Although we understand that there's a lot of crap on the web, we also believe strongly in providing equal opportunity access to all our users.
- Google
The folks at Google are working hard to provide innovative services to the world. A couple months ago, they announced the Beta release of their free in-home wireless broadband service: Google TiSP.

Google TiSP promises free, fast, and easy to install wireless broadband service. Sorry, this services is only available in the United States and Canada.

Check it out here.

What is TiSP?

Toilet Internet Service Provider
When things go wrong with TiSP, they go very, very wrong. Let's leave it at that.

Happy Flushing!
Now where'd I put my WiiHelm?


June 8, 2007

Bad Messages

Posted by Ben Simo

I regularly spend a great deal of time tracking down the root cause of software errors -- both on and off the job. Much of the investigation effort could have been avoided if I were not presented with incomplete or incorrect error messages. The text of error messages appears to be commonly overlooked by software developers and testers.

One of my early test automation development tasks was to fix the error messages in a test tool. The tool was used to validate the structure of data exchanged between computer systems. In nearly all cases, this tool displayed an error dialog window stating that the data was not valid. This tool gave the user (a tester) no information about what was wrong with the data. The lack of detail in these messages required that the user manually examine the data -- bit by bit -- with a protocol analyzer. This turned investigation of errors reported by an automation tool into a tedious manual task. The tool encountered a problem but did not report the source of the problem. I reviewed the code and added more detail to the error messages in this test tool. I also added logging of the data exchange in a format that made sense to human beings that could be reviewed without a protocol analyzer. The improved error messages and human-readable logging greatly decreased the troubleshooting and investigation time.

I am amazed at the poor error messages in test automation tools. Test automation tools and frameworks are likely to encounter errors in the software under test. I believe that this requires better error reporting and handling than many other tools. I am also disappointed in how difficult some of the automation tools make it to create scripts that run unattended.

One tool I use regularly often stops executing tests and displays a message like the following.

It took a great deal of effort to figure out how to coax the tool into returning error codes that could be handled in code instead of prompting a user during an automated test execution. The tool vendor's support personnel didn't seem to understand why it was a problem that I could not code their tool to handle an error without human intervention at run time. They kept telling me to fix the application under test to make the test tool's error dialog go away.

I also recently uninstalled a different test tool that had a horrible uninstall interface. It displayed a window with an uninstall progress bar that didn't move. After several minutes, the application displayed an "Install Complete" window underneath the progress window. The progress bar began to slowly move only after I clicked the "Finish" button on the buried completion window. Then it prompted me with something like the following.

Do I want to erase all files? What's the context? Of course, I don't want to erase all my files. I wonder how much the maker of this software spends on support calls about this message.

I used to receive numerous inquires from new computer users about the Windows "program has performed an illegal operation" error message. Many users thought this meant that they did something illegal and the police would soon be knocking on their door. People familiar with computers understand that this error message is about the program's interaction with the operating system and hardware; however this error message is misleading. This error message is designed for developers, not users. Good error messages tell the user what they need to know about the problem and what to do about it. Good error messages explain the problem but do not overwhelm the user with information that is useless to the intended user. This means that different kinds of applications require different kinds of error messages.

If you write software, please provide your users with accurate messages tailored to the user -- not developers. If you create test tools or automation frameworks, provide testers with information that is useful in determining what happened. Please.

People who write framework software should spend more time on useful error messages that show people why the error occurred and give a clue as to how to fix it.
- Eric M. Burke,

Want to practice creating better error messages? Try the Error Message Generator.


June 7, 2007

Model-Based Test Engine Benefit #4: Generate and execute new tests – and find new bugs

Posted by Ben Simo

The last -- and perhaps the best -- major benefit of implementing a Model-Based Test Engine (MBTE) is automation that is capable of generating and executing tests that have not previously been executed manually.

Traditional regression test automation simply retraces test steps that have already been performed manually. This may find new bugs that show up in a previously tested path though an application but will not find bugs off the beaten path. In his book "Software Testing Techniques", Boris Beizer compares the eradication of software bugs to the extermination of insects.
Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffectual.
- Boris Beizer
Software Testing Techniques
When we apply any method to finding bugs, we will find bugs that are found by that method. However, other bugs will remain. Finding new bugs requires variation in testing, not repeating the same thing over and over. Repeatability is often advertised as a benefit of traditional test automation. However, complete repeatability often hurts more than it helps. It can also give a false sense of security when repetitive automated test executions do not find bugs. James Bach compares repetitive tests to retracing a cleared path through a minefield.
Highly repeatable testing can actually minimize the chance of discovering all the important problems, for the same reason that stepping in someone else’s footprints minimizes the chance of being blown up by a land mine.
-James Bach
Test Automation Snake Oil
The randomized action selection of a MBTE leads to execution of a variety of paths though an application with a variety of data. This is likely to try things that have not been executed manually. Every randomly generated test will not be of value. However, computers are able and willing to run tests all night and on weekends for a lower cost than human testers.

As with any automation, tests generated and executed by a MBTE are no better than the model provided by the human designer. If the test or model designer does not model something, that thing will not be tested. Good test design is essential to useful automation.
Use model-based automation as an interactive part of the overall testing process. Use automation to collect information for human testers instead of attempting to replace them.

Use your brain. Do some exploratory modeling. Model the behavior you expect from the application and use the MBTE like a probe droid to go out and confirm your expectations. Then use the results of each test execution to refine your model for the next execution. Continually update your models improve your testing.

Go out and find new bugs.



Posted by Ben Simo

Sometimes you just gotta read the instructions.

Here's a short student film directed by a friend.


What's the message here? Which character do you relate to?

I think I relate to the person that broke the computer in the first place. I could'a done that.


June 6, 2007

Stupid Questions

Posted by Ben Simo

Why do we drive on parkways and park in driveways?

Why do noses run and feet smell?

If Jimmy cracks corn and no one cares, why is there a song about him?

Why do we call them restrooms when no one goes there to rest?

Why do you have to click the "Start" button to stop Windows?

Most of my life, I have been told that there are no such things as stupid questions. This was usually said to encourage me, and others, to not be afraid to learn. However, I am beginning to think that there is such a thing as a stupid question. I don't mean questions like the above. Coming up with the questions above requires some thought and I suspect they all have reasonable answers. The above questions are more silly than stupid.

So what do I consider to be a stupid question? A stupid question is a question that has little basis in intelligent thought. A stupid question is a question without the context required to provide an answer. A stupid question is one that the questioner would have realized has no answer had they thought about it.
(adj) stupid: lacking or marked by lack of intellectual acuity

(noun) question: a sentence of inquiry that asks for a reply
Before I continue, I admit that I have asked my share of stupid questions. I am, however, alarmed at the large number of stupid questions that software testers are asking in Internet discussion forums and newsgroups.

Here are some paraphrases of stupid questions I've recently seen posted online:
  • How can all tests be automated?
  • What are the limitations of [commercial functional test tool]?
  • What is functional testing? I don't want a definition, I want complete details.
  • What is the industry standard response time for web applications?
  • How much test case detail is required?
  • What is the best automation tool?
  • How do I test a [development platform] application?
  • What is the [one and only] definition for [fuzzy testing term]?
  • How do I do software testing?
  • What is the standard tester to developer ratio?
  • What's the best testing technique?
  • What are the CMM procedures for a test team of more than n people?
  • What is the role of the QA team?
  • How do I create test data?
  • How can I do exhaustive testing?
  • What is the best way to find bugs?
  • How many types of bug resolutions are there?
  • Who decides if a bug is resolved?
  • What's the difference between a requirement and a specification?
  • What is the formula for [magic metric that measures testing value without context]?
Most of these questions are unanswerable because they lack context or are made with the assumption that there is one right context-free answer. These questions may lead to interesting discussions but are not answerable with one-size-fits-all solutions.
Don't get stuck on stupid, reporters. We're moving forward.
... You are stuck on stupid. I'm not going to answer that question.

- Gen. Russel Honore
Many of the "senior" testers in online discussion forums answer stupid questions with the tact of General Honore. They are not trying to be rude. Most are not arrogant. They are experienced. Many have learned through their own failure that there are no magic solutions for general questions. Most of the experienced testers I've interacted with online are very willing to help. They are very willing to answer intelligent questions -- even if they disagree with a premise of the question.

Testing software is a context-sensitive intellectual task. An important aspect of testing is working through ambiguity to find and test what really matters. Testing is not a purely technical domain for which single best ways of doing things can be defined and applied regardless of context. Testers need to think and ask intelligent questions.

I asked plenty of questions when I was new to testing. I was given boundaries in which to work and was given freedom to think and learn within those boundaries. I had some great mentors that taught me a great deal about testing. The mentors provided me with good documentation, answered questions, and exemplified good testing practices. Some of the wisdom of my early mentors did not become clear to me until after I failed on my own. Experience is a great teacher. Sometimes we can learn from other people's successes and failures. Sometimes we have to learn on our own.

If you are new to testing, please ask questions. If you don't understand a term or technical detail, please ask. If a requirement is not clear, please ask. If you don't understand the context, please ask. If you need help, please ask. There are plenty of people able and willing to assist other testers. It would be foolish to pretend to know what you are doing when you do not. Asking for help or clarification is not a sign of weakness, it is a sign of intelligence.
Being ignorant is not so much a shame, as being unwilling to learn.
- Benjamin Franklin
Before asking a broad question, think about it. Ask yourself if it is answerable. Do a little research. Provide some context. Show that you care about the question and the requested answer. Realize that the specificity of your question is directly related to the specificity of the answer. General questions are unlikely to have a single answer. When you get an answer, test it. Try to think of situations in which the answer does not apply. Consider what new problems are created by any solution to an existing problem.
By three methods we may learn wisdom:
First, by reflection, which is noblest;
Second, by imitation, which is easiest;
and third by experience, which is the bitterest.
- Confucius

Now, why do we drive on parkways?


June 2, 2007


Posted by Ben Simo

Poka-Yoke is not a dance. Its not an event at a rodeo. Its not what my kids do to each other in the back seat of the car. Poka-Yoke is Japanese for "mistake-proofing". Poka-Yoke was developed by Japanese industrial engineer Shigeo Shingo. He realized that people cannot be expected to work like machines and consistently do everything the same way every time they do it. People make mistakes and poorly designed processes can make it easier for people to err. Poka-Yoke's goal is to make it difficult for people to make mistakes through mistake prevention and detection.


Applied poka-yoke gives users warnings about incorrect behavior and directs users towards the correct behavior. Computer PS/2 keyboards and mice share the same physical connector design but the connectors are usually color-coded to indicate which device goes into which port on a computer. Some computing hardware is shipped with warning stickers on top of connectors telling users to read a manual or install software before plugging in the device.

Poka-Yoke also means stopping users from doing the wrong thing. Diesel fuel pump nozzles will not fit in a vehicle that requires gasoline. The ignition key cannot be removed from most cars with automatic transmissions if the car is not in "park". Most cars with manual transmissions cannot be started unless the clutch pedal is pressed. These safety features prevent users from making mistakes.


Some errors cannot be prevented or are too expensive to prevent. The application of poka-yoke demands that errors be detected when and where they occur so that action can be taken before mistakes become bigger problems. Modern space heaters will automatically shut off if they are kicked over. A great example of automatic error detection and correction is the SawStop table saw that automatically disengages when the blade touches something that conducts electricity -- such as fingers. (See the video below.)


Poka-Yoke Applied to Software

Poka-Yoke has existed in hardware products for decades. Poka-Yoke has improved quality and safety of many devices we use daily. While I do not like the behavior-shaping constraints of poka-yoke applied to intellectual tasks, directing and constraining user behavior is essential for good software. I do not advocate application of poka-yoke to the development process. I do advocate applying poka-yoke thinking to every stage of the software development life cycle to improve the quality of the software products we produce. Designers should think poka-yoke. Coders should think poka-yoke. Testers should think poka-yoke. Thinking about usability can lead to fewer bugs.

We are human and there will be bugs. To the extent that quality assurance fails at its primary purpose -- bug prevention -- it must achieve a secondary goal of bug detection.
- Boris Beizer
Software Testing Techniques


Keep it simple. Make it easy for users to identify the expected correct way to use the software. Warn them if they try to do something wrong. Don't overwhelm users with unnecessary options.

When the risk of users not following a warning is great, prevent users from doing bad things. Things like list boxes and radio buttons can prevent users from entering invalid data. Data input constraints keep users and the software on the expected path. The security risks in web applications increase the necessity to prevent users from doing what they are not supposed to do.


It is especially important to detect errors that get past the warnings and constraints and stop processes before errors develop into bigger problems. The earlier an error is detected the easier it is to recover. Bad data detected when it enters a system does not have a chance to cascade into the rest of the system.

Poke-yoke thinking can improve usability and prevent bugs.

Some Poka-yoke resources on the web:


June 1, 2007

Model-Based Test Engine Benefit #3: Automatic handling of application changes and bugs

Posted by Ben Simo

Automated tests based on models have one important feature that scripted testing cannot: automated handling of application changes and bugs. I do not mean that model-based automation can think and make decisions like a human tester does when they discover something unexpected. Instead, the automated selection of test steps supports working around the unexpected without special exception handling code for each situation.

For example: If there are two methods for logging into an application and one breaks the test engine can try the alternate option to get to the rest of the application. If a traditional scripted automated test encounters an unexpected problem it will not be able to complete.

The model-based test engine (MBTE) can be coded to not try an action after a pre-defined number of failures. The MBTE's selection algorithm can then seek out other options that have not yet been found to fail. This also results in the MBTE reattempting failed actions and exposing failures that only occur after specific sequences of actions.

To facilitate the error detection, each action and validation should return the status to the MBTE framework. This allows for error handling to be built into the framework instead of each test model or script. Standard error codes -- either your own or the tool's built-in codes -- help standardize reporting.

For example: return a zero (0) when an action successfully completes or a validation passes, return a negative number on failure, and return a positive number for inconclusive results that require manual investigation.

Code the test engine to detect the error status of each action and validation and take appropriate action. If an action passes, perform the validations for the action's expected end state. If an action fails, restart the application or do whatever other error recovery fits your situation.

If a validation fails you can either code that the next validation be performed or identify validation failures that should stop further validation.

Validations can also be flagged to be state-changing failures by adding a "fail state" column to the oracle/validation tables. Give this field the name of the state that the application is in if the validation fails. You can even build standard states such as "restart" into the framework to indicate that the state is unknown and the application needs to be restarted. For example, a validation that an HTTP 404 error page is not displayed could have a "fail state" of "restart" defined to indicate that the application should be restarted when this validation fails.

Julian Harty has suggested that validations can be weighted and test execution be varied based on the combined score of failures.

Build error handling into the framework so that you can define the details with data instead of code.