A Client Evaluation

contents of this chapter:

introduction
early troubles
the testing phase
dealing with bugs
conclusion

introduction

This project had two clients, Anthony Kroch and Ann Taylor. The following evaluation is from Ann Taylor. Anthony Kroch is on my thesis committee so he will be heard from at my thesis defense.

early troubles

As primary client/tester/user of CorpusSearch, I am extremely satisfied with the product, as well as with the way the development process has been going. The whole venture was something of a leap of faith on both sides, since I, as the client, had only a limited idea of what I wanted and didn't know how it should be implemented, while Beth, at the time, knew very little about linguistics and had no clear notion of what it was we were asking her to do. Indeed, our first attempts at communication were rather unsuccessful. Many concepts which were self-evident to me as a linguist, meant something entirely different to Beth, the outcome being that our first attempts at searching produced rather mysterious results. It quickly became clear that it was crucial for Beth to really understand the primary linguistic concepts involved. Thus I worked to clarify more explicitly what I was looking for and to explain it more clearly and Beth started to learn some linguistics. Things got easier as we went along, but much discussion and several attempts were (and are) still often necessary to get the program to do exactly what we want.

the testing phase

It turned out to be quite difficult for me to anticipate in the abstract what features the program would need, so although the bare bones were in place early on, it was only when we started serious testing that we started to make real progress towards the program as it stands today.

While using the program it became clear that there were many ways in which it would be helpful to be able to control the search and then present the output so that it would be maximally useful. Most of this was accomplished through the addition of query file commands. Two very useful commands that were added in order to simplify searching are the nodes_only and remove_nodes commands. We originally started with the nodes_only command alone, but because of the problem of embedded nodes, we added the remove_nodes option. When this is activated the contents of all irrelevant sub-sentences embedded under the boundary node are removed. In this way, you can be certain that the output file contains all and only the nodes you're interested in, and subsequent queries will be carried out only on those nodes. With these two commands you can use a series of very simple queries to progressively divide the data into more and more narrowly defined sets. Another useful command that we discovered a need for while testing the program is the print_complement command. Like the nodes_only and remove_nodes commands, print_complement makes for easier, quicker and more accurate searches. All the query file commands have set defaults which we have discovered through experience work best for the most common type of searches linguists make. All the settings can be changed, however, making it possible for the user to override the defaults to maintain maximum flexibility.

Most of the testing of CorpusSearch has been done on "real" searches carried out while collecting data for a paper on Middle English dialect differences. The amount and different types of data required for this study as well as the complexity of the queries has put CorpusSearch through a very rigorous real-world experience. The remainder of the testing is being done as part of the search for examples for the manual describing the Middle English corpus. This exercise, while generally requiring less complex queries, has exposed CorpusSearch to searches for a very wide variety of structures. This testing has been useful both for turning up bugs of various types, all of which Beth has cheerfully dealt with, and for fine-tuning the program to make it maximally useful to the average linguist. I have found through this extensive testing process that CorpusSearch provides the power to search for very intricate structures, without requiring the user to write equally intricate queries.

dealing with bugs

A final extremely helpful feature of a rather different type that Beth has added to the program checks for errors in the format of the corpus itself. CorpusSearch, when confronted with badly formed input, simply reports a format error and continues. The error is only reported, however, if it affects the search. The bug hunting feature on the other hand goes through the whole corpus specifically looking for various kinds of common errors, like unbalanced parentheses, lack of a wrapper, no ID node, and illegal labels. Thus CorpusSearch is useful in the corpus building stage as well as for searching the finished product.

conclusion

All in all, developing this program has been far more work than I had ever anticipated, but the results have also been far more successful. The experience itself has also been interesting and rewarding, largely I think because of Beth's willingness to engage in the project and really try to understand what we as the client wanted and to produce that result for us.