Programming Project #3 COSC 311 Distributed 2/14/2013 Due: 3/14/2013 Are Binary Search Trees O(log n) for the delete operation? This project will require you to demonstrate that binary search trees are O(log n) for deletion. This will involve collecting experimental (timing) data for trees of various sizes n (at least 5 different values for n), graphing the results on graph paper, and attempting to fit a curve c * log(n). The graphing and curve-fitting may be done in a separate step, using pen-and-paper or Excel or other tools. There is one major wrench thrown in the works. That is, you will need to demonstrate BSTs are O(log n) for deletion on both (1) array-based, *in-memory* BSTs and on (2) array-based, *on-disk* BSTs. The experimental data for each kind of implementation must be kept separate. The on-disk, external BST *must* be implemented using random access files in Java! Because internal (in-memory) implementations are so much faster than external (hard disk) implementations, the choices of n will be different for each implementation. In each case, you should have at least 5 different values for n and they should cover a reasonable range (producing significant differences in time). To simplify the data collection, a small perturbation is allowed: after you generate a random BST of size n, then delete 0.01 * n nodes (rather than just 1 node), then divide the total delete time by 0.01*n in order to get time/single-node. So, if n = 5*10^4, then you will delete 5*10^2 nodes. The average time to delete a single node from the tree is the total time / 500. Caveats: (1) Do NOT time anything except for actual deletions. If a value targeted for deletion is not in the tree, then you must not count the time required to find out that the value is not in the tree. (2) Do target values *randomly*. Do not delete nodes in the same order they were inserted or in any other ordered fashion. (3) The BSTs you initially build must be random BSTs. That is the data generated is randomly obtained. In order to reduce the probabilities of targeting a value for deletion that has not been inserted, then choose values for insertion from a fairly small subset of possibles and allow for multiple nodes with the same value in the tree. This means, a given value will appear multiple times in a single BST. (4) The type of data should not have an effect on the timing. This means, keep the data as simple as possible so that tests for <= and = are as fast as possible. This means use type int. (5) A BST stored on disk may not (for this experiment to be valid) be stored as a stream file. It must be stored in a random access file. Here is the basic experimental setup for each implementation: Repeat for several different values of n (1) Generate random BST size n. (2) Repeat k times: generate random value delete value (2*) if the value to delete is not in the tree, do not count the time to search for it in the tree. (3) Calculate average time to delete single node for size n. Turn in: (1) Hard copy of internal BST implementation (2) Hard copy of external BST implementation (3) Screen shot of one run of internal delete (4) Screen schot of one run of external delete (5) Tabulated data from each implementation (6) Graph of data with fitted curve (7) One to two paragraph conclusion: do both implementations have O(log n) behavior? Give a proposal for any oddities in the data. Grade based on: 50% for internal BST 50% for external BST Each of those are graded based on: meets specs (i.e., compiles and works properly satisfying a code walk-thru if requested elegance of programming solution readability (i.e., documentation and "look") of programming solution written conclusion