Abstract
Search algorithms or methodologies play an important role in any database, data mining, or any professional application. Given their importance, no new search algorithm was developed in the past decade. Fora search methodology to be applicable, it has to be novel and faster than existing algorithms. This paper will present a new search methodology that has been awarded a patent for novelty (United States Patent no. 9009200). This search method will be called Thabit's algorithm in this paper. Moreover, the paper will show some test results to show the superiority of this methodology to several search algorithms like binary search, hashing, AVL trees and other search algorithms. For example, when searching a list of 50 million airline PNR numbers, the search time for Microsoft Hash was 18649ms, but when the list became one billion elements the search time increased to 2070318ms with a collision of 57%. Binary search was fairly stable with a search time of 116897ms in a list of 50 million and 2771639ms in the list of one billion elements. On the other hand, searching with Thabit's algorithm took 8517ms for a list of 50 million elements and 194739ms for the list of one billion elements. The AVL tree had 249269ms and 11995298 ms respectively. The comparison was done for space requirements, where binary search took 44703.6 MB for the billion elements. Thabit's algorithm, Microsoft Hash and AVL tree required 60883.3 MB, 115578.4 MB, and 104000.3 MB of space, respectively.
To achieve these results, a graduate student was employed and a makerplace lab was created using a server and two laptops. In the makerplace lab the Thabit's algorithm was implemented. The implementation involved processing each item one character at a time or two, three, four,... etc characters at a time and checking which data structures led to better performance. In the Thabit's algorithm, three dimensional arrays ware replaced with one dimensional arrays. Furthermore, much study went into data size; checking the effect of increasing the size of the data and finding when thrashing starts. Finally, trying to run the programs in parallel, and checking speed gain and how this gain compares to Amdahl's law. Also, tests were done on searching through a large number of 16 digit debit card with great results using Thabit's algorithm.