Spiders Are Us
 
 Research Goal  

While small-scale search engines in specific domains and languages are increasingly desired by Web users, most existing search engine development tools do not support the development of search engines in languages other than English, cannot be integrated with other applications, or rely on proprietary software. A tool that supports search engine creation in multiple languages is thus highly desired. To study the research issues involved, we designed and implemented a toolkit, called SpidersRUs, for multilingual search engine creation. The toolkit consists of a Spider module, an Indexer module, a Search module, a Graphical User Interface module, and an Index Structure. This study demonstrates that the proposed architecture is feasible in effectively and efficiently developing search engines in different language such as Chinese, Spanish, Japanese, and Arabic.

 Funding  

This project has been supported in part by the following grants:

IIS-9817473 April 1999 – March 2002
NSF Digital Library Initiative-2  
High-performance Digital Library Systems: From Information Retrieval to Knowledge Management
DUE-0121741 September 2001 – August 2003
NSF National SMETE Digital Library  
Intelligent Collection Services for and about Educators and Students: Logging, Spidering, Analysis and Visualization

 

 Acknowledgements  

We would like to thank Chia-Jung Hsu for his contribution to this project. We would also like to thank other members of the Artificial Intelligence Lab at the University of Arizona who have tested the toolkit and shared with us their ideas and comments.

 Approach & Methodology  
In this study, we reviewed related literature and suggested the criteria for an ideal search tool. We proposed an architecture for a multilingual search engine building tool and implemented it in Java programming language. The design and implementation of the tool consists of a Spider module, an Indexer module, a Search module, a Graphical User Interface module, and an Index Structure. We also conducted a case study on using the tool to develop a medical search engine in Chinese and demonstrated the effectiveness and efficiency of the toolkit.

 Team Members  
   Dr. Hsinchun Chen hchen@eller.arizona.edu
   Chia-Jung Hsu
   Chunju Tseng chunju@u.arizona.edu
Alumni Team Members
   Michael Chau  
   Jialun Qin  
   Wei Xi  
   Yilu Zhou  

 Publications  
  • Chau, M., Qin, J., Zhou, Y., Tseng, C., and Chen, H., "SpidersRUs:
    Automated Development of Vertical Search Engines in Different Domains and Languages," in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'05), Denver, Colorado, USA, June 7-11, 2005.
  • Qin, J., Zhou, Y., and Chau, M., "Building Domain-Specific Web Collections for Scientific Digital Libraries: A Meta-Search Enhanced Focused Crawling Method," in Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'04), Tucson, Arizona, USA, June 7-11,
    2004, pp. 135-141.
  • Chau, M., Huang, Z., and Chen, H., "Teaching Key Topics in Computer Science and Information Systems through a Search Engine Project," ACM Journal on Educational Resources in Computing (JERIC), 3(3), 1-14, 2003.
  • Chen, H., Fan, H., Chau, M., and Zeng, D., "Testing a Cancer Meta Spider," International Journal of Human-Computer Studies (IJHCS), 59(5), 755-776, 2003.
  • Chau, M. and Chen, H., "Comparison of Three Vertical Search Spiders," IEEE Computer, 36(5), 56-62, 2003.
  • Chen, H., Lally, A. M., Zhu, B., and Chau, M., "HelpfulMed: Intelligent Searching for Medical Information over the Internet," Journal of the American Society for Information Science and Technology (JASIST), 54(7), 683-694, 2003.
  • Chau, M., Zeng, D., Chen, H., Huang, M., and Hendriawan, D., "Design and Evaluation of a Multi-agent Collaborative Web Mining System," Decision Support Systems (DSS), Special Issue on Web Retrieval and Mining, 35(1), 167-183, 2003.
  • Chen, H., Chau, M., and Zeng, D., "CI Spider: A Tool for Competitive Intelligence on the Web," Decision Support Systems (DSS), 34(1), 1-17, 2002.
  • Chen, H., Fan, H., Chau, M., and Zeng, D., "MetaSpider: Meta-Searching and Categorization on the Web," Journal of the American Society for Information Science and Technology (JASIST), 52(13), 1134-1147, 2001.
  • Chau, M., "Spidering and Filtering Web Pages for Vertical Search Engines," in Proceedings of The Americas Conference on Information Systems, AMCIS 2002 Doctoral Consortium, Dallas, Texas, August 8-11, 2002.
  • Chau, M., Chen, H., Qin, J., Zhou, Y., Qin, Y., Sung, W. K., and McDonald, D., "Comparison of Two Approaches to Building a Vertical Search Tool: A Case Study in the Nanotechnology Domain," in Proceedings of The Second ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'02), Portland, Oregon, USA, July 14-18, 2002, pp. 135-144.
  • Chau, M., Zeng, D., and Chen, H. "Personalized Spiders for Web Search and Analysis," in Proceedings of the First ACM/IEEE-CS Joint Conference on
    Digital Libraries (JCDL'01), Roanoke, Virginia, USA, June 24-28, 2001, pp. 79-87.
  • Chen, H., Chung, Y., Ramsey, M. and Yang, C. "A Smart Itsy Bitsy Spider for the Web," Journal of the American Society for Information Science, Special Issue on AI Techniques for Emerging Information Systems Applications, Volume 49, Number 7, Pages 604-618, 1998.
  • Chen, H., Chung, Y., Ramseym, M. and Yang, C. "An Intelligent Personal Spider (Agent) for Dynamic Internet/Intranet Searching," Decision Support Systems, Volume 23, Pages 41-58, 1998.

 
 
  Research Goal
  Funding
  Acknowledgements
  Approach & Methodology
  Team Members
  Publications
  Demo
SpidersRUs Digital Library Toolkit
GA Optimizer I and II
CI Spider
Meta Spider
Cancer Spider
GIS Textual Knowledge Source

AI Lab | MIS Department | Eller College | UA | Disclaimer | Privacy | Contact Us

Eller College of Management | The University of Arizona
1130 E. Helen Street | P.O. Box 210108 | Tucson, AZ 85721-0108 | 520.621.6219

© Copyright The University of Arizona. All rights reserved.