Spiders Are
Us
|
| Research
Goal |
|
While
small-scale search engines in specific
domains and languages are increasingly
desired by Web users, most existing
search engine development tools do
not support the development of search
engines in languages other than English,
cannot be integrated with other applications,
or rely on proprietary software. A
tool that supports search engine creation
in multiple languages is thus highly
desired. To study the research issues
involved, we designed and implemented
a toolkit, called SpidersRUs, for
multilingual search engine creation.
The toolkit consists of a Spider module,
an Indexer module, a Search module,
a Graphical User Interface module,
and an Index Structure. This study
demonstrates that the proposed architecture
is feasible in effectively and efficiently
developing search engines in different
language such as Chinese, Spanish,
Japanese, and Arabic. |
|
| Funding |
|
This
project has been supported in part
by the following grants:
| IIS-9817473 |
April 1999 – March 2002 |
| NSF Digital Library Initiative-2 |
|
| High-performance
Digital Library Systems: From
Information Retrieval to Knowledge
Management |
 |
| DUE-0121741 |
September 2001 – August
2003 |
| NSF National SMETE Digital Library |
|
| Intelligent
Collection Services for and about
Educators and Students: Logging,
Spidering, Analysis and Visualization |
|
|
|
| Acknowledgements |
|
We
would like to thank Chia-Jung Hsu
for his contribution to this project.
We would also like to thank other
members of the Artificial Intelligence
Lab at the University of Arizona who
have tested the toolkit and shared
with us their ideas and comments. |
|
| Approach
& Methodology |
|
| In this
study, we reviewed related literature
and suggested the criteria for an ideal
search tool. We proposed an architecture
for a multilingual search engine building
tool and implemented it in Java programming
language. The design and implementation
of the tool consists of a Spider module,
an Indexer module, a Search module,
a Graphical User Interface module, and
an Index Structure. We also conducted
a case study on using the tool to develop
a medical search engine in Chinese and
demonstrated the effectiveness and efficiency
of the toolkit. |
|
| Team
Members |
|
|
|
| Alumni Team Members |
| Michael
Chau |
|
| Jialun
Qin |
|
Wei
Xi |
|
| Yilu
Zhou |
|
|
|
|
| Publications |
|
- Chau, M., Qin,
J., Zhou, Y., Tseng, C., and Chen,
H., "SpidersRUs:
Automated Development of Vertical
Search Engines in Different Domains
and Languages," in Proceedings
of the ACM/IEEE-CS Joint Conference
on Digital Libraries (JCDL'05),
Denver, Colorado, USA, June 7-11,
2005.
- Qin, J., Zhou,
Y., and Chau, M., "Building
Domain-Specific Web Collections
for Scientific Digital Libraries:
A Meta-Search Enhanced Focused Crawling
Method," in Proceedings of
the ACM/IEEE-CS Joint Conference
on Digital Libraries (JCDL'04),
Tucson, Arizona, USA, June 7-11,
2004, pp. 135-141.
- Chau, M., Huang,
Z., and Chen, H., "Teaching
Key Topics in Computer Science and
Information Systems through a Search
Engine Project," ACM Journal
on Educational Resources in Computing
(JERIC), 3(3), 1-14, 2003.
- Chen, H., Fan,
H., Chau, M., and Zeng, D., "Testing
a Cancer Meta Spider," International
Journal of Human-Computer Studies
(IJHCS), 59(5), 755-776, 2003.
- Chau, M. and
Chen, H., "Comparison of Three
Vertical Search Spiders," IEEE
Computer, 36(5), 56-62, 2003.
- Chen, H., Lally,
A. M., Zhu, B., and Chau, M., "HelpfulMed:
Intelligent Searching for Medical
Information over the Internet,"
Journal of the American Society
for Information Science and Technology
(JASIST), 54(7), 683-694, 2003.
- Chau, M., Zeng,
D., Chen, H., Huang, M., and Hendriawan,
D., "Design and Evaluation
of a Multi-agent Collaborative Web
Mining System," Decision Support
Systems (DSS), Special Issue on
Web Retrieval and Mining, 35(1),
167-183, 2003.
- Chen, H., Chau,
M., and Zeng, D., "CI Spider:
A Tool for Competitive Intelligence
on the Web," Decision Support
Systems (DSS), 34(1), 1-17, 2002.
- Chen, H., Fan,
H., Chau, M., and Zeng, D., "MetaSpider:
Meta-Searching and Categorization
on the Web," Journal of the
American Society for Information
Science and Technology (JASIST),
52(13), 1134-1147, 2001.
- Chau, M., "Spidering
and Filtering Web Pages for Vertical
Search Engines," in Proceedings
of The Americas Conference on Information
Systems, AMCIS 2002 Doctoral Consortium,
Dallas, Texas, August 8-11, 2002.
- Chau, M., Chen,
H., Qin, J., Zhou, Y., Qin, Y.,
Sung, W. K., and McDonald, D., "Comparison
of Two Approaches to Building a
Vertical Search Tool: A Case Study
in the Nanotechnology Domain,"
in Proceedings of The Second ACM/IEEE-CS
Joint Conference on Digital Libraries
(JCDL'02), Portland, Oregon, USA,
July 14-18, 2002, pp. 135-144.
- Chau, M., Zeng,
D., and Chen, H. "Personalized
Spiders for Web Search and Analysis,"
in Proceedings of the First ACM/IEEE-CS
Joint Conference on
Digital Libraries (JCDL'01), Roanoke,
Virginia, USA, June 24-28, 2001,
pp. 79-87.
- Chen, H., Chung,
Y., Ramsey, M. and Yang, C. "A
Smart Itsy Bitsy Spider for the
Web," Journal of the American
Society for Information Science,
Special Issue on AI Techniques for
Emerging Information Systems Applications,
Volume 49, Number 7, Pages 604-618,
1998.
- Chen, H., Chung,
Y., Ramseym, M. and Yang, C. "An
Intelligent Personal Spider (Agent)
for Dynamic Internet/Intranet Searching,"
Decision Support Systems, Volume
23, Pages 41-58, 1998.
|
|
|
| |
|
|