Skip to content

Arthurlpgc/InfoRetrievalProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

119 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Websites included

The crawler will be retrieving information from the following online judges:

Running Crawler

In order to run your crawler, follow these steps:

  • First, make sure you have Python 3.6 and pip installed in your system. Then:
  1. Go to src folder: cd src
  2. Install project requirements: pip install -r requirements.txt
  3. Run the crawler: scrapy runspider crawler/questions.py

This will start a breadth first search based on some heurístic spider module responsible for downloading all pages in the specified domain. You can see them on the fly in src/retrieved/documents and src/retrieved/objects folder.

Creating an Index

After running the crawler and retrieving documents, you have to manually set up an index to work with. In order to do this:

  1. Go to src folder: cd src
  2. Run the indexer: python3 indexer/indexer.py

It will search for documents stored at src/retrieved/objects and create various indexes accordingly. The indexes will be avaiable for latter querys at the src/indexes folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors