Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

URL Extractor (Web Scrapping)

###Requirement

  1. Write a python program to take an input URL from a user.
  2. Scrap all the "a href" tags from the web-page.
  3. Display the results in a GUI.

###Packages Used

  1. BeautifulSoup - Beautiful Soup is a Python package for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping

  2. Requests - Requests is an elegant and simple HTTP library for Python.

  3. Tkinter - Tkinter is a Python binding to the Tk GUI toolkit. It is the standard Python interface to the Tk GUI toolkit, and is Python's de facto standard GUI

Instructions

$ pip install -r requirements.txt

Run the application it will ask you to input the URL. Once you click "Get URL Button", it will populate all the internal URL's of the webpage in the listbox.

Future Enhancements

  1. Add logic to refresh the listbox
  2. Add logic to refresh the application and reset the Entry widget to refocus for next search.
  3. Beautify the display on GUI.
  4. Add features for topic modelling to understand the topic of webpage.

Author

Harpreet Siddhu