QUAD

🚀 Group Members QUAD

Terence Loorthanathan

RISHMA FATHIMA BINTI BASHER

CHONG KAI ZHE

NUR SYAMALIA FAIQAH BINTI MOHD KAMAL

Lxml

In this notebook, we will show you how to scrape a website using lxml. lxml is a Python library for parsing and manipulating XML and HTML documents. It provides a way to navigate, search, and modify the elements and attributes of an XML or HTML document using a simple and consistent API.

The library is built on top of the libxml2 and libxslt C libraries, which provide fast and efficient parsing and manipulation of XML and HTML documents. lxml provides a Pythonic API that is easy to use and intuitive for Python programmers, while still being very powerful and flexible. lxml also includes a number of other useful features for working with XML and HTML, such as support for parsing and generating these formats, as well as support for working with different encodings and character sets.

lxml is a powerful and efficient library for parsing and manipulating XML and HTML documents in Python. It is built on top of the C libraries libxml2 and libxslt, which means it can handle very large documents quickly and efficiently. Additionally, lxml provides a simple and consistent API that makes it easy to work with XML and HTML documents in Python.

Why use lxml?
lxml is considered to be one of the most feature-rich and stable XML and HTML parsing libraries for Python. It's considered to be much faster than other libraries like BeautifulSoup, and it's more powerful when it comes to handling complex xpath and xslt.
For more information on lxml please go to this link https://lxml.de/

What website we are trying to scrape?
We are going to use the most used online job search website in Malaysia, Jobstreet. Jobstreet operates primarily in Southeast Asia, including countries such as Malaysia, Singapore, Philippines, Indonesia, and Vietnam. However it has established its HQ in Malaysia.

What data we are going to scrape?
We are going to retrieve data of job offerings for Computer/Information technology specialists. We will get basic information of the job offering such as what company is offering it, what is the salary, and what is the job title.

For more information, you can read here in lxml - XML and HTML with Python

Name		Name	Last commit message	Last commit date
parent directory ..
QUAD_LXML.ipynb		QUAD_LXML.ipynb
job_search.csv		job_search.csv
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

Lxml

FilesExpand file tree

QUAD

Directory actions

More options

Directory actions

More options

Latest commit

History

QUAD

Folders and files

parent directory

readme.md

Lxml