Converting a URL to an HTML File with Python
Utilizing Python's urllib3 and system arguments to create a streamlined HTML file downloader
In the vast world of the internet, the necessity to process and save webpage data is quite common. Whether it's for data analysis or for offline reading, being able to convert a URL into an HTML file can be incredibly useful. Python, with its vast array of libraries and functionality, can make this task very simple.
Today, we're going to explore how you can use Python to take a URL and convert it to an HTML file for offline use. The script will utilize the urllib3 library to fetch the web page and save it to an HTML file.
url_to_html_file.py
import sys
import urllib3
def url_to_html_file(url, html_file):
"""
Fetches a URL via HTTP GET request and writes its
content to an HTML file.
Parameters:
url (str): The URL to fetch.
html_file (str): The path to the file to write
the HTML content.
Returns:
None
"""
try:
"""
urllib3's PoolManager allows you to manage and
reuse connections through a pooling mechanism.
This can increase efficiency when making multiple
HTTP requests.
Even though this script only makes a single
request, using PoolManager sets up the code for
better scalability in case of multiple requests
in the future.
"""
# Create a PoolManager instance
http = urllib3.PoolManager()
# Make a GET request to the provided URL
r = http.request('GET', url)
# Write the response data to the provided file
with open(html_file, 'w') as f:
f.write(r.data.decode('utf-8'))
except Exception as e:
print(f"An error occurred: {e}")
sys.exit(1)
if __name__ == '__main__':
if len(sys.argv) < 3:
print('Usage: python url_to_html_file.py <url> <html_file>')
sys.exit(1)
# Get the URL and file path from the command line arguments
url = sys.argv[1]
html_file = sys.argv[2]
# Call our function with the provided arguments
url_to_html_file(url, html_file)
Example of usage :
python3 https://www.wikipedia.org index.html
Please don't hesitate to leave a comment! I'm eager to hear any constructive ideas.