Converting a URL to an HTML File with Python

Utilizing Python's urllib3 and system arguments to create a streamlined HTML file downloader

In the vast world of the internet, the necessity to process and save webpage data is quite common. Whether it's for data analysis or for offline reading, being able to convert a URL into an HTML file can be incredibly useful. Python, with its vast array of libraries and functionality, can make this task very simple.

Today, we're going to explore how you can use Python to take a URL and convert it to an HTML file for offline use. The script will utilize the urllib3 library to fetch the web page and save it to an HTML file.

url_to_html_file.py

import sys
import urllib3

def url_to_html_file(url, html_file):
    """
    Fetches a URL via HTTP GET request and writes its
    content to an HTML file.

    Parameters:
        url (str): The URL to fetch.
        html_file (str): The path to the file to write
        the HTML content.

    Returns:
        None
    """

    try:
        """
        urllib3's PoolManager allows you to manage and
        reuse connections through a pooling mechanism.
        This can increase efficiency when making multiple
        HTTP requests.
        Even though this script only makes a single
        request, using PoolManager sets up the code for
        better scalability in case of multiple requests
        in the future.
        """
        # Create a PoolManager instance
        http = urllib3.PoolManager()

        # Make a GET request to the provided URL
        r = http.request('GET', url)

        # Write the response data to the provided file
        with open(html_file, 'w') as f:
            f.write(r.data.decode('utf-8'))

    except Exception as e:
        print(f"An error occurred: {e}")
        sys.exit(1)

if __name__ == '__main__':
    if len(sys.argv) < 3:
        print('Usage: python url_to_html_file.py <url> <html_file>')
        sys.exit(1)

    # Get the URL and file path from the command line arguments
    url = sys.argv[1]
    html_file = sys.argv[2]

    # Call our function with the provided arguments
    url_to_html_file(url, html_file)

Example of usage :

python3 https://www.wikipedia.org index.html

Please don't hesitate to leave a comment! I'm eager to hear any constructive ideas.