We are maintaining 500+ client websites in our environment. Some day before we received a request to get the list of links/Images used on each home page. We knew that it will be very tricky to get the list of links/URLs mapped in the 500+ pages and you are also aware that the annual work will not give 100% results.
So we decided to use Powershell Links in the Invoke-WebRequest method to reduce manual effort. In this post, we will discuss the same with a simple example using a single URL. For checking the multiple URLs, please refer to a similar article which helps to read from excel and loop it –
PowerShell’s Invoke-WebRequest is a powerful cmdlet that allows you to download, parse, and scrape web pages. The Invoke-WebRequest cmdlet is used to download files from the web via HTTP and HTTPS. However, this cmdlet enables you to do more than download files. You can use this cmdlet for analyzing the contents of web pages.
Example: Get the list of URLs
The below script will grab the innerText in addition to the corresponding links
(Invoke-WebRequest -Uri “https://dotnet-helpers.com/powershell”).Links | sort-object href -Unique | Format-List innerText, href
Example: Get the list of URLs This gets the list of links with grid view control
The grid view control lets you filter URLs with keyword search and you will copy the listings to the clipboard by using the Ctrl + C option.
(Invoke-WebRequest -Uri “www.lantus.com”).Links.Href | Sort-Object | Get-Unique | out-gridview
Example: Get the list of Image URLs
To fetch the list of image URLs from the page, you can run the below cmdlet
(Invoke-WebRequest -Uri “https://dotnet-helpers.com”).Images | Select-Object src