Use this powerful n8n workflow to automatically scrape a webpage, extract all internal links, normalize relative paths, and filter external domains. Essential n8n templates for SEO.
Download this n8n workflow template and start using it instantly.
Managing website links is critical for SEO and user experience. This specialized n8n workflow addresses the complex task of accurately identifying all internal links on a given webpage. Unlike simple scrapers, this n8n template intelligently handles relative URLs (links starting with /) by automatically prepending the base domain, ensuring a complete and absolute list. The final filtering step guarantees that only links belonging to the original domain are kept, making this n8n automation ideal for clean data extraction and detailed site analysis. This robust n8n workflow saves significant manual effort in content auditing.
The process is initiated by the manual n8n trigger, 'When clicking ‘Test workflow’.' The first functional step is the 'Set Base URL' n8n node, where the target website address is defined and stored. This URL is then passed to the 'Fetch base URL' n8n node, which performs an HTTP request to download the webpage's full HTML content.
Next, the powerful 'Extract links' n8n node utilizes the HTML parser capabilities to identify and pull all link elements ( tags) and their respective href attributes. The resulting array of links is broken down into individual items by the 'Split Out' n8n node.
The 'Find relative links' n8n node (an IF condition) checks if a link is relative (e.g., starting with /). If it is, the link travels the true path to the 'Append base URL' n8n node, where the defined base URL is prepended to form a complete absolute URL. If the link is already absolute (the false path), it skips the appending step.
All paths merge at the 'Merge' n8n node, combining the normalized and existing absolute links. Finally, the 'Filter external links' n8n node applies a crucial condition: it checks every merged URL against the original base domain to ensure only internal links remain in the final output of this powerful n8n workflow.
To deploy this internal link extraction n8n workflow, follow these steps:
When clicking ‘Test workflow’ (Manual Trigger n8n trigger): The starting point for this n8n workflow, allowing manual execution for testing and deployment.
Set Base URL (Set n8n node): Essential for defining the target URL for the scraping operation. This value is reused later for URL normalization and filtering.
Fetch base URL (HTTP Request n8n node): Performs the initial GET request to retrieve the raw HTML content of the target URL.
Extract links (HTML n8n node): Parses the received HTML document, specifically configured to extract all href attributes from elements.
Split Out (Split Out n8n node): Takes the array of links extracted by the previous n8n node and converts each link into a separate item for individual conditional processing.
Find relative links (IF n8n node): Determines if the link requires normalization (i.e., if it is a relative link). If true, it ensures the link receives the base domain prefix.
Append base URL (Set n8n node): Used only on relative links (the true branch) to construct a complete, absolute URL by concatenating the base URL and the relative path.
Merge (Merge n8n node): Combines the absolute links and the newly normalized links back into a single stream before final filtering.
href attribute contains the original base domain, thereby removing all external links.Automate Instagram data collection using this robust n8n workflow. Scrape profile details via Apify and log the results instantly into Google Sheets. Get started with n8n templates.

Use this complex n8n workflow to analyze logo sheets via AI Vision (GPT-4o), extract tools and attributes, and automatically populate structured data into an Airtable database. Efficiently manage competitive intelligence.

Use this advanced n8n workflow to analyze YouTube's most popular videos, calculate true engagement rates, and extract the highest-performing trend keywords for content strategy.

Automate financial data entry using an n8n workflow. Extract invoice details from Gmail PDF attachments using GPT-4o AI and save structured data to Google Sheets instantly.

Automate invoice processing using this powerful n8n workflow. Extract due dates and amounts from Gmail using GPT-4o and instantly notify your team on Slack. Get started with these ready-to-use n8n templates.








































