What is Log File Analysis
Log file analysis involves reviewing the data stored by a website’s servers, which is recorded in log files that track every request made to the site (an important part of technical SEO).
In the context of SEO, log file analysis offers valuable insights into how Googlebot and other web crawlers interact with your website. Examining these logs allows you to identify problematic pages, manage your crawl budget, and gather other key information relevant to technical SEO.
To grasp log file analysis, it’s important to first understand what log files are – These records are generated by servers and capture details about every request made to the site. This data includes the requesting server’s IP address, the request type, the user agent, a timestamp, the requested resource’s URL path, and HTTP status codes.
Although log files provide a lot of information, they are usually stored for a limited time based on the website’s traffic and the rate at which data accumulates.
Importance of Log File Analysis
Log file analysis is important in technical SEO as it offers valuable insights into how Google and other search engine crawlers interact with your website. Through log file analysis, you can monitor:
- How frequently Google crawls your website
- Which pages are most often crawled and which are not, ensuring your key pages receive attention
- Whether irrelevant or problematic pages are wasting your crawl budget
- The HTTP status codes for each page, helping identify issues like 404 errors or server errors
- Significant changes in crawler activity, such as unexpected spikes or drops
- Unintentional orphan URLs, which are pages with no internal links, making them difficult to crawl and index
How to Conduct a Log File Analysis
Here’s an overview of the steps to perform a log file analysis:
1. Access the Log Files
Log files are stored on the server, so you’ll need access to download them. The most common way to retrieve these files is via FTP, using tools like Filezilla, a free, open-source FTP. Alternatively, you can access them through your server’s control panel file manager.
The exact steps to access log files vary depending on your hosting provider; some common challenges you might encounter include:
- Incomplete data: Logs could be spread across multiple servers, requiring you to collect data from each one.
- Privacy concerns: Logs contain users’ IP addresses, which are considered personal data and may need to be anonymised for compliance.
- Short data retention: Some servers store log data for only a few days, making long-term trend analysis difficult.
- Unsupported formats: Log files may need to be parsed into a readable format before analysis.
2. Export and Parse the Log Files
After connecting to the server, retrieve the relevant log files, typically those associated with search engine bots – These files may need to be parsed or converted into the correct format before proceeding to the next step.
3. Analyse the Log Files
While you could import the log data into Google Sheets, this process can be overwhelming due to the large volume of data, especially if you’re focusing on Googlebot requests. A more efficient approach is to use specialised log file analysis tools that automate the process. Popular tools include:
- Logz.io
- Splunk
- Screaming Frog Log File Analyser
- ELK Stack
Key Areas to Focus On
- HTTP Errors: Identify URLs returning non-200 status codes (e.g., 404 Not Found or 410 Gone).
- Crawl Budget: Check for potential crawl budget wastage, like search engines crawling non-indexable URLs.
- Search Engine Bots: Track which bots crawl your site most frequently.
- Crawling Trends: Monitor how crawler activity changes over time and whether there are significant increases or decreases.
- Orphan Pages: Look for pages that can’t be crawled or indexed due to a lack of internal links.