Web Design & Dev

Ultimate Guide to the Use of .htaccess for SEO – Redirects, Urls, Content Crawl

Pawan Sahu 4 January, 2020

Search engine optimizations (SEO) is all about making your website search engine friendly and help rank pages or posts. In this competitive market, you just cannot ignore SEO. .htaccess is one of the important files of your website that lets you control your website functionality and how it interacts with the web. That’s why you need to customize it to maximize not only control but also search engine optimization. With the use of .htaccess for SEO, you can override the server configuration with your requirement.

htaccess for seo main image

To make it work, the server detects the file and then overrides it before executing it for the end-user. That’s what makes it useful for search engine optimization performance and improves not only website performance, but visibility to the search engines out there.

.htaccess for SEO is the term that we are going to specify the SEO that needs to be done in the .htaccess file. The article will cover every single aspect of .htaccess SEO so that you can customize it for better search engine optimization results. We will also be sharing .htaccess file snippets that improve the functionality of .htaccess. Last, but not the least, we will be covering regular expression and how it is used in .htaccess. So, without any delay lets get started.

Ultimate Guide to .htaccess for SEO

What is .htaccess?

Before we dive deep into the ultimate guide, we need to get a better understanding of .htaccess. So for starters, it is a file and not just a file extension. It is common for file extensions, but in this case, it is not. So, when you are looking for the specific file, do a search as the file name and not simply extension.

Technically, .htaccess is used by web servers that are powered by Apache. They are used to configure each directory. Also, if the web hosting doesn’t let you edit the .htaccess file, you can easily create a new one and override the parameters from your .htaccess file. The directives will override at the directory and subdirectories level. This means that if you put a .htaccess file on the root of your web server, it will be applied to your whole website. For more fine control, you need to create a new .htaccess file in the specified directory.

Accessing the file is super easy. All you need to do is search for the file in the root directory. Most of the popular content management system provides you an easy way of configuring the file. You can also access the file through FTP.

Directives – Giving you a finer control

Directives let you have more delicate control over what you want to configure. They are the commands in the configuration file. The .htaccess also uses derivatives, and you have to use short derivatives. With these derivatives, you can protect files with passwords, control crawling, allow or ban IP addresses and much more.

Why we need SEO optimization in .htaccess files?

why seo image

Before we dive deep into the actual configuration, we first need to understand how does .htaccess files help SEO in the first place? Let’s list them below.

  • Search engine crawlers will reach how your .htaccess handles website functionality. If it is done right, it will improve your website credibility.
  • With .htaccess, you can generate a clean URL’s which the search engine loves.
  • A good .htaccess for SEO also means that it resolves 404 HTTPS error and also handles the 301 redirects the way it should be done.
  • It can also be used to block users using IP address or domain from search engine spy solutions.

All these benefits are substantial and hence cannot be ignored.

Backup before making .htaccess for SEO changes

Editing .htaccess can be a tricky proposition. It does improve your site functionality and features, but doing it wrong can lead to you a non-functioning website. You might get a 500 internal server error. To avoid such instances, we recommend you to back your .htaccess file before you make any changes. This will keep you safe and will also allow you to experiment with the file.

To make life easy, we recommend using this web hosting platform for experimentation.

Buy Hosting

Where to find the .htacess file?

The location of the .htaccess file depending on which platform you are using. Most of the time, it is present at the root of your directory. For example, if you are using WordPress, you will find it in the WordPress installation directory. For different hosting platform using cPanel, you can simply use the File Manager and go to the root directory. There you have to turn on “Show Hidden Files” for accessing the .htaccess file.

SEO-Friendly URLs

htaccess seo friendly url image

URLs play a crucial role when it comes to search engine ranking. According to Matt Cutts who worked as an engineer in Google, URLs structure plays a crucial role in ranking. Moreover, he added how the keywords in URLs also add value to the ranking factor. Another critical thing a blog owner needs to keep clean is the URL length. It should be short so that the visitors can remember them easily. Even though no one does it, keeping it short and clean, hints the search engine that you care about your reader.

Optimizing General Website URLs

You can optimize your .htaccess file to deal with these issues. To do so, you need to override the current URLs using file extension and ensure that the current URLs are handled properly. To do so, you need to use the following code in your .htaccess file.

RewriteEngine On

Rewrite Rule ^topicname/ (a[a-zA-Z0-9]+)$ index.php?topic=$1

The above two lines will transform all your website’s URL to the following

www.yoursite.com/therulesyouchoosed/article.

Optimizing Content Management System URLs

If you are using WordPress, Joomla or any other content management system, you need to customize the URLs differently. It can be done by adding the following code to the .htaccess file.

RewriteEngine On

Rewrite Rule (.*)/$load_page.php?&page_id=$1

Once the file is updated, you are all set for cleaner SEO-optimized URLs.

Removing .php and .html

You can also get rid of the page extensions such as .html and .php. In reality, they add no value to the reader, and it also makes it complicated for them to remember.

To remove the extensions, all you need to do is copy the code below and put it into the .htaccess file.

RewriteCond %{REQUEST_FILENAME} !-d

RewriteCond %{REQUEST_FILENAME}.php -f <! — Removes PHP extension –>

RewriteCond %{REQUEST_FILENAME}.html -f <! — Removes HTML extension –>

Rewrite Rule ^(.*)$ $1.html

This leads us to the end of optimizing website’s URL through .htaccess for SEO purposes.

Canonical robots.txt

You can improve robots.txt scanning with the help of canonical. Generally, the robot.txt file is located at the root directory, but bad robots or other malicious scripts can eat up your website resources by scanning the whole website for a single robot.txt file. As an owner, you surely don’t want that to happen.

Canonical solution

You can use the .htaccess to guide crawlers and users to find the robot.txt file. It also solves the problem of a continuous request for accessing “robot.text” file. By doing so, you are improving your website SEO by providing crawlers what they want an also reducing load on the server.

 

# CANONICAL SOLUTION  FOR ROBOTS.TXT

<IfModule mod_rewrite.c>
RewriteBase /
RewriteCond %{REQUEST_URI} !^/robots.txt$ [NC]
RewriteCond %{REQUEST_URI} robots\.txt [NC]
RewriteRule .* http://yourmotocmswebsite.com/robots.txt [R=301,L]
</IfModule>

In the above code, all you need to do is change “yourwebsite.com” to your website URL. The URL should be your website’s root directory that can be accessed by anyone. In the above method, we used Apache’s rewrite module. However, if you are looking for an alternative solution that is cleaner, you can use mod_alias.

RedirectMatch 301 ^/(.*)/robots\.txt http://yourmotocmswebsite.com/robots.txt

Non-www redirect – Canonical Issue

One of the most basic issues that the website has is the non-www redirect. If you are auditing your website, you should also take care of the non-www redirect issue. The good news is that a simple canonical tag can fix it. It will also fix the URL duplication issue on your website and improve the overall website’s URL structure.

# Fixing the non-www URLs and redirect them to www

RewriteEngine on

RewriteCond %{HTTP_HOST}!^www\.yourwebsite\.com

RewriteRule(.*) http://www.yourmotocmswebsite.com/$1 [R=301,L]

You can also use the code below to get the same effect.

# Fixing the non-www URLs and redirecting them to www

RewriteEngine on

RewriteCond %{HTTP_HOST} ^yourwebsite\.com [NC]

RewriteRule(.*) http://www.yourmotocmswebsite.com/$1 [R=301,L]

In both the above codes, you need to replace the term “yourmotocmswebsite” with your domain name.

Using canonical tags to headers and PDFs

The rel=”canonical” can also be used to signal canonical URLs on your website. This makes it even more useful to signal files in headers and as well as PDFs. For example, you can use the canonical tags technique to point the PDF links to the HTML version simply. This way the user doesn’t have to download the file to their desktop to inspect it. However, to make it happen, you need to have both the options enabled.

To enable it using HTTP headers, you need to use the code below. Simply redirect it to other HTML pages that are served using the /page.html URL.

<Files “file.PDF”>

Header and Link “<http://www.yourmotocmswebsite.com/page.html>; rel=”canonical””

</Files>

Doing Redirects Using .htaccess for SEO

In this section, we will be looking into redirects that can be achieved using the .htaccess file editing. As an SEO expert, you need to make sure that the various redirects are in place to avoid the 404 errors. The two main redirects that you have to work with is the 301 redirect and the 404 redirects. By doing so, you are ensuring that none of the users lands on a broken link. It also handles search engine crawlers better and ensures that the site meets the standards set by the Google algorithm.

As an owner, you can either create a custom 404 bad request page or simply redirect to the main page of your website. However, we recommend you to create a custom 404 page and put a search option right into it. This will allow the user another to search the website again for the content they need. If done right, .htaccess for SEO will eventually drop down the bounce rate and improve website SEO performance.

To do the redirect for all the errors including 400: bad request, 401: authorization required, 403: forbidden content, and 500: internal server error, you just need to use the content mentioned below.

ErrorDocument 402 /temp/page-unavailable-temporarily402

The above code will work for any error. All you need to do is simply change the error code and also change the redirect page associated with it.

Stopping bad bots

Not all bots are good. They may want to crawl your website without providing any value to your site. That’s why you need to block these bad bots. The .htaccess lets you write derivatives that can be used to prevent bots. To do so, you need to use the following code.

RewriteCond%{HTTP_USER_AGENT} ^BOTNAME [OR]
RewriteCond%{HTTP_USER_AGENT} ^BOTNAME1 [OR]
RewriteCond%{HTTP_USER_AGENT} ^BOTNAME3
RewriteRule ^.*-[F,L]

Fixing 301 redirects and 404 errors

Our next step is to fix the 301 redirects and 404 errors. And, it is always better to solve them beforehand rather than doing it manually or later. The reason is simple as 404 not found error will happen even when you maintain your website daily. The 301 redirect helps to eradicate the 404 error by redirecting old pages to the new pages. This will solve two problems.

  1. Other websites linking to your old page. It will redirect them successfully.
  2. The 301 redirects work for all the search engines including Google.

To do so, all you need to do is write the code in your .htaccess file. For this to work, you need the old page and new page URL.

Redirect 301/information/old-article

http://www.yourmotocmssite.com/articles/new-article

Latest Website Redirection

If you don’t know yet, a search engine crawls different versions of your home page. This indexing is done almost all the time. However, as an owner, you would never want a visitor to land on the old version of the website. That’s why you as an owner need to make sure that you do the proper latest website redirection using a simple 301 redirect. All you need to do is use the following code in your .htaccess file.

RewriteCond %{HTTP_Host} ^yoursite.com$ [NC]

RewriteCond ^(.*)$ http://www.yourmotocmssite.com/$1[R+301,L]

RewriteCond %{THE_REQUEST} ^.*/index

Rewriter ^(.*)index$ http://www.site.com/$1 [R=301, L]

Note: Don’t forget to replace the term “yourmotocmssite” to your domain name.

Redirecting Sitemaps

Canonical is extremely handy when it comes to directing. We earlier discussed how you could redirect bots and users to find the robot.txt file. Now, we are going to do the same but for sitemaps. Sitemaps also suffer from bad bots, and hence you need to make sure that your servers spend as less time as possible to help crawlers find them. This will help you free system resources and also make sure to save bandwidth. To solve this problem, add the following code to your .htaccess file. We are going to use mod_alias to do the redirection.

# Solving sitemaps using canonical

<IfModule mod_alias.c>

RedirectMatch 301 /sitemap\.xml$ http://yourmotocmssite.com/sitemap.xml

RedirectMatch 301 /sitemap\.xml\.gz$ http://yourmotocmssite.com/sitemap.xml.gz

</IfModule>

To use the above code, you need to edit it according to your website domain and the file paths you are using. Now, let’s go through the above code to get a better understanding. The first line of the code is used to redirect requests to a regular sitemap which is uncompressed. The 2nd line of code, on the other hand, redirects to a compressed version of the sitemap.

Improving the .htaccess for SEO With Site Speed Caching

Who doesn’t love speed? Well, both users and the search engine does. And, if you want to rank higher on the search engine, you need to make sure that your website loads faster. One technique to make it faster is to use caching. Caching is a method by which website resources are stored in the browser. These resources are not changed much, and that’s why they don’t need to be downloaded every time the user requests it. By enabling caching, not only you speed up your website, but you also save server processing time.

To enable caching, we are going to use mod_headers and mod_expires. These will give you a better way of handling the overall caching process. Let’s get started.

<ifModule mod_headers.c>

ExpiresActive In

ExpiresDefault A259200

The above code will help you set an expiration time for assets. The number might seem significant as it is in seconds.

Robot Directives

Your website might contain pages or posts that you don’t want to share with your audience and even with the search engine. In that case, you might want to restrict access to those files. You can do it through .htaccess editing. It comes in handy when many content management system just doesn’t allow you to do the restriction.

We are going to use “No Index Meta Robots” to achieve the desired result. Let’s go through an example below.

Header(“X-Robots-Tag: no index”, true);

The above code simply hides PHP files that are generated by you. You can also configure a web server by using the following code:

<Files Match “robots.text”>

Header set X-Robots-Tag “no index”

</FilesMatch>

You can also set pages to “no follow” if you want those pages to not followed by search engines.

Header(“X-Robots-Tag: no index, no follow”, true);

Redirecting Feeds To FeedBurner

Redirecting your feeds to the Feedburner can help you boost your website SEO. You can automate the whole process thanks to the .htaccess. We are going to use the mod_rewrite module to achieve the desired result.

# Feedburner redirection
<IfModule mod_rewrite.c>
 RewriteCond %{REQUEST_URI} ^/feed/ [NC]
 RewriteCond %{HTTP_USER_AGENT} !(FeedBurner|FeedValidator) [NC]
 RewriteRule .* http://feeds.feedburner.com/mainContentFeed [L,R=302]

 RewriteCond %{REQUEST_URI} ^/comments/feed/ [NC]
 RewriteCond %{HTTP_USER_AGENT} !(FeedBurner|FeedValidator) [NC]
 RewriteRule .* http://feeds.feedburner.com/allCommentsFeed [L,R=302]
</IfModule>

For the above code to work for your website, you need to replace allCommentsFeed and mainContentFeed in the above code with your FeedBurner values.

Improve Crawl Mobile Content Using Vary Header

Mobile serving can also be improved using the Vary Header value. It is simply telling Google that you have a different page for serving to your mobile users. If you do it, Google will identify the pages correctly and crawl them accordingly. This will improve the user experience which inadvertently affects the site SEO rankings.

To activate it, all you need to do is copy the code in your .htaccess file.

Header append Vary User-Agent

Protecting image leaching

You can also configure .htaccess to stop other websites to leech images from your site. This will help you to improve your web server bandwidth usage and improve its performance. To do so, you need to use the code mentioned below.

OnRewriteCond %{HTTP_REFERER} !^http://www.yourmotocmssite.com[NC]
RewriteCond%{HTTP_REFERER}!^http://yourmotocmssite.com[NC]
RewriteRule[^/]+.(gifljpg)-[F]

That’s it! You are all set!

Summing up .htaccess for SEO

Google search image

.htaccess SEO can help you improve your website SEO in many ways. In today’s article, we went through many methods that touched issues such as redirection, caching, canonical and SEO-friendly URLs. The .htaccess file can be a game-changer if you use it correctly. The above guide will not only get you started but will also help you take full advantage of what .htaccess SEO has to offer.

The guide is aimed at both beginners and experienced bloggers and webmasters. By following the guide, you will be able to take advantage of what SEO has to offer completely. The aim is to optimize your website from the start and not wait for a longer period as changing it later can mean less impact. Solving things like image leeching, redirection, HTTP errors can have a huge impact on your website regarding user experience which is directly connected with the SEO itself. In the long term, only you will be benefitted, and that’s why we recommend it setup your .htaccess as soon as possible.

So, which methods are you going to optimize your website’s SEO? Comment below and let us know. We are listening. You can also comment below if you think we missed something or in case you have any suggestions.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Tags: developer's guide SEO seo guide seo tools tutorials web development
Author: Pawan Sahu
Pawan Sahu is the founder of MarkupTrend. He is a Digital Marketer and a blogger geek passionate about writing articles related to WordPress, SEO, Marketing, Web Design, and CMS etc. You can tweet Pawan @impawansahu