16

May 11

SEO for PDF files: Advanced Tricks

Optimizing PDFs - One File at a Time

Monk-eying around with a PDF file

PDF files, just like web pages, can be optimized to rank highly on Google.  Many SEOs recommend steering away from PDF files as much as possible, but they are ranking all over the place on Google, so I wouldn’t particularly avoid using them.  In fact, if you’ve gone through some effort to make a professionally-formatted PDF file, one might argue it’s likely to be higher “quality” content than the average run-of-the-mill web page.  I would not rule out Google even slightly favoring PDF files for this reason.  Tests or correlation studies don’t seem to have been done on this topic by anyone in the industry – if you know of any, please mention them in the comments below.

Here’s a list of best practices you can use to optimize your PDF files:

1. Tools to create your PDFs
Ideally you should use Adobe Acrobat, but if you’d like to do some of the things I’m suggesting here on the cheap, you can download a few free tools that can do the job.  CutePDF is a printer driver for Windows that converts anything you’re printing into a PDF, and Quick PDF Tools allows you to edit the PDF’s properties after the fact.  If you have Microsoft Word 2007, Microsoft has a free add-in download that enables you to save documents as PDFs as well.

2. Keyword Density
Just as with web pages, using the target keyword, and peppering in some related keywords, the right number of times, is important in telling Google what your page is about.  Many in the industry have tried debunking keyword density as a ranking factor, but the fact is, it works.  You can use a tool like Bruce Clay’s keyword density tool or GoRank’s tool to figure out the proper keyword density by analyzing the top pages ranking for the keyword you’re targeting (I usually use the top 4 ranking pages).  Yes, it can be hard to say “lawyer in miami” 70 times, but if the pages you are competing with are doing that on average, you really must.  I wouldn’t get quite as hung up on document length – often there is one very large SERP result that skews the average – but you should try have your document be longer, at least, than one of the top four.

3. Avoid Duplicate Content
Of course, if your PDF is simply an alternate, printable version of an existing web page of yours and you don’t really want it to rank, you should “noindex” it in your robots.txt file – otherwise Google may rank it rather than the web page you’d prefer to rank.

4. Make it a text-based PDF rather than an image-based one
If you’re printing from MS-Word or using CutePDF and so on, this won’t be a problem.  If you’re using image editing or some sort of page layout program, you may need to check this.  If you can view the file with Acrobat Reader and can select and copy text from it, then you’ve gotten this one correct.

5. Put your keyword in the file name
This is often ignored but is likely used by Google in its ranking algorithms – use dashes to separate words, i.e. “squeaky-floor.pdf”.

6. Set the Title property

Obviously you want to optimize the title just as you would for a web page.  Put your keyword as far to the left as possible, and if you can get the keyword (or pieces, stems, and so on of it) in there twice, more power to you.  For instance, if you want to rank for “grow tomatoes”, you might try “Grow tomatoes – tips for tomato growing”, and so on.  Whatever you set the title property to is what Google will likely display as the title in the SERP.  Also in the document properties place the title into the “description” field.

7. Subject Property (i.e. the Meta-Description)
You should put your meta-description into the “Subject” property of your PDF file. I have found a lot of bad advice out on the web about this directing people to use other properties such as the “keywords” field, but here is absolute proof that the “Subject” property is the correct one for your meta-description: a screenshot of a SERP result (figure 1), and the properties of the source document (figure 2).

SERP Result for a PDF file with Meta-Description
Figure 1 – SERP Result for a PDF file with Meta-Description.  *** click to enlarge ***

Figure 2: Proof the "Subject" property is used for the Meta-Description
Figure 2: Proof the “Subject” property is used for the Meta-Description.   *** click to enlarge ***

The prosecution rests!

8. Keywords Property
Although Google is not believed to use Meta-Keywords tags from HTML pages, a slight correlation was observed in a study done by academics who worked to reverse engineer Google’s ranking algorithm.  It may not help much, but throwing in your keyword and a few variations, separated by commas, certainly won’t hurt and is probably called for.

9. H1 and H2 tags
I would not obsess about adding these in as they only contribute to ranking slightly, but if you want to, you have two options.   The MS-Word plug-in mentioned above allows you to save headings as bookmarks (make your H1 tag by selecting the “Heading 1” style in the document, then when you save as PDF hit the “Options” button to select this option).  The other way would be to purchase Acrobat Professional.  I do not know of any free tools that will allow you to create H1 and H2 tags, but if anyone out there does, please make a comment below.

10. Other fields
The Author, Comments, and advanced fields such as Copyright Information and so on can generally be ignored for SEO purposes.

11. Linking to individual pages in your PDF
Here’s a neat trick – you can link to a specific page of a PDF (regardless of whether it has any special tags in it and so on) simply by appending [#page=] and a page number to the URL, for example:
http://www.hq.nasa.gov/alsj/SM2A-03-BK-II-(1).pdf#page=610
This won’t necessarily help you from an SEO standpoint, but from a navigation standpoint within your site it can be extremely convenient.

12. Use PDF files in your internal linking strategy because they PROBABLY pass PageRank
In an interview with Stone Temple consulting, Matt Cutts implies that links in PDF files do indeed pass PageRank. If PDF’s don’t pass PageRank, Google would lose nothing by disclosing that – but if they do, then by disclosing it Google would be creating an incentive for people to proliferate PDFs (and Google is well known to hate closed standards – particularly if it’s not their own – they probably don’t want to encourage people to embed links in QuickTime videos either 😉 . You could argue that since PDF files do not show up in Google Webmaster Tools as sources of links they must not count, but what GWT displays or doesn’t display is a conscious choice on Google’s part (in fact, they point out often that the backlinks you can find in GWT are not all of your backlinks).

I would not be surprised if a link from a PDF does indeed pass PageRank and is even weighed *more* heavily than the typical link, but I am unaware of anyone in the industry doing testing in this respect.

Just as with HTML files however, it is reasonable to assume that the anchor text of links in PDF files is significant for the document being linked to for ranking purposes, so PDF files should be a part of your website’s internal cross-linking strategy.

Conclusion

PDF files are fine to use for SEO purposes; if you only have a few, don’t sweat the details, but if you have a lot of them, putting a standard process in place to optimize these as you create them will be well worth the effort.

17 Comments

  1. shashank says:

    Thanks Ted, I wasn’t knowing about whether PDF passes Page rank or not, you cleared that doubt. the next Question i have for you is; how to get your PDF rank high in Google for a particular keyword apart from putting meta details.

  2. harry says:

    Thanks for this advice, know i can say, i know better!

  3. Casey says:

    Thanks! I do all of my own SEO and I have noticed several pdf files showing up near the top on google search. I wanted to check this out and you have definately helped! I do have one “dumb” question.. Where do I upload the pdf file? On a page of my website? And then do I re-submit it to google so that it can index it quickly?

  4. Ted Ives says:

    You’ll have to transfer it to a directory on your website. If you’re using cPanel or some other hosting admin software, look around for “File Transfer” in the admin area, or “FTP”.

    Once you have placed it in a directory (and the url would then be, for instance, “http://www.foo.com/directory1/filename.pdf”), then add a link to that URL on one of your existing pages, and also add the URL to your sitemap file. If you don’t have a sitemap file, google [sitemap generator], there are many websites that will automatically spider your site and create a sitemap. A sitemap acts as the centralized place that Google’s spider double-checks for files to make sure it hasn’t missed any.

  5. seo says:

    this blog is very nice ,excellent and given coconut seo information,
    ———————
    Johnson

  6. Great info Ted. One question – is there a way to embed a pdf file within a portion of a page (iframe, etc.) and still retain the ability for Google to read the document? When I’ve played around with it so far, it doesn’t seem that Google recognizes the text in a pdf if it is embedded.

    Any suggestions would be really helpful!

  7. Berenice says:

    This is very helpful but doesn’t quite answer all my questions. I was thinking of using pdf sharing sites (eg scribd) as a means of providing backlinks for my main blog. Is this worth it?

    On another website it was advised to repurpose content for this reason. ie. take an article and convert it into a pdf and upload it onto these kind of sites. Will google pick this up as duplicate content?

  8. Ted Ives says:

    Great question. Although I’ve not done this sort of “repurposing”, I’ve read a lot of the same articles (Ann Smarty has one here with some good ideas):
    http://www.seosmarty.com/how-to-re-package-your-best-content-for-more-exposure-and-links/

    My sense is, for the most part, Google is nowhere near as proficient at identifying duplicates across websites as they would like you to believe, but instead is much more focused on identifying duplication within websites (an example would be, all the people that complained after Panda that scraper sites were outranking the originators of content).

    So what you’re talking about is probably one reasonable link-building approach, but make sure you hedge your bets by having 4-5 different link building strategies you’re pursuing.

    For example, I tried some WordPress blog commenting over the summer, to little effect, only to find now that others in the industry have also noticed that WordPress comment links were highly devalued by the Panda update.

    So make sure you pursue multiple different link building strategies, to future-proof your efforts.

  9. Ted Ives says:

    James, somehow i missed your comment in August on this. Good question, I did some poking around and the best posting I’ve seen on this recommends you embed PDFs (if at all) using the tag, but definitely not iframes (which makes sense):
    http://www.cloutsmiths.com/2011/embedding-pdfs-in-a-search-engine-friendly-way/

  10. James M. says:

    Thanks for the link Ted. I’ll give it a go!

  11. AllSearch52 says:

    Thanks for the tips, I was looking around as to whether a PDF sitemap is something outside the norm, one woudl think that if PDF’s are indexable(similar to images), you could be able to create a standalone PDF sitemap and submit to Google.

  12. Kevin Parker says:

    Ted, you have some great tips in here. I particularly like #3 “Avoid Duplicate Content”. For the record, though, PDF has been an OPEN standard since 2008, published as ISO 32000-1. There are also variants like PDF/A (archive), which is the ideal for digital preservation of documents. PDF/A is also an open standard (ISO 19005-1) and is maintained by AIIM International.

  13. Thank you so for the informative post. I just now started to collect the number of high rank pdf sharing website links. I was wondering how pdf could effects the keyword of my webpages. It works so perfect and strong and build links too. This post in very helpful for seo beginners.

  14. GuruMoz says:

    Since this article was written a few years ago, is it still relevant in 2014? How is Google handling PDF documents on the web now?

  15. This is a simple method than going after long tail keywords. Thanks for opening an additional avenue for PDF SEO

  16. Matija says:

    Hi there. I agree indexing pdf files is seo good practice, but it’s all matter of intent. I have sites where my intent is solely to get inguery for products that I am selling online. So, I am not interested for my visitors to read, to see the cataloge or even to send my an enail regarding therse wishes. Or to subscribe to my newsletter. Where intention of my site is to get an inquery for products, I don’t use seo to index pdf. Regards, Matija

  17. Robyn says:

    This is great information! We currently have our pdfs on an external URL and they are emailed to visitors after they complete a form. I know that they are not getting any SEO juice by being on another external server. I am afraid that if I put them on our website, people will find them and download them without filling out a request form. Now we have control over their distribution. Will it help to put them on our website so that people can find them directly? Because they get more CTR, I am thinking it will but don’t like losing the control we now have. Any advice?

Leave a Reply