Google and Flash - the Good, the Bad and the Ugly.
17.11.2008
On July 1st, 2008 Google and Adobe announced that they had been working together to significantly improve the way that Google indexes Flash files. “Great!” everyone cried (especially Flash developers). But this new indexing capability has also raised a few thorny questions that are still looking for definitive answers. So where do things stand a few months after the announcement?
The Good
Obviously, it is a great step forward to have Google being able to properly search Flash files. Plenty of content is tied up in those ubiquitous .swf’s, and the better Google can read it, the better representative of the content of the web it’s search results will be.
Also, if Google can ‘read’ the content within the Flash files, then it means that accessibility tools such as screen readers may also stand a chance of using the same technology to make Flash content more accessible to users of such devices. In fact, the newly released Flash CS4 has increased support in these areas as well, so the future definitely looks rosier for accessibility in Flash.
The Bad
When the announcement was first made, the Google team were upfront about the limitations of the technology at the time. Since then, they have released another update to it which have solved some of the problems - most notably that Flash files embedded via JavaScript using the more popular solutions (such as SWFObject and SWFObject2) can now also be ‘read’, which was an initial limitation that worried a lot of people who were attempting to embed their Flash in a standards-compliant way.
However, there are other lingering issues such as the inability to index bidirectional languages (such as Hebrew and Arabic); and more seriously the fact that content that is loaded from external resources is not considered part of the content of the flash file (although they will be separately indexed). This means that externally-loaded XML, HTML or additional SWFs will not be seen as part of the same page’s content, which is a major problem if you are using the common (especially with AS3) technique of using a small ~4kb preloader file that pulls in the main SWF file and displays it when loaded. In this case, only the preloader would be indexed as part of the main page, which is obviously not the ideal situation. Google have said that they are aware of this issue and are looking into it, and expect to address it in a future update before too long.
Another other limitation of the technology (which is more a limitation of Flash than it is Google’s indexing algorithm) is the inability to deep-link into the Flash files. What does this mean? Imagine that there is a site which consists of one Flash file, within which you navigate around using menus within the Flash. The architecture within the Flash file would give the impression to the user that the site consists of multiple pages; however as far as Google is concerned, all of that content belongs to one page and that is the only page that will appear in SERPs (Search Engine Results Pages). So even though there may be an ‘about us’ page in the Flash file, that page will never feature in the SERPs, so no-one will be able to jump directly to that page from a Google search (i.e. no ‘deep-links’ into the site) - they will always have to go into the homepage and try to find that content via navigating around the site.
There are third party solutions to the problem of deep-linking within Flash files, however they are not widespread, and it is unlikely that the indexing problem will be solved until the Flash authoring tool offers native support for deep-linking within the files it produces.
The Ugly
There is one major potential question that arises from this new Flash indexing capability which so far has not been adequately addressed by Google, despite it potentially have quite major repercussions: That is the question of duplication of content. The widely agreed on ‘best practice’ for embedding Flash these days is to use JavaScript to embed the Flash file into the page, replacing a section of HTML that contains appropriate alternate content so that search-bots, or those without Flash, have access to the same (or similar) content - albeit in potentially a less graphically pleasing form.
This technique works well. Previously, because Google could not index the Flash file itself, it would not see any duplication of content, something that is frowned upon because it is a shady technique often used by ‘black-hat’ search engine optimisers to ’spam’ Google into thinking that there is more content on the pages than there really is. These people often attempt to hide large parts of keyword-heavy text on pages in order to help boost the page’s ranking, something that Google comes down hard on if detected.
However, now Google can read content in the Flash files, the question arises of how will Google view the alternate HTML content? Will websites end up being penalised for duplicate content? On the Google Webmaster Blog, they say:
“Serving the same content in Flash and an alternate HTML version could cause us to find duplicate content. This won’t cause a penalty — we don’t lower a site in ranking because of duplicate content.”
However, in a previous SEMNE group event, Dan Crow (Head of Crawl at Google) said that embedding techniques such as SWFObject were ‘dangerous’ and that ‘he could not guarantee as being immune from being penalized’. So the situation is far from clear.
Even if Google does not penalise websites for content that is duplicated in this way, another point that needs clearing up is over the ‘preference’ Google will give to each type of content. Will the HTML text be favoured over the copy in the Flash file? Or vice versa? Obviously either situation has the potential to be far from ideal.
In summary…
So, whilst in general Google has taken a major step forward in providing us all will search results that are truly representative of the content on the web today, there are still a few outstanding issues that need to be addressed and a few lingering questions that need to be answered. Google is definitely aware of these problems, and without doubt will roll out updates to their service that will ensure these are fixed in the not-to-distant future. If you want to keep up with the updates, probably the best place to do this is the Google Webmaster Blog - and if anything of note bubbles up we will do our best to cover it here too!