On Twitter: @JamesFirth and @s_r_o_c (post feed)

Got a tip? tip@sroc.eu



Wednesday, 2 February 2011

What the Bing-copies-Google spat has to do with your privacy

Adapted from original image by Alan Cleaver CC-BY
It's an interesting heavyweight spat between technology giants, but behind the allegation that Bing copied search engine results from Google lies a frightening reminder of a practice that intrudes on your private web habits.

When Danny Sullivan posted on Search Engine Land that Bing had been copying search results from Google the story went viral, and has since appeared in the morning papers here in the UK.

The resulting traffic crippled the Search Engine Land website, and forced Microsoft to issue a denial: we do not copy Google's results.

I caught the news early on Twitter and, unable for a couple of hours to access Danny's original article, I puzzled over how Google could let this happen!  The technology exists to limit the ability of your rivals to take a wholesale copy of any website.

I'd bet my house on Google having the technology to prevent rivals stealing results.  To do this on a commercial scale, as required by a large rival like Bing, and not get caught, would simply be : impossible.

Which brings me on to privacy and why this story is important to us all.  A post on the official Google blog links Bing's copying of search results to the Bing browser tool bar and/or the Suggested Sites feature of Internet Explorer.  Many people use these convenient browser add-ons, but few are aware - or even care - exactly how much personal information might be sent back to the maker of the add-on.

Implicit in the allegations detailed on the official Google blog is that the browser, combined with the Bing tool bar, does, in certain circumstances such as the configuration outlined in Google's blog posting, send substantial portions - if not the whole - of the web page visited back to Microsoft.

Software engineers have known ever since web browser add-ons were invented that any add-on has the potential to snoop on your entire web browsing session.

Of course any reputable company is restricted by privacy laws* and a desire to protect their reputation and maintain public trust, but I wonder how many of us have clicked the "accept" button for an add-on without realising we had just granted permission for the maker of that tool to snoop on your web browsing? Without really appreciating that almost everything you do in your web browser could be relayed back to a third party?

Because of this I use very few browser add-ons; and, without sounding too paranoid, I would seriously urge readers to review the list of add-ons installed on their browser and consider whether (a) that tool brings you any tangible benefits and (b) you trust the supplier of the add-on to only take the information it says it wants and process any information gathered in a secure and sensitive way.  If the answer to either question is no, then remove it without hesitation!

Privacy isn't about having something to hide, it's about taking sensible precautions to limit who knows what about you.  In practical terms this will limit the capacity of spammers and marketeers to hassle you with marketing messages and reduce the possibility that some of your personal details may fall into the hands of criminals and be misused in an attempt to hijack your online accounts.

But in general terms privacy is about limiting the power others have over you once they know information about you.  Again we're not talking about hiding illegal or even immoral or embarrassing acts, but how many of us would be happy for our parents to know every detail of our lives that our close friends know?  It's a basic right to be able to choose what information we share and with whom.

*A quick note on privacy laws - laws are local to a particular state or country, whilst most internet services have an international reach.  I discussed the implications of this in my elastic jurisdiction post.
And also a caution on reputation.  Whilst a company may well take your privacy very seriously, there is always the risk of a data leak.  All it takes is one rogue employee who may be tempted to sell for example account names or email addresses on the black market - and there is such a market - or one engineer to make a simple error or miscalculation for private data to be released.

Taking privacy seriously also encompasses several basic principles, such as: collecting and storing only the minimum amount of data necessary for any given purpose; and, processing the information in such as way as to remove or decouple identifying information such as IP and email addresses, account IDs etc from the data gathered at the earliest opportunity.

@JamesFirth

8 comments:

  1. "The technology exists to limit the ability of your rivals to take a wholesale copy of any website."

    Really? I would severely question this statement as don't believe it true at all!

    Nice post though and I totally agree about plugins, not just for your browser but in all the tools you use. Facebook apps are the next obvious example :-/

    ReplyDelete
  2. Hi Dan,

    Okay it's not 100% reliable but I did say "limit" the ability of your rivals, and had dynamic content - such as search engine results - specifically in mind.

    If a large rival wanted to crib from Google, dynamically, on a commercial scale they'd need an impressive network of IP addresses to hide the redirected traffic from Google. I can't see anyone the size of Bing managing to hide this much traffic.

    I'd still bet my house on it being a non-starter, which is why the use of the add-on makes so much sense to me.

    ReplyDelete
  3. Microsoft used their worldwide browser userbase as a bot-net (via their browser extensions) to create that impressive network of IP addresses (there are 900 million users of IE). They hid this from Google initially by distributing the computers around the world as users. Google suspected this and put out honey pots (as explained in the blog) to prove it.

    ReplyDelete
  4. I'd love to know whether Microsoft stand accused of a "distributed botnet" or simply scraping the results from the browser window. There is a subtle difference - 2 HTTP/GET requests as opposed to one. Also Google results are personalised, so *if* there was a 2nd request, did that 2nd request carry the same personal identifier?

    To me this fine detail makes a huge difference in understanding any privacy implications...

    ReplyDelete
  5. It's simple - Google was the only engine which knew fake key "jkhsdfjkshfjds" -> mapped to -> "picture of teddy bear URL". The browser extensions performed a reconstruction of that mapping by recording 'when user enters "jkhsdfjkshfjds" -> it goes to ->"picture of teddy bear URL" and returned it to Bing. Bing then loaded these associations into it's index. It's reverse engineering what keys are associated with what URLs by Google - this is a plain and simple copy of the association in Google at that point in time.

    ReplyDelete
  6. I think you missed the subtle point of my question (assuming both Anon posters are the same).

    I want a definitive answer whether the browser extension makes the association from scanning the content on the web page in the browser. Or does it make its own separate query with Google to form this association.

    It's an important point, because if the browser extension is reading the web page rendered in the browser then surely it's sending more than "web browsing history" back to Microsoft, as described in IE privacy policy on Feb 1st 2010?

    ReplyDelete
  7. I guess only Google know the answer to that. I think we're looking at two different issues. I'm expressing dismay at a stealthy rip off of Google's search algorithm by recording it's results and passing them off later as the results of Bing, and your'e concerned with the privacy implications of potentially how they did it. Both valid but different issues.

    ReplyDelete
  8. I read Google's blog again. It implies they don't (and can't know). They speculate that Microsoft either sends it back via the toolbar and/or the browser feedback pipe, or the two things could be combined on their receiving end to match 'typed this in' from toolbar with 'went here' from the browser feedback. Either way only Microsoft knows (not Google) I'm guessing as it's their code.

    ReplyDelete

Comments will be accepted so long as they're on-topic, do not include gratuitous language and do not include personal attacks or libellous assertions.

Comments are the views of the commentator and not necessarily the view of the blog owner.

Comments on newer posts are not normally pre-moderated and the blog owner cannot be held responsible for comments made by 3rd parties.

Requests for comment removal will be considered via the Contact section (above) or email to editorial@slightlyrightofcentre.com.