On Twitter: @JamesFirth and @s_r_o_c (post feed)

Got a tip? tip@sroc.eu

Wednesday, 2 November 2011

Copyright law and data analytics (data mining) discussed in depth at PICTFOR

For comments or corrections please email editorial@slightlyrightofcentre.com or call 01252 560 426

Last night’s Parliamentary ICT Forum (PICTFOR) event on copyright and data analytics (“mining” published works for trends and other nuggets of information) was one of the most enjoyable and useful events so far for the newly-merged Parliamentary Committee.

There’s currently no copyright exemption (e.g. fair dealing-type justification) for computer processing of published works without permission from the copyright owner, and this can seriously impact academic study and areas such as medical research, we heard.

It can also make it very hard, from a legal perspective, for a rival to Google to emerge in the UK; although IBM’s legal advisor was very careful not to mention the ‘G’-word, instead focussing on the legal uncertainty around processing even unprotected web-based content. “We largely advise against doing it, because of the legal risk. We might even be accused of incitement to commit copyright infringement if we provide tools to enable others to analyse online content.”

IBM have a fairly balanced view on data analysis, trying to tread a path between the right to analyse openly published online content and the right of publishers to limit access to other content.

Scroll my tweets live from the event (or open in new window):

Although much of the debate was technically involved, speaking afterwards to several audience members they found this style of debate highly useful.

The panel consisted of Cambridge professor of intellectual property law Lionel Bently; Philip Ditchfield, contacts and licensing manager at Glaxo Smith Kline; COADEC’s Jeff Lynn; John McNaught from the National Centre for Text Mining; Richard Mollet, CE of the Publisher’s Association; Stephen Pinfield, CIO at the University of Nottingham and IBM’s legal advisor on intellectual property Peter Stretton.

There was no argument against the usefulness of data mining and “deep” semantic analysis of published works, especially academic journals - but there’s no consensus or simple categorisation of the types of works it would be useful to analyse, either now or in the future.

For example, linguistics scholars may draw useful conclusions from analysing language used in works of fiction. This got me thinking whether world events such as war, terrorism and recession might influence the mood, themes and language used in fiction published during these times.

The debate however focussed mainly on academic research, with a strong emphasis on science. We heard from the National Centre for Text Mining that up to 92% of the content of academic works remained largely invisible to standard academic search tools because subjects and themes were not captured in the abstract.

Even “full text search” was inadequate for many purposes because some words and scientific terms are common, yielding tens of thousands of hits.  Analysing the context in which such terms appear can narrow the search and yield useful results – techniques broadly known as semantic analysis.

Glaxo Smith Kline suggested that analysis of trends across hundreds or thousands of medical publications might help direct future research, or even yield a breakthrough. Smarter research would lead ultimately to better medicine.

Richard Mollet of the Publisher’s Association countered that publishers weren’t averse to allowing controlled access for bona fide research organisations wishing to data-mine their entire works.

Semantic analysis techniques [currently*] require the whole text to be made available so that the analyser can have full control over how the text is interpreted. This brings a risk that unscrupulous organisations might steal whole volumes of text, denying the publishers of their prime asset.

[* This is not always true. Some databases allow a level of analysis to be performed by third party algorithms without handing over a full copy of the input text.]

“Access needs to be controlled or regulated in some way. Some people like to paint publishers as bouncers, denying access. That’s not true, we want to help. I like to think of publishers more like maître d’s, guiding clientèle to their table.”

Richard dismissed Jeff Lynn’s (Coadec) suggestion that publishers want to protect an exclusive monopoly over their back catalogue, saying that most publishers had a policy of licensing academic work, and many indeed licensed other uses.

He also noted a lack of demand, saying that only 10-15 requests (per year?) were made to license analysis of published catalogues, but the University of Nottingham CIO Peter Pinfield provided some background for this, explaining that licensing was both complex and restrictive, and this discouraged many researchers from embarking on projects which relied upon data mining.

Jeff Lynn added that many small businesses or enthusiasts would never get the funding or meet the criteria to access the data, yet notable digital advances have come from small businesses or individuals. Open access would allow an army of smaller developers to develop new search and cataloguing techniques or dig for interesting trends.

Some interesting legal points were raised, including a discussion around whether intellectual property (IP) rights were analogous in law to physical property rights. Essentially, yes, said Professor Bently. Intellectual property was protected as a property right under Article 1 of the European Convention on Human Rights (and other treaties), but that didn't necessarily mean there can't be exemptions or compulsory (statutory) licensing conditions applied

In fact the absence of a compulsory licensing model could eventually work against the interests of publishers, as a statutory defence to copyright infringement exists when there is no lawful method of licensing content *for educational use, although this is a complex area that would need to be established in court.

* Correction added 5/11/2011

Peter Stretton also made an argument in this theme, noting that automated processes that currently crawled the web and other data sources searching for infringing material could themselves be infringing other people's copyright.  Essentially tools useful to publishers in detecting infringement could themselves be unlawful under the current copyright regime.

During questions it was pointed out that the solution to many of the issues could lie in statutory licensing – fixed rate license fees for access, set by a tribunal in much the same way as performing a cover version of a song.

There are strong arguments that companies wishing to benefit from other’s IP, such as GSK (as with all other corporations profiting from science), should pay something. Statutory licensing would solve many of the problems highlighted by the University of Nottingham’s CIO.

But Stephen Pinfield countered that the University already pays around £5m a year to licence journals and other content, why should it pay again just to perform a computerised analysis of the content it already licensed?

Other representatives of the Publisher’s Association argued that a change in the law might bring unintended consequences (I read: detrimental to the publishing industry) and that a solution could be found through “a collaborative collective approach.” By collective it can be assumed he means royalty collection societies, in a similar role to the Performing Rights Society in music royalty collection.

But there was a mood in the audience both during the Q&A, and speaking privately afterwards, that publishers have historically been reluctant to act in this area, and would not go far enough under their own volition.


For comments or corrections please email editorial@slightlyrightofcentre.com or call 01252 560 426

No comments:

Post a Comment

Comments will be accepted so long as they're on-topic, do not include gratuitous language and do not include personal attacks or libellous assertions.

Comments are the views of the commentator and not necessarily the view of the blog owner.

Comments on newer posts are not normally pre-moderated and the blog owner cannot be held responsible for comments made by 3rd parties.

Requests for comment removal will be considered via the Contact section (above) or email to editorial@slightlyrightofcentre.com.