eBook - ePub

Search: Theory and Practice in Journalism Online

Name: Search: Theory and Practice in Journalism Online
Author: Murray Dick

Murray Dick,

164 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Search: Theory and Practice in Journalism Online

Murray Dick,

Book details

Book preview

Table of contents

Citations

About This Book

Whether uncovering breaking stories, finding reliable background information, or finding the right contributors for stories and packages, there is now a wealth of information available to journalists online - but where to begin? In Search: Theory and Practice in Journalism Online, Murray Dick provides a practical and theoretical overview of the journalistic research potential in various online tools. Written by a leading expert in the field, the book offers experience-based guidance into online search for journalism. Key features: - Up-to-date coverage of advanced search, the 'invisible web', social media, multimedia and the verification of online material
- A critical overview of theory in online ethics, verification, and use of social media in journalism online
- Original research into search theory, privacy, trust and rights issues online
- Student-friendly pedagogy based upon professional practice and informed by experts in online research Search: Theory and Practice in Journalism Online is essential reading for undergraduate students of digital journalism, online reporting and journalism studies.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Yes, you can access Search: Theory and Practice in Journalism Online by Murray Dick in PDF and/or ePUB format, as well as other popular books in Lingue e linguistica & Giornalismo. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Bloomsbury Academic

Year

2013

ISBN

9781350306585

Edition

Topic

Lingue e linguistica

Subtopic

Giornalismo

1 Search in theory

From information overload to filter failure

Information overload is a modern concept, though not a new one. Long pre-dating the rise of the internet, the term was popularised by Alvin Toffler. His book, Future Shock (1984), conveyed the nausea of the information age, and the sense of dislocation individuals and groups feel as a result of excessive change in a too short period of time. The concept has been studied and analysed in a range of information professions, from accounting to marketing and consumer research. In most cases the common denominator is that individual performance (in terms of decision-making) varies relative to the volume of information available, up to a point. Beyond that relative point, further information results in rapidly declining performance (Eppler and Mengis, 2004).

As far back as 1997, information overload was found to interfere with journalists’ ability to gain traction, or ‘grip’ over the news (Nicholas and Martin, 1997). Journalists have complained of information overload caused by modern working life, and not least excessive PR communications, or ‘information subsidy’ (Curtin, 1999). More recently, scholars have predicted that information overload will continue to be a key challenge for journalist and citizen alike in today’s networked world (Servaes, 2009).

The internet makes it possible for anyone to publish information, which has led to rapid growth in the production of information online. Indeed, Google indexed its landmark trillionth web page as far back as 2008. This sea of information can be disorientating without adequate support, and the risk of misinformation to journalists poses a serious danger to professional credibility.

Information overload continues to inspire new literature, albeit at a declining rate since around 2004. Some critics have argued that the volume of information on the internet is making us more ‘stupid’ (Carr, 2010), others disagree (Battelle, 2008). Striking a pragmatic note, Shirky (2008) turned the concept of information overload on its head, arguing that the true malaise of the modern era is rather ‘filter failure’. This is conceived as the collapse of those systems we use to help us tell good from bad (the likes of which we make use of everyday in our off-line lives). While on the surface this may seem a mere semantic twist, Shirky’s term moves us away from the individual (not ‘user’) as a helpless, information ‘junkie’. It moves us instead towards a place where, if we can build filters online to help determine the useful from the useless more efficiently, then we may be able to plot a course through this sea of information.

This book will offer an overview of the current state of information filters available to journalists working online. It is intended to help journalists address the filter failure that plagues contemporary, networked journalism.

How journalists use the internet

Throughout the last century, most large media organisations employed teams of researchers and librarians whose job was to provide journalists with research systems (such as newspaper clippings) and a full reference library service. A combination of economic factors (including declining circulations and industry consolidation) and technological developments (including the emergence of powerful and affordable database technologies) have changed this. As with many other professions throughout the global economy, media research is increasingly becoming the domain of the all singing, all dancing all-rounder: the journalist.

Today many journalists do much of their own research from the comfort of a desk. But navigating the internet as a researcher or journalist is a different proposition from using it to book holidays and listen to music. One difference is that a journalist cannot just give up on a story if he/she is struggling to find the information needed, another is that the news schedule will not slow down to accommodate journalists who cannot find information in time.

Various studies have considered how journalists use the internet to help them source news and contributors. Nicholas (1996) found that Guardian journalists (who demonstrated degrees of capability in online search) were not as ignorant of technique as some librarians (and librarianship literature) assumed them to be.

The mid-1990s saw a tipping point in US journalists’ usage of the web in newsgathering. One study found that daily use of the internet had risen from 25% of respondents in 1994 to 92.4% in 1998 (Garrison, 1999). This rise in usage has fuelled concerns about the content found online, especially concerning the verifiability and reliability of this information.

By 2001 web search was out-stripping the use of commercial online research tools in US journalists’ news-gathering routines (Garrison, 2001), and by 2005, it was found that almost two-thirds of journalists were using competitor news found on the web in their research and reporting (EURO RSCG Magnet & Columbia University, 2005). However, just because journalists today have access to many unofficial sources online does not mean they use them. On the contrary, the old long-established ‘conventional’ means of sourcing stories remains (Jha, 2007).

More recent studies have moved away from the basic measurement of internet use, towards trying to understand the contexts within which the internet is accessed by journalists. This includes issues arising out of professional context, such as time constraints, and means of access, in terms of the range of tools available to journalists, and those which are actually used (Hermans et al., 2009). While it is important not to confuse professional approach with medium in journalism, online journalists have been found to be more trusting of news they find online than print journalists (Cassidy, 2007), suggesting a relationship between the two.

In perhaps the most comprehensive research study to date into journalists’ use of online search tools, Machill and Beiler (2009) found that although Google plays a decisive role in most German journalists’ news-gathering research today, it is nevertheless one of relatively few online tools commonly used in the newsroom. They found that most journalists achieve only moderate levels of success in online search. Those who apply most thought to search problems were found to perform best overall. It was also found that journalists’ concerns about the reliability and verifiability of material found on the web have led to an increase in the cannibalistic approach to newsgathering that journalists will tend to reference their own work and other media at the expense of reflecting the whole web. Further research has shown that just over half of journalists are oblivious to blogs and social media in terms of sourcing the news (Oriella PR Network, 2011).

Search theory

Morville (2005) explains that the central issue in search (or ‘information retrieval’ as it was originally known) has traditionally been the concept of relevance. Traditionally, developers in this field have conceptualised search in terms of the binary (and inversely related) concepts of ‘precision’ and ‘recall’ as measures of relevance. These two concepts have very precise meanings in search engine development, thus:

Precision = Number of relevant and retrieved/Total number retrieved Recall = Number of relevant and retrieved/Total number relevant

(Morville, 2005, p. 51)

Morville goes on to explain these concepts in the following terms: ‘precision measures how well a system retrieves only the relevant documents ... recall measures how well a system retrieves all the relevant documents’ (Morville, 2005, p. 49). How important these two concepts are (and how search strategy should be amended to accommodate them) depends on the type of search undertaken. For searches which require a certain (manageable) number of search results (a situation common to many busy, time-poor news journalists), precision is the key. But for exhaustive searches, where unearthing a fact may require hours of painstaking search (a situation most investigative journalists will be able to relate to) recall is the key.

But this approach is fundamentally compromised by the imprecision, ambiguity and vagueness of language as it is used by one variable which no laboratory conditions can impose order upon, the searcher. When relevance is defined by search engines, it is an aggregate, quantitative measure, whereas we humans think of relevance in a rather more fluid, ambiguous, qualitative way. This is why search requires additional aboutness, and additional keywords for content, not to mention Boolean operators, field-specific functions and other advanced options.

In the mid-1990s, the focus of work in information retrieval moved away from the hard science of precision and recall, and towards the study of how humans interact with information (Bates, 2002). Putting the searcher at the centre of the process changes the landscape. Acknowledging that information needs to evolve as searchers interact with the tools at their disposal (and change their search needs) changes the game.

Search developers are aware of the importance of iteration in search that what is considered to be the right result is an (often) internalised process of negotiation. Expert search is akin to berrypicking in that it is discriminating (Bates, 2002). Users satisfice their search needs. As such, those search options which are available to us can change the nature not only of what can be found, but also of what we seek (Halavais, 2009, p. 87). Post-modernity, it could be argued, came late to search, but information literacy can help journalists to clear a path through the search wilderness.

How search engines work

The term ‘search engine’ is often used to refer to two entirely different types of search resource, human-powered directories (which will be covered in more detail later in this book) and crawler-based search engines. The first form, human-powered directories, such as the Open Directory Project, DMOZ (http://www.dmoz.org/), are designed to aid browsing for information within a collection of materials organised along the lines of human expertise, and presented in an intuitive way.

The second form, crawler-based search engines like Google, Bing and Yahoo, have three components. First is the ‘spider’, the algorithms which pass through the internet looking for new content and for changes to existing content. This process involves analysing all information on a page, parsing that information, and then storing it in the second component, the search engine’s ‘index’. The ‘spider’ then exits via the links found on that page, so if a web page has been spidered but not indexed, it will not be found via the search engine. The third component, the search engine, is a program which helps the searcher interact with the index, and which ranks content returned in terms of ‘relevance’.

Major search engines differ only by degrees, in particular with regard to the metrics (and the weighting of these metrics) used to determine ‘relevance’.

Google is said to use over 200 ‘signals’ to help determine the relevance ranking of web pages, including their patented PageRank algorithm (which uses co-citation between web pages to inform ranking). However, apart from some general technical (and editorial) advice, the company is deliberately vague on how these signals are weighted in ranking. This absence of transparency is deliberate; it stems from not wishing to give unscrupulous web publishers ammunition with which to ‘game’ the system (Moran and Hunt, 2006). This is a fast changing situation. Some of those methods which have been significant in search relevance in the past, such as the use of metadata tags to add meaning to online content, have been exploited by ‘spammers’ and so their significance in ranking has been muted. For example, the <Keywords> HTML meta tag is no longer factored into ranking (Cutts, 2009).

More recently, Google has been forced to acknowledge the need to move towards qualitative notions of quality and relevance. The ‘Panda’ algorithm (Google Blog, 2011), has been developed in the midst of wide-ranging concern about the rise of ‘content farms’, websites whose content is optimised for search, but which owe their online authority more to effective exploitation of Google’s Adsense marketing platform, than to reputation for providing quality information (Roth, 2009).

Political economy of search

Jurgen Habermas conceived of the public sphere as a social arena which exists between the private sphere of enterprise and the government. It is somewhere people can get together to talk openly about the issues of the day, to debate social problems, and organise action accordingly. This neo-Enlightenment concept, it is argued, found its apogee in the coffee houses of 17th and 18th century London. These were discursive spaces, places where deliberative democracy could flourish (Habermas, 1991). But does the economic structure and technology comprising today’s search industry contribute to, or detract from this ideal?

Monopolies and oligopolies, in effect at least, abound within the search industry. In July 2010, Google domains accounted for more than 90% of the UK search market (Experian Hitwise Data Center, 2010) – an effective monopoly unrivalled in virtually any other industry. But Google has a major influence upon the information we consume due to search behaviour too. Research on AOL server logs has found that 42% of users click on the first placed result in search engine results, while only 12% click on the second placed result and 9% on the third. The first ten results receive 90% of all click-through traffic, and the second page only just over 4% (Enge et al., 2010). People do not have the time or resources to check every result, and as the internet grows we are becoming more dependent upon Google as an arbiter of relevance (as a proxy for truth) in our lives.

Google’s corporate literature used to describe the PageRank algorithm as ‘uniquely democratic’; a claim which has led some in academe to question just what kind of democratic system the company have developed. In a theoretical consideration of search indexing method, Introna and Nissenbaum (2000) argued that far from being an egalitarian domain, a small number of elite online sources dominate on the web. Elsewhere, those concerned with deliberative democracy have argued that Google’s PageRank...

Cover
Title Page
Copyright
Contents
Preface
Acknowledgements
Introduction
1. Search in theory
2. Search in practice
3. The invisible web
4. Social media theory
5. Social networks and newsgathering
6. Multimedia
7. Bringing it all together: developing an online beat
8. Verifying online sources
Index