{"id":272,"date":"2008-07-11T10:49:16","date_gmt":"2008-07-11T17:49:16","guid":{"rendered":"http:\/\/www.spreadingscience.com\/?p=272"},"modified":"2008-07-11T11:03:05","modified_gmt":"2008-07-11T18:03:05","slug":"browsing-clouds-not-papers","status":"publish","type":"post","link":"https:\/\/www.spreadingscience.com\/2008\/07\/11\/browsing-clouds-not-papers\/","title":{"rendered":"Browsing clouds, not papers"},"content":{"rendered":"

Commentary: Summarizing papers as word clouds<\/a>:
\n[Via
Buried Treasure<\/a>]<\/p>\n

<\/div>\n
The web provides entirely new avenues for decimating information and for visualizing it. It can be very time consuming to browse throught the literature, even though the most creative research often comes from the intervention of <\/em>Serendipity<\/a><\/em> (the Wikipedia article lists many examples).<\/p>\n

Lars discusses some interesting numbers and comes up with an intriguing solution.<\/em><\/div>\n

For use in presentations on literature mining, I did a back-of-the-envelope calculation of how much time I would be able to spend on each new biomedical paper that is published. Assuming that all papers were indexed in PubMed (which they are not) and that I could read papers 24 hours per day all year around (which I cannot), the result is that I could allocate approximately 50 seconds per paper. This nicely illustrates the point that no one can keep up with the complete biomedical literature.<\/p>\n

When I discovered Wordle<\/a>, which can turn any text into a beautiful word cloud, I thus wondered if this visualization method would be useful for summarizing a complete paper as a single figure. To test this, I extracted the complete text of three papers that I coauthored in the NAR database issue 2008. Submitting these to Wordle resulted in the three figures below (click for larger versions):
\n
\"\"<\/a>
\n
\"\"<\/a>
\n
\"\"<\/a><\/p><\/blockquote>\n

<\/div>\n
These sorts of rich figures could be very useful in a scientific setting, where being able to rapidly filter a large number of articles is important.<\/p>\n

However, he does notice that this approach may not work for all articles, unless there are changes made, either in how the articles are written or in the software that creates the visuals.<\/em><\/div>\n

…I think a large part of the problem is the splitting of multiwords; for example, “cell cycle” becomes two separate terms “cell” and “cycle”. Another problem is that words from different sections of the paper are mixed, which blurs the messages. These two issues could be solved by 1) detecting multiwords and considering them as single tokens, and 2) sorting the terms according to where in the paper they are mainly used.<\/p><\/blockquote>\n

And it would be easy to adapt the visuals to scientific needs and then be able to track if they are actually useful in practice.<\/em><\/div>\n

<\/p>\n

Technorati Tags: Social media<\/a>, Web 2.0<\/a><\/p>\n

<\/p>\n","protected":false},"excerpt":{"rendered":"

Commentary: Summarizing papers as word clouds: [Via Buried Treasure] The web provides entirely new avenues for decimating information and for visualizing it. It can be very time consuming to browse throught the literature, even though the most creative research often comes from the intervention of Serendipity (the Wikipedia article lists many examples). Lars discusses some … Continue reading Browsing clouds, not papers<\/span> →<\/span><\/a><\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[3,4],"tags":[],"class_list":["post-272","post","type-post","status-publish","format-standard","hentry","category-science","category-web-20"],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pe2yp-4o","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":332,"url":"https:\/\/www.spreadingscience.com\/2008\/08\/19\/an-interesting-start\/","url_meta":{"origin":272,"position":0},"title":"An interesting start","date":"August 19, 2008","format":false,"excerpt":"by mandj98 Mendeley = Mekentosj Papers + Web 2.0 ?: [Via bioCS] Via Ricardo Vidal: Mendeley seems to be a Windows (plus Mac\/Linux) equivalent of Mekentosj Papers (which is Mac OS X only, and has been described as \"iTunes for your papers\"). In addition to handling your PDFs, it has\u2026","rel":"","context":"In "Open Access"","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.spreadingscience.com\/wp-content\/uploads\/2008\/08\/mandaly.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":330,"url":"https:\/\/www.spreadingscience.com\/2008\/08\/19\/missing-the-point\/","url_meta":{"origin":272,"position":1},"title":"Missing the point?","date":"August 19, 2008","format":false,"excerpt":"by sylvar It has been about a month since Science published Electronic Publication and the Narrowing of Science and Scholarship by James Evans. I've waited some time to comment because the results were somewhat nonintuitive, leading to some deeper thinking. The results seem to indicate that greater access to online\u2026","rel":"","context":"In "General"","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.spreadingscience.com\/wp-content\/uploads\/2008\/08\/pendulum.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":599,"url":"https:\/\/www.spreadingscience.com\/2009\/11\/02\/short-answers-to-simple-questions\/","url_meta":{"origin":272,"position":2},"title":"Updated: Short answers to simple questions","date":"November 2, 2009","format":false,"excerpt":"by Nima BadieyNIH Funds a Social Network for Scientists \u2014 Is It Likely to Succeed? [Via The Scholarly Kitchen] The NIH spends $12.2 million funding a social network for scientists. Is this any more likely to succeed than all the other recent failures? [More] Fuller discussion: In order to find\u2026","rel":"","context":"In "Knowledge Creation"","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":269,"url":"https:\/\/www.spreadingscience.com\/2008\/07\/08\/now-we-have-article-20\/","url_meta":{"origin":272,"position":3},"title":"Now we have article 2.0","date":"July 8, 2008","format":false,"excerpt":"by luisvilla* I will participate in the Elsevier Article 2.0 Contest: [Via Gobbledygook] We have been talking a lot about Web 2.0 approaches for scientific papers. Now Elsevier announced an Article 2.0 Contest: Demonstrate your best ideas for how scientific research articles should be presented on the web and compete\u2026","rel":"","context":"In "Open Access"","img":{"alt_text":"","src":"https:\/\/i0.wp.com\/www.spreadingscience.com\/wp-content\/uploads\/2008\/07\/ruby0nrails.jpg?resize=350%2C200","width":350,"height":200},"classes":[]},{"id":672,"url":"https:\/\/www.spreadingscience.com\/2010\/08\/10\/all-part-of-the-great-cycle-of-knowledge\/","url_meta":{"origin":272,"position":4},"title":"All part of the great cycle of knowledge","date":"August 10, 2010","format":false,"excerpt":"by Peter KaminskiOpen access saves $1B [Via Naturally Selected] A new analysis suggests that making papers open access would pump $1 billion into the U.S. economy over the next few decades. That\u2019s about five times the amount it costs to archive the papers, according to ScienceInsider. The economic analysis, about\u2026","rel":"","context":"In "Open Access"","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":479,"url":"https:\/\/www.spreadingscience.com\/2009\/01\/15\/broken-filters\/","url_meta":{"origin":272,"position":5},"title":"Broken filters?","date":"January 15, 2009","format":false,"excerpt":"by mrpattersonsir Information overload is NOT filture failure: This has been bothering me for a while now, dating back to last year, when I first heard Clay Shirky's very pithy statement that information overload isn't a real problem, the real problem is a failure to build effective filters. It's a\u2026","rel":"","context":"In "Open Access"","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/posts\/272"}],"collection":[{"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/comments?post=272"}],"version-history":[{"count":0,"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/posts\/272\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/media?parent=272"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/categories?post=272"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.spreadingscience.com\/wp-json\/wp\/v2\/tags?post=272"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}