CIL2008: Text Mining and Visualization of Open Sources

Text Mining and Visualization of Open Sources
Patrice Slert
We’re talking about structured data from open sources (Web of Science, Dialog, Silobreaker, the Internet), not necessarily free sources. This is in contrast to intelligence data, where a lot of the technologies have applications, as well.
Visualization can mislead you in terms of cause and effect. It can also lead to false similarities (such as New England and England being presented as the same place).
Open Source Information (OSI) is growing. Intelligence community is recognizing the value of librarians in searching the open source information space.
ISI Web of Knowledge includes visualization and text mining capabilities. However, limited to databases provided through ISI. To mix and match with data available through other vendors, need to use other products, such as VantagePoint. VantagePoint allows you to create filters for importing data from various sources.
SiloBreaker — a news analysis tool, commercially available. It lets you mine for information via word searches, visual searches, people, organizations, industries — ways of pulling together relationships among these facets. It pulls out networks of people, as reported in news reports. It’s provides a way to look at the news and see who is appearing in news articles about the subject. You can expand your search — or refocus it — by diving deeper into related people, organizations, companies, etc.

CIL2008: Library Web Presence

Widgets at Penn State

Ellysa Stern CahoyEmily Rimland. Facebook application for the library. Led them to think about simple pages. Build “Research JumpStart” aimed at beginning users. Uses widgets — little bits of content taken from their source and dumped into another page.
Widgets provide easy access to popular, most valuable resources. Once you have widgets, you can place them in other environments (iGoogle, for example). Widgets help you compartmentalize your information and provide just what’s needed, when it’s needed.
Widgets on JumpStart page: 1) Catalog search. 2) ProQuest 3) Research guides for specific courses/subjects — just the guides that are most used by undergraduates. 4) Chat widget (they use AIM).
iGoogle widgets have proved very popular. Faculty and students have liked taking the search tools and RSS feeds and creating a personalized page.
Binky Lush. How these were developed. Uses WidgetBox. Provide widgets for all sorts of services (iGoogle, PageFlakes, social networking sites, etc.). Provides code for your own site, to include in a blog, etc. All of Penn State’s widgets are hosted by Widget Box. This is the “get widget” chicklet that appears on the JumpStart page in each gadget. This gives you a window with options for all sorts of places the widget can be embedded — code customized for each site — or raw HTML.

I wonder about whether this makes sense; to host this sort of content on an external site. What are advantages? There’s obvious ease of creating the widget, but shouldn’t core services be hosted locally?

WidgetBox lets you create a Facebook widget, but doesn’t fully take advantage of Facebook’s social graph — so PennState is developing their own.

LibraryGuides at Temple

DerekDerik Badman and Kristina DeVoe
Original subject guides were static pages, long lists of annotated links. There were based on Contribute, which was not easy to use, according to DerekDerik. No functionality other than what was on the page.
Brought in LibGuides in spring 2007. Had a semester to migrate all 90+ guides into LibGuides. Was fairly easy to do. Creating and maintaining guides easy. Also very flexible. Content of guide can be organized by resource type (like always), but also by any other categories library wants — time period, topic, etc. Units of class, paper topics, anything that’s needed. And that librarian has time for.
Content is modular. Easy to take a content block from one guide to another. Easy to share.
Users can find guides by subject, by tags, by “featured resources”, by recently updated, by ratings. Users can comment on guides — either on guide as a whole or on a section. Allows community building to start. LibGuides also has a polls feature — about the guide, or anything else.
They’ve added widgets (chat, calendar, etc.) as well as direct search boxes so that users can search directly in featured resources without having to first go to a page and then search. Similarly, tailored federated search. Pull in RSS feeds from various sources — for example, table of contents for specific journals or news.
Have used for course guides — a guide not just for a subject, but for a particular course. Resources are targeted to specific classes and contain resources that are relevant at that point in the semester.
Usage… Usage has gone up significantly (static guides vs. dynamic guides).
Marketing is important. Students need to know the new guides exist, that they are better than the old.
What else can LibGuides be used for? Ideas… 1) Information literacy. For example, adding descriptions of “primary sources” to the Temple history guide. 2) Co-opt faculty; invite them to get involved and become partners in creating the resources, tailored for their needs.
Question: What are privacy implications of using a service like widgetbox or libguides?
Answer: LibGuides doesn’t save any data. No user accounts are created. It is hosted at LibGuides. Widgetbox… Widgetbox is similar, but not clear how much data is stored.
Question: How easy is it integrate guides into local web site?
Answer: We don’t know yet. Redirected old URLs to new. But since LibGuides is hosted, it’s not on the same server.
Question: Are other sites embedding PSU’s widgets in their sites?
Answer: We don’t know — don’t have that level of detail as to where it gets embedded.
Update 4:30 PM 7 April Corrected name of first PSU speaker and corrected link. Update 11:20 PM 7 April Corrected Derik’s name. Not my day for getting names right.

CIL 2008: Mobile Search

Megan Fox and Gary Price
Slides and more will be available at web.simmons.edu/~fox/mobile/

Mobile Market

3.3 billion mobile phones. 46 million wireless subscribers used mobile search (mostly through text, not web browsers, on the phone).
iPhone users responsible for 50 times the traffic in mobile search. 85% of iPhone users accessed news and information on their phone (compared to 58% of other wireless users). Most searches are simple, single words (hard to enter text on a mobile device). Gary thinks that next year voice search will be the new thing — you say your query, you get results by text or email.
Some search tools are carrier-specific; some are phone-specific.
People who search from mobile devices are generally looking for “ready reference” information (facts, figures, stock prices, weather etc.). Rarely in-depth research. Search engines have mobile search interfaces, aimed at handheld devices. THey assume that mobile user wants facts, information. And that users don’t want to type much. Searches are aggregated across silos otherwise provided to web users (so news, images, sites, etc., are listed on one page, not on several). This trend — “one search” for Yahoo, “universal search” for Google — is on the rise in web searching, too.
How to delivery high-bandwidth content to mobile devices with different capabilities, and with providers that allow different traffic, is a challenge.
Yahoo’s mobile search has ‘snippets’ — stripped down ‘widgets’ — that give you a preview of web content you frequently access.
Google indicates pages tailored for mobile devices with a tiny green icon. There are sites that transcode — convert for mobile use — regular web pages to mobile pages. They work differently, though; some handle different kinds of content better than others.
Live Search — Live Mobile. Makes assumptions about your future searches based on past use. Also uses personal search histories; things you’ve searched before are remembered and influence future searches.
4info — lets you search by text.
Alerts — services will watch news (sports, etc.) for certain thresholds, and will send you a text or email alert when something happens (a score is close in the 7th inning, etc.)
Medio — Working on a “predictionary” — predicts the words you are going to finish typing, based on words you’ve typed in the past. Does on the mobile device what your browser does in remembering past search queries.
Lots of mobile meta-search/federated search tools. MCN, obovo.com, upsnap are up-and-coming players in this market.
Using your phone’s camera to take a picture of something, send it to omoby, mobot, or snapnow, it sends back a search response based on the photo. Also new 2D barcodes — take a picture, it prompts your phone to pull down a URL, send a text message, etc. These are much more common in Europe/Asia.
chacha — call 1-800-2chacha or text “chacha”, say your question, get an answer by text. Humans do the answering. They provide an answer and a source URL. Not clear who is doing research (probably not librarians!)
Location based search — based on where your phone says it is, gives you localized search results.
Location based search — actually, more like a directory. you say where you are, it offers you categories that you can look through. The return of Gopher!
Clusty, a search clustering engine, works well in mobile environment. Brings back results by kind (a search for “apple” offers company, fruit, etc., categories as a filter.
Behavioral targeting on mobile devices is coming. Real estate is small, importance of what gets sent there is critical. Making sure that the right content gets to the mobile device is important.
Spinvox — Listens to your phone calls, sends you information on topics you discuss. Can also update your blog from dictation.
Searchme uses a presentation of search results like “cover-view” in iTunes or iPhone. Results pages are presented in thumbnail view that you can flip through one at a time.
A directory of hundreds of search tools (available for the next two weeks, go to mlvb.net and log in with rubble888 and cil2008.

CIL2008: Going Local in the Library

CIL2008: Going Local in the Library, Charles Lyon (SUNY Buffalo)

What is local web

The web viewed through a lens of where you are. Not just spatial, but lots of other information you need. Which stores are open now? Which are in good neighborhoods? Which can handle my particular needs? Doing local information is hard; it’s very individualized.
Google does this better than anyone. Search results are customized to where you are. But don’t include the really useful information a true local could give you.
Google spends a lot of effort on this — so libraries should do, too. Google is the bellwether.
So what is the local web? Some pieces:

  1. local search engine
  2. maps
  3. local media
  4. local photos/data/video/blogs
  5. local social networks
  6. local people –this is the most important part.

The local web is social. It’s user-generated, participatory, amateur, civic, grassroots, citizen’s journalism. It’s by and from the community it serves.
It’s localized — about neighborhoods, communities, blocks, streets, buildings. Not just geographical areas, but about “imagined communities” — people who seem themselves as part of a small unit.
Local web is joining the real world and the virtual world. Interconnection between the two. It brings the placeless infosphere — the cloud — down to wherever you are. It reverses the “antisocialization” that was feared in the early days of the Internet.
Local web brings a sense of place to the Internet. It’s becoming big business — lots of companies competing in this space.

What do libraries bring to local web

Information, local information (events, community directories, guides to local events and communities.
What can libraries do that extends this?
Everyday life is still local. The internet is getting more local. Web 2.0 has many local applications. Libraries are community-focuses institutions. Libraries have experience with local information… There is an opportunity for libraries to become even more local-focused in the web environment.
Strategies: become expert users of local resources. Raise awareness and assist the community in using online local resources. Broaden the scope of local data collection. Become active participants in community-focused resources. And create locally-focused content.

Examples of local 2.0

Local search: Enhance their own listings in local search engines; advertise (no cost!) in the local search engine. Create your own search engine — that only searches the sites you specify. Libraries can build a search tool that only includes the stuff that you feel is relevant to your clientele.
Local blogs: placeblogs, metroblogs, neighblogs. Create a local blog directory. And once you’ve found them, add them to your local search engine. And libraries can blog themselves — not about the library, per se, but about the community it serves. Whether broadly or narrowly focused, you can take advantage of library’s knowledge (or librarian’s knowledge).
Local News: News refocused on local geography — the news that happens close to you. They’re blog-like: people can comment on news articles, set up profiles, learn about neighbors.
Locally-focused online communities (Skokie Talk, MyHamilton.ca). Wikis focused on local area, open to contribution by community.
Local data: HelloMetro.com and EveryBlock.com (San Francisco, Chicago, NYC only). News for your neighborhood at block level. Building permits, restaurant inspections, graffiti, all sorts of things that are important to the neighborhood. Much of this is already available — but not aggregated by address. Everyblock is grant-funded and will open-source their code at the conclusion of the project.
Local Photos: Geotagging is geographic metadata to online information. As simple as a zip code, as complex as latitude-longitude. Geotagging makes it easy to find things. Flickr is leading drive for this in photos. Libraries can aggregate local photos.
Maps: It’s easy to create a custom map.

Why libraries are primed for local?

Local is cheap. Using free services. Guidespot, ineighbors. Local sites generally don’t generate revenue — they’re labors of love. Perfect for libraries. Also, it’s not too late — there’s no winner in the local web. There are lots of kinds of local data that aren’t web accessible yet. Much of local data is not easily automated; still requires people to determine relevance to the locale. Helps build good will.
This can be applicable to academic libraries, too — local as the campus, not just the community.

CIL2008: Keynote on “Libraries Solve Problems”

I’m attending Computers in Libraries 2008 and will be blogging many of the sessions I attend… I’ll post my (mostly unedited) notes. If you’re at CIL, look me up!
Presented by Lee Rainie, Director of Pew Internet & American Life Project.
Blogging is about information and communication. This is what makes the Internet so wonderful. That’s what the era of user-generated content is all about.
Information was scarce, expensive, and institutionally oriented. now, it’s abundant, cheap, and personally oriented.
In 2000; 46% of adults used the internet; 73% of teenagers. 5% had broadband at home. 50% owned a cell phone. Nobody connected wirelessly. Phone line ruled.
2008: 75% of adults, 93% of teens use internet. 54% have broadband at home. 78% own cell phone. 62% connect wirelessly (42% by wireless, 59% use cell phones over data networks — overlap is 62%). Cell phone users tend to be minorities, less well educated — reverses digital divide fears. Wireless connectivity is determinant of Internet behavior. Results in resurgence of email — on a cell phone, email matters a lot. News becomes more important, too – broadly defined. Fast and mobile connections rule.
The home media ecology is immensely complex. Data moves from this to that (TiVo to computer, cell phone to cable box, etc.). Internet becomes “cloud” — it’s where important stuff is stored. The Internet is the computer and storage device. This has huge, not yet understood, implications.
Content creation — 62% young adult users have uploaded photos to the internet. 34% of all users have done this. It’s an obligation of sorts to photo-document their lives. Pictures are currency of community building and communication.
58% have created a profile on social networks (33% of adults) on MySpace, Facebook, etc. 39% of online teens (13% of online adults) share and create content online.
A quarter of online teens help others get their stuff online.
33% of online college students keep blogs. 54% of online college students read blogs. 12% of online adults have blogs; 35% read them. This gets hard to measure because blogging is baked into all sorts of tools. Reading blogs even more so; what’s a blog? What do people recognize as a blog?
19% of online young adults have created an avatar that interacts with others. 6% of online adults do this.
New research on libraries in the information ecosystem. Original question was from GPO — how do people want government documents (online, print, mail, etc.)? Survey grew to be much broader: How do people get information to help them solve problems that could have a government connection or be aided by government resources?
Asked about 10 broad areas: health, schooling, taxes, jobs, Medicare, Social Security, voter registration, local government, legal actions, immigration. About 80% of respondents had been through at least one of these problem classes and needed information. This makes about 169 million adults. Survey asked where they found information? Libraries included in possible responses. 53% of adults had been to a local library in the past year. Gen Y (age 18-30) — 62%. Gen X (31-42): 59%. Trailing Boomers (43-52) 57%. Leading boomers (53-61): 46%; Matures (62-71) 42%. After Work (72+) 32%. Youngest cohort had the highest use of libraries. Teen use of libraries: 60% of online teens use the internet at libraries, up from 36% in 2000. Youth use libraries, contrary to expectations.
Those who use libraries are more likely to come from higher-income households. More likely to Internet users. More likely to have broadband at home. Parents with minor children at home more likely. Libraries matter more in the Internet age, not less (as previous expectations were). Internet users are more active in information gathering and usage than non-users. No real difference in patronage based on race or ethnicity.
How people solve problems? What sources did you use when you confronted the most recent problem you faced? 58% used Internet overall. 53% turned to professionals, then other sources. However, young adults (18-29) 21%. Blacks 26%; Latinos (22%). Younger people relied on libraries, as did minorities and lower income users.
Most popular problem-solving searches at libraries: schooling/education, finding ways to pay. Then jobs, serious illness, taxes, medicare/medicaid.
Once people are at the library… 69% got help from staff. 68% used computers (38% got technical assistance). 58% sought reference materials. People and resources matter. Libraries are social learning experience.
Future intentions: Would you go back to the library for a future problem? Overall, about 29% were somewhat likely or more. But — less well off (40%); Gen Y (41%), less educated (41%), Latinos (42%), Blacks (48%).
Why are youth so library-centric? Lee’s hypothesis: they have the most recent experience with libraries (through school assignments). Based on recent experience, they are more aware of how libraries have changed, more than other age groups. They know libraries can help.
Takeaways and Implications
Public education efforts about what libraries do and how we have changed are likely to pay off. Focus on success stories and competence. The people who know us best are the ones who keep coming back.
Patrons are happy and zealous advocates. Encourage your patrons to evangelize on your behalf. Give them Web 2.0 tools and, if needed, training to use them. They are eager to give you feedback.
Your “un-patrons” are primed to think of libraries. Need to let them know what you offer: tools available, training, mentoring skills, comfortable environment.
This is the era of social networks. People rely more now on social networks than ever before. They are for learning, news/navigation, support and problem solving. This last point is very important. Libraries can have a huge role in this. How can library be a node in social network.
Virtual communities are becoming more person-centric. Not created by a “publisher”, but ad hoc built around your friends and people you trust.

Tagging and Taggers

A recent research paper, “Can Social Bookmarking Improve Web Search?” by Paul Heymann, Georgia Koutrika, and Hector Garcia-Molina, draws numerous interesting conclusions about the effect of taggers and tagging on findability. The authors used del.icio.us as the source for tags.
Several of the results they found:

  • “Tags are present in the page text of 50% of the pages they annotate and in the titles of 16% of the pages they annotate” (p. 8). It seems that taggers are not particularly original in their tagging.
  • “Pages posted to del.icio.us are often recently modified” (p. 4) and “approximately 25% of URLs posted by users [of del.icio.us] are new, unindexed pages” (p. 5). By monitoring tags of interest to you, you can find out what’s new more effectively than you can by setting up standard search queries.

Their closing section, in which they discuss how tagging could be improved in the long run, bears quoting at length:

In terms of tags, we believe that user interface features could have a large impact on improving the quality of tags for search. For instance, interfaces that recommended tags not in the page, or not common for the given domain, might help alleviate those two problems. Another approach might be to have domain-specific sites (e.g., photography) which might have higher quality tags due to the shared context of the users.

Code4Lib Journal’s Second Issue Now Available

The second issue of the Code4Lib Journal is now available. Here is the table of contents:

Take a look — this issue is filled with excellent articles. (Disclaimer: I’m on the editorial committee for the journal.)

RSS… On Your TiVo

According to a recent press release,

TiVo users can subscribe to and watch a broad range of video content available through Real Simple Syndication (“RSS”) feeds, including everything from network nightly newscasts and The Sesame Street Podcast to Daily Headlines from MTV News and College Humor from CHTV.”

The catch — there’s always a catch — is that you need Tivo Desktop Plus on your Windows PC (sorry, Mac users) to get the material from your PC to your TiVo. So I won’t be testing it any time soon. This marks another move forward in getting Internet podcast content onto the family room TV, though it’s not as easy as it could be.

EUFeeds — 300 European Newspapers by RSS

EUFeeds is a special-purpose RSS aggregator for European newspapers that provides access to more than 300 papers from the European Union. Provided by the European Journalism Centre in the Netherlands, this site lets you quickly browse the print media from each EU member nation.
The site defaults to UK newspapers; there is no apparent way to set a different country as your default entry page. It also does not provide an RSS feed for the aggregated content — so you cannot subscribe to the aggregated Czech Republic news, only visit it on a web page.

New Tagging Tool at University of Michigan Library

I’d like to talk about a tagging project we just launched at my workplace. MTagger is a social bookmarking tool that we’ve integrated into several University of Michigan library resources. A tag cloud now appears:

Like del.icio.us and many other social bookmarking tools available on the Internet, MTagger allows users to bookmark and tag web pages using language that makes sense to them. Anyone can see tag clouds on pages and search MTagger; only users with valid U-M network logins can apply tags. (Individuals can, of course, opt out of sharing their tags with others if they choose.)
Unlike these other tools, MTagger offers the concept of “Collections” — letting users restrict their searches for similarly tagged items to a specific collection (library catalog records, images, web pages, etc.). While tags themselves would allow people to serendipitously find items in other collections, the “Collections” metaphor will, we expect, help drive home that the library offers more than books, electronic journals, and databases.
More important than the tagging functionality itself is what MTagger will allow our faculty, staff, and students to do. MTagger brings a social component to research that we have not previously had. It will allow users to share knowledge about library resources with each other, to enable quick-and-dirty subject guides to be produced, and — we hope — to bring researchers together via their individual tag clouds. As research moves online, chance meetings in the stacks of researchers with overlapping interests become even more rare. Through tagging, we hope to be able to recreate some of those synergistic interactions as one researcher finds a tag of interest, and through that, the other researcher.
Oh, and just to keep this in the realm of libraries and RSS, anything that can be searched within MTagger can be accessed via an RSS feed.