Crowdsourcing | THATCamp College Art Association (CAA) 2014

Slides and links from the February 11th 2014 THATCamp CAA session on applying computer vision techniques to art history research, facilitated by John Resig.

Learn More About Computer Vision

Session Slides: Applying Computer Vision to Art History
www.slideshare.net/jeresig/thatcamp-cv
Paper: Using Computer Vision to Increase the Research Potential of Photo Archives
ejohn.org/research/computer-vision-photo-archives/
Syllabus: Brown Computer Vision Course
cs.brown.edu/courses/csci1430/

Computer Vision Software and Libraries

Optical Character Recognition: Tesseract (Open Source, Unsupervised)
code.google.com/p/tesseract-ocr/
Face Matching: OpenBiometrics (Open Source, Supervised)
openbiometrics.org/
Image Similarity: imgSeek (Open Source, Unsupervised)
www.imgseek.net/
Image Similarity: TinEye’s MatchEngine (Commercial, Unsupervised)
services.tineye.com/MatchEngine
Image Categorization: Ersatz (Commercial, Minimally Supervised)
ersatz1.com/
General Computer Vision: libCCV (Open Source, Supervised, Requires Coding)
libccv.org/
General Computer Vision: OpenCV (Open Source, Supervised, Requires Coding)
opencv.org/

Ian McDermott’s original proposal:

I’d like to propose a General Discussion/Working Session hybrid about the D. James Dee Photo Archive, approx. 250,000 transparencies, slides, and negatives documenting contemporary art in NYC (particularly Soho galleries) from the late 1970s – present. Artstor acquired the archive this summer and is in the process of figuring out how to digitize it and, more importantly, catalog it. The collection isn’t cataloged and the slides aren’t labeled so any effort to effectively describe it will be a collective effort. I’m curious to hear what people think about crowd sourcing, tagging, and any other ideas. The BBC’s Your Paintings project is one example of a successful tagging project but what about extensive crowd sourced cataloging, how much metadata is needed before images are released, is it best to open the cataloging to everyone or a select group?

Existing projects resources:

Tagasauris
Metadata Games – NEH-funded project up at Dartmouth
NEH details and white paper for Metadata Games
National Archives Citizen Archivist tagging
Zooniverse – science crowdsourcing but also starting to do some humanities
New York Public Library Labs does a lot of crowdsourcing
DuoLingo – language teaching tool that is also crowdsourcing translation
Article on “rules” of crowdsourcing: t.co/FOxmZLssvC
Flickr Commons
Steve Museum – www.steve.museum/
A write-up of 36 crowd-sourcing projects: t.co/6W0ashpOe9
Dutch company, Picturae — best known for mass digitization, but also software development

A common theme as people introduce themselves is wanting to get *good* tags in addition to tags at all — possibly using controlled vocabularies.

Ian asks whether people do know of available tools to use — there are problems with using vendors, and there are other problems with “rolling your own” platform. One participant records Artsy’s experience using Mechanical Turk: it took a developer a couple hours to sync the database with Amazon’s, and thereafter it cost about 1 cent per image even with having about 5 people tag each one. Concerns, though, with labor ethics and with image rights.

The Carnegie Mellon program had a Teeny Harris program to get people to identify who’s in the photo.

John Resig brings up a case where a lot of crowdsourced work that had happened over the course of years was replaced in an afternoon by an advanced “computer vision” technique that helped identify things in photos. General point: before you turn to crowdsourcing, talk to advanced computer scientists to make sure that there’s not a computational technique.

Participant wonders what information would be most needed: gallery, creator, year, people, etcetera.

Amanda brings up LibraryThing’s Legacy Libraries and suggests having a “barn-raising” — an event to engage the community as well as to get some items tagged or cataloged. Ian agrees it can be a terrific jumpstart in particular. Participant raises the issue of how you reach people who “aren’t on the Internet all the time.” John Resig also raises a concern about just expecting people to do all the work: important to “chunk” the work so that it’s doable. At the same time, there are many people who do care passionately about particular items or topics. Participant raises the topic of errors in crowdsourcing: Ian mentions that many projects will only accept data once it has been verified by multiple people. Participant brings up the example of the Steve.Museum, where the curation had to happen after all the tagging. John Resig talks about how often it takes thousands of cases in order to train computer software, so unless your set has thousands and thousands of items, in some cases you might as well just do the work manually yourself, or crowdsource it.

Participant brings up search by image — how does it work? John is going to talk about some of that in the next session.

THATCamp College Art Association (CAA) 2014

See you in Chicago in spring 2014!

Category Archives: Crowdsourcing

Applying Computer Vision to Art History

Notes from crowdsourcing, tagging, collective cataloging project

Collaborative doc for Digital Publishing lightening talk