Install Steam
login
|
language
简体中文 (Simplified Chinese)
繁體中文 (Traditional Chinese)
日本語 (Japanese)
한국어 (Korean)
ไทย (Thai)
Български (Bulgarian)
Čeština (Czech)
Dansk (Danish)
Deutsch (German)
Español - España (Spanish - Spain)
Español - Latinoamérica (Spanish - Latin America)
Ελληνικά (Greek)
Français (French)
Italiano (Italian)
Bahasa Indonesia (Indonesian)
Magyar (Hungarian)
Nederlands (Dutch)
Norsk (Norwegian)
Polski (Polish)
Português (Portuguese - Portugal)
Português - Brasil (Portuguese - Brazil)
Română (Romanian)
Русский (Russian)
Suomi (Finnish)
Svenska (Swedish)
Türkçe (Turkish)
Tiếng Việt (Vietnamese)
Українська (Ukrainian)
Report a translation problem

Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell
(Submitted on 17 Nov 2014 (v1), last revised 31 May 2016 (this version, v4))
Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:1411.4389 [cs.CV]
(or arXiv:1411.4389v4 [cs.CV] for this version)
Submission history
From: Jeff Donahue [view email]
[v1] Mon, 17 Nov 2014 08:25:17 GMT (577kb,D)
[v2] Tue, 18 Nov 2014 07:37:44 GMT (2700kb,D)
[v3] Tue, 17 Feb 2015 23:59:08 GMT (4430kb,D)
[v4] Tue, 31 May 2016 22:57:33 GMT (2346kb,D)
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
Link back to: arXiv, form interface, contact.