Syllabus query



Academic Year/course: 2020/21

8025 - Master in Intelligent Interactive Systems

32489 - Computational Semantics


Teaching Guide Information

Academic Course:
2020/21
Academic Center:
803 - Masters Centre of the Department of Translation and Language Sciences
Study:
8025 - Master in Intelligent Interactive Systems
Subject:
32489 - Computational Semantics
Credits:
5.0
Course:
1
Teaching languages:
Theory: Group 1: Pending
Teachers:
Gemma Boleda Torrent, Matthijs Westera
Teaching Period:
First Quarter
Schedule:

Presentation

This course provides the basics of how natural language meaning is modeled in Computational Linguistics / Natural Language Processing. We will analyse the relevant semantic phenomena and approaches to tackling them, with a special emphasis on data-driven methods based on distributional semantics and Machine Learning. Along the way, we will learn the basic methodology of Machine Learning and strengthen the student's skills in using computational and quantitative tools (in class, we will use Python and Python-based toolkits, such as NLTK).

Associated skills

  • Analytical skills (problem solving, data analysis, reasoning about semantic data).
  • Machine Learning methodology.
  • Basic programming (Python, NLTK).
  • Quantitative thinking in the domain of language.

Learning outcomes

The student will acquire:

  • a deeper understanding of semantics and how Computational Linguistics can contribute to its study;
  • knowledge of the basic methodology of Machine Learning, and associated basic skills to carry out Machine Learning experiments;
  • basic knowledge and skills on distributed approaches to meaning;
  • familiarity with quantitative and computational methods for semantic phenomena.

Prerequisites

Contents

  1. "You shall know a word by the company it keeps" (Firth, 1957): Distributional semantics.
    • Sparse and dense encodings.
    • Word similarity and relatedness.
    • The neural network version: Word2vec.
    • Beyond words: phrase meaning.
  2. Word senses and word similarity: Thesaurus-based approaches.
    • WordNet.
  3. Machine Learning for computational semantics.
    • Basic methodology.
    • ML Methods: Naive Bayes, (simple introduction to) deep learning.
    • Evaluation and error analysis.
  4. Beyond word meaning: Coreference resolution.
    • Reference and coreference.
    • Basic methods, tasks, and datasets.

Note: The contents of the course may vary depending on the students' interests. 

Teaching Methods

The class will be based on lectures, readings, practical exercises, and a project to be presented in class at the end of the course. Lectures will be primarily Q&A sessions about weekly readings; students are expected to submit 3 questions about each reading. Readings will be mostly material from the textbook, but based on student interest we can include research articles, too. Most practical exercises will be directed towards the final project that students will need to present. The project will be on a computational semantic task.

Evaluation

  • Practical exercises, project, presentation, essay: 90%.
  • Questions submitted and participation in class discussions (in class or online): 10%.

Bibliography and information resources

Bibliography:

  • Jurafsky, Daniel & Martin, James H. (2009), Speech and Language Processing: An Introduction to
    Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd edition. Prentice
    Hall. https://web.stanford.edu/~jurafsky/slp3

Recommended readings:

  • Word meaning:

Kilgarriff, Adam. I don't believe in word senses. Computers and the Humanities 31.2 (1997): 91-113.

Murphy, Gregory L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Note: Really great book I recommend to everybody. See especially Chapter 11.

  • Symbolic (formal semantics, DRT-based) system for the processing of free English text (not covered in the course):

Bos, Johan (2008). Wide-coverage semantic analysis with Boxer. Proceedings of the 2008 Conference on Semantics in Text Processing. Association for Computational Linguistics.

Bos, J., & Markert, K. (2005). Recognising textual entailment with logical inference. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (pp. 628-635). Association for Computational Linguistics.

  • Distributional semantics, general:

M. Baroni and A. Lenci. 2010. Distributional Memory: A general framework for corpus-based semantics. Computational Linguistics 36(4): 673-721.

Boleda, G. Distributional Semantics and Linguistic TheoryAnnual Review of Linguistics. Accepted for publication. Note: survey article.

Katrin Erk. Vector space models of word meaning and phrase meaning: a survey. Language and Linguistics Compass 6(10), 635-653, October 2012. Note: survey article.

Stephen Clark. 2015. Vector Space Models of Lexical Meaning. Handbook of Contemporary Semantic Theory — second edition, edited by Shalom Lappin and Chris Fox. Chapter 16, pp.493-522. Wiley-Blackwell. [PDF] (pre-copy editing). Note: survey article.

Alessandro Lenci. 2008. Distributional semantics in linguistic and cognitive research. Italian journal of linguistics, 20 (1), pp. 1-31.

  • Multimodal distributional semantics:

Bruni, E., G. Boleda, M. Baroni, N. K. Tran. 2012. Distributional semantics in technicolor. Proceedings of ACL 2012, pp. 136-145, Jeju Island, Korea.

Silberer, C and Lapata, M. 2013. Learning Grounded Meaning Representations with Autoencoders. Proceedings of ACL 2013.

M. Baroni. 2016. Grounding distributional semantics in the visual world. Language and Linguistics Compass 10(1): 3-13. Note: survey article.

  • Composition in distributional semantics:

M. Baroni. 2013. Composition in distributional semantics. Language and Linguistics Compass 7(10): 511-522. Note: survey article.

Jeff Mitchell and Mirella Lapata. 2008. Vector-based Models of Semantic Composition. In: ACL. 2008, pp. 236–244.

E. Vecchi, M. Marelli, R. Zamparelli and M. Baroni. 2017. Spicy adjectives and nominal donkeys: Capturing semantic deviance using compositionality in distributional spaces. Cognitive Science 41(1): 102-136.

Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122(3), 485–515. http://doi.org/10.1037/a0039267

Socher, R., Pennington, J., Huang, E.H., Ng, A.Y. and Manning, C.D. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the conference on empirical methods in natural language processing (pp. 151-161).

  • Building word vectors with neural networks:

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer.  Deep contextualized word representations. Proceedings of NAACL 2018.

Jeffrey Pennington, Richard Socher, Christopher Manning. 2014. Glove: Global vectors for word representation. Proceedings of EMNLP.

Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781v3.

M. Baroni, G. Dinu and G. Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proceedings of ACL 2014 (52nd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 238-247.

Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 746–751). Atlanta, Georgia: Association for Computational Linguistics.

  • Current limits of distributional semantics / neural networks:

Bernardi, R., G. Boleda, R. Fernandez, D. Paperno. 2015. Distributional semantics in use. Proceedings of EMNLP 2015 Workshop LSDSem 2015: Linking Models of Lexical, Sentential and Discourse-level Semantics, 95-101. Lisbon, Portugal, September. Association for Computational Linguistics.

Paperno, D., G. Kruszewski, A. Lazaridou, Q. Ngoc, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, R. Fernandez. 2016. The LAMBADA dataset: Word prediction requiring a broad discourse contextProceedings of ACL 2016 (54th Annual Meeting of the Association for Computational Linguistics), 1525-1534, Berlin, Germany, August. Association for Computational Linguistics.

Boleda, G. and A. Herbelot. 2016. Formal Distributional Semantics: Introduction to the Special Issue. Computational Linguistics 42:4, 619-635.

  • General Machine Learning:

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460. http://m.mind.oxfordjournals.org/content/LIX/236/433.full.pdf, (if that fails: http://phil415.pbworks.com/f/TuringComputing.pdf)

Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78. http://doi.org/10.1145/2347736.2347755

Parloff, R. (2016). The AI Revolution: Why Deep Learning Is Suddenly Changing Your Life. Fortune Magazine. Note: well written, thorough popular science article about deep learning.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. http://doi.org/10.1038/nature14539.

 


Academic Year/course: 2020/21

8025 - Master in Intelligent Interactive Systems

32489 - Computational Semantics


Informació de la Guia Docent

Academic Course:
2020/21
Academic Center:
803 - Masters Centre of the Department of Translation and Language Sciences
Study:
8025 - Master in Intelligent Interactive Systems
Subject:
32489 - Computational Semantics
Credits:
5.0
Course:
1
Teaching languages:
Theory: Group 1: Pending
Teachers:
Gemma Boleda Torrent, Matthijs Westera
Teaching Period:
First Quarter
Schedule:

Presentation

This course provides the basics of how natural language meaning is modeled in Computational Linguistics / Natural Language Processing. We will analyse the relevant semantic phenomena and approaches to tackling them, with a special emphasis on data-driven methods based on distributional semantics and Machine Learning. Along the way, we will learn the basic methodology of Machine Learning and strengthen the student's skills in using computational and quantitative tools (in class, we will use Python and Python-based toolkits, such as NLTK).

Associated skills

  • Analytical skills (problem solving, data analysis, reasoning about semantic data).
  • Machine Learning methodology.
  • Basic programming (Python, NLTK).
  • Quantitative thinking in the domain of language.

Learning outcomes

The student will acquire:

  • a deeper understanding of semantics and how Computational Linguistics can contribute to its study;
  • knowledge of the basic methodology of Machine Learning, and associated basic skills to carry out Machine Learning experiments;
  • basic knowledge and skills on distributed approaches to meaning;
  • familiarity with quantitative and computational methods for semantic phenomena.

Prerequisites

Contents

  1. "You shall know a word by the company it keeps" (Firth, 1957): Distributional semantics.
    • Sparse and dense encodings.
    • Word similarity and relatedness.
    • The neural network version: Word2vec.
    • Beyond words: phrase meaning.
  2. Word senses and word similarity: Thesaurus-based approaches.
    • WordNet.
  3. Machine Learning for computational semantics.
    • Basic methodology.
    • ML Methods: Naive Bayes, (simple introduction to) deep learning.
    • Evaluation and error analysis.
  4. Beyond word meaning: Coreference resolution.
    • Reference and coreference.
    • Basic methods, tasks, and datasets.

Note: The contents of the course may vary depending on the students' interests. 

Teaching Methods

The class will be based on lectures, readings, practical exercises, and a project to be presented in class at the end of the course. Lectures will be primarily Q&A sessions about weekly readings; students are expected to submit 3 questions about each reading. Readings will be mostly material from the textbook, but based on student interest we can include research articles, too. Most practical exercises will be directed towards the final project that students will need to present. The project will be on a computational semantic task.

Evaluation

  • Practical exercises, project, presentation, essay: 90%.
  • Questions submitted and participation in class discussions (in class or online): 10%.

Bibliography and information resources

Bibliography:

  • Jurafsky, Daniel & Martin, James H. (2009), Speech and Language Processing: An Introduction to
    Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd edition. Prentice
    Hall. https://web.stanford.edu/~jurafsky/slp3

Recommended readings:

  • Word meaning:

Kilgarriff, Adam. I don't believe in word senses. Computers and the Humanities 31.2 (1997): 91-113.

Murphy, Gregory L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Note: Really great book I recommend to everybody. See especially Chapter 11.

  • Symbolic (formal semantics, DRT-based) system for the processing of free English text (not covered in the course):

Bos, Johan (2008). Wide-coverage semantic analysis with Boxer. Proceedings of the 2008 Conference on Semantics in Text Processing. Association for Computational Linguistics.

Bos, J., & Markert, K. (2005). Recognising textual entailment with logical inference. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (pp. 628-635). Association for Computational Linguistics.

  • Distributional semantics, general:

M. Baroni and A. Lenci. 2010. Distributional Memory: A general framework for corpus-based semantics. Computational Linguistics 36(4): 673-721.

Boleda, G. Distributional Semantics and Linguistic TheoryAnnual Review of Linguistics. Accepted for publication. Note: survey article.

Katrin Erk. Vector space models of word meaning and phrase meaning: a survey. Language and Linguistics Compass 6(10), 635-653, October 2012. Note: survey article.

Stephen Clark. 2015. Vector Space Models of Lexical Meaning. Handbook of Contemporary Semantic Theory — second edition, edited by Shalom Lappin and Chris Fox. Chapter 16, pp.493-522. Wiley-Blackwell. [PDF] (pre-copy editing). Note: survey article.

Alessandro Lenci. 2008. Distributional semantics in linguistic and cognitive research. Italian journal of linguistics, 20 (1), pp. 1-31.

  • Multimodal distributional semantics:

Bruni, E., G. Boleda, M. Baroni, N. K. Tran. 2012. Distributional semantics in technicolor. Proceedings of ACL 2012, pp. 136-145, Jeju Island, Korea.

Silberer, C and Lapata, M. 2013. Learning Grounded Meaning Representations with Autoencoders. Proceedings of ACL 2013.

M. Baroni. 2016. Grounding distributional semantics in the visual world. Language and Linguistics Compass 10(1): 3-13. Note: survey article.

  • Composition in distributional semantics:

M. Baroni. 2013. Composition in distributional semantics. Language and Linguistics Compass 7(10): 511-522. Note: survey article.

Jeff Mitchell and Mirella Lapata. 2008. Vector-based Models of Semantic Composition. In: ACL. 2008, pp. 236–244.

E. Vecchi, M. Marelli, R. Zamparelli and M. Baroni. 2017. Spicy adjectives and nominal donkeys: Capturing semantic deviance using compositionality in distributional spaces. Cognitive Science 41(1): 102-136.

Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122(3), 485–515. http://doi.org/10.1037/a0039267

Socher, R., Pennington, J., Huang, E.H., Ng, A.Y. and Manning, C.D. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the conference on empirical methods in natural language processing (pp. 151-161).

  • Building word vectors with neural networks:

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer.  Deep contextualized word representations. Proceedings of NAACL 2018.

Jeffrey Pennington, Richard Socher, Christopher Manning. 2014. Glove: Global vectors for word representation. Proceedings of EMNLP.

Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781v3.

M. Baroni, G. Dinu and G. Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proceedings of ACL 2014 (52nd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 238-247.

Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 746–751). Atlanta, Georgia: Association for Computational Linguistics.

  • Current limits of distributional semantics / neural networks:

Bernardi, R., G. Boleda, R. Fernandez, D. Paperno. 2015. Distributional semantics in use. Proceedings of EMNLP 2015 Workshop LSDSem 2015: Linking Models of Lexical, Sentential and Discourse-level Semantics, 95-101. Lisbon, Portugal, September. Association for Computational Linguistics.

Paperno, D., G. Kruszewski, A. Lazaridou, Q. Ngoc, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, R. Fernandez. 2016. The LAMBADA dataset: Word prediction requiring a broad discourse contextProceedings of ACL 2016 (54th Annual Meeting of the Association for Computational Linguistics), 1525-1534, Berlin, Germany, August. Association for Computational Linguistics.

Boleda, G. and A. Herbelot. 2016. Formal Distributional Semantics: Introduction to the Special Issue. Computational Linguistics 42:4, 619-635.

  • General Machine Learning:

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460. http://m.mind.oxfordjournals.org/content/LIX/236/433.full.pdf, (if that fails: http://phil415.pbworks.com/f/TuringComputing.pdf)

Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78. http://doi.org/10.1145/2347736.2347755

Parloff, R. (2016). The AI Revolution: Why Deep Learning Is Suddenly Changing Your Life. Fortune Magazine. Note: well written, thorough popular science article about deep learning.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. http://doi.org/10.1038/nature14539.

 


Academic Year/course: 2020/21

8025 - Master in Intelligent Interactive Systems

32489 - Computational Semantics


Información de la Guía Docente

Academic Course:
2020/21
Academic Center:
803 - Masters Centre of the Department of Translation and Language Sciences
Study:
8025 - Master in Intelligent Interactive Systems
Subject:
32489 - Computational Semantics
Credits:
5.0
Course:
1
Teaching languages:
Theory: Group 1: Pending
Teachers:
Gemma Boleda Torrent, Matthijs Westera
Teaching Period:
First Quarter
Schedule:

Presentation

This course provides the basics of how natural language meaning is modeled in Computational Linguistics / Natural Language Processing. We will analyse the relevant semantic phenomena and approaches to tackling them, with a special emphasis on data-driven methods based on distributional semantics and Machine Learning. Along the way, we will learn the basic methodology of Machine Learning and strengthen the student's skills in using computational and quantitative tools (in class, we will use Python and Python-based toolkits, such as NLTK).

Associated skills

  • Analytical skills (problem solving, data analysis, reasoning about semantic data).
  • Machine Learning methodology.
  • Basic programming (Python, NLTK).
  • Quantitative thinking in the domain of language.

Learning outcomes

The student will acquire:

  • a deeper understanding of semantics and how Computational Linguistics can contribute to its study;
  • knowledge of the basic methodology of Machine Learning, and associated basic skills to carry out Machine Learning experiments;
  • basic knowledge and skills on distributed approaches to meaning;
  • familiarity with quantitative and computational methods for semantic phenomena.

Prerequisites

Contents

  1. "You shall know a word by the company it keeps" (Firth, 1957): Distributional semantics.
    • Sparse and dense encodings.
    • Word similarity and relatedness.
    • The neural network version: Word2vec.
    • Beyond words: phrase meaning.
  2. Word senses and word similarity: Thesaurus-based approaches.
    • WordNet.
  3. Machine Learning for computational semantics.
    • Basic methodology.
    • ML Methods: Naive Bayes, (simple introduction to) deep learning.
    • Evaluation and error analysis.
  4. Beyond word meaning: Coreference resolution.
    • Reference and coreference.
    • Basic methods, tasks, and datasets.

Note: The contents of the course may vary depending on the students' interests. 

Teaching Methods

The class will be based on lectures, readings, practical exercises, and a project to be presented in class at the end of the course. Lectures will be primarily Q&A sessions about weekly readings; students are expected to submit 3 questions about each reading. Readings will be mostly material from the textbook, but based on student interest we can include research articles, too. Most practical exercises will be directed towards the final project that students will need to present. The project will be on a computational semantic task.

Evaluation

  • Practical exercises, project, presentation, essay: 90%.
  • Questions submitted and participation in class discussions (in class or online): 10%.

Bibliography and information resources

Bibliography:

  • Jurafsky, Daniel & Martin, James H. (2009), Speech and Language Processing: An Introduction to
    Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd edition. Prentice
    Hall. https://web.stanford.edu/~jurafsky/slp3

Recommended readings:

  • Word meaning:

Kilgarriff, Adam. I don't believe in word senses. Computers and the Humanities 31.2 (1997): 91-113.

Murphy, Gregory L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Note: Really great book I recommend to everybody. See especially Chapter 11.

  • Symbolic (formal semantics, DRT-based) system for the processing of free English text (not covered in the course):

Bos, Johan (2008). Wide-coverage semantic analysis with Boxer. Proceedings of the 2008 Conference on Semantics in Text Processing. Association for Computational Linguistics.

Bos, J., & Markert, K. (2005). Recognising textual entailment with logical inference. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (pp. 628-635). Association for Computational Linguistics.

  • Distributional semantics, general:

M. Baroni and A. Lenci. 2010. Distributional Memory: A general framework for corpus-based semantics. Computational Linguistics 36(4): 673-721.

Boleda, G. Distributional Semantics and Linguistic TheoryAnnual Review of Linguistics. Accepted for publication. Note: survey article.

Katrin Erk. Vector space models of word meaning and phrase meaning: a survey. Language and Linguistics Compass 6(10), 635-653, October 2012. Note: survey article.

Stephen Clark. 2015. Vector Space Models of Lexical Meaning. Handbook of Contemporary Semantic Theory — second edition, edited by Shalom Lappin and Chris Fox. Chapter 16, pp.493-522. Wiley-Blackwell. [PDF] (pre-copy editing). Note: survey article.

Alessandro Lenci. 2008. Distributional semantics in linguistic and cognitive research. Italian journal of linguistics, 20 (1), pp. 1-31.

  • Multimodal distributional semantics:

Bruni, E., G. Boleda, M. Baroni, N. K. Tran. 2012. Distributional semantics in technicolor. Proceedings of ACL 2012, pp. 136-145, Jeju Island, Korea.

Silberer, C and Lapata, M. 2013. Learning Grounded Meaning Representations with Autoencoders. Proceedings of ACL 2013.

M. Baroni. 2016. Grounding distributional semantics in the visual world. Language and Linguistics Compass 10(1): 3-13. Note: survey article.

  • Composition in distributional semantics:

M. Baroni. 2013. Composition in distributional semantics. Language and Linguistics Compass 7(10): 511-522. Note: survey article.

Jeff Mitchell and Mirella Lapata. 2008. Vector-based Models of Semantic Composition. In: ACL. 2008, pp. 236–244.

E. Vecchi, M. Marelli, R. Zamparelli and M. Baroni. 2017. Spicy adjectives and nominal donkeys: Capturing semantic deviance using compositionality in distributional spaces. Cognitive Science 41(1): 102-136.

Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122(3), 485–515. http://doi.org/10.1037/a0039267

Socher, R., Pennington, J., Huang, E.H., Ng, A.Y. and Manning, C.D. 2011. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the conference on empirical methods in natural language processing (pp. 151-161).

  • Building word vectors with neural networks:

Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer.  Deep contextualized word representations. Proceedings of NAACL 2018.

Jeffrey Pennington, Richard Socher, Christopher Manning. 2014. Glove: Global vectors for word representation. Proceedings of EMNLP.

Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781v3.

M. Baroni, G. Dinu and G. Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proceedings of ACL 2014 (52nd Annual Meeting of the Association for Computational Linguistics), East Stroudsburg PA: ACL, 238-247.

Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 746–751). Atlanta, Georgia: Association for Computational Linguistics.

  • Current limits of distributional semantics / neural networks:

Bernardi, R., G. Boleda, R. Fernandez, D. Paperno. 2015. Distributional semantics in use. Proceedings of EMNLP 2015 Workshop LSDSem 2015: Linking Models of Lexical, Sentential and Discourse-level Semantics, 95-101. Lisbon, Portugal, September. Association for Computational Linguistics.

Paperno, D., G. Kruszewski, A. Lazaridou, Q. Ngoc, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, R. Fernandez. 2016. The LAMBADA dataset: Word prediction requiring a broad discourse contextProceedings of ACL 2016 (54th Annual Meeting of the Association for Computational Linguistics), 1525-1534, Berlin, Germany, August. Association for Computational Linguistics.

Boleda, G. and A. Herbelot. 2016. Formal Distributional Semantics: Introduction to the Special Issue. Computational Linguistics 42:4, 619-635.

  • General Machine Learning:

Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460. http://m.mind.oxfordjournals.org/content/LIX/236/433.full.pdf, (if that fails: http://phil415.pbworks.com/f/TuringComputing.pdf)

Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78. http://doi.org/10.1145/2347736.2347755

Parloff, R. (2016). The AI Revolution: Why Deep Learning Is Suddenly Changing Your Life. Fortune Magazine. Note: well written, thorough popular science article about deep learning.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. http://doi.org/10.1038/nature14539.