When we model the English language statistically and then use our learned model to make new sentences, we get meaningless garbage. It's a little disappointing; we'd like our computers to say intelligent things to us instead.
Is there a test that we can run on sentences to see if they're meaningful, so that we can save ourselves some disappointment and only look at model outputs which pass that test?
One answer is "No, we can't". There are lots of languages that could be formed from English words, and a given sentence could well have a different meaning or no meaning in each of these languages. But of course we don't care about possible languages that associate words and meanings: we care about the specific language that we already have and share, and its associations between words and components of the world.
Perhaps we feel that we should be able to algorithmically determine what a sentence is talking about just from language usage data. Having no vision and no instilled linguistic faculties, a competent newly born AI should still be able to reason about atmospheric optics from reading the Wikipedia article on Rayleigh Scattering, should it not? With enough deep thinking? AIXI could do it, presumably! Why not my shitty Python program?
And indeed, amazingly, we *can* translate between languages using just language usage data - even using just mono-lingual non-parallel corpora we can do this! - without modelling the associations that languages make between words and the world as it presents to our sensors.
But if our conventional statistical methods paired with merely terrestrial amounts of data and computing resources can't reliably make sentences from scratch that meaningfully refer to the world, well, that's not so unreasonable. It's a hard task. And it's not so tragic, either: all that we need in order to make Language Generation meaningful is to jointly model the world and language. We just have to get our programs to look at things while people talk about them and then our programs can be "grounded", and they too can talk about things while looking at them. We know it can be done in some circumstances: AI programs right now can label images with grammatical sentences and paint new images from sentence prompts. We have not yet foreseen the date when AIs will be able to talk about more abstract concepts like justice as fluently as they can already talk about the colors of birds' wings, but we will get there or be killed trying.
I suspect that the language grounding literature to date is all about grounding language in sensory categories, which are different from concepts. There are some fairly simple sensory categories for "mother" such as "a woman who is seen caring for children that look like her and her mate", while the concept of "mother", which is more complicated and built on top of the sensory category, can involve inferences to unseen events in the woman's history like "childbirth" or "adoption". Concepts are just one step on the path to intelligent language generation, but my hope is that the first step alone (sensory grounding of language terms) is enough to generate lots of meaningful speech, if not intelligent speech.
While present research in language acquisition almost uniformly leverages parallelism between a linguistic source and a sensory channel, maybe the task can even one day be done with merely non-parallel visual and linguistic data streams, as we are just now learning that translation can be done even without a corpus from each language which is parallel to the other in meaning at the sentence level.
Below are some titles of academic papers that I'm looking forward to reading soon which I think are relevant to the project of meaningful language generation. Not all of them are about grounding language acquisition in perception: I do still have some hope that language can be grounded in itself so to speak: that usage data has a good amount of as-yet-uncaptured structure that can be used to constrain language models so that they only output sentences which are sensical (if not sensorially referential). In particular, selectional restrictions and subcategorization frames are really cool to me and I want to play with them and see if they can help me to make broad sensical grammars.
* A Cognitive Constructivist Approach To Early Syntax Acquisition
* A Comparative Evaluation Of Deep And Shallow Approaches To The Automatic Detection Of Common Grammatical Errors
* A General-Purpose Sentence-Level Nonsense Detector
* A Neural Network Model For Low-Resource Universal Dependency Parsing
* A System For Large-Scale Acquisition Of Verbal, Nominal And Adjectival Subcategorization Frames From Corpora
* Combining Language And Vision With A Multimodal Skip-Gram Model
* Detecting Grammatical Errors With Treebank-Induced, Probabilistic Parsers
* Ebla: A Perceptually Grounded Model Of Language Acquisition
* Experience-Based Language Acquisition: A Computational Model Of Human Language Acquisition
* Exploiting Social Information In Grounded Language Learning Via Grammatical Reductions
* Grounded Language Acquisition: A Minimal Commitment Approach
* Grounded Language Learning From Video Described With Sentences
* Grounded Language Learning In A Simulated 3d World
* Integrating Type Theory And Distributional Semantics: A Case Study On Adjective–Noun Compositions
* Interactive Grounded Language Acquisition And Generalization In A 2d World
* Learning Perceptually Grounded Word Meanings From Unaligned Parallel Data
* Learning To Connect Language And Perception
* Parser Features For Sentence Grammaticality Classification
* Reassessing The Goals Of Grammatical Error Correction: Fluency Instead Of Grammaticality
* Selection And Information: A Class-Based Approach To Lexical Relationships
* Solving Text Imputation Using Recurrent Neural Networks
* The Generality Constraint And Categorical Restrictions
* Unsupervised Alignment Of Natural Language Instructions With Video Segments
Why don't I have a bunch of references to Deep Learning papers like "Generative Adversarial Text To Image Synthesis"? Because I read what I want. Or I don't read what I want, but I make a blog post about what I kind of want to read so that I can close some of these browser tabs.
But let's get back to the main question: is there a test that we can run on sentences to see if they're meaningful? Well, if you ground an agent's language faculty, then it will understand some sentences and not others, just as I understand lots of English sentences but my eyes lose focus when I hear people talk about category theory. So by grounding an agent's language usage, we can push back the question of "Is this sentence meaningful?" to the questions of "Is this sentence meaningful to the agent?" and "Is the agent conceptually fluent in this domain?". If the agent is well versed in the plumage of birds but doesn't have a good guess for the meaning of a sentence that mentions feathers, then we can suspect that the sentence is semantically ill-formed, even if a syntactic parser tells us that the sentence is grammatical. That leaves us with a problem of judging to what degree an agent is conceptually fluent in a domain and a problem of how to handle sentences in domains where our agent never becomes fluent (for example, because a conceptual faculty is required, whereas the agent only has sensory categories).
Right now I just want to read some papers and write some code and see what I can do when I commune with the spirit of the academic times. I'll let you know how it goes. Take care of yourselves.
No comments:
Post a Comment