Generating Meaningful Sentences

When we model the English language statistically and then use our learned model to make new sentences, we get meaningless garbage. It's a little disappointing; we'd like our computers to say intelligent things to us instead.

Is there a test that we can run on sentences to see if they're meaningful, so that we can save ourselves some disappointment and only look at model outputs which pass that test?

One answer is "No, we can't". There are lots of languages that could be formed from English words, and a given sentence could well have a different meaning or no meaning in each of these languages. But of course we don't care about possible languages that associate words and meanings: we care about the specific language that we already have and share, and its associations between words and components of the world.

Perhaps we feel that we should be able to algorithmically determine what a sentence is talking about just from language usage data. Having no vision and no instilled linguistic faculties, a competent newly born AI should still be able to reason about atmospheric optics from reading the Wikipedia article on Rayleigh Scattering, should it not? With enough deep thinking? AIXI could do it, presumably! Why not my shitty Python program?

And indeed, amazingly, we *can* translate between languages using just language usage data - even using just mono-lingual non-parallel corpora we can do this! - without modelling the associations that languages make between words and the world as it presents to our sensors.

But if our conventional statistical methods paired with merely terrestrial amounts of data and computing resources can't reliably make sentences from scratch that meaningfully refer to the world, well, that's not so unreasonable. It's a hard task. And it's not so tragic, either: all that we need in order to make Language Generation meaningful is to jointly model the world and language. We just have to get our programs to look at things while people talk about them and then our programs can be "grounded", and they too can talk about things while looking at them. We know it can be done in some circumstances: AI programs right now can label images with grammatical sentences and paint new images from sentence prompts. We have not yet foreseen the date when AIs will be able to talk about more abstract concepts like justice as fluently as they can already talk about the colors of birds' wings, but we will get there or be killed trying.

I suspect that the language grounding literature to date is all about grounding language in sensory categories, which are different from concepts. There are some fairly simple sensory categories for "mother" such as "a woman who is seen caring for children that look like her and her mate", while the concept of "mother", which is more complicated and built on top of the sensory category, can involve inferences to unseen events in the woman's history like "childbirth" or "adoption". Concepts are just one step on the path to intelligent language generation, but my hope is that the first step alone (sensory grounding of language terms) is enough to generate lots of meaningful speech, if not intelligent speech.

While present research in language acquisition almost uniformly leverages parallelism between a linguistic source and a sensory channel, maybe the task can even one day be done with merely non-parallel visual and linguistic data streams, as we are just now learning that translation can be done even without a corpus from each language which is parallel to the other in meaning at the sentence level.

Below are some titles of academic papers that I'm looking forward to reading soon which I think are relevant to the project of meaningful language generation. Not all of them are about grounding language acquisition in perception: I do still have some hope that language can be grounded in itself so to speak: that usage data has a good amount of as-yet-uncaptured structure that can be used to constrain language models so that they only output sentences which are sensical (if not sensorially referential). In particular, selectional restrictions and subcategorization frames are really cool to me and I want to play with them and see if they can help me to make broad sensical grammars.

* A Cognitive Constructivist Approach To Early Syntax Acquisition
* A Comparative Evaluation Of Deep And Shallow Approaches To The Automatic Detection Of Common Grammatical Errors
* A General-Purpose Sentence-Level Nonsense Detector
* A Neural Network Model For Low-Resource Universal Dependency Parsing
* A System For Large-Scale Acquisition Of Verbal, Nominal And Adjectival Subcategorization Frames From Corpora
* Combining Language And Vision With A Multimodal Skip-Gram Model
* Detecting Grammatical Errors With Treebank-Induced, Probabilistic Parsers
* Ebla: A Perceptually Grounded Model Of Language Acquisition
* Experience-Based Language Acquisition: A Computational Model Of Human Language Acquisition
* Exploiting Social Information In Grounded Language Learning Via Grammatical Reductions
* Grounded Language Acquisition: A Minimal Commitment Approach
* Grounded Language Learning From Video Described With Sentences
* Grounded Language Learning In A Simulated 3d World
* Integrating Type Theory And Distributional Semantics: A Case Study On Adjective–Noun Compositions
* Interactive Grounded Language Acquisition And Generalization In A 2d World
* Learning Perceptually Grounded Word Meanings From Unaligned Parallel Data
* Learning To Connect Language And Perception
* Parser Features For Sentence Grammaticality Classification
* Reassessing The Goals Of Grammatical Error Correction: Fluency Instead Of Grammaticality
* Selection And Information: A Class-Based Approach To Lexical Relationships
* Solving Text Imputation Using Recurrent Neural Networks
* The Generality Constraint And Categorical Restrictions
* Unsupervised Alignment Of Natural Language Instructions With Video Segments

Why don't I have a bunch of references to Deep Learning papers like "Generative Adversarial Text To Image Synthesis"? Because I read what I want. Or I don't read what I want, but I make a blog post about what I kind of want to read so that I can close some of these browser tabs.

But let's get back to the main question: is there a test that we can run on sentences to see if they're meaningful? Well, if you ground an agent's language faculty, then it will understand some sentences and not others, just as I understand lots of English sentences but my eyes lose focus when I hear people talk about category theory. So by grounding an agent's language usage, we can push back the question of "Is this sentence meaningful?" to the questions of "Is this sentence meaningful to the agent?" and "Is the agent conceptually fluent in this domain?". If the agent is well versed in the plumage of birds but doesn't have a good guess for the meaning of a sentence that mentions feathers, then we can suspect that the sentence is semantically ill-formed, even if a syntactic parser tells us that the sentence is grammatical. That leaves us with a problem of judging to what degree an agent is conceptually fluent in a domain and a problem of how to handle sentences in domains where our agent never becomes fluent (for example, because a conceptual faculty is required, whereas the agent only has sensory categories).

Right now I just want to read some papers and write some code and see what I can do when I commune with the spirit of the academic times. I'll let you know how it goes. Take care of yourselves.

Finding Short Rubik's Cube Codes

When you see codes for the 3x3x3 Rubik's cube, they're generally used for manipulating the top layer while keeping the lower two layers intact. I'd like to discover all of the last-layer algorithms for various short lengths.

6-Move Codes
There aren't many at first. If we constrain ourselves to a set of 18 moves consisting of clockwise quarter turns of the six faces, (U, D, L, R, F, B), counter-clockwise quarter turns (denoted by appending a typewriter apostrophe ' to the letter of the turned face), and half turns (for which we append a 2), then even though there are 18^6 distinct sequences that are six moves in length, there are, up to rotations and mirror reflections, only two last-layer algorithms among them:

<F  U  R  U' R' F'> and its inverse <F  R  U  R' U' F'>

7-Move Codes:
What about codes that are 7 moves in length?

Let's normalize our presentation of codes in the following way:
  • If a code's first move is on one of the faces (L, R, or B), it should be transformed by rotation into a code that begins with a turn of the F face.
  • Any code which then begins with F' should be transformed into one that begins with F through a left-right reflection.
  • What about codes that begin with F2? A left-right reflection won't change the first move, but it will change any quarter turns that show up later in the move. If a code begins with F2 and contains a quarter turn later on, prefer the left-right variant for which the first quarter turn is made to be clockwise rather than counter-clockwise.
  • Any code that has a turn of the U face at its beginning or end should have those moves simply cut off.
  • Any code that begins with a turn of the D face should have that turn moved to the end.
  • Any two successive turns on the same face should be combined into one turn, and they should also be combined if they're only separated by a turn of the (independent) parallel face.
  • I haven't added this into my normalization procedure yet, so the codes below might not adhere to this rule, but if a code has two successive moves on opposite faces, the two moves can be swapped without changing the result, so perhaps we should establish some ordering (such as U, D, L, R, F, B), and prefer that opposite faces occur in that order, unless the reverse order would result in a reduction according to one of the other rules. For example, "U D" would be preferred to "D U" in the interior of a code, but if "U D" appeared at the end of a code, then by switching to "D U" we could shorten the code by removing the U turn.
Using those normalization rules, there are 5 distinct last-layer codes in 7 moves. These are

<F  U' B' U  F' U' B > | The inverse of which, in normal form, is same code.
<F  U2 F' U' F  U' F'> | <F  U  F' U  F  U2 F'>
<F  R  B' R  B  R2 F'> | <F  R2 B' R' B  R' F'>

Again, this is kind of weird: the number 18^7 is much bigger than 5. It's not as if the number of distinct last-layer configurations (about a thousand, depending on which symmetries you ignore) is the real limiting factor, because I would happily separately count two codes that produced the same configuration. There are just very few sequences that are codes. Oh well. At least it's an easy set to memorize.

8-Move Codes:
How about codes in 8 moves? Surprisingly there are just 19.... probably. The program that I ran in order to find these had a stupid optimization which might have skipped a valid code or two. My guess however is that these are the full set.

<F  R  B  R' F' R  B' R'> | <F  R  F' L  F  R' F' L'>
<F  R  B  U' B' U  R' F'> | <F  R  U' B  U  B' R' F'>
<F  R  B' R' F' R  B  R'> | <F  R' F' L  F  R  F' L'>
<F  R' F  R  F2 L' U2 L > | <F  U2 F' L2 B  L  B' L>
<F  R' F' L  F2 R  F2 L'> | <F  R2 B' R2 F' R  B  R'>
<F  U  F  R' F' R  U' F'> | <F  U  R' F  R  F' U' F'>
<F  U  F' L' B' U' B  L > | <F  R  U' R' F' L' U  L>
<F  U  F' U' F' L  F  L'> | <F  R' F' R  U  R  U' R'>
<F  U2 F' U2 F' L  F  L'> | <F  R' F' R  U2 R  U2 R'>
<F  U  F2 L  F  L2 U  L > | Whose normal-form inverse is the same.


I wish I could say which of these are newly discovered and which are known. Probably all are known, given how small the set is and how much computing power has been thrown at Rubik's cubes, but it's hard to know. People online don't put their codes into normal form, so even if Google can't find one of my codes online, maybe that just means someone has posted a transformed version of it.

9-Move Codes:
How about Last Layer codes in 9 moves? I really don't have any guess as to the size of this set. The search space is huge. If I were to limit myself to codes where each face besides U had face turn conservation (that is, the turns to each face must add up to 0 modulo 4, counting each counter-clockwise turn as -1), then I think I could find all such Last Layer codes in a few days. Maybe I will. It would be pleasing to know that I had exhaustively found all 9-move codes even in that restricted class.

Here are the 9-move normal form codes that I have so far - 76 in total, but I still need to filter out a few equivalent codes that remain unnecessarily due the fact that I haven't yet normalized the order of successive moves on parallel faces.

<F  B' U  R  U' R' F' U' B > | <F  U' B' R' U' R  U  F' B >
<F  B' U' R' U  R  B  U  F'> | <F  U' B' R' U' R  U  B  F'>
<F  D  B2 D' F  D  B2 D' F2> | <F2 D  B2 D' F' D  B2 D' F'>
<F  D2 B2 L  U2 B2 D2 R  F > | <F  L  D2 B2 U2 R  B2 D2 F >
<F  D2 B2 R  B2 D2 F2 L  F > | <F  R  F2 D2 B2 L  B2 D2 F >
<F  L' U2 L  U2 L  F2 L' F > | <F  R' F2 R  U2 R  U2 R' F >
<F  R  B  U' B' U2 R' U' F'> | <F  U  R  U2 B  U  B' R' F'>
<F  R  F  U  F' R' F  U' F2> | <F2 U  F' R  F  U' F' R' F'>
<F  R  U  R' F' U  F  U2 F'> | <F  U2 F' U' F  R  U' R' F'>
<F  R  U  R2 F  R  F' U' F'> | <F  U  F  R' F' R2 U' R' F'>
<F  R  U' R' F' U' L' U2 L > | <F  U2 F' U' L' B' U' B  L >
<F  R  U' R' U  F' L' U  L > | <F  U  F' L' U  B' U' B  L >
<F  R  U' R' U  R  U  R' F'> | <F  R  U' R' U' R  U  R' F'>
<F  R  U' R' U' F' L' U  L > | <F  U  F' L' U' B' U' B  L >
<F  R  U' R' U2 F' L' U  L > | <F  U  F' L' U2 B' U' B  L > 
# Both of the above are equivalent to <U2>. That's a trivial result, but they're still very pretty codes.
<F  R  U' R' U2 R  U  R' F'> | NF-inverse is the same.
<F  R  U2 B  U  B' R' U2 F'> | <F  U2 R  B  U' B' U2 R' F'> 
# The above are equivalent to <U> and <U'> respectively.
<F  R' F  L2 F' R  F  L2 F2> | <F2 R2 F  L  F' R2 F  L' F >
<F  R' F  R  F2 U  L' U  L > | <F  U  F' U  L2 B  L  B' L >
<F  R' F  R2 B' R  B  R2 F2> | <F2 L2 B  L  B' L2 F  L' F >
<F  R' F' U' F  U  R  U' F'> | <F  U  R' U' F' U  F  R  F'>
<F  R' F2 L  F2 R  F2 L' F > | NF-inverse is the same.
<F  R2 B' D  B' D' B2 R2 F'> | <F  R2 B2 D  B  D' B  R2 F'>
<F  U  F  D  F' U' F  D' F2> | <F2 D  F' U  F  D' F' U' F'>
<F  U  F' L' U  L  F  U' F'> | <F  U  F' L' U' L  F  U' F'>
<F  U  F' R' F  U' F' U  R > | <F  U  R' U' R  F' R' U  R >
<F  U  R  U2 R' F' L' U  L > | <F  U  F' L' B' U2 B  U  L >
<F  U2 F  D  B' R2 B  D' F2> | <F2 D  B' R2 B  D' F' U2 F'>
<F  U2 F  D  F' U2 F  D' F2> | <F2 D  F' U2 F  D' F' U2 F'>
<F  U2 F' L  F' L' F2 U2 F'> | <F  U2 F2 L  F  L' F  U2 F'>
<F  U2 F2 U' F2 U' F2 U2 F > | NF-inverse is the same.
<F2 D  B' D' F' D  B  D' F'> | <F  D  B' D' F  D  B  D' F2>
<F2 L2 R2 B2 D  B2 L2 R2 F2> | NF-inverse is the same.
<F2 L2 R2 B2 D  F2 L2 R2 B2> | <F2 R2 L2 B2 D  F2 R2 L2 B2>
<F2 R2 F  L2 F' R2 F  L2 F > | <F  R2 F  L2 F' R2 F  L2 F2>
<F2 R2 L2 B2 D  B2 R2 L2 F2> | NF-inverse is the same.
<F2 U  L  R' F2 L' R  U  F2> | NF-inverse is the same.
<F2 U  R' L  F2 R  L' U  F2> | NF-inverse is the same.
<F2 L  F  L' U' F2 R' F' R > | <F  R' F' R2 U' B' R  B  R2>
# The above are both equivalent to <U'>.
<F  L  F2 L' F  U  F2 U' F2> | <F2 U  F2 U' F' L  F2 L' F'>
<F  U  F' L' U2 L  F  U' F'> | NF-inverse is the same.
<F  R' B2 R  F  R' B2 R  F2> | <F2 L  B2 L' F  L  B2 L' F >

Gosh, that was fun. If you know of someone who's already done or even started this work, please do let me know.

EDIT: While testing for remaining 9-move codes, I did some online searching and found a handy little github repository from Lars Petrus containing a (large) text file with all possible Last Layer algorithms up to fifteen moves here. After normalizing those, I was pleased to find that there were no codes in fewer than 9 moves that I had failed to find, and that for codes of exactly 9 moves, there were only 4 new pairs:

<F  R2 B' R' B  U  R' U' F'> | <F  U  R  U' B' R  B  R2 F'>
<F  R2 D  R  D' R  F2 U  F > | <F  U  F2 L  D' L  D  L2 F >
<F  R2 D  R  D2 F  D  F2 R > | NF-inverse is the same.
<F  U  F' L2 B  L  B' U' L > | <F  U' R' F  R  F2 L' U  L >

None of those violate face turn conservation except in the U face, so I would have found them all eventually, though it would have taken days, and I wouldn't have had this warm satisfaction of knowing that I was done with the project. I'm free!