A couple of days ago, this pretty interesting paper came out of Facebook AI - it was on Neural Databases. Let’s try and understand what the significance of this is.

In recent years, neural nets have gotten a lot better at answering queries from natural language text. The question that arises, then, is can we relax the fundamental assumption of database management - that we have some kind of predefined schema?

This paper is an exploration of whether that could be possible. Can we just store data schema-less and still expect to retrieve and so on? NeuralDB is a database system where updates and queries are both given in natural language. This could potentially lead to more efficient storage.

There are other benefits as well - because they are using a pretrained language model, there are certain operations that could be done more efficiently. We’d be able to query with fewer joins, theoretically. (Example to follow)

First, some terminology.

Select-Project-Join queries

SQL’s SELECT command encompasses both Select and Project queries.

  1. Select is simple enough - it is used to select rows from the table that match a condition. Think of it like a filter operation on an Excel sheet. An example SQL command would be SELECT * FROM Customers WHERE Country=’Mexico’; (copied straight off w3schools).
  2. Project is essentially the same as what we talked about with Select, but with columns as opposed to rows. An example command would be SELECT Country, State FROM Customers; we are reducing the complexity of our table, in some sense. Filtering out unwanted columns.
  3. Join operations are used to relate two or more independent tables. If you have a Customers table and an orders table and you want to view all the orders of customers who live in Senegal - you’re going to have to do a Join operation on these two tables.

Back to Neural Databases

Problem

Data and queries in a NeuralDB are all just sentences. I think of each sentence as providing some information, and perhaps we have various Venn diagrams giving us useful information.

For instance, if our DB consisted of the sentences:

  1. Raj lives in Pasadena.
  2. Raj is a character in the Big Bang Theory.
  3. Sheldon and Raj are friends, and live in the same city.

And our query was “Which city does Sheldon live in?” - we’d know Pasadena was a city from our pretrained language model. We know Sheldon and Raj live in the same city from the third sentence, and a join operation of sorts (intersection if that helps you visualize) would tell us that Sheldon therefore must live in Pasadena as well.

The data in a NeuralDB is a set of sentences \(D = {(u_1, t_1), ..., (u_k, t_k)}\), where the \(t_i\)’s are timestamps.

What are Transformers?

If you aren’t sure what Transformers are, in the NLP sense, it might be a good idea to read Jay Alammar’s fantastic blog post. If you have more time, read Attention is All You Need. If you sorta understand how RNNs work, Transformers are not a huge leap from there. If you’d like to black box it, that’s fine as well.

The authors first demonstrate that transformers can answer SPJ queries.

To do so, they provided their T5 Transformer model with (D,Q,A) triples where D is a set of facts, Q is a query and A is the correct answer to Q given D. There were several augmentations and templates they used to make sure there was no overfitting and so on on all three members of each triple. They also took care to make sure the model couldn’t “cheat” by relying on the pretrained model, and so changed names and so on (their data was from Wikipedia, I should have mentioned this earlier). To do this, they relied on timestamps (read Section 3.1 of the paper if you want to know more).

The input vector to the transformer during training was the facts vector (D) concatenated with the query vector (Q). The authors conducted various tests with this setup - by varying the composition of the facts and the queries, and seeing how the model performed.

There were large differences observed - the Perfect IR (Information Retrieval) approach came out the winner in most regards. This approach assumes that the information retrieval portion is perfect, and always gets exactly what it needs to answer a query. A pretty generous assumption to be making, for sure.

The one area the Perfect IR approach wasn’t great was with aggregation steps. We’ll talk about this later.

Other approaches included Whole DB (it’s what you’d expect - the facts vector is always just the entire database. It has the problems you’d expect - that’s too much data for the Transformer to process.)

Then there was TF-IDF IR and DPR. The former has to do with structuring queries and facts in such a way that there is some overlap. I’m not 100% sure what the details are. The latter has to do with Dense Passage Retrievers, also something I’m not familiar with.

The paper goes on to discuss how the Whole DB approach could potentially be used - if you multiple transformers concurrently. And thr notion of a support set - the minimal subset of the facts in the database that contains “support” to generate an answer to a query. It’s interesting stuff, and we will cover it in Part 2. Coming up soon!

Caveats

A NeuralDB is “not meant to provide the same correctness guarantee as a traditional database system”, there is likely some degree of ambiguity here and so it can’t be relied for applications that demand this. The authors argue speech detection and personal assistant software could use such databases, which is fair. Coming up with schema for storing facts about users is a little bit tricky.