Spreadsheet-style tool could democratize database design

When an organization needs a new database, it typically hires a contractor to build it or buys a heavily supported product customized to its industry sector. Usually, the organization already owns all the data it wants to put in the database. But writing complex queries in SQL or some other database scripting language to pull data from many sources, filter, sort, combine, and otherwise manipulate it, and display it in an easy-to-read format requires expertise that few organizations have in-house.

New software from researchers at MIT’s Computer Science and Artificial Intelligence Laboratory could make databases much easier for laypeople to work with. The program’s home screen looks like a spreadsheet, but it lets users build their database queries and reports by combining functions familiar to any spreadsheet user. Simple drop-down menus let the user pull data into the tool from multiple sources. The user can then sort and filter the data, recombine it using algebraic functions, and hide unneeded columns and rows, and the tool will automatically generate the corresponding database queries.

The researchers also conducted a usability study that suggests that their tool could be easier to use even in its prototype form than existing commercial database systems that represent thousands of programmer hours of work if not tens of thousands. “Organizations spend about $35 billion a year on relational databases,” says Eirik Bakke, an MIT graduate student in electrical engineering and computer science who led the development of the new tool. “They provide the software to store the data and to do efficient computation on the data, but they do not provide a user interface. So what inevitably ends up happening when you have something extremely industry-specific is to hire a programmer who spends about a year of work on building a user interface for your particular domain.“

Related Articles :

Familiar face

Bakke‘s tool, which he developed with the help of his thesis advisor, MIT Professor of Electrical Engineering David Karger, could allow organizations to get up and running with a new database without waiting for a custom interface. Bakke and Karger presented the tool at the Association for Computing Machinery’s International Conference on Management of Data.

The tool’s main drop-down menu has 17 entries, most of which — such as “hide,“ “sort,“ “filter,“ and “delete“ — will look familiar to spreadsheet users. In the conference paper, Bakke and Karger prove that those simple functions are enough to construct any database query possible in SQL-92, which is the core of the version of SQL taught in most database classes.

Some database queries are simple: A company might, for instance, want a printout of all its customers’ names and phone numbers. But it might also be like a printout of customers’ terms and phone numbers in a given zip code whose purchase totals exceeded some thresholds over a particular period. If each purchase has its record in the database, the query must include a code for summing up the purchase totals and comparing them to the threshold quantity.

Things are even more complicated because a database generally stores related data in different tables. For demonstration purposes, Bakke loaded several existing databases into his system. A database used at MIT to track research grants has 35 separate tables; another, which records all the information in a university course catalog, has 15.

Likewise, a company might store customers’ names and contact information in one table, lists their purchase orders in another, and list the items constituting each purchase order in a third. A relatively simple query that pulls up the phone numbers of everyone who bought a particular product for a specific date range could require tracking data across all three tables.

Bakke and Karger‘s tool lets the user pull in individual columns from any table — name and phone number from the first, purchase orders and dates from the second, and products from the third. (The tool will automatically group the effects associated with each purchase order together in a single spreadsheet “cell.“) A filter function like that found in most spreadsheet programs can restrict the date range and limit the results to those that include a particular product. The user can then hide any unnecessary columns and complete the report.

Hands-on approach

Previous academic projects have explored techniques for database query construction using editable flowchart diagrams or virtual buttons that can be snapped together. But Bakke and Karger’s tool enables what is known in computer science as “direct manipulation” of data. “It harkens back to our physical nature, that we’re very comfortable with the idea that if I pick something up and I twist it, then it will twist, and if I shake it, it will shake,” Karger says. “You want the same feeling when you’re manipulating information in a computer — that you’re picking up the information and pushing it this way or sliding it that way or cutting things out — instead of writing some instructions telling the computer to do something. And then the computer does it, and you say, ‘Oh, that’s not what I meant.’“

Bakke conducted two studies on the usability of his tool. In one, 14 participants were asked to construct a series of queries using the device and then rated their experience using the System Usability Scale. This standard measure allows the comparison of different types of software. The scores hovered around the 50th percentile in business software, which isn’t bad for an academic research project. But Microsoft’s Access database program scores are much worse — around the fourth percentile. “The way to describe that result is that database querying is hard, but we can make it tolerable,“ Bakke says.

At present, Bakke‘s tool enables query construction on an existing database but doesn’t allow the direct entry or modification of data. He expects to begin adding that functionality over the next six months, and his office wall is covered with a list of functions he’d like to add and bugs he needs to repair. But he hopes to release the tool in a year or so. “It’s almost ironic,“ Karger says. “Eirik‘s software is far more robust than just about everything graduate students have built. But he’s not satisfied with releasing it in its current form. He’s aiming for something of commercial quality.” “It turns out that when you’re dealing with people’s data, you need to get it right,“ Bakke says.

Spreadsheet-style tool could democratize database design

Familiar face

Hands-on approach

Titanfall 2 Single-Player Campaign Revealed, Shows New Details

OpenVLC platform for research in Visible Light Communication Systems

Latest Updates