Recently I’ve been working on a project to extract data from an older SQL database into an ElasticSearch stack, to give a vastly improved experience for users looking for data. One of the challenges in this project is that the database has gone through years of evolution and acquisitions, and is starting to show its age. Finding the data to extract becomes one of the harder things to do, with some quirky naming conventions and join structures.
I got a log way by simply querying the information schema. SQL databases have the functionality built in to query the tables and columns of a database as if they were any other data, using queries like:
WHERE COLUMN_NAME LIKE
By using good guesswork and “like” expressions I was often able to find the columns I was looking for, and then repeat the exercise searching by foreign key fields to map a route through the table joins. One of the naming conventions in the database I’m working against is that primary and keys are named very consistently, and almost always match, so name searches find these joins most of the time.
However we came up with a faster way still to find columns, using Entity Framework and ReSharper. For the benefit of those that are unfamiliar with these tools, Entity Framework is one of Microsoft’s data access libraries, originally targeting SQL only but now going beyond that. ReSharper is JetBrains’ much-loved Visual Studio extension that offers a vast suite of productivity and refactoring tools, which I’ve loved using for many years and recommend to anyone as a way to boost productivity.
Entity Framework offers not only data access libraries for building applications, but also a set of tooling. One of the first iterations of this tooling included the ability to interrogate a database and generate C# classes for each of the tables, with the column names as properties, and with joins mapped where possible. We can simply take a connection string to the existing database, point Entity Framework’s tooling at it, and it will spit out a C# class map for the tables we select, with classes for each table, properties for each column, and an attempt to identify primary and foreign keys where these are defined.
One of the ReSharper features I use most is the very powerful search capabilities (Control+T with the default keybindings). ReSharper’s search includes (among others) files, classes and properties. Given that Entity Framework has mapped the database tables into classes and properties, we can now use ReSharper’s search over the database schema. Unlike querying the information schema, ReSharper searches are on key-press, in real-time, and have fuzzy search behaviors. This makes finding a column in the database really fast! It also means that I’m able to navigate the database schema without access to the database – e.g. offline – so I’ve been able to fill time on train journeys by making a note of the upcoming work, and doing a little investigative work to pad out the expected query structure so I can try it out when back online.