Creating Dynamic Visualizations for Busy Users

The Roadmap

I spent the last month working on an application for the non-profit organization Human Rights First (HRF). HRF works to connect immigration attorneys with asylum seekers and provide attorneys with resources that will help them best represent their clients. The application I worked on is designed to assist immigration attorneys and refugee representatives in advocating for clients in asylum cases by identifying patterns in judicial decisions and predicting possible outcomes based on those patterns. I worked on the data science team, along with a team of web developers, to add additional functionality to the application, namely the visualization of the data that is scraped from the case files that are uploaded to the application and saved in a database.

What’s the Problem?

The main problem that I needed to solve was that I needed to create visualizations that would update as more data is uploaded to the database, as well as give the user the ability to choose which key metrics and in what form they want the data visualized. These visualizations are vital to the application’s core functionality. It’s important to keep in mind that the average user of this application is a lawyer who could be juggling over 100 asylum cases at any one time. They therefore don’t have time to look into the details of the data of thousands of cases. Essentially, the lawyers want to quickly assess if there is a pattern in cases similar to that of their client. They can then use this information to formulate and tailor their arguments when representing their clients. Our solution was to give the user the choice between a stacked or a grouped bar chart.

How We Moved Forward

Since this project had been running for just over 6 months, the codebase was well built out and my teammates and I had to find where we could implement this data visualization. By walking through the code, we found where the data science API was creating some visualizations, pictured below.

FirstCode.png

What you are seeing here is that the data science API endpoint for creating visualizations for the judges is hard coded with 3 features, has a variable at the end of the endpoint that shouldn’t be there, and is returning three charts all at the same time. There were two key problems with this current implementation. First, these are not dynamic and do not give the user control over the features that are being visualized. Second, returning three charts requires too much memory and will crash the application.

We adjusted for these two problems by creating code that was more dynamic, getting rid of anything hard coded, and only returning one chart when the endpoint is called to fix the memory issue. The code that we wrote to replace this and that will be passed onto the next team for fine tuning is pictured below.

SecondCode.png

Why Make These Changes?

Our thought process behind this implementation of the visualizations was to strike a balance between giving the user control and flexibility over what to visualize, while not overwhelming them with the burden of too many decisions. We needed to be in constant communication with the frontend and backend teams so that we knew what metrics the user could select and how that was being passed from frontend to backend to the data science API.

A key challenge that we faced in this process was that originally, the way that the codebase had been setup and the way that the data science API was being called, the backend was pulling the data and sending the information to the data science API to visualize. After much discussion with the team, we decided that it would be significantly more efficient for the data science API to query the database directly rather than deal with passing the data back and forth. This way, the backend could focus on just passing which key metrics and what kind of chart the user wants visualized. This reduced the complexity of the code as well as the amount of memory that is being allocated for sending information from different parts of the application to each other.

The Result

Even though the visualizations are not perfect and the code could be better documented and organized with more functions and files, we did accomplish our main goal of creating dynamic visualizations, as well as solving other problems that were plaguing the application. Together, the team and I were onboarded onto a project with a very mature codebase, through collaboration with the backend team realized that only one endpoint on the data science API was actually being called, created two new endpoints for visualizations (code for second endpoint for individual judges pictured below), and solved a timeout issue with uploading case files.

ThirdCode.png

We did not have time for the frontend team to implement a form for the user to fill out, and therefore did not have the opportunity to test the code on a live deployed version of the application. However, we were able to test this locally and have the visualizations up and running. If I had more time on the project that is the first thing that I would prioritize, as well as organizing the code to be much more readable.

One challenge that I foresee down the line is the quality of the data as more and more cases are uploaded to the database. If the data is not clean, the number of bars on the graph will grow too large to convey any meaningful insight. This will be a challenge down the line because it seems that different courts and districts have different file formats for their cases, which could lead to messy data if left unsupervised. Data cleaning and data engineering is definitely something that will need to be paid attention to in the future iterations of this project.

Time for Reflection

This last month, even though stressful at times, has been an extremely valuable experience for me to have. I feel that I have really grown as a Data Scientist. Before this month, I had never worked with a backend or frontend team. I had only created a Flask application or two where everything happened in one, maybe two python files. I now feel confident in my ability to collaborate and contribute to a team as well as report to stakeholders on progress and ask questions about their vision. I have no problem speaking up and talking about what I’m working on, my progress, my wins, and of course my blockers. The value in teamwork has really shown through this month. One quote I like to reference that could not be more true throughout this experience is, “If you want to go quickly, go alone. If you want to go far, go together.”