ACR Bulletin

Covering topics relevant to the practice of radiology

Hone Your Data Science Skills

Here's how you can brush up and keep up with changes in AI in radiology, both on your own and through organized activities.
Jump to Article

Working on a project that will impact your daily practice is probably the single best way to keep your skills current because there is a real incentive to do things well.

November 18, 2022

You’re a full-time radiologist working a day job. A few years ago, you heard about big data and AI and decided to learn all you could. A good thing you did, too: Radiology AI is real and here today, and your practice looks to you as the local expert.

The challenge, then, is keeping up.

Just as all AI performance degrades with time, your data skills will also need upkeep. If your goals are similar to mine, these suggestions might help. These are not ways to become the world’s leading expert in AI but suggestions for brushing up.

Engage in a Data Project

While images are the most apparent source of rich data, you regularly run across a lot of text data as a radiologist. Finding a project in data analytics can take your work to the next level. Maybe the
project is predicting future volume based on this year’s data to justify hiring new radiologists. Perhaps it’s predicting demand for radiologists and RTs by day and hour using a combination of exam volume and turnaround time by modality. Maybe it’s building a computer vision model for a disease entity you’ve spent your career studying.

Working on a project that will impact your daily practice is probably the single best way to keep your skills current because there is a real incentive to do things well. There is the pressure of eventually displaying your work. What’s more, it’s an opportunity to improve the way your co-workers do their work. As long as you are willing to accept the challenge, both intrinsic and extrinsic
rewards can be well worth the effort.

Make Your Data Better

If taking on a truly tangible project sounds too involved for your professional life right now, that’s OK! There are plenty of other ways to keep your data skills current.

The best data science projects start with high-quality data. Sometimes this means better data: A structured format for diagnostic findings, standardized recommendations, and fine-tuned, practice-level reporting templates are all meaningful engagements. These can be highly worthwhile projects to refine the input to machine learning models.

Sometimes high-quality data means better use of data. What is the volume trend in your practice? What is the average turnaround time? Make a request to your data center for a spreadsheet of last month’s radiology reports by modality, anatomy, and timestamps. In particular, timestamps are extremely helpful for calculating turnaround time and identifying outliers.

Take a moment to learn how data is populated. How is your practice’s turnaround time calculated? Do your radiologists and ED physicians agree on the definition? Are there manual components
in the time calculations — for instance, does the scanner automatically fill in exam start time, or is there another button an RT has to click? If a data field is manually populated, what are your options to improve the quality of that data?

It might be easy and tempting to do the analysis straight on a spreadsheet, but instead, try using an analytic platform. Is R your cup of tea? Do you prefer Python? The benefit of approaching
analytics this way is that you can scale your analysis quickly. The right platform can make all the difference when you go from analyzing a spreadsheet with 300 rows to one with 30 million.

The right platform can make all the difference when you go from analyzing a spreadsheet with 300 rows to one with 300 million.

Enter Machine Learning Competitions

Many machine learning competitions allow you to solve discrete problems in “practice mode” (or in actual competition mode with penalties for wrong answers). For example, national societies such
as RSNA and the Society for Imaging Informatics in Medicine (SIIM) routinely produce radiology-relevant competitions on a timely topic. ML competitions provide optimized data and encapsulate
the problem. While real-life data science is messy and often involves mixed-quality data, competitions abstract out the logistics and focus on model-building. If you got into data science because you enjoyed the rush of creating something out of your own hours of effort, you might enjoy these competitions. Cash or computing resources are common prizes for top performers.

Kaggle is a website that allows users to publish anonymized data sets, build machine learning models, and host or participate in data science competitions both in and outside of healthcare. RSNA and SIIM have hosted many of their recent machine learning challenges — and winning solutions — on Kaggle. Outside of the radiology competitions, data science problems on Kaggle range from straightforward to very difficult, and there is something for everyone, from complete novices to experts. It’s never just busywork.

Learn a New API or New Language

Like any skill, every element of computer science builds upon itself. While current literature covering radiology data science emphasizes coding in Python, a radiologist with the right data and no access to full-time data scientists can use a low-code or no-code environment like PyCaret to turn ideas into a working prototype.

For those with coding experience, even within one programming language, there are many packages to consider. Python libraries in machine learning alone pose a daunting challenge: Pytorch, Keras, Caffe/Caffe2, and MXNet are just some examples of the many choices you have for computer vision. For natural language processing, popular starting points include nltk, GenSim, SciPy, and others.

Finally, the proper integration of data models into the broad healthcare technology and workflow is critical in the real world. Pragmatic considerations often require knowledge beyond Python or data science. Java (deeplearning4j), C++ (OpenCV, Cuda), and C# (also OpenCV through a .NET wrapper) are useful considerations for data science projects ripe for clinical translation.

One great way to keep yourself current as a data scientist is to keep learning new things because learning new things requires you to review what you already know.

Conclusion

As a radiologist, I am not (and probably never will be) as good as a full-time data scientist, so my goal is to keep abreast of the newest technologies and periodically create something that helps me solve my everyday problems at work.

How do you keep up with your data skills?

Author Po-Hao "Howard" Chen  MD, MBA, is chief imaging informatics officer, IT medical director for enterprise radiology, and staff radiologist in musculoskeletal imaging at Cleveland Clinic.