How to get familiar with a codebase

Example image

The ability to quickly get familiar with an existing codebase is a valuable skill. It can help whether you are planning to contribute to an open-source project or inherit a new codebase at your workplace.

This is one of the most underrated skills of a programmer, and by no means it’s easy. Sometimes even good programmers find it difficult and become frustrated when they inherit a new codebase. Being methodical and applying a few strategies can make this process less frustrating or even enjoyable! 🙂

Familiarize with the product

Use the product as an end-user. It’s impossible to understand the code without understanding the various flows of the product.

Try as many flows as you can. Go through the product requirement documents if you have them. If the product is an internal component, e.g., an API server, or a library, go through the significant APIs it provides. Call them from cURL or test programs. Understand what these APIs do. Look for any API documentation.

The idea here is to gain enough knowledge about the product. What are its capabilities, and how are they used.

Understand the structure of the codebase

Let’s move on to the code now. It would help if you started by understanding how the codebase is structured. The directory structure, build process, test suite, deployment process. You should be able to build the product and use your build. Read any documentation available on Wiki, ReadMe, code comments.

Deconstruct a feature end-to-end

  1. Now, pick a feature or a product flow that you already know how the business logic works. This is the reason familiarizing with the product is essential. Talk to people if you have questions about the product.

  2. Now, think about how would you implement it? Create a mental model of the solution. Note down error conditions that you’d want to handle. Don’t worry about getting it right, but creating your own mental model is important. We’ll use this model to compare with the actual implementation later.

  3. Alright, the next step would be to create a branch to play and experiment with the code. Start by identifying the entry point in the code. Where does the flow for the said feature start?

  4. At this step, you should take the help of log files. Add additional log messages if needed. Once you have identified the entry point, start navigating the call graph. You can use a debugger here.

  5. Take notes all the while about the function call graph. What does each function do, and how are errors handled?

  6. Compare these notes with your model solution created earlier. Reflect on the differences.

What about concurrent systems?

For concurrent systems, here is what I’d do:

  1. While familiarizing with the product, understand what flows run concurrently—their entry points, where they get the input from and produce output.

  2. Now, I’ll start by focusing on one execution path/thread by disabling other concurrent flows by commenting out or mocking their input and output.

  3. Sometimes, disabling other flows is not entirely possible, but thinking hard on this part pays off by simplifying other steps.

  4. Our goal is to simplify as much as we can to reduce our cognitive load. So, I can make changes to the flow I’m focussing on and experiment with less intervention from other flows.

  5. Introduce more debug log messages to identify different execution paths/threads.

  6. Basically, we should follow all the steps I mentioned in the article with more rigor 🙂

Of course, I suggest this flow for understanding a new codebase. Debugging a production issue in a concurrent system is another story, though many tips here can help.

Next steps

Once you’ve understood the flow for a few features, try fixing a few simple bugs or implement something simple. Use code review to ask questions you have. Now try to follow the same process for as many different flows as you can.

That’s it. Diving into an unfamiliar codebase is not easy, but being pragmatic and methodical can take out most of the frustration! Soon you will start enjoying the process of diving into a new codebase.

If you like this post, please share on Twitter.

Follow @sabya for more thoughts on programming and building software.