Introduction: Birth of the Platform Team
Scaling products is a continuous process, beginning with the development of features that help earn millions of dollars for engineers. However, at a certain point in the scaling process, organizations encounter issues that hinder further feature development. For example:
- App launch time getting increased
- Crashes/ANRs that are not easy to root cause
- Build times of project getting increased
- Sudden loss of customer conversion in a user flow
Initially, these issues are assigned to a few engineers who have relevant experience or knowledge. For example, engineers familiar with the code path involved in app launch may be tasked with reducing app launch time, or those experienced in solving performance issues may address general performance issues.
While this approach may work for a few issues, it eventually begins to affect product release. As a result, organizations recognize that solving such problems requires continuous effort and should be the responsibility of a dedicated team. This realization gives rise to a "Platform team".
The Truth About Mobile Platform Teams and Their Roles
In my experience, there are significant differences in the structure and work of platform teams across the industry, leading to myths about their roles. Some common myths include:
- Mobile Platform Teams Only Deal with Technical Issues: Many believe that platform teams only work on technical issues, optimizing metrics here and there, and some even believe that they select their own work at will. The reality is quite different; platform teams also work on non-technical issues like prioritizing what to work on and looking at different key indicators to see the business impact of what they should be working on. For example, if the conversion of customers on a core flow is decreasing due to an ANR, they might prioritize it over tackling battery drain.
- Mobile Platform Teams don't work with Product Teams: The platform team is not an engineering army; they collaborate with product managers and analysts. Product teams seek the help of platform teams to monitor the health of their features and avoid roadblocks like crashes, ANRs, or performance degradation. Product teams and analysts give useful input on what they should be prioritizing.
- Mobile Platform Teams are solely responsible for fixing issues: It's a common misconception that platform teams are solely accountable for solving every issue, which is not true. For example, after improving app launch time, platform teams would also expect other teams to debug in case their features cause regressions.
- Mobile Platform Teams solve only for Performance: Although performance is one of the most challenging areas and requires more insights, platform teams work on other areas as well. These can include enabling localization, managing experiments for all apps, modularization, and more.
Roles for Platform Teams
For every issue, there are roughly four roles that platform teams perform:
Now that we have seen these roles, let's discuss them and see with examples how this fits with the team working on reducing cold start.
Role I: Making issues deterministic
The majority of issues that platform teams work on are indeterministic, in contrast to feature development. To make things clearer, platform engineers have to narrow down their paths incrementally. This is done by root cause analysis of the impacted numbers and by creating monitoring systems. As an example, if you are working on reducing cold start time, you need a clear pathway forward, which you can achieve by:
- Monitoring impacted cold start sessions from production
- Tracking the right metrics for cold start and iterating on the metrics when needed for example logging the first screen drawn, total time, and user ids.
- Logging traces from the impacted users who are facing cold start greater than 5 seconds. (impacted session)
Performing these steps platform team ensures that you can clearly define your action items.
Role II: Buying out time from product
Since the majority of issues faced by platform teams are indeterministic, it becomes challenging to provide a deadline to the product team. There is a risk of selecting the wrong metrics during the investigation, leading to an incorrect path for resolving the issue. For example, if debugging cold start is done through debug apps, it is likely to result in an inaccurate path. Thus, transparency is vital in explaining how the issue will be addressed.
Buying time for resolving issues is only possible when there is a clear understanding of how to scope out the work. Roughly solving these issues involves three stages, and it is essential to estimate the work required based on known factors and to keep a buffer for unknowns that might arise.
- Chasing known metrics for the cold start: The initial metrics to track and analyze to determine a path forward. For visibility on cold start, this includes logging impacted sessions with the first screen-drawn distribution, total time taken, and session id.
- Attempting a fix: Based on the data collected from production, the team can narrow down where to make fixes. For example, optimizing the creation of the Deeplink screen if the majority of impacted first screen-drawn events are from this screen. However, not all attempts to fix cold start will work, and the team may need to iterate on metrics and fixes.
- Quality gate for the future: This stage involves performing checks at different levels, such as nightly checks or PRs, to ensure that cold start does not regress for the app and is locked on a known baseline. Additionally, this stage aims to make cold start easy to debug for other developers and provide focus to platform teams without having concerns about what is already fixed.
It is crucial to keep a buffer for each stage since there may be many unknown factors, such as unsuccessful attempts at fixes or the need to expand the list of metrics to track.
Role III: Framing OKRs for platform issues
OKRs are a good methodology for tracking the progress and productivity of teams. One common pitfall I have experienced with platform teams is setting overly optimistic goals. For example, around the cold start issue, one could set an objective of reducing cold start by 70%, which could be challenging to achieve when all the issues are not known yet.
This could create the impression among the product team that the engineers on the platform team are not performing well when the real issue is how the OKRs are framed.
Instead of committing to an overly optimistic improvement percentage, a better approach is to track the number of successful attempts to fix the issues and avoid using too many technical terms in framing the OKRs. For example:
Objective: Fix app launch time for "x" consumer app
KR 1: Improve observability for the following metrics:
- Total launch time
- First screen names
- User IDs
KR 2: Achieve 30% accuracy in total attempts to fix app launch time
KR 3: Establish a quality gate and reach X% confidence in detecting regressions.
Role IV: Translate impact to business
One way to demonstrate the value of platform teams is by showing how their technical work translates to tangible business impact. This is important because their tasks often involve technical and engineering aspects that may not be immediately apparent to product teams in terms of how they affect the business.
For instance, if we propose an objective to reduce frame drops by 30%, product teams may not fully understand the business impact of this. However, if we associate this objective with a critical flow such as the Add to Cart flow, and demonstrate how dropped frames affect the user experience and subsequently, the business metrics such as cart abandonment rates, then the business implications become much clearer. By establishing this link, the platform team's contributions can be better appreciated by the product and business teams.
In conclusion, the role of a platform team is crucial in ensuring the stability and success of a product. By busting these common myths, I hope this has shed some light on the importance of platform teams and their contributions toward achieving business goals.
If you enjoyed this article, follow me on Twitter for more content like this.