Optimizely Flags
Scale Experimentation & Feature Management
Design Lead Product Design User Experience
- The Challenge
- The Strategy
- The Process
- The Results
Optimizely had been the undisputed leader in A/B testing and experimentation for years, but had arrived at a crossroads with scalability. The barrier for entry into experimentation was high, the scalability with current tools was hampered by outdated approaches and tech debt, and necessary features were too advanced or difficult to use for most of the users that needed them. Customers simply created too much tech debt too quickly using the products.
The design strategy focused on creating a seamless and intuitive user experience that would delight and engage users, ultimately leading to increased retention and growth. Historically, we knew scaling an experimentation program was a challenge in spite of its clear ROI and we had to get clarity on the stumbling blocks users were facing along the way. Throughout this collaboration and research, we settled on four primary goals:
- Minimize time from sign up to first value (What’s the quickest way our users can see value?)
- Enable high-volume users to scale usage (How does the product scale for enterprise needs?)
- Simplify overall UI and underlying user-facing concepts (How can we simplify how tasks are accomplished and streamline user flows?)
- Enable smooth collaboration between developers and non-developers
Our secondary goals were:
- Information architecture: How do we clarify this for the user?
- Experimentation lifecycle: How is this represented in the product?
- Delight: What form does this take and how do we measure it?
Our goal was to understand user pain points, identify opportunities for improvement, and leverage innovative design solutions to address them. The strategy also included incorporating user feedback loops and data-driven insights to iteratively refine the design.
Dubbed "Project Ozone" (or O3, the third major generation of experimentation products), the design team embarked on a comprehensive research phase to understand user needs, pain points, and industry trends. They conducted interviews, surveys, and observations to gather insights directly from users and internal stakeholders. This research helped identify key areas for improvement and informed subsequent design sprints.
To focus our research efforts during the sprint, we started at the top of the funnel for three sizes of orgsonas (startup, SMB, enterprise) and four key user personas to represent the needs of our users and prospects. These began with user interviews and contextual inquiry to establish context. We also interviewed internal stakeholders to get additional feedback on customer pain points.
Taking actual responses and feedback, we shaped user journeys for each of the user personas. A developer evaluating Optimizely's products would naturally approach it differently that a product manager or marketing manager. and understanding the key differences formed the basis for each of the user journeys and help create the "steel thread" for that persona with the least amount of friction. These journey's are future focused and represent the idea approach to solving that user's problems regardless of constraints. Additionally, typical customer journeys were established for common orgsona use cases such as a bank running an experiment to get their customers to open a new account.
What emerged from this process was a clear signal of shortcomings how product teams didn't collaborate as efficiently as they should, how customers build up tech debt quickly, and how transitioning from the Optimizely Web product (geared more towards less tech-savvy users) to Optimizely Full Stack (a more powerful product) was a painful step. In theory, a mashup of both products seemed the most successful option and biggest innovation leap forward.
We conducted several session using divergent explorations and activities with design, engineering and product—no idea too crazy to consider. This helped to free up our thinking to consider solutions we hadn't previously considered. The next phase was convergence to align our explorations and ground them more in the overall strategy and timeframe.
Large innovation requires a shared mind shift and the key to this one was thinking about the feature flag as something that wasn't just used for feature management or a toggle, but to treat is as a decision point in the application. Taking that decision point and enabling ways to perform other actions on it was the breakthrough. So not only could the user toggle features, but they could have the utmost flexibility to run A/B experiments, multivariate tests, and multi-armed bandits all at that decision point in their application. Set up correctly, the tech debt which customers accumulated over time from running experiments could be greatly reduced because the winning variation, for example, could be made into a simple variable or feature toggle, thereby removing the experiment overhead.
A number of other improvements also came out of the sprint process:
- Improvements to navigation and menu system: some of the ideas required modifications to the hierarchy to accomodate new concepts and increase speed to finding most commonly access features.
- Improvements to the user permission model: Abstracting system entities such as projects, metrics, and audiences became a necessity to enable the flexibility users needed.
- Creating a taxonomy system: one of the user pain points was a need to associate system entities regardless of the current structure of projects and teams, which meant design a solution for scalable tag management.
- Unifying Help Resources: another point of user frustration was how easily they found the right help resource that solved their problem and creating a help toolkit provided a few ways to onboard and guide users to what they needed faster.
Once we starting creating prototypes, we tested this in rotation with users to validate our approaches, then refined and retested until we had solid user experience flows that product, design and engineering all aligning on building. Based on our initial strategy, we focused first on creating the new "Full Stack 2.0" followed by the other features.
Innovating how feature flags and experimentation have typically been done for years wasn't a small task and implementing the approaches was done in quarterly iterations behind a feature flag. Key customers were were given the opportunity to updater to the new version after the second quarter release. The feedback was overwhelmingly positive (increase NPS by almost 2 points) and the team received a lot of helpful feedback to streamline the features even further in the next releases.
More information: