FINRA: Implementing Self-Service Analytics & Cloud Scalability

To protect investors and market integrity, FINRA uses Dataiku to swiftly analyze vast market data, preventing misconduct and fostering innovation while saving costs.

18 TB

(terabytes) of data made easily accessible to analysts


risk assessments facilitated by models in < 1 year


web applications in operation daily


The Financial Industry Regulatory Authority (FINRA) is a not-for-profit organization authorized by the U.S. Congress to protect investors and ensure market integrity through effective and efficient regulation of broker-dealers. FINRA writes and enforces rules governing the activities of more than 3,400 broker-dealers representing more than 630,000 brokers, examines firms for compliance, fosters market transparency, and educates investors.

Every day, FINRA oversees up to 600 billion market events, including equities, options, and fixed income products in the U.S. Petabytes of historical market data need to be analyzed to uncover insider trading and other strategies used to gain an unfair advantage. This constitutes a significant volume of data that requires considerable computing power for effective analysis.

FINRA’s mission is to protect investors and promote market integrity. To achieve this, it’s important to quickly respond to market events, especially in today’s volatile market. This means analyzing hundreds of billions of market event data points across hundreds of sources efficiently. 

FINRA must leverage interactive data analytical solutions with close collaboration across examination, surveillance, enforcement, and technology functions to respond to potentially harmful market events. Their ultimate goal is to have an integrated data platform that avoids duplication, translates data into real results, and supports coordinated contributions from stakeholders seamlessly.

Enterprise-Wide Self-Service Analytics

FINRA chose Dataiku to support their data vision — enabling and empowering users to derive better insights from large volumes of data so that they can identify and stop bad actors faster. Dataiku provides FINRA with a single, shared platform for analytics. 

This strategic choice to use Dataiku’s shared hub allows for better insights from large data volumes, enabling faster, more accurate identification and prevention of market misconduct. Diverse user personas, including Excel users, SQL users, data scientists, data engineers, business users, and executives, utilize Dataiku’s capabilities within the integrated end-to-end platform.

With Spark & Dataiku, the Sky is the Limit

Thanks to Dataiku, it’s easy to analyze disparate data and datasets of various sizes. Between Dataiku’s architecture and Spark’s scalability, FINRA essentially has no data size limits. 

In fact, Dataiku enabled a new type of data analysis paradigm at FINRA, the End User Computing applications (EUCA). The EUCA is an ability for knowledge workers without technical skills to access and analyze the data regardless of its size (petabytes!) and the complexity of the analysis (Spark transformations, stats, ML, etc.). 

FINRA can build the EUCAs using the following three Dataiku capabilities: 

  • No-code environment and visual recipes: Any non-tech individual will find Dataiku’s visual recipes intuitive and can pull insights from the data very quickly. 
  • Web-based applications: Dataiku provides non-coders and data scientists with the ability to develop web-based applications using just a few mouse clicks or a short Python program.  What could take months to develop can now be done in days. 
  • Custom Dash-based apps: Dataiku’s flexible WebApp features have enabled FINRA to create custom Dash-based apps that combine large datasets of pre-processed market data with user-generated inputs to assess risks at member firms. The flexibility of dash-based WebApps have allowed small teams of data scientists to partner with business SMEs to produce prototype tools in a fraction of the time it would take to develop in traditional development platforms.

FINRA has found that the EUCAs are the fastest and most scalable way to perform ad hoc analyses. Moreover, it’s perfect for trying new ideas to usher in innovation. 

Tangible Impact

Dataiku-driven self-service analytics revolutionizes decision-making at FINRA. It slashes time-to-market from months to hours, enabling rapid prototyping and testing. Analysts access over 18 terabytes of data, fostering innovation beyond Spark pipelines to user-friendly applications. Efficiency skyrockets as projects multiply, saving millions in costs. 

Plus, Dataiku’s robust features drive FINRA’s data-driven culture, empowering less technical users with the visual recipes and user-friendly interfaces mentioned above. The scalability ensures seamless analysis of large datasets amid market volatility. Insights are democratized, fostering collaboration and satisfaction. Automation streamlines workflows, saving time and reducing risk, and this dynamic transformation delivers millions in value annually, propelling FINRA’s mission forward.

There’s More: Self-Service Cloud Scalability

FINRA’s use of Dataiku goes even further. Due to FINRA’s massive data volume, a predefined compute cluster approach does not meet the full scope of capacity needs. With these limitations, jobs can become backlogged, causing considerable analysis delays, compromising swift action, and hampering user experience. In addition, users would be dependent on the platform administration team for tailored cluster configurations.

To navigate these challenges, FINRA adopted a more self-sufficient, automated, and team managed strategy for users to adapt computing power to the task at hand, without being limited by the computational power of their laptops or the pre-defined clusters offered by administrators. In this evolved system, each team is empowered with the responsibility of launching, upkeep, and cost management of their respective clusters. This shared responsibility ensures bespoke cluster configurations, fostering rapid and efficient data analysis.

Dataiku + Kubernetes + AWS = A Computational Powerhouse

FINRA developed a set of macros called Node Launcher in Dataiku based on Kubernetes. This capability allows users to define the bounds and limits of computational power for each project, thereby eliminating the constraints imposed by computing guidelines.

However, with the increase in user adoption, the need for implementing guardrails became evident. These include:

  • Provisioning of Clusters: To ensure proper usage, Node launcher has authorization system in place, allowing teams to provision and use their designated clusters, based on their project and group membership.
  • Self-Service Capability: Using Node Launcher, users can create node groups for their needs and launch compute environments with the instance type, leverage multiple node groups for their diverse workloads and all these node groups are scaled up and down automatically without any manual interventions.
  • Performance Tuning: With numerous clusters running on cloud providers, there was a need for extensive scaling and tuning, primarily focusing on optimizing startup time, networking improvements, and Spark configurations.
  • Security: Specific security improvements using S3 connections were implemented to ensure secure data access based on group membership.
  • Cost Optimization Measures: FINRA leverages AWS spot instances for non-critical workloads, which are only 10% of the cost of EC2 and can be terminated at any time. They have developed a Cost dashboard and Usage dashboard, allowing users to understand the cost breakdown across projects, Dataiku stacks (e.g., design, production), and users. This not only helps users manage their costs but also aids development teams in analyzing their activity. To ensure optimal usage, cost clinics are conducted each month, providing diagnostics of each cost and recommendations for optimization. A checklist of five to 10 recommendations is also provided to ensure that a project is optimized.
  • Resource Utilization Metrics: FINRA built tools to analyze resource utilization and performance, which assist with overall cluster management. These are crucial in ensuring that the analysis process is efficient and effective, despite the large volumes of data involved

Value, Value, and More Value

The architecture developed by FINRA boasts remarkable scalability, handling petabytes of source data and over 18 terabytes of analyzed data, incorporating eight million data objects created via Dataiku. On any given day, FINRA maintains 500+ EC2 node clusters, executes 400+ jobs, and operates 200+ web applications. DataOps plays a pivotal role, ensuring efficiency and accuracy. 

Through Dataiku, FINRA has deployed 12 prototype risk models as webapps, engaging 180 explorer users daily. Over the past 10 months, these models facilitated over 20,000 risk assessments, streamlining the analysis process. Additionally, the architecture fosters self-service analytics, empowering analysts to utilize cloud elasticity for diverse tasks, thereby enhancing accessibility and efficiency. Furthermore, through custom webapps on the Dataiku platform, Explorer users engage with complex analytics seamlessly, democratizing insights and facilitating swift decision-making.

BMO: Revolutionizing Client Interaction With AI

BMO's AI-driven solution to analyze client calls is powered by Dataiku and has enhanced customer engagement and operational efficiency, earning global recognition.

Read more

Davivienda: AI for Quality Operations & Financial Inclusivity

Davivienda uses Dataiku to power data and AI projects across the business, from providing product recommendations for low-income clients to collections strategy optimization.

Learn More

BNP: Integrating AI Into ESG Scenario Analysis

From data management and preparation to collaboration and model creation, Dataiku helps BNP increase speed of delivery and overall efficiency.

Learn More

Frende Forsikring: Using NLP to Automate Claims Reporting

An initial BERT model trained on 10,000 emails in Dataiku is now being used to distribute all emails in the claim center.

Learn More

Standard Chartered Bank: Driving Business Outcomes With Data

Across all areas of the bank, Standard Chartered is accelerating the development of AI solutions, creating a culture of decision making driven by analytics and unlocking the value of data to power better business outcomes.

Learn More