insideBIGDATA Guide to Big Data for Finance (Part 2)

Print Friendly, PDF & Email

This insideBIGDATA technology guide co-sponsored by Dell Technologies and AMD, insideBIGDATA Guide to Big Data for Finance, provides direction for enterprise thought leaders on ways of leveraging big data technologies in support of analytics proficiencies designed to work more independently and effectively across a few distinct areas in today’s financial service institutions (FSI) climate.

Regulatory and Compliance

It is important for banks, investment firms, and other financial services organizations to be able to collect and analyze this information in order to accurately assess risk and determine market trends. This became apparent during the market downturn of 2007-2008, when banks and brokerage houses scrambled to understand the implications of massive capital leverage and their ability to model and refine liquidity management.

A single bank might capture internal transactions exceeding two billion per month, in addition to collecting public data of over a billion monthly transactions. These tremendous transaction volumes have made it nearly impossible to create models that take into account multi-year data sets using detailed data.

Regulatory Big Data. Source: Moody’s Analytics

Financial firms manage anywhere from tens to thousands of petabytes of data, yet most systems used today build models using only samples as small as 100 gigabytes. Relying on data samples requires aggregations and assumptions, resulting in inaccuracies in projections, limited visibility into actual risk exposure, instances of undetected fraud, and poorer performance in the market. As result of more rigorous regulatory compliance laws, the financial services industry has had to store an increasing amount of historical data. New technology tools and strategies are needed to address these demands.

Hadoop represents a good path for financial sector firms to adopt big data. With Hadoop, firms have access to a powerful platform providing both highly scalable and low cost data storage tightly integrated with scalable processing. Financial firms are now able to tackle increasingly complex problems by unlocking the power of their data. The capability to understand and act upon their data opens the door to a richer and more robust financial ecosystem.

Spark is an open-source data analytics cluster computing framework built on top of HDFS. Spark serves as evidence of the continuing evolution within the Hadoop community—away from being a batch processing framework tied to the two-stage MapReduce paradigm to a more advanced in- memory, real-time platform. Now, FSIs can better serve their customers, understand their risk exposure and reduce incidents of fraud.


Dell Technologies has invested to create a portfolio of Ready Solutions designed to simplify the configuration, deployment and management of Hadoop clusters. These trusted designs have been optimized, tested and tuned for a variety of key Hadoop use cases. They include the servers, storage, networking, software and services that have been proven in our labs and in customer deployments to meet workload requirements and customer outcomes.

The modular solution building blocks provide a customized yet validated approach for deploying new clusters and scaling or upgrading existing environments. Ready Solutions for Hadoop have been jointly engineered to optimize investments, reduce costs and deliver outstanding performance.

Algorithmic Trading

In the digital economy, data—and the IT solutions used to harness it—are often a financial services company’s prime source of competitive advantage, as more automated the process, the faster the time to value. This is especially true for algorithmic trading, a highly automated investment process where humans train powerful software applications to select investments and implement trades automatically.

The ultimate evolution of algorithmic trading is high frequency trading, where the algorithms make split second trading decisions designed to maximize financial returns. Automating and removing humans from trading has several advantages, such as reduced costs and greater speed and accuracy.

Developing trading algorithms requires a proprietary mix of data science, statistics, risk analysis and DevOps. Then the algorithm is back tested, which involves running it against historical data and refining the algorithm until it produces the desired profits. The algorithm is then put into production, making trades in real time on behalf of the firm. The real world yields produced by the algorithm produce even more data, which is used to continually train the algorithm in the back end and improve its performance. This training feedback loop is a data intensive process.

Source: Analytics Vidhya

More recently, developers have taken up machine learning, a subset of artificial intelligence (AI), to improve predictive capabilities, using deep neural networks to find trends that trigger buy or sell decisions. In addition to automation and intelligence, high frequency trading platforms deliver competitive advantage by placing thousands of trades before the market can react. Therefore, high frequency trading has led to competition in computational speed, automated decision making, and even connectivity to the execution venue to shave off microseconds and beat other traders to opportunities.

What’s more, financial trading firms are continually developing, implementing and perfecting algorithmic trading strategies to stay a step ahead of the competition. This puts significant stress on infrastructure because the algorithm must continuously adapt to new input to remain relevant. As such, the back end infrastructure must accommodate for live data feed and quick processing of large amounts of data. Databases must be able to feed the compute engine in real or near real time to update the algorithm.

The data intensive training requirements and the need for high speed and low latency mean that these sophisticated algorithms are typically trained and run on High-Performance Computing (HPC) systems to provide the rapidity and accuracy required to dominate the market. A HPC system that supports algorithmic trading should be able to accommodate current workloads seamlessly and provide the flexibility, performance and scaling required to continually train and update algorithms to stay ahead of the market.


Dell Technologies has the expertise and experience to design and implement HPC, data analytics and AI solutions optimized for algorithmic trading. This includes considerations for software, services and infrastructure design with complete architectural design examples, such as:

  • Data lake configurations for data ingestion using streaming tools such as Apache® Kafka® and StreamSets® aimed for low latency real time data feed with the Ready Solution for Data Analytics Real Time Data Streaming.
  • Apache Hadoop® with Cloudera® and Greenplum® supported by Dell EMC Ready Solutions for Hadoop.
  • Dell EMC Ready Solutions for Data Analytics with Spark® on Kubernetes and Data Science and Advanced Analytics with VMware Tanzu.

Over the next few weeks we will explore these topics:

  • Introduction, Retail Banking
  • Regulatory and Compliance, Algorithmic Trading
  • Security Considerations, Conclusion, Next Steps with Dell Technologies and AMD

Download the complete insideBIGDATA Guide to Big Data for Finance courtesy of Dell Technologies and AMD. 

Speak Your Mind