Hadoop Architecture

Communication service providers (CSPs) are finding the need to undergo a stack shift to a big data architecture in order to protect themselves from modern, sophisticated revenue threats. After doing extensive research on the successful approaches taken by large, data-driven companies in other industries (including Facebook and Google), we believe that a CSP’s new architecture needs to encompass a shift.

  • A shift from rules to rules and machine learning
  • A shift from batch to real-time analytics
  • A shift from proprietary architectures to Hadoop
  • A shift from data silos to a data lake
  • A shift from flat world analytics to graph analysis

To successfully undertake this stack shift, it is critical to have a deep understanding of the following three areas:

A CSP Data Lake:
First and Foremost, It’s All About the Data

A mobile subscriber uses voice, SMS, data, and over-the-top applications – and may be on their home network or roaming in a foreign country, on a remote network. They also might be prepaid or postpaid, or have been a subscriber for five years or five minutes. They could be a business user or a personal user. In order to protect subscribers from criminals exploiting weaknesses or loopholes in the network to commit fraud or security breaches, providers must connect all this disparate data. 

To do this effectively requires integrating many types of data into a data lake.

Batch Billing Data
TD.35, CDR, TAP 3

Real-Time Call Packets

Real-Time Fixed Call Packets
ISUP, Diameter

Real-Time VoIP Call Packets
SIP, H.323, Diameter

Real-Time Data Packets
GTP, Diameter

Business Data
CRM, Billing

Real-Time Machine Learning

After connecting all relevant data in a data lake, it is then possible to use machine learning to discover anomalous behavior in real time, uncovering revenue threats in a way not previously possible. These may be fraud threats, profit threats, or SLA threats – and these threats may have been seen before, or they might be new threats that are being discovered from a zero-day cyber attack. Sophisticated multi-domain anomaly detection is also able to separate the actions of a loyal customer who has paid their bills on time for five years from a suspicious brand new customer.

Argyle Data’s world-leading data scientists have developed unique approaches using the latest research in machine learning to identify anomalous behavior in real time.

Real-Time Feature Enrichment
As entries are stored, features are created in real-time

Real-Time Anomaly Detection
As entries are stored, anomalies are detected by looking at the new entry and a historic pattern in real time

Real-Time Fraud Alerts
As fraud is detected, revenue threat alerts are sent to the analyst dashboard

Big Data Native Hadoop Architecture

The next generation of data-driven applications will have to be Hadoop native in order to survive against sophisticated, modern attacks. They can’t simply import and export from Hadoop (which is often in batch); they must be entirely built on the new platform, unable to function without it. Data-driven applications will be written by people who are masters of writing applications in new ways using native (BigTable type) Hadoop distributed key/value stores, native Hadoop distributed SQL, and machine learning operating against native Hadoop stores with powerful graph analytics.

Argyle Data’s approach delivers the ingestion rates of a key-value/triple store and the querying power of a distributed SQL database, natively on Hadoop.

Interactive response times at petabyte scale, in parallel across hundreds of nodes

Full Secondary Indexing
Interactive response times at petabyte scale on primary and secondary indexes

Powerful queries and standard integrations to visualization frameworks

Request a Demo

Get Our E-Book