Skip to main content

28 posts tagged with "real-time data platform"

View All Tags

Text analytics on LinkedIn Talent Insights using Apache Pinot

· One min read
LinkedIn
LinkedIn Engineering Team

LinkedIn Talent Insights (LTI) is a platform that helps organizations understand the external labor market and their internal workforce, and enables the long term success of their employees. Users of LTI have the flexibility to construct searches using the various facets of the LinkedIn Economic Graph (skills, titles, location, company, etc.).

Read More at https://engineering.linkedin.com/blog/2021/text-analytics-on-linkedin-talent-insights-using-apache-pinot

Text analytics on LinkedIn Talent Insights using Apache Pinot

Introduction to Geospatial Queries in Apache Pinot

· One min read
Kenny Bastani
Kenny Bastani

Geospatial data has been widely used across the industry, spanning multiple verticals, such as ride-sharing and delivery, transportation infrastructure, defense and intel, public health. Deriving insights from timely and accurate geospatial data could enable mission-critical use cases in the organizations and fuel a vibrant marketplace across the industry. In the design document for this new Pinot feature, we discuss the challenges of analyzing geospatial at scale and propose the geospatial support in Pinot.

Read More at https://medium.com/apache-pinot-developer-blog/introduction-to-geospatial-queries-in-apache-pinot-b63e2362e2a9

Introduction to Geospatial Queries in Apache Pinot

Automating Merchant Live Monitoring with Real-Time Analytics - Charon

· One min read
Uber
Uber Data Team

At Uber, live monitoring and automation of Ops is critical to preserve marketplace health, maintain reliability, and gain efficiency in markets. By the virtue of the word “live”, this monitoring needs to show what is happening now, with prompt access to fresh data, and the ability to recommend appropriate actions based on that data. Uber’s data platform provides the self-serve tools which empower the Ops teams to build their own live monitoring tools, and support their regional teams by building rich solutions.

For this project, the requirement was to provide merchant level monitoring and handle the edge cases which remain unaddressed by the sophisticated internal marketplace management tools. We used a variety of Uber’s real-time data platform components to build a tool called Charon to reduce impact of poor marketplace reliability on the merchants.

Read More at https://eng.uber.com/charon/

Operating Apache Pinot at Uber Scale

Solving for the cardinality of set intersection at scale with Pinot and Theta Sketches

· One min read
LinkedIn
LinkedIn Engineering Team

The Lambda architecture has become a popular architectural style that promises both speed and accuracy in data processing by using a hybrid approach of both batch processing and stream processing methods.

Read More at https://engineering.linkedin.com/blog/2021/pinot-and-theta-sketches

From Lambda to Lambda-less Lessons learned

Introduction to Upserts in Apache Pinot

· One min read
Kenny Bastani
Kenny Bastani

Since the 0.6.0 release of Apache Pinot, a new feature was made available for stream ingestion that allows you to upsert events from an immutable log. Typically, upsert is a term used to describe inserting a record into a database if it does not already exist or update it if it does exist. In Apache Pinot’s case, upsert isn’t precisely the same concept, and I wanted to write this blog post to explain why it’s exciting and how you can start using it.

Read More at https://medium.com/apache-pinot-developer-blog/introduction-to-upserts-in-apache-pinot-987c12149d93

Introduction to Upserts in Apache Pinot

Real-time Analytics with Presto and Apache Pinot

· One min read
PinotDev
Pinot Editorial Team

In this world, most analytics products either focus on ad-hoc analytics, which requires query flexibility without guaranteed latency, or low latency analytics with limited query capability. In this blog, we will explore how to get the best of both worlds using Apache Pinot and Presto.

Read Part 1 at https://www.startree.ai/blogs/real-time-analytics-with-presto-and-apache-pinot-part-i/

Read Part 2 at https://www.startree.ai/blogs/real-time-analytics-with-presto-and-apache-pinot-part-ii/

Real-time Analytics with Presto and Apache Pinot

Change Data Analysis with Debezium and Apache Pinot

· One min read
Kenny Bastani
Kenny Bastani

In this blog post, we’re going to explore an exciting new world of real-time analytics based on combining the popular CDC tool, Debezium, with the real-time OLAP datastore, Apache Pinot.

Read More at https://medium.com/apache-pinot-developer-blog/change-data-analysis-with-debezium-and-apache-pinot-b4093dc178a7

Change Data Analysis with Debezium and Apache Pinot

Operating Apache Pinot at Uber Scale

· One min read
Uber
Uber Data Team

Uber has a complex marketplace consisting of riders, drivers, eaters, restaurants and so on. Operating that marketplace at a global scale requires real-time intelligence and decision making. For instance, identifying delayed Uber Eats orders or abandoned carts helps to enable our community operations team to take corrective action. Having a real-time dashboard of different events such as consumer demand, driver availability, or trips happening in a city is crucial for day-to-day operation, incident triaging, and financial intelligence.

Read More at https://eng.uber.com/operating-apache-pinot/

Operating Apache Pinot at Uber Scale

Deep Analysis of Russian Twitter Trolls

· One min read
Kenny Bastani
Kenny Bastani

The history behind Russian disinformation is a dense and continuously evolving subject. The world’s best research hasn’t seemed to hit the mainstream yet, which made this an excellent opportunity to see if I could use some open source tooling to surface new analytical evidence.

In this blog post, I’ll show you how to use Apache Pinot and Superset to analyze 3 million tweets by the Internet Research Agency (IRA) open-sourced by FiveThirtyEight.

Read More at https://towardsdatascience.com/a-deep-analysis-of-russian-trolls-with-apache-pinot-and-superset-590c8c4d1843

Deep Analysis of Russian Twitter Trolls

Leverage Plugins to Ingest Parquet Files from S3 in Pinot

· One min read
PinotDev
Pinot Editorial Team

One of the primary advantages of using Pinot is its pluggable architecture. The plugins make it easy to add support for any third-party system which can be an execution framework, a filesystem, or input format.

In this tutorial, we will use three such plugins to easily ingest data and push it to our Pinot cluster. The plugins we will be using are -

  • pinot-batch-ingestion-spark
  • pinot-s3
  • pinot-parquet

Read more at https://medium.com/apache-pinot-developer-blog/leverage-plugins-to-ingest-parquet-files-from-s3-in-pinot-decb12e4d09d

Leverage Plugins to Ingest Parquet Files from S3 in Pinot

Monitoring Apache Pinot with JMX, Prometheus and Grafana

· One min read
PinotDev
Pinot Editorial Team

I may be kicking open doors here, but a simple question has always helped me start from somewhere. When it comes to investigating degraded user experience caused by latency, can I observe high resource usage on all or some nodes of the system?

Read more at https://medium.com/apache-pinot-developer-blog/monitoring-apache-pinot-99034050c1a5

Monitoring Apache Pinot with JMX, Prometheus and Grafana

Achieving 99th percentile latency SLA using Apache Pinot

· One min read
PinotDev
Pinot Editorial Team

In this article, we talk about how users can build critical site-facing analytical applications requiring high throughput and strict p99th query latency SLA using Apache Pinot.

Read more at https://medium.com/apache-pinot-developer-blog/achieving-99th-percentile-latency-sla-using-apache-pinot-2ba4ce1d9eff

Achieving 99th percentile latency SLA using Apache Pinot

Utilize UDFs to Supercharge Queries in Apache Pinot

· One min read
PinotDev
Pinot Editorial Team

Apache Pinot is a realtime distributed OLAP datastore that can answer hundreds of thousands of queries with millisecond latencies. You can head over to https://pinot.apache.org/ to get started with Apache Pinot.

While using any database, we can come across a scenario where a function required for the query is not supported out of the box. In such time, we have to resort to raising a pull request for a new function or finding a tedious workaround.

Scalar Functions that allow users to write and add their functions as a plugin.

Read more at https://medium.com/apache-pinot-developer-blog/utilize-udfs-to-supercharge-queries-in-apache-pinot-e488a0f164f1

Utilize UDFs to Supercharge Queries in Apache Pinot

Building a culture around metrics and anomaly detection

· One min read
Kenny Bastani
Kenny Bastani

Anomaly detection is a very broad term. Usually it means that you want to see if things are running as usual. This could go from your business metrics down to the lowest level of how your systems are running. Anomaly detection is an entire process. It’s not just a tool that you get out of the box that measures time series data. Similar to DevOps, anomaly detection is a culture of different roles engaging in a process that combines tooling with human analysis.

Read More at https://medium.com/apache-pinot-developer-blog/building-a-culture-around-metrics-and-anomaly-detection-da740960fcc2

Building a culture around metrics and anomaly detection

Moving developers up the stack with Apache Pinot

· One min read
Kenny Bastani
Kenny Bastani

Once upon a time, an internet company named LinkedIn faced the challenge of having petabytes of connected data with no way to analyze it in real-time. As this was a problem that was the first of its kind, there was only one solution. The company put together a talented team of engineers and tasked them with building the right tool for the job. Today, that tool goes by the name of Apache Pinot.

Read More at https://medium.com/apache-pinot-developer-blog/moving-developers-up-the-stack-with-apache-pinot-29d36717a3f4

Moving developers up the stack with Apache Pinot

Monitoring business performance data with ThirdEye smart alerts

· One min read
LinkedIn
LinkedIn Engineering Team

Explain how ThirdEye smart alerts and automated dashboards helped the LinkedIn Premium business operations team monitor key metrics—such as new free trial signups—for the timely detection of outliers in business performance data.

Read More at https://engineering.linkedin.com/blog/2020/monitoring-business-performance-data-with-thirdeye-smart-alerts

Monitoring business performance data with ThirdEye smart alerts

Using Apache Pinot and Kafka to Analyze GitHub Events

· One min read
Kenny Bastani
Kenny Bastani

In this blog post, we’ll show you how Pinot and Kafka can be used together to ingest, query, and visualize event streams sourced from the public GitHub API. For the step-by-step instructions, please visit our documentation, which will guide you through the specifics of running this example in your development environment.

Read More at https://medium.com/apache-pinot-developer-blog/using-apache-pinot-and-kafka-to-analyze-github-events-93cdcb57d5f7

Using Apache Pinot and Kafka to Analyze GitHub Events

Engineering SQL Support on Apache Pinot at Uber

· One min read
Uber
Uber Data Team

Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform.

To resolve these issues, we built a solution that linked Presto, a query engine that supports full ANSI SQL, and Pinot, a real-time OLAP (online analytical processing) datastore. This married solution allows users to write ad-hoc SQL queries, empowering teams to unlock significant analysis capabilities.

Read More at https://eng.uber.com/engineering-sql-support-on-apache-pinot/

SQL Support on Apache Pinot at Uber

Auto-tuning Pinot real-time consumption

· One min read
LinkedIn
LinkedIn Engineering Team

Focus on Auto tuning Pinot, a scalable distributed columnar OLAP data store developed at LinkedIn, delivers real-time analytics for site-facing use cases such as LinkedIn's Who viewed my profile, Talent insights, and more.

Read More at https://engineering.linkedin.com/blog/2020/bridging-batch-and-stream-processing

Bridging batch and stream processing for the Recruiter usage statistics dashboard

Introducing ThirdEye - LinkedIn’s Business-Wide Monitoring Platform

· One min read
LinkedIn
LinkedIn Engineering Team

ThirdEye is a comprehensive platform for real-time monitoring of metrics that covers a wide variety of use-cases. LinkedIn relies on ThirdEye to monitor site performance, track member growth, understand adoption of new features, flag sustained attempts to circumvent system security, and many other areas

Read More at https://engineering.linkedin.com/blog/2019/01/introducing-thirdeye--linkedins-business-wide-monitoring-platfor

Star-tree index - Powering fast aggregations on Pinot

Engineering Restaurant Manager - UberEATS Analytics Dashboard

· One min read
Uber
Uber Data Team

At Uber, we use data analytics to architect more magical user experiences across our products. Whenever possible, we harness these data engineering capabilities to empower our partners to better serve their customers. For instance, in late 2016, the UberEATS engineering team built a comprehensive analytics dashboard that provides restaurant partners with additional insights about the health of their business.

Read More at https://eng.uber.com/restaurant-manager/

Engineering Restaurant Manager - UberEATS Analytics Dashboard

A Brief History of Scaling LinkedIn

· One min read
LinkedIn
LinkedIn Engineering Team

LinkedIn started in 2003 with the goal of connecting to your network for better job opportunities. It had only 2,700 members the first week. Fast forward many years, and LinkedIn’s product portfolio, member base, and server load has grown tremendously.

Today, LinkedIn operates globally with more than 350 million members. We serve tens of thousands of web pages every second of every day. We've hit our mobile moment where mobile accounts for more than 50 percent of all global traffic. All those requests are fetching data from our backend systems, which in turn handle millions of queries per second.

Read More at https://engineering.linkedin.com/architecture/brief-history-scaling-linkedin

A Brief History of Scaling LinkedIn