Announcing Apache Pinot 0.10
We are excited to announce the release this week of Apache Pinot 0.10. Apache Pinot is a real-time distributed datastore designed to answer OLAP queries with high throughput and low latency.
This release is cut from commit fd9c58a11ed16d27109baefcee138eea30132ad3. You can find a full list of everything included in the release notes.
Let’s have a look at some of the changes, with the help of the batch QuickStart configuration.
#
Query PlansAmrish Lal implemented the EXPLAIN PLAN
clause, which returns the execution plan that will be chosen by the Pinot Query Engine.
This lets us see what the query is likely to do without actually having to run it.
EXPLAIN PLAN FORSELECT *FROM baseballStatsWHERE league = 'NL'
If we run this query, we'll see the following results:
Operator | Operator_Id | Parent_Id |
---|---|---|
BROKER_REDUCE(limit:10) | 0 | -1 |
COMBINE_SELECT | 1 | 0 |
SELECT(selectList:AtBatting, G_old, baseOnBalls, caughtStealing, doules, groundedIntoDoublePlays, hits, hitsByPitch, homeRuns, intentionalWalks, league, numberOfGames, numberOfGamesAsBatter, playerID, playerName, playerStint, runs, runsBattedIn, sacrificeFlies, sacrificeHits, stolenBases, strikeouts, teamID, tripples, yearID) | 2 | 1 |
TRANSFORM_PASSTHROUGH(AtBatting, G_old, baseOnBalls, caughtStealing, doules, groundedIntoDoublePlays, hits, hitsByPitch, homeRuns, intentionalWalks, league, numberOfGames, numberOfGamesAsBatter, playerID, playerName, playerStint, runs, runsBattedIn, sacrificeFlies, sacrificeHits, stolenBases, strikeouts, teamID, tripples, yearID) | 3 | 2 |
PROJECT(homeRuns, playerStint, groundedIntoDoublePlays, numberOfGames, AtBatting, stolenBases, tripples, hitsByPitch, teamID, numberOfGamesAsBatter, strikeouts, sacrificeFlies, caughtStealing, baseOnBalls, playerName, doules, league, yearID, hits, runsBattedIn, G_old, sacrificeHits, intentionalWalks, runs, playerID) | 4 | 3 |
FILTER_FULL_SCAN(operator:EQ,predicate:league = 'NL') | 5 | 4 |
#
FILTER Clauses for AggregatesAtri Sharma added the filter clause for aggregates. This feature makes it possible to write queries like this:
SELECT SUM(homeRuns) FILTER(WHERE league = 'NL') AS nlHomeRuns, SUM(homeRuns) FILTER(WHERE league = 'AL') AS alHomeRunsFROM baseballStats
If we run this query, we'll see the following output:
nlHomeRuns | alHomeRuns |
---|---|
135486 | 135990 |
#
greatest and leastRichard Startin added the greatest
and least
functions:
SELECT playerID, least(5.0, max(homeRuns)) AS homeRuns, greatest(5.0, max(hits)) AS hitsFROM baseballStatsWHERE league = 'NL' AND teamID = 'SFN'GROUP BY playerIDLIMIT 5
If we run this query, we'll see the following output:
playerID | homeRuns | hits |
---|---|---|
ramirju01 | 0 | 5 |
milneed01 | 4 | 54 |
testani01 | 0 | 5 |
shawbo01 | 0 | 8 |
vogelry01 | 0 | 12 |
#
DistinctCountSmartHLL Xiaotian (Jackie) Jiang added the DistinctCountSmartHLL
aggregation function, which automatically converts the Set to HyperLogLog if the set size grows too big to protect the servers from running out of memory:
SELECT DISTINCTCOUNTSMARTHLL(homeRuns, 'hllLog2m=8;hllConversionThreshold=10')FROM baseballStats
If we run this query, we'll see the following output:
distinctcountsmarthll(homeRuns) |
---|
66 |
#
UI updatesThere were also a bunch of updates to the Pinot Data Explorer, by Sanket Shah and Johan Adami.
The display of reported size and estimated size is now in a human readable format:
Fixes for the following issues:
- Error messages weren't showing on the UI when an invalid operation is attempted:
- Query console goes blank on syntax error.
- Query console cannot show query result when multiple columns have the same name.
- Adding extra fields after
SELECT *
would throw a NullPointerException. - Some queries were returning
--
instead of0
. - Query console couldn't show the query result if multiple columns had the same name.
- Pinot Dashboard tenant view showing the incorrect amount of servers and brokers.
#
RealTimeToOffline TaskXiaotian (Jackie) Jiang made some fixes to the RealTimeToOffline job to handle time gaps and proceed to the next time window when no segment matches the current one.
#
Empty QuickStartKenny Bastani added an empty QuickStart command, which lets you quickly spin up an empty Pinot cluster:
docker run \ -p 8000:8000 \ -p 9000:9000 \ apachepinot/pinot:0.10.0 QuickStart \ -type empty
You can then ingest your own dataset without needing to worry about spinning up each of the Pinot components individually.
#
Data IngestionRichard Startin fixed some issues with real-time ingestion where consumption of messages would stop if a bad batch of messages was consumed from Kafka.
Mohemmad Zaid Khan added the BoundedColumnValue partition function, which partitions segments based on column values.
Xiaobing Li added the fixed name segment generator, which can be used when you want to replace a specific existing segment.
#
Other changes- Richard Startin set LZ4 compression as the default for all metrics fields.
- Mark Needham added the
ST_Within
geospatial function. - Rong Rong fixed a bug where query stats wouldn't show if there was an error processing the query (e.g. if the query timed out).
- Prashant Pandey fixed the query engine to handle extra columns added to a
SELECT *
statement. - Richard Startin added support for forward indexes on JSON columns.
- Rong Rong added the GRPC broker request handler so that data can be streamed back from the server to the broker when processing queries.
- deemoliu made it possible to add a default strategy when using the partial upsert feature.
- Jeff Moszuti added support for the
TIMESTAMP
data type in the configuration recommendation engine.
#
Dependency updatesThe following dependencies were updated:
- async-http-client because the library moved to a different organization.
- RoaringBitmap to 0.9.25
- JsonPath to 2.7.0
- Kafka to 2.8.1
- Prometheus to 0.16.1
#
ResourcesIf you want to try out Apache Pinot, the following resources will help you get started:
- Download page: https://pinot.apache.org/download/
- Getting started: https://docs.pinot.apache.org/getting-started
- Apache Pinot Recipes: https://dev.startree.ai/docs/pinot/recipes/
- Join our Slack channel: https://communityinviter.com/apps/apache-pinot/apache-pinot
- See our upcoming events: https://www.meetup.com/apache-pinot
- Follow us on Twitter: https://twitter.com/startreedata
- Subscribe to our YouTube channel: https://www.youtube.com/c/StarTree