![cloudera apache lucene cloudera apache lucene](https://raw.githubusercontent.com/cloudera/hue/master/docs/images/sql-editor.png)
Falcon, a Hortonworks-driven project which started at InMobi, will operate in concert with Atlas – the two share many committers.SQL is one of the major competitive battlegrounds – although MapR, for example, also supports Cloudera’s Impala, don’t expect to see any of the others support multiples anytime soon. MapR is pushing it the notion of “schema on read” with Drill – some observers (this one included) prefer to think of this as “SQL on first read,” because anything useful is likely to be persisted and its schema saved. Drill uses a “shredded, in-memory, columnar data representation,” vectorization and pipelining, and its data integration capabilities differentiate it from some of its competitors. It provides a SQL interface that includes interactive analysis for many data formats – Amazon S3, Azure Blob Storage, MapR-FS, NAS and local files, as well as other Hadoop and non-Hadoop formats including Parquet, AVRO, JSON, XML, HBase and MapR-DB, HDFS, MongoDB, Google Cloud Storage, and Swift. Drill, based on Google’s Dremel (also a basis for Google BigQuery) is MapR’s entry in the SQL-on-Hadoop contest.But everyone has their own approach to this, and convergence (and adoption by others) does not appear to be on the horizon. I haven’t seen any evidence that it’s a significant differentiator driving distributor selection. It’s considered good for tasks such as joining and aggregation over data types that are “not very relational” such as HBase, time series, and serialized object formats like Avro. Crunch, supported by Cloudera, is a framework for writing, testing, and running MapReduce pipelines – including UDFs – as an alternative to Pig, and supporting Spark.If it accomplishes its goals, it’s likely to be supported by several distributors. This is a dramatic upending of current architectures, providing an implementation of relational algebra with transformation rules, a cost model, and metadata, that other projects can send work to. Calcite evaluates SQL and builds optimized, efficient plans, essentially deconstructing the traditional RDBMS model, and providing support for multiple back ends including Hive-on-Tez, Drill (both base their optimizers on it), MongoDB, Splunk, Spark, and JDBC data sources. Calcite, formerly Optiq, is also driven by Hortonworks, who brought in its creator, Julian Hyde, to support his work.
![cloudera apache lucene cloudera apache lucene](https://www.zoomdata.com/sites/default/files/styles/body_ipad/public/zoomdata_on_apache_solr.png)
![cloudera apache lucene cloudera apache lucene](https://profile.alumnius.net/161866117.jpg)
Today only Hortonworks lists it as supported – thought there is really not much there to use yet. Atlas is in a very early stage now, and will compete with Cloudera’s Navigator (first announced in 2013, and available as a priced option) when it begins to ship.
CLOUDERA APACHE LUCENE MANUAL
Atlas is a governance framework project being developed by a Hortonworks-led consortium of users and vendors – Aetna, JPMC, Merck, SAS, Schlumberger, Target and others – targeting metadata (for classification), data audit, lifecycle management, search and lineage, with a security and policy engine building on Ranger (which is supported by 2 distributors: Hortonworks and Pivotal.) It’s designed to be installed and managed by Ambari (which is supported by three: the ODP members Hortonworks, IBM and Pivotal) but manual installation is also possible.In Now, What is Hadoop? And What’s Supported? I list 10 projects in the open source community (though not all Apache projects) supported by only one: Atlas, Calcite, Crunch, Drill, Falcon, Kite, LLAMA, Lucene, Phoenix and Presto. The challenges for the Hadoop user are twofold: trying to decide which projects might be useful in big data-related cases, and determining which are supported by commercial distributors.
CLOUDERA APACHE LUCENE SOFTWARE
The Apache Software Foundation has succeeded admirably in becoming a place where new software ideas are developed: today there are over 350 projects underway.