Friday, April 21, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part III

This is the third in a series of posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. The second post looked at how to use Apache Ranger to authorize access to data stored in HDFS. In this post we will look at how Apache Ranger can create "tag" based authorization policies for HDFS using Apache Atlas. For information on how to create tag-based authorization policies for Apache Kafka, see a post I wrote earlier this year.

The Apache Ranger admin console allows you to create security policies for HDFS by associating a user/group with some permissions (read/write/execute) and a resource, such as a directory or file. This is called a "Resource based policy" in Apache Ranger. An alternative is to use a "Tag based policy", which instead associates the user/group + permissions with a "tag". You can create and manage tags in Apache Atlas, and Apache Ranger supports the ability to imports tags from Apache Atlas via a tagsync service, something we will cover in this post.

1) Start Apache Atlas and create entities/tags for HDFS

First let's look at setting up Apache Atlas. Download the latest released version (0.8-incubating) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.8-incubating-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "Data".  Click on "Search" and the "Create new entity" link. Select an entity type of "hdfs_path" with the following values:
  • QualifiedName: data@cl1
  • Name: Data
  • Path: /data
Once the new entity has been created, then click on "+" beside "Tags" and associate the new entity with the "Data" tag.

2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. First, start the Apache Ranger admin service and rename the HDFS service we created in the previous tutorial from "HDFSTest" to "cl1_hadoop". This is because the Tagsync service will sync tags into the Ranger service that corresponds to the suffix of the qualified name of the tag with "_hadoop". Also edit 'etc/hadoop/ranger-hdfs-security.xml' in your Hadoop distribution and change the "ranger.plugin.hdfs.service.name" to "cl1_hadoop". Also change the "ranger.plugin.hdfs.policy.cache.dir" along the same lines. Finally, make sure the directory '/etc/ranger/cl1_hadoop/policycache' exists and the user you are running Hadoop as can write and read from this directory.

After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
  • Set TAG_SOURCE_ATLAS_ENABLED to "false"
  • Set TAG_SOURCE_ATLASREST_ENABLED to  "true"
  • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
  • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". It can now be started via "sudo ranger-tagsync-services.sh start".

3) Create Tag-based authorization policies in Apache Ranger

Now let's create a tag-based authorization policy in the Apache Ranger admin UI. Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HDFSTagService". Create a new policy for this service called "DataPolicy". In the "TAG" field enter a capital "D" and the "Data" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with component permission of "HDFS" and "read" and "execute":


The last thing we need to do is to go back to the Resource based policies and edit "cl1_hadoop" and select the tag service we have created above.

4) Testing authorization in HDFS using our tag based policy

Wait until the Ranger authorization plugin syncs the new authorization policies from the Ranger Admin service and then we can test authorization. In the previous tutorial we showed that the file owner and user "alice" can read the data stored in '/data', but "bob" could not. Now we should be able to successfully read the data as "bob" due to the tag based authorization policy we have created:
  • sudo -u bob bin/hadoop fs -cat /data/LICENSE.txt

No comments:

Post a Comment