使用Python进行并发编程

用Python抓取亚马逊云(AWS)的日志(CloudTrail)数据

posted @ 2015年4月19日 21:54 in 软件开发 , 1354 阅读
分享到: 更多


如今是云的时代,许多公司都把自己的IT架构部署在基础架构云(IaaS)上。著名的IaaS提供商有亚马逊,微软(Azure),IBM等,国内也有诸如阿里云等。这里亚马逊毫无疑问是该市场的领军者。

AWS提供了非常多的服务,领先了竞争对手一大截。并且AWS提供非常丰富的API,其API基于Rest,所以很容易被不同的语言的平台来调用。

在如今的大数据时代,利用数据在做决策是大数据的核心价值,AWS提供了许多服务来获取其运行数据cloudtrail和cloudwatch是经常被用到的两个。CloudTrail是对AWS的所有API调用的日志,CloudWatch是监控AWS服务的性能数据。(新出的Config服务可用于监控AWS的资源变化)

今天我们来看看如何使用Python(Boto AWS的开源Python SDK)来自动配置ClouTrail的服务并获取日志内容。

我们先来看看CloudTrail的概念和相关的配置。

  • S3 Bucket

    在打开CloudTrail的服务时,需要指定一个相关的S3的Bucket,S3是亚马逊提供的存储服务,你可以把它当作一个基于云的文件系统。CloudTrail的API调用日志,会以压缩文件的形式,存储在你指定的Bucket里。

  • SNS

    SNS是亚马逊提供的通知服务,该服务使用的是订阅/发布(Subsrcibe/Publish)的模式。在创建CloudTrail的时候,可以关联一个SNS的Topic(可选),这样做的好处是当有API调用时,可以第一时间得到通知。可以使用不同的客户端来订阅SNS的通知,例如Email,Mobile的Notification Service,SQS等

  • SQS

    SQS是亚马逊提供的队列服务,在本文中,我们使用SQS订阅SNS的的内容,这样我们的Python程序就可以从SQS的队列中获取相应的通知。


配置CloudTrail

首先我们需要创建SNS,并指定相应的策略。代码如下:

import boto.sns
import json

key_id='yourawskeyid'
secret_key='yourawssecretkey'

region_name="eu-central-1"
trail_topic_name="topicABC"
sns_policy_sid="snspolicy0001"

sns_conn = boto.sns.connect_to_region(region_name,
                                         aws_access_key_id=key_id,
                                         aws_secret_access_key=secret_key)

sns_topic = sns_conn.create_topic(trail_topic_name)

# Get ARN of SNS topic
sns_arn = sns_topic['CreateTopicResponse']['CreateTopicResult']['TopicArn']

# Add related policy
attrs = sns_conn.get_topic_attributes(sns_arn)
policy = attrs['GetTopicAttributesResponse']['GetTopicAttributesResult']['Attributes']['Policy']
policy_obj = json.loads(policy)
statements = policy_obj['Statement']

default_statement = statements[0]
new_statement = default_statement.copy()
new_statement["Sid"] = sns_policy_sid
new_statement["Action"] = "SNS:Publish"
new_statement["Principal"] = {
        "AWS": [
          "arn:aws:iam::903692715234:root",
          "arn:aws:iam::035351147821:root", 
          "arn:aws:iam::859597730677:root",
          "arn:aws:iam::814480443879:root",
          "arn:aws:iam::216624486486:root",
          "arn:aws:iam::086441151436:root",
          "arn:aws:iam::388731089494:root",
          "arn:aws:iam::284668455005:root",
          "arn:aws:iam::113285607260:root"
        ]
      }
new_statement.pop("Condition", None)
statements.append(new_statement)
new_policy = json.dumps(policy_obj)
sns_conn.set_topic_attributes(sns_arn,"Policy",new_policy)

CloudTrail是和区域(Region)相关的,不同的Region有不同的CloudTrail服务,所以,在创建对应的SNS时,需要保证使用同一个Region。

这里要注意的是我们创建了新的policy来使得CloudTrail拥有向我们创建的SNS发布消息(Action=“SNS:Publish”)的权限。我们的做法是从缺省的策略中拷贝了一份,修改了相应的Action和Sid(随便取一个不重复的名字),Principal部分是一个缺省的account的列表,这里是硬编码,AWS有可能会修改该列表的值,但在当前环境下,该值是固定的。最后移除Condition的值。把新创建的Policy片段添加到原来的Policy中就好了。

然后我们需要创建一个SQS的队列,并订阅我们创建的SNS的Topic。这一步相对比较简单。

import boto.sqs

sqs_queue_name="sqs_queue"
sqs_conn = boto.sqs.connect_to_region(region_name,
                                         aws_access_key_id=key_id,
                                         aws_secret_access_key=secret_key)
sqs_queue = sqs_conn.create_queue(sqs_queue_name)
sns_conn.subscribe_sqs_queue(sns_arn, sqs_queue)

然后,我们需要创建一个S3的Bucket用来存储CloudTrail产生的日志文件。同样的,需要指定响应的策略以保证CloudTrail能够有权限写入对应的日志文件。

import boto

bucket_name="bucket000"
policy_sid="testpolicy000"
s3_conn = boto.connect_s3(aws_access_key_id=key_id,aws_secret_access_key=secret_key)
bucket = s3_conn.create_bucket(bucket_name)
bucket_policy = '''{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "%Sid%GetPolicy",
			"Effect": "Allow",
			"Principal": {
				"AWS": [
					"arn:aws:iam::903692715234:root",
					"arn:aws:iam::035351147821:root",
					"arn:aws:iam::859597730677:root",
					"arn:aws:iam::814480443879:root",
					"arn:aws:iam::216624486486:root",
					"arn:aws:iam::086441151436:root",
					"arn:aws:iam::388731089494:root",
					"arn:aws:iam::284668455005:root",
					"arn:aws:iam::113285607260:root"
				]
			},
			"Action": "s3:GetBucketAcl",
			"Resource": "arn:aws:s3:::%bucket_name%"
		},
		{
			"Sid": "%Sid%PutPolicy",
			"Effect": "Allow",
			"Principal": {
				"AWS": [
					"arn:aws:iam::903692715234:root",
					"arn:aws:iam::035351147821:root",
					"arn:aws:iam::859597730677:root",
					"arn:aws:iam::814480443879:root",
					"arn:aws:iam::216624486486:root",
					"arn:aws:iam::086441151436:root",
					"arn:aws:iam::388731089494:root",
					"arn:aws:iam::284668455005:root",
					"arn:aws:iam::113285607260:root"
				]
			},
			"Action": "s3:PutObject",
			"Resource": "arn:aws:s3:::%bucket_name%/*",
			"Condition": {
				"StringEquals": {
					"s3:x-amz-acl": "bucket-owner-full-control"
				}
			}
		}
	]
}'''
bucket_policy = bucket_policy.replace("%bucket_name%",bucket_name)
bucket_policy = bucket_policy.replace("%Sid%",policy_sid)
bucket.set_policy(bucket_policy)

这里我们使用一个缺省的Policy文件,替换掉响应的字段就好了。

最后,我们创建CloudTrail的服务:

import boto.cloudtrail

trail_name="Trailabc"
log_prefix="log"

cloudtrail_conn=boto.cloudtrail.connect_to_region(region_name,
                                         aws_access_key_id=key_id,
                                         aws_secret_access_key=secret_key)

##cloudtrail_conn.describe_trails()
cloudtrail_conn.create_trail(trail_name,bucket_name, s3_key_prefix=log_prefix,sns_topic_name=trail_topic_name)
cloudtrail_conn.start_logging(trail_name)

好了,现在CloudTrail已经配置好了,并且关联的SNS也被我们创建的SQS队列订阅,下面我们就可以抓取日志了

获取日志数据

每当有一个API调用,CloudTrail都会把响应的日志文件写入到S3我们创建的Bucket中,同时在我们在创建的SNS的topic中发布一条消息,因为我们使用SQS的队列订阅了该消息,所以我们可以通过读取SQS消息的方式来获得日志数据。

首先连接到SQS的队列,并从中读取消息

import boto.sqs

sqs_queue_name="sqs_queue"
sqs_conn = boto.sqs.connect_to_region(region_name,
                                         aws_access_key_id=key_id,
                                         aws_secret_access_key=secret_key)
                                         
sqs_queue = sqs_conn.get_queue(sqs_queue_name)
notifications = sqs_queue.get_messages()

然后我们从消息中获得响应的日志文件在S3中的地址,并利用该地址从S3中获得对应的日志文件

for notification in notifications:
    envelope = json.loads(notification.get_body())
    message = json.loads(envelope['Message'])
    bucket_name = message['s3Bucket']
    s3_bucket = s3_conn.get_bucket(bucket_name)
    for key in message['s3ObjectKey']:
        s3_file = s3_bucket.get_key(key)
        with io.BytesIO(s3_file.read()) as bfile:
            with gzip.GzipFile(fileobj=bfile) as gz:
                logjson = json.loads(gz.read())

logjson就是对应的日记内容的JSON格式。这里有一个例子

{
    "Records": [{
        "eventVersion": "1.0",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "EX_PRINCIPAL_ID",
            "arn": "arn:aws:iam::123456789012:user/Alice",
            "accessKeyId": "EXAMPLE_KEY_ID",
            "accountId": "123456789012",
            "userName": "Alice"
        },
        "eventTime": "2014-03-06T21:22:54Z",
        "eventSource": "ec2.amazonaws.com",
        "eventName": "StartInstances",
        "awsRegion": "us-west-2",
        "sourceIPAddress": "205.251.233.176",
        "userAgent": "ec2-api-tools 1.6.12.2",
        "requestParameters": {
            "instancesSet": {
                 "items": [{
                      "instanceId": "i-ebeaf9e2"
                }]
            }
        },
        "responseElements": {
            "instancesSet": {
                "items": [{
                      "instanceId": "i-ebeaf9e2",
                      "currentState": {
                          "code": 0,
                          "name": "pending"
                      },
                      "previousState": {
                          "code": 80,
                          "name": "stopped"
                      }
                    }]
            }
        }
    },
    ... additional entries ...
	]
}

你可以使用以上代码来监控所有的cloudtrail的日志,拿到的JSON格式的日志可以放在你的数据库(Mongo不错)中,然后利用你的BI工具做分析。

注意你也可以不创建SNS和SQS,直接扫描bucket的内容,这样做的好处是配置更简单,缺点是实时性比较差,扫面Bucket需要额外的计算,并且需要在本地保存文件扫描的状态,code会更加复杂。

利用CloudTrail的日志,你可以做很多事情,比如看看有没有非法的登陆,各个服务的使用频率,总之,当你有了足够多的数据,你就可以从中发现足够的价值。



分享到: 更多
Avatar_small
cleaning services 说:
2019年10月11日 23:34

Belonging to the high a better standard of dancing to help you skirting message boards and carpet cleaning service, we have an excessive amount of experience in hotel, drink station, pubs, nightclub and even all-sized inns – and we all assume about their work you demand Will to remain the host to Hiring an important CCS in the cleaning from your hotel and cleaning pub means which you could feel incredibly to opened your doors in the paying court.

Avatar_small
sikkimhealth 说:
2020年4月15日 23:52

For the reason that noted above You ought to that medicine and health providers get embroiled with society groups to work with population quality health management strategies to improve the actual health of this community. One good portion of involvement will be Community Healthiness Needs Test project increasingly being implemented via the local healthiness department not to mention non-profit clinics.

Avatar_small
cleantechlaws 说:
2020年4月15日 23:52

A good security taxation is vitally important to campus security measure. However a lot of our survey from security directors / law enforcement agency chiefs indicates that college administrators cannot allow such assessments to always be done. Two why this refusal will be fear from liability exposure and then the chance that audit would definitely require alters in relief systems.

Avatar_small
bankruptcylawmerced 说:
2020年4月15日 23:52

Web site of especially handy online websites that furnish resources for selecting the right tax legal requirement books for the purpose of either good discounts or debt through legal requirement school libraries. They are actually organized that provides local legal requirement schools, say tax legal requirement, ongoing income tax news, text book, casebooks, legal requirement reviews, legal requirement journals, income tax law piece of writing abstracts, resource articles (website pages that have an array of tax legal requirement resource shortcuts) not to mention tax path law because of many YOU AND ME law faculties (because of Northwestern Higher education of Law in the New York University Higher education of Legal requirement).

Avatar_small
duilawyermontreal 说:
2020年4月15日 23:53

Any sort of relationship relating to two addresses, either humans or schools, cannot turn out to be established except in accordance with some wide range rules. These rules may well be unenforceable norms and / or customs from a group and / or society, or numerous explicit laws that has a binding not to mention enforceable expertise.

Avatar_small
wendywoodlaw 说:
2020年4月15日 23:54

Some subdivisions thus characteristics that any particular one muust have to flourish in a livelihood in legal requirement. You should certainly assess one self before genuinely enrolling to any school. To have a relatively clear theory, you are able to visit hometown courts, enroll trials, consult with lawyers and observe the functioning of this legal structure.

Avatar_small
maid service dubai 说:
2020年5月04日 02:20

It is important to don't forget is that your particular maid involving honor speech must be brief. I would suggest that it should be no over 10 units long. The major focus in the wedding day is definitely the bride along with groom. You will certainly play an important part for the day but it is advisable to keep this portion of your tasks short in order that the other speeches might be made so there's no delay on the celebrations.

Avatar_small
house painting servi 说:
2020年5月04日 02:20

The camp coat will then re-activate and initiate to roll off the wall, resulting in a very textured effect that can not search good. If this specific does happen it is going to mean your current Oxford decorator should sand along the wall membrane and delay till your paint features dried, so by simply rushing you can in simple fact make the position take extended.

Avatar_small
full time maids in d 说:
2021年6月07日 19:53

Soon after entering this project issued area, cleaners fit their equipment along with packages within a fixed area, so which the diseases are not able to spread. The cleaning practice is very well planned in addition to systematic.


登录 *


loading captcha image...
(输入验证码)
or Ctrl+Enter