Giter Site home page Giter Site logo

facebook / threatexchange Goto Github PK

View Code? Open in Web Editor NEW
1.1K 75.0 287.0 151.43 MB

Trust & Safety tools for working together to fight digital harms.

Home Page: https://developers.facebook.com/docs/threat-exchange

License: Other

Python 13.51% PHP 0.82% JavaScript 0.44% Makefile 0.39% Ruby 0.96% Batchfile 0.09% Jupyter Notebook 0.35% Go 0.14% C++ 77.92% Shell 0.29% Java 4.07% M4 0.01% C 0.58% Starlark 0.03% Dockerfile 0.03% HTML 0.07% CSS 0.05% CMake 0.03% C# 0.13% Cython 0.07%
hashing image image-hashing image-similarity ncmec perceptual-hashing threatexchange video video-hashing stopncii

threatexchange's Introduction

Projects in this Repository

This repository originally started as code to support Meta's ThreatExchange API, but over time has grown to include a number of projects to support signal exchange and content moderation in general. Below are a list of sub-projects.

PDQ Image Hashing and Similarity Matching

PDQ is a photo hashing algorithm that can turn photos into 256 bit signatures which can then be used to match other photos.

TMK+PDQF (TMK) Video Hashing and Similarity Matching

TMK+PDQF (or TMK for short) is a video hashing algorithm that can turn videos into 256KB signatures which can be used to match other videos.

Video PDQ (vPDQ) Video Hashing and Similarity Matching

Video PDQ (or vPDQ for short) is a simple video hashing algorithm that determines two videos are matching based on the amount of shared similar frames. It can easily be applied for other image algorithms, and not just PDQ.

Hasher-Matcher-Actioner (HMA) Trust & Safety Platform

HMA is a ready-to-deploy content moderation project for AWS, containing many submodules. It allows you to maintain lists of known content to scan for, which you can either curate yourself or connect to other hash exchange programs to share and recieve lists. More can be found at the wiki.

A second version of this project, called "Open Media Match" is under construction, which uses a cloud-agnostic docker-based deployment.

python-threatexchange

A python Library/CLI tool available on pypi under threatexchange which provides implementations for content scanning and signal exchange. It provides reference implementations in python for downloading hashes from Meta's ThreatExchange API, scanning images with PDQ, and others. It can also be easily extended to work with other hash exchanges and other techniques, not all of which are written by the maintainers of this repository.

Meta's ThreatExchange API Reference Examples

The api-reference-examples folder contains example implementations in various languages for using the API. These implementations are at various stages of completeness and may not all implement every endpoint available on the ThreatExchange API. For full details on the ThreatExchange API and UI, data formats, and best practices are available in the ThreatExchange docs.

Meta's ThreatExchange API

ThreatExchange is a set of RESTful APIs on the Facebook Platform for querying, publishing, and sharing security threat information. It's a lightweight way for exchanging details on malware, phishing pages, and other threats with either specific members of the community or the ThreatExchange community at large.

For full details on ThreatExchange and best practices are available in the ThreatExchange docs.

Get All Available Data

For tag-driven workloads, supporting either bulk download or incremental updates, our currently recommended best practice is a Java reference design.

You can also explore the dataset using the hosted ThreatExchange UI

Getting Access

To request access to ThreatExchange, please submit an application via https://developers.facebook.com/products/threat-exchange/.

Other Information about this Repository

Contributing

We welcome contributions! See CONTRIBUTING for details on how to get started, and our Code of Conduct.

License

All projects in this repository are under the BSD license - see ./LICENSE. However, there are some exceptions for files that were included for demonstration purposes, and their alternative licenses are noted at the top of the files themselves.

As of 12/9/2021, this is the complete list of exceptions:

  • pdq/cpp/CImg.h

threatexchange's People

Contributors

9b avatar barrettolson avatar benoit-yubo avatar dcallies avatar dependabot[bot] avatar dmukhg avatar dougneal avatar eriksendc avatar evank28 avatar fuzzball5000 avatar hammem avatar htracey-godaddy avatar ianwal avatar inderh avatar jafischer avatar jeberl avatar jessek avatar johnkerl avatar juanmrad avatar maus- avatar meirwah avatar mengyangwang avatar mgoffin avatar mounikabodapati avatar omercnet avatar samyakk123 avatar tiegz avatar tigerchensh avatar tsytrytskyi avatar wxsbsd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

threatexchange's Issues

Invalid field "type" when querying ThreatDescriptors

When searching for ThreatDescriptors using the "objects" interface in pytx, it does not seem to match the documentation displayed on the Facebook developers site - https://developers.facebook.com/docs/threat-exchange/reference/apis/threat-descriptors.

from pytx import init
from pytx import ThreatDescriptor
from pytx.vocabulary import Types

init(app_id='<app-id>', app_secret='<app-secret>')
results = ThreatDescriptor.objects(
    text='37.59.224.217',
    strict_text=True,
    type=Types.IP_ADDRESS
)

When running the code above, the following comes back:

Traceback (most recent call last):
  File "app/tools/threat_exchange.py", line 35, in <module>
    type=Types.IP_ADDRESS
TypeError: objects() got an unexpected keyword argument 'type'

Control where "strict_text" is applied when searching

The current "strict_text" parameter is great for reducing fuzzy matches, but I've noticed that while it uses my exact query, it seems to apply across all fields. It would be helpful to specify the exact field I would like my "strict_text" query to run against.

Example use case:

from pytx import init
from pytx import ThreatDescriptor
from pytx.vocabulary import Types

init(app_id='<app-id>', app_secret='<app-secret>')
results = ThreatDescriptor.objects(
    text='haberko.com',
    strict_text=True,
    type_=Types.DOMAIN
)

When executing the above example, two different descriptors come back. One is an exact match (based on indicator) while the other was found just because the query was found in the comment. My use case is to get information about the exact indicator I queried for and nothing else, so having to loop over results to check where the match occurring is an extra step and potentially long one if the item being queried is more popular. Maybe allow for the user to specify the field to match against and just default to _all like it currently does now.

[RFC] Removing classful POST capability from pytx

With 2.4 there is a very large disparity between how things are uploaded and how things are downloaded. Prior to 2.4 we were able to both GET and POST with the same class attributes and for the most part it worked. Now the differences are so large that representing a descriptor in a class doesn't in any way look like what is necessary for adding a new one to the system.

I'm thinking that we change the following:

  • GET requests can still be classful and use the generator to return instantiated objects.
  • classes no longer have a .save() method.
  • classes get a .new() method (where applicable) which takes either a dict of values or arguments which are acceptable POST parameters for adding a new object.
  • new ThreatDescriptor class
    • once POSTing to /threat_indicators/ is fully deprecated, we can remove the .new() from ThreatIndicators and people will move their code to use ThreatDescriptor instead.
  • all of the above still leverages the vocabulary.

Any c&c as to whether or not this would be a good evolution to pytx?

npm integration for node

It'd be useful to have the node app added to npm. I have an account setup and can push it to npm, but I didnt know if threat exchange owners wanted to own this one

Unexpected keyword "status" when searching ThreatDescriptors

When searching for ThreatDescriptors using the "objects" interface in pytx, it does not seem to match the documentation displayed on the Facebook developers site - https://developers.facebook.com/docs/threat-exchange/reference/apis/threat-descriptors.

from pytx import init
from pytx import ThreatDescriptor
from pytx.vocabulary import Status

init(app_id='<app-id>', app_secret='<app-secret>')
results = ThreatDescriptor.objects(
    text='37.59.224.217',
    strict_text=True,
    status=Status.MALICIOUS
)

When running the code above, the following comes back:

Traceback (most recent call last):
  File "app/tools/threat_exchange.py", line 35, in <module>
    status=Status.MALICIOUS
TypeError: objects() got an unexpected keyword argument 'status'

Add method to search for exact text

There should be a way to search for an exact match. For example, when searching for evilevillabs.com, the query should match for exactly evilevillabs.com, not test.evilevillabs.com or i-know-somebody-at-evilevillabs.com.

Edits to the example code (in bold)

from pytx.access_token import access_token
from pytx import ThreatDescriptor
from pytx.vocabulary import ThreatDescriptor as td
from pytx.vocabulary import ThreatIndicator as ti
access_token('', '')

results = ThreatDescriptor.objects(text='www.facebook.com')
for result in results:
print result.get(td.THREAT_TYPE)

results = ThreatDescriptor.objects(type_='IP_ADDRESS',
text='127.0.0.1')
for result in results:
print result.get(ti.INDICATOR)

pytx: Searching for a ThreatDescriptor shows 'privacy_type': None

I added a ThreatDescriptor with a privacy_type of "HAS_PRIVACY_GROUP", but searching for it in pytx is returning "None" as the privacy_type. Just to note, the indicator is not sensitive. I was just testing. Details below.

Using pytx, I added the following ThreatDescriptor:

# Let's try to submit a new Threat Descriptor
from pytx import ThreatDescriptor
from pytx.vocabulary import ThreatDescriptor as tdv
from pytx.vocabulary import Types, Precision, PrivacyType, ReviewStatus, Severity, ShareLevel, Status

params = {
    tdv.INDICATOR : 'http://212.154.211.81/giz.exe',
    tdv.TYPE : Types.URI,
    tdv.CONFIDENCE : 75,
    tdv.DESCRIPTION : 'Ransomware download URL',
    tdv.PRECISION : Precision.MEDIUM,
    tdv.PRIVACY_MEMBERS : '1125937020771155', # CatFanciers ID
    tdv.PRIVACY_TYPE : PrivacyType.HAS_PRIVACY_GROUP,
    tdv.REVIEW_STATUS : ReviewStatus.REVIEWED_AUTOMATICALLY,
    tdv.SEVERITY : Severity.SUSPICIOUS,
    tdv.SHARE_LEVEL : ShareLevel.AMBER,
    tdv.STATUS : Status.MALICIOUS,
    tdv.TAGS : 'sage,ransomware,http_request,malware',
}

result = ThreatDescriptor.new(params=params)
print(result)

The following response returned: {'id': '1447134161986113', 'success': True}

Then I took a look at the indicator directly in my browser with: https://graph.facebook.com/v2.8/1447134161986113/?access_token=[REDACTED]

It showed what I would expect:

{
   "added_on": "2017-02-23T16:38:27+0000",
   "id": "1447134161986113",
   "indicator": {
      "indicator": "http://212.154.211.81/giz.exe",
      "type": "URI",
      "id": "1447134155319447"
   },
   "owner": {
      "id": "1678314142420566",
      "email": "nlhausrath\u0040ashland.com",
      "name": "Ashland CIRT"
   },
   "type": "URI",
   "raw_indicator": "http://212.154.211.81/giz.exe",
   "description": "Ransomware download URL",
   "status": "MALICIOUS",
   "privacy_type": "HAS_PRIVACY_GROUP",
   "share_level": "AMBER"
}

However, when I did the following, privacy_type was set to 'None':

from pytx import ThreatDescriptor

results = ThreatDescriptor.objects(
    text='giz.exe',
    owner='1678314142420566', # me
)

for result in results:
    print(result.to_dict())

The following was printed:

{'privacy_members': None, 'severity': 'SUSPICIOUS', 'owner': {'id': '1678314142420566', 'name': 'Ashland CIRT', 'email': '[email protected]'}, 'privacy_type': None, 'source_uri': '', 'id': '1447134161986113', 'share_level': 'AMBER', 'expired_on': None, 'precision': 'MEDIUM', 'review_status': 'REVIEWED_AUTOMATICALLY', 'metadata': None, 'indicator': {'type': 'URI', 'id': '1447134155319447', 'indicator': 'http://212.154.211.81/giz.exe'}, 'status': 'MALICIOUS', 'my_reactions': None, 'raw_indicator': 'http://212.154.211.81/giz.exe', 'type': 'URI', 'description': 'Ransomware download URL', 'added_on': '2017-02-23T16:38:27+0000', 'last_updated': '2017-02-23T16:38:28+0000', 'tags': {'data': [{'id': '1382721905133632', 'text': 'http_request'}, {'id': '1375757795798370', 'text': 'ransomware'}, {'id': '1318516441499594', 'text': 'malware'}, {'id': '595090370615714', 'text': 'sage'}]}, 'confidence': 75}

Am I doing something wrong or is this the wrong expectation? Thanks!

RFC: Python Library for ThreatExchange

I wanted to build a Python Library that would allow developers to quickly integrate with ThreatExchange. I've started the work here:

https://github.com/mgoffin/ThreatExchange/tree/pytx
(comparison: https://github.com/facebook/ThreatExchange/compare/master...mgoffin:pytx)

The goals I have for this are:

  • Easy to install (pip install pytx).
  • Easy to work with for quick prototyping and production-quality development.
    • Not just a wrapper around the API. Provide easy-to-use methods for tasks.
    • Rich vocabulary list to make code more resilient against name-changes.
    • Thorough documentation (both in code and elsewhere) so getting up-to-speed is quick and painless.
  • Useful results that are easy to loop over, parse, and ingest.
  • Classful interfaces to the different object types and results.
  • Flexible for future API enhancements.
  • Test cases to keep code in working order and automate builds for validation.
  • Example scripts to get people started.

The code is still very much a WIP but I wanted to get it out there for visibility, comments, and direction. I think it is a decent foundation to work from, but a lot can be done to make this better. Some of the goals haven't been started because I think the foundation needs to mature a bit before they can be worked on (like Classful objects and results, example scripts, test cases, more documentation, etc.).

Ultimately I'd like to get this in a state where it's stable enough to submit a PR, get some development interest from other community members, then get it up on PyPi for quick installation.

Here's an example of using this library in its current form:

import pytx
import pytx.vocabulary as v

p = pytx.pytx('<app-id>', '<app-secret>')

# Find malware using approximate matching for "www.facebook.com"
i = p.malware_analyses(text="www.facebook.com")

# Find Indicators using strict matching for "www.facebook.com"
i = p.threat_indicators(text="www.facebook.com", strict_text=True)

# Get a list of ThreatExchange members.
i = p.threat_exchange_members()

# Get a list of malware objects associated with a specific object.
i = p.objects('<object-id>', connection=v.Connection.MALWARE_ANALYSES)

# Quickly inspect and loop over results
print i
for x in i:
    print x

Thanks!

pytx support for previous versions

When the 2.4 Graph API came out there were some significant changes to the ThreatExchange API. pytx accurately reflects those changes - but doesn't look to make an effort to enable use of the previous version.

I would like to use pytx to query the old /threat_indicators endpoint. But I don't see a parameter or method to override the API version used by Common.

Recent changes also blew away the fields that were associated with the ThreatIndicators class as well.

I understand the value of staying current. But for me to use the deprecated but still available v2.3 API Call /threat_indicators I would need to downgrade my pytx module.

Support could be maintained for previous version - or perhaps the line should be drawn that the module only supports the latest version.

Odd response count inconsistencies

I wrote the following script:

#!/usr/bin/env python

from pytx import ThreatDescriptor
import time

def foo():
    start = time.time()
    c = 0
    for t in ThreatDescriptor.objects(text="facebook.com", dict_generator=True):
        c += 1
    end = time.time()
    print("time: ", end - start)
    print("count: ", c)

foo()

Several executions resulted in the following:

('time: ', 10.145052909851074)
('count: ', 200)

('time: ', 6.017927169799805)
('count: ', 175)

('time: ', 120.53107500076294)
('count: ', 6141)

('time: ', 61.75989103317261)
('count: ', 3275)

('time: ', 36.055991888046265)
('count: ', 2575)

('time: ', 40.71333408355713)
('count: ', 1600)

('time: ', 18.10604500770569)
('count: ', 825)

It seems very odd that the times and counts are so far off. Can anyone else reproduce the wide array of times and counts that I am seeing?

pytx support for batch queries

The Graph API allows for this and right now the only way to do so is using the raw argument to .objects() (I'm not sure if that works without also having to set full_response or dict_generator to True).

pytx should have a simple interface for making these types of requests.

Documentation for threat_descriptors submission should specify URL path correctly

https://developers.facebook.com/docs/threat-exchange/reference/submitting/v2.5

Where it says:

You may submit data to the graph via an HTTP POST request the following URL:
https://graph.facebook.com/threat_descriptors

Turns out nothing will work unless you specify the platform version you want to use:
https://graph.facebook.com/v2.4/threat_descriptors

Took me several hours figuring that one out, proxying the request, looking at the postdata... :( I finally found a URL in some code that had the platform version in the URL path and decided to try it out.

pytx- why does a day return more ThreatDiscriptors than a month?

Hi everyone,

I'm just starting to explore ThreatExchange with pytx, and I'm getting odd results.

ThreatDescriptor.objects(since="yesterday",
until="now")

Returns 500+ results

ThreatDescriptor.objects(since="-1 month",
until="now")

Returns 4 results

What am I missing? Shouldn't that return up to 1000 results?

Nuke UI code?

The UI code originally written by @mgoffin was put together on a whim as a PoC and has not been touched in over a year. It is outdated and I have privately seen one bug report in it, though I am unable to replicate the bug. I think it may be time to nuke that code as it isn't being maintained. If anyone agrees I'll toss up a PR and kill it.

Copy/Paste code examples in API documentation

The value of a copy/paste example when looking at API documentation cannot be overstated. In the current API documents, endpoints and GET/POST content is shown, but it would be nice to have an example I could just copy/paste (think how Google does it) in the top 3-4 languages (Python, Java, PHP).

While it's not terribly difficult to convert the outlined requests into code, it's yet another set of steps that stop me from instantly plugging ThreatExchange into my project. Additionally, it could also be useful to include a hosted version of the JSON code, so I could load it directly from the FB developer site for testing or download it locally.

Add field to control which field is searched with the 'text' parameter

By default, we search all of the available fields using what's supplied in the 'text' parameter. Currently, the 'strict_text' parameter limits the search to the primary field (indicator or name). We should create a new field, 'text_field', which specifies which field should be searched using the 'text' parameter. For example,

/threat_descriptors?text="Jesse's best C2"&text_field=description&strict_text=1

should search the description field for exact matches to "Jesse's best C2".

See the discussion in #84 for background.

Github hook for ReadTheDocs

pytx is setup on readthedocs (pytx.readthedocs.org) but in order for it to kick off a new build it needs to be notified (they don't seem to rebuild on their own).

There are ways to hook Github into readthedocs: https://docs.readthedocs.org/en/latest/webhooks.html

readthedocs gives me the following post-commit hook:

curl -X POST http://readthedocs.org/build/pytx

I am not sure how the ReadTheDocs Github app works with the above directions, but it would make life a lot easier knowing when a commit makes it to master the docs are rebuilt automagically :)

Dateutil version

Hi guys,

I ran:

python2.7 setup.py install

Which ended successfully.

Running your example as is yields the following error in my system:

> python ./get_all_data/get_threat_indicators.py --text="facebook" --days_back=10
Traceback (most recent call last):
  File "./get_all_data/get_threat_indicators.py", line 65, in <module>
    main()
  File "./get_all_data/get_threat_indicators.py", line 33, in main
    utils.get_time_params(s.end_date, day_counter, format_)
  File "build/bdist.macosx-10.11-x86_64/egg/pytx/utils.py", line 62, in get_time_params
AttributeError: 'module' object has no attribute 'parser'

To fix it, I changed utils.py as follows:

diff --git a/pytx/pytx/utils.py b/pytx/pytx/utils.py
index a42dc50..5538f3d 100644
--- a/pytx/pytx/utils.py
+++ b/pytx/pytx/utils.py
@@ -1,4 +1,4 @@
-import dateutil
+import dateutil.parser as dateutil_parser
 import datetime


@@ -59,7 +59,7 @@ def get_time_params(end_date, day_counter, format_):
     """
     # We use dateutil.parser.parse for its robustness in accepting different
     # datetime formats
-    until_param = dateutil.parser.parse(end_date) - \
+    until_param = dateutil_parser.parse(end_date) - \
         datetime.timedelta(days=day_counter)
     until_param_string = until_param.strftime(format_)

And then I ran python2.7 setup.py install again, after which the get_threat_indicators.py ran successfully.

I noticed that, in setup.py, and also in requirements.txt (why the duplication?) you are specifying the required version of dateutil as 2.5.2, but that seems to be ignored by the system. I'm not an expert in python builds so maybe all of this is not that useful for you, but I thought I'd give you guys a heads up. Please close if irrelevant. Thanks for the very interesting work ๐Ÿ˜„

Add __version__ somewhere.

It would be sweet if I could do:

import pytx
print pytx.__version__

And have it print the current pytx version number.

duplicated python scripts are confusing

[ivanlei@some_machine:~/source/ThreatExchange] $ find . -name 'get_*.py'
./malware/get_samples.py
./members/get_members.py
./pytx/scripts/get_compromised_credentials.py
./pytx/scripts/get_indicators.py
./pytx/scripts/get_members.py
./threat_indicators/get_compromised_credentials.py
./threat_indicators/get_indicators.py

So there's:

  • 2 copies of get_compromised_credentials.py
  • 2 copies of get_indicators.py

and there's basically 4 directories with scripts in them:

  • malware/
  • members/
  • pytx/scripts/
  • threat_indicators/

I suggest all the scripts be moved into pytx/scripts/ so they can be installed with as part of the pytx python package.

InsecurePlatformWarning from requests on OSX with default python

$ python scripts/get_compromised_credentials.py --since=2015-05-01 --until=2015-06-01
READING https://graph.facebook.com/threat_indicators/
/Users/ivanlei/virtual_envs/ThreatExchange/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning

This is easily fixed. PR incoming.

API call to identify if "x was added to ThreatExchange" already

In order to identify if some indicator/descriptor has already been added to ThreatExchange for my app, I need to run a complicated query and then parse the results in an effort to surface the potential item. If I don't do this, I run into the potential of overwriting my existing input.

As an example use case, take the following:

I want to identify if I currently have a threat descriptor for evil.com. My query would need to include the following to potentially surface any existing TDs: owner as me, indicator as evil.com, potentially type and strict_text to avoid fuzzy matching. Even with a detailed query like that, it's entirely possible that I would get more than just my evil.com threat descriptor as evil.com could lie in the description of other descriptors I have sent in. So, I would need to parse the results, looking to ensure the "indicator" field was an exact match.

The way to combat this for now is to store FBIDs locally inside of some localized database, but assuming I have already sent data in, the above process is what I need to go through just to identify if I have already pushed data for a specific indicator. There's a lot of friction there.

pytx should set sane defaults for fields returned and use them by default

Each main class has an _default_fields attribute which is a list of fields that are supposed to be "default fields returned by ThreatExchange when you perform a query."

We should change this so that _default_fields is a list of "useful" fields determined by those that use the data. Once those lists are fixed up, we should change .objects() to use what is in cls._default_fields if the fields argument is still None. This will allow people to override and specify the fields they want to use, otherwise they will get what pytx considers the default ones.

Long duration in pulling Malware Connections

Reopening the issue reported in bug #46.

I did another experiment today and following are the stats of pulling dropped, dropped_by and malware, for only 3 minutes period. are below Each of the following data was pulled with since=1433167932 and until=1433168114. (i.e. 3 minutes duration). Note that malware took less than a minute, but dropped and dropped by took 5 and 6 minutes to pull 3 minutes of data. Is this expected or something we are aware of and working? Thanks!

$ ./malware.py
Execution Started: 2015/08/28 20:26:43
Execution Completed: 2015/08/28 20:27:23

$ ./dropped_by.py
Execution Started: 2015/08/28 20:29:32
Execution Completed: 2015/08/28 20:34:39

$ ./dropped.py
Execution Started: 2015/08/28 20:38:52
Execution Completed: 2015/08/28 20:44:18

HTTP 500 Error - Internal Error for huge samples

When trying to request "huge" samples it will cause a HTTP 500 - Internal Error.

Here are some example requests:

FB internal Error(500): https://graph.facebook.com/1068651733168127?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/971900736209108?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/930432390327138?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/976251609114096?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/934348283307805?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/890422987721923?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/898279190257733?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/997472226992851?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/914303301973741?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/721750407926819?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/744778702289443?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/847333382050416?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/989351084444472?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count
FB internal Error(500): https://graph.facebook.com/935900829820480?access_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX&fields=password%2C+sample%2C+added_on%2C+crx%2C+xpi%2C+id%2C+status%2C+victim_count%2C+submitter_count

Name change for ruby lib.

I'd like to propose changing the Ruby lib name from ThreatExchange to threat_exchange everywhere.

Snakecase is the most common convention in Ruby (e.g. list of gems on Rubygems).

It's mostly a cosmetic change, but could help adoption of the library.

Allow custom headers for pytx

Similar to how we allow people to set a proxy, or setup logging, we should allow them to setup custom headers and then use those headers for all requests going forward.

Example code:

headers = {
    'User-Agent': 'Foo'
}

response = requests.get(url, headers=headers)

pytx fetch error in fetching malware_connection_families

I was able to pull the families about two weeks ago. But now when I am running the same code it is giving following error, please advise.

Traceback (most recent call last):
File "pull_malware_test3.py", line 93, in
for details in Malware.details(id=result['id'], connection="families", dict_generator=True):
File "/usr/local/lib/python2.7/site-packages/pytx/request.py", line 272, in get_generator
results = cls.get(url, params)
File "/usr/local/lib/python2.7/site-packages/pytx/request.py", line 202, in get
return cls.handle_results(resp)
File "/usr/local/lib/python2.7/site-packages/pytx/request.py", line 113, in handle_results
resp.text))
pytx.errors.pytxFetchError: Response code: 500: {"error":{"message":"An unknown error has occurred.","type":"OAuthException","code":1}}

Search does not recognize '@' symbol

The ThreatExchange search does not appear to recognize the '@' symbol in searches. Looking @evilevil, for example, would return some results which contain "evilevil", but not the '@' symbol.

Don't remove old versions of pytx from pypi

pytx is awesome and we've started using it to talk to ThreatExchange at Bitly. We had pinned our pytx version to 0.1.0, which is no longer available in pypi.

I propose keeping older versions in pypi unless there's a security hole or other glaring need to remove. Of course, the ThreatExchange API might change to a degree such that older versions of pytx aren't useful. Not sure what to do in that case.

Thanks!

Updates to Threat Descriptors Fail

I am attempting to update the status of a threat descriptor and can't seem to get it to stick despite being returned a success. I have tried this through PyTX, and have used requests below as an example. Note, I am including the params as both data (POST) and params (GET) just in case, but have tried each independently with no luck.

import logging
import requests
from pytx.vocabulary import ThreatExchange as te

logging.basicConfig(level=logging.DEBUG)

auth = {
    'app_token': 'APP_TOKEN',
    'app_secret': 'APP_SECRET'
}
params = {'status': 'MALICIOUS'}
access_token = auth['app_token'] + '|' + auth['app_secret']
url = te.URL + str('907174782669741')
url = "%s?access_token=%s" % (url, access_token)
response = requests.post(
    url,
    params=params,
    data=params
)
print response.content

Here's the debugging output I am seeing:

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): graph.facebook.com
DEBUG:requests.packages.urllib3.connectionpool:"POST /907174782669741?access_token=<APP_TOKEN>%7C<APP_SECRET>&status=MALICIOUS HTTP/1.1" 200 16
Response: {"success":true}

object "type" field only included in metadata

I had this conversation with @jessek, @mrichard91, and @hammem, but I don't think we got anywhere.

The API allows you to query a specific object by identifier. The URL looks something like this:

https://graph.facebook.com/<object-id>

With the above context it is unknown to the person querying the API what type of object is going to be in the response. One could add ?metadata=1 to the query which will bloat the response with a ton of extra information about each field and what it means, as well as the object's type.

From a development standpoint this becomes a bit cumbersome. Specifically in pytx it causes a bit of a problem when querying for details of an object as can be found here:

https://github.com/facebook/ThreatExchange/blob/master/pytx/pytx/common.py#L209

Users have run into situations where they felt that pytx was "stripping" fields out of the response. This looked like the case because they were filling one class with the fields from another so only the common fields were being applied.

There is the issue that a ThreatIndicator has a type field already, so adding the object's type field would clobber the name, but I still recommend adding a field somewhere which denotes the type of object a developer is working with. In many cases context will prevail (ex: I queried /threat_indicators so I'm getting ThreatIndicator objects back) but there are many cases where it won't and an automated system needs to determine what it's working with.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.