ocsf / ocsf-schema Goto Github PK
View Code? Open in Web Editor NEWOCSF Schema
License: Apache License 2.0
OCSF Schema
License: Apache License 2.0
The goal is simplify (reduce) the number of profiles define is OCSF v1.0 release. The Reputation and Domain Security profiles looks somewhat similar and we might be able to merge them.
Propose to alter cloud activity_id
to:
{
"enum": {
"1": {
"caption": "Login",
"description": "The event pertains to login activity."
},
"2": {
"caption": "Management",
"description": "The event pertains to management activity (e.g. policy updates, user creations, subnet creations, etc.)."
},
"3": {
"caption": "Operational",
"description": "The event pertains to cloud resource operations activity (e.g. data downloads, launched virtual machines, etc.)."
}
}
}
Originally posted by rroupski July 19, 2022
Attributes that are either generated or derived by the collection, post-collection processing, or storage systems other than the mapping process are designated Reserved. The current list of the reserved attributes is:
_logged_time
This discussion is about whether the last 3 attributes should be reserved or not.
_observables
_raw_data
_unmapped
The observables
should generated based the input data and the schema. In other words, the observables
data should not be manually added by the source that generated the event.
The raw_data
is attribute that contains the original data as generated by the source. If the event source creates events in the OCSF Schema, then the raw_data
should not be used.
The unmapped
is attribute that contains the attributes, which are not defined by the OCSF Schema. If the event source creates events in the OCSF Schema, then the unmapped
attribute could be used to add additional attribute, which are not defined by the schema.
Ranges for extensions / collaborators is required for initial OCSF release.
First release Categories, checked if complete and merged into main:
In the container schema, the following item is defined:
"sha2": {
"description": "Commit hash of image created for docker or the SHA256 hash of the container. For example: a3bf90e006b2.",
"group": "context",
"requirement": "optional"
},
The field "sha2" is ambiguous in a few respects:
Finally, it may make sense to define this as a set of hashes. If sha256 is broken, it'll be easier to adapt to a set of hashes rather than a specifically hard coded value. One good example we can follow is from the In-Toto project where hashes are presented in a dictionary object with each hash explicitly listed, rather than attached to the root object. See https://github.com/in-toto/docs/blob/v0.9/in-toto-spec.md#421-hash-object-format for more information.
Hope this all helps! I'm happy to open these changes as a pull request if that would be useful.
Propose to update the activity_id
for cloud activity category. Specifically to:
{ "enum": { "1": { "caption": "Success", "description": "The API call was successful." }, "2": { "caption": "Failed", "description": "The API call failed." } } }
We could get a bit more specific, but maybe we can leave that up to some event classes that extend the cloud activity
class. Below are some examples of event outcomes for various cloud logs (small samples so might not be comprehensive):
CloudTrail Error Codes (small sample but these can essentially be mapped to success
and fail
):
GCP Audit have a variety too but could be easily mapped the success
and fail
.
In order to promote interoperability, OCSF must define a "schema", not just a "schema framework". The data that goes into logging information must be defined across vendors, not just "captioned".
Consider dictionary.json:
"account_type": {
"caption": "Account Type ID",
"description": "The user account type (e.g. AWS, LDAP, Windows account, etc.).",
"type": "string_t"
},
"account_type_id": {
"caption": "Account Type ID",
"description": "The user account type identifier (e.g. AWS, LDAP, Windows account, etc.).",
"enum": {
"-1": {
"caption": "Other",
"description": "The user account type is not mapped."
},
"0": {
"caption": "Unknown",
"description": "The user account type is unknown."
},
"1": {
"caption": "LDAP Account"
},
"2": {
"caption": "Windows Account"
},
"3": {
"caption": "AWS IAM Account"
},
"4": {
"caption": "GCP Account"
},
"5": {
"caption": "Azure AD Account"
}
},
"type": "integer_t"
},
This is a framework for an enumeration, but OCSF defines no value for the "account_type" "string_t". An information model (abstract schema) does define enumerations:
ID | Name | Description |
---|---|---|
-1 | ? | Other: The user account type is not mapped. |
0 | ? | Unknown: The user account type is unknown. |
1 | ? | LDAP Account: |
2 | ? | Windows Account: |
3 | ? | AWS IAM Account: |
4 | ? | GCP Account: |
5 | ? | Azure AD Account: |
The name column (the string_t account_type) is undefined. Which means that when looking at, for example, Splunk logs, OCSF provides no guidance:
<TS> phonenumber=333-444-4444, app=angrybirds, installdate=xx/xx/xx, acct=Windows Account
<TS> phonenumber=333-444-4444, app=facebook, installdate=yy/yy/yy, acct=Azure AD Account
Using captions might work for comma-separated data fields (assuming captions prohibit commas), but it definitely will not work for space-separated data:
<TS>
USER ACCT PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
Root Windows Account 41 21.9 1.7 3233968 143624 ?? Rs 7Jul11 48:09.67 /System/Library/foo
Rdas Azure AD Account 790 4.5 0.4 4924432 32324 ?? S 8Jul11 9:00.57 /System/Library/baz
Enumeration names enable interchangeable logging data:
ID | Name | Description |
---|---|---|
-1 | other | Other: The user account type is not mapped. |
0 | unknown | Unknown: The user account type is unknown. |
1 | ldap | LDAP Account: |
2 | windows | Windows Account: |
3 | aws_iam | AWS IAM Account: |
4 | gcp | GCP Account: |
5 | azure_ad | Azure AD Account: |
enables
<TS>
USER ACCT PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
Root windows 41 21.9 1.7 3233968 143624 ?? Rs 7Jul11 48:09.67 /System/Library/foo
Rdas azure_ad 790 4.5 0.4 4924432 32324 ?? S 8Jul11 9:00.57 /System/Library/baz
Defining enumerated strings is the rationale for formatting "enum" entries with both a property name and an integer id, as proposed in Issue #214:
"account_type_id": {
"caption": "Account Type ID",
"name": "AccountType",
"description": "The user account type identifier (e.g. AWS, LDAP, Windows account, etc.).",
"enum": {
"other": {
"caption": "Other",
"description": "The user account type is not mapped."
"id": -1
},
...
An example schema containing just Enumerated data types defined in the OCSF enums folder is available here. The OCSF files could easily be updated to define both datatype names and property names.
First release Objects, checkbox checked when merged into main.
I want to review OCSF as a consumer of OCSF schema.
After reviewing all the available documentation, only related reference I could find was in Contributing.MD
However, the linked repo is not accessible.
Kindly share the link to updated repository.
Thanks,
## Using OCSF as a consumer
See [ocsf-server](https://github.com/ocsf/ocsf-server) documentation.
For the new datetime_t
data type, Proposal 11 had a profile that would overlay a companion attribute everywhere there is a timestamp_t
data type. Rather than use an actual profile, the schema server can have a switch or mode where this can be done globally (i.e. wherever there is a timestamp_t
attribute in any event class). How that is detected via the API or schema is yet to be determined but I like the idea. Otherwise it is the same as Proposal 11. The original timestamp_t
field must still be populated.
The intention of Required attributes within event classes is that the attribute is always present in every instance of the event. To be useful, reasonable default values must be spelled out so that the event can be validated, and that the semantics of the event can be reasonably (if not completely) represented.
For Enum attributes, the default is Unknown. For process PID integer attributes, the default may be 1, the primordial OS process (e.g. init).
However many other required attributes have not been documented as to what their default value should be.
categories.json appears to be an enumerated list of items, but the items are called "attributes" as if they were properties of an object.:
{
"caption": "Categories",
"name": "category",
"description": "Initial working list of categories (work in progress).",
"attributes": {
"system": {
"caption": "System Activity",
"description": "System Activity events.",
"uid": 1
},
"findings": {
"caption": "Findings",
"description": "Findings events report findings, detections, and possible resolutions of malware, anomalies, or other actions performed by security products.",
"uid": 2
},
...
represents a list of enumerated items:
UID | Name | Description |
---|---|---|
1 | system | System Activity: System Activity events |
2 | findings | Findings: Findings events report findings, detections, and possible resolutions of malware, anomalies, or other actions performed by security products. |
... | ... | ... |
But an example enum file is JSON data in a different format that represents the identical kind of enumerated list:
{
"enum": {
"1": {
"caption": "Login",
"description": "The event pertains to login activity."
},
"2": {
"caption": "IAM",
"description": "The event pertains to Identity and Access Management (IAM) activity (e.g. policy updates, user creations, etc.)."
},
"3": {
"caption": "Operational",
"description": "The event pertains to cloud resource operations activity (e.g. data downloads, launched virtual machines, etc.)."
}
}
}
UID | Name | Description |
---|---|---|
1 | Login | The event pertains to login activity. |
2 | IAM | The event pertains to Identity and Access Management (IAM) activity (e.g. policy updates, user creations, etc.). |
3 | Operational | The event pertains to cloud resource operations activity (e.g. data downloads, launched virtual machines, etc.). |
Use the same data structure for enumerated lists in categories.json, all files in the enums directory, and other files where enums are embedded.
Using different data to represent the same kind of information is unnecessary. Putting numeric UIDs into strings is confusing because strings are often displayed in lexical order instead of numeric order (as can be seen in URL Categories). And using caption as a name may be unsuitable when the item should have an identifier but the caption (e.g. "System Activity") is not an identifier.
The schema defines a container object, which is not used in any classes or other objects. In this case, do we really need the container object?
Determine what profiles/traits we would like to add to the network event classes. Once that decision is made we need to decide whether the dns/http classes should extend the network class or the base event class.
Migrated from Slack as a requirement for MVS release
From our Systems Activity Workstreams sync on 9/7, we discussed the following path forward to address some of the gaps for Process Injection events within the System Activity
category & Process Injection
class:
injection_type_id
: Change the enums as follows:
-1
Other
0
Unknown
1
Remote Thread
2
Load Library
module
object: Add a start_address
field:
module.start_address
: The start address of the execution.
module.file
: Change from Required
to Recommended
module.load_type_id = ShellCode
and other cases, there is no file path.Current CVSS Score Object is inconsistent with CVSS specs and requires an adjustment. There are three major versions of the CVSS (v1, v2 and v3) and each have different set of the associated attributes that contribute to CVSS score calculation.
https://www.first.org/cvss/v1/guide
https://www.first.org/cvss/v2/guide
https://www.first.org/cvss/specification-document
ocsf cvss score object | proposed cvss score object | cvss v1 | cvss v2 | cvss v3 | group |
---|---|---|---|---|---|
score | n/a | ||||
depth | n/a | ||||
severity | n/a | ||||
base_score | n/a | ||||
base_vector | n/a | ||||
risk | n/a | ||||
version | version | n/a | |||
access_complexity_id | access_complexity_id | access_complexity | access_complexity | base | |
attack_vector_id | attack_vector_id | attack_vector | base | ||
authentication_id | authentication_id | authentication | authentication | base | |
availability_impact_id | availability_impact_id | availability_impact | availability_impact | availability_impact | base |
confidentiality_impact_id | confidentiality_impact_id | confidentiality_impact | confidentiality_impact | confidentiality_impact | base |
integrity_impact_id | integrity_impact_id | integrity_impact | integrity_impact | integrity_impact | base |
privileges_required_id | privileges_required_id | privileges_required | base | ||
user_interaction_id | user_interaction_id | user_interaction | base | ||
attack_complexity_id | attack_complexity_id | attack_complexity | base | ||
scope_id | scope | base | |||
impact_bias_id | impact_bias | base | |||
attack_complexity_id | attack_complexity | base | |||
exploitability_id | exploitability | exploitability | temporal | ||
exploit_code_maturity_id | exploit_code_maturity | temporal | |||
remediation_level_id | remediation_level | remediation_level | remediation_level | temporal | |
report_confidence_id | report_confidence | report_confidence | report_confidence | temporal | |
collateral_damage_potential_id | collateral_damage_potential | collateral_damage_potential | environmental | ||
target_distribution_id | target_distribution | target_distribution | environmental | ||
confidentiality_requirement_id | confidentiality_requirement | confidentiality_requirement | environmental | ||
integrity_requirement_id | integrity_requirement | integrity_requirement | environmental | ||
availability_requirement_id | availability_requirement | availability_requirement | environmental | ||
modified_base_metrics_id | modified_base_metrics | environmental |
Option 1: keep one CVSS Score Object with all attributes from v1, v2 and v3. All attributes are optional and would be pulled based on version and depth attributes values.
Option 2: Create objects CVSS_V1. CVSS_V2 and CVSS_V3, include those objects in CVSS Score Object as an optional. Based on version attribute value appropriate object is expected.
Option 3: Since most data will include only Score value and potentially depth and version information, keep only Score, Depth, Sevirity and Version in CVSS Score Object.
Option 4: Keep CVSS Score Object as is today.
The lb_actions
attribute is a plural noun, so does it hold a single action or multiple actions value?
"lb_actions": {
"description": "The actions performed by the load balancer",
"name": "Load Balancer Actions"
}
Extension documentation is required for the initial OCSF release.
The DNS Activity class has the same caption (Failed) for 2 different activity_id values:
400302 DNS Activity: Failed The DNS request has failed.
400304 DNS Activity: Failed The network connection failed. For example a connection timeout or no route to host.
The enum captions must be unique, so that users can distinguish values without looking the enum integer value.
Currently the Virtual Machine object is used by the Cloud Virtual Machine event.
Are there any other applicable use cases? For example, virtual machines are used on a desktop or laptop computer.
Issue to track initial release requirement for no more breaking changes.
It is common to transport some data, such as payloads etc in form of byte array, without converting it. To cover those use cases we need to add new data type "bytestring_t" to represent immutable sequence of bytes. We can add it as subset of String with base64 encoding or as base type.
Initial release tracking for finalizing list of core objects, classes, and categories.
The _id
postfix is received for enum attributes, thus the dns_id
should be renamed as dns_uid
, dns_query_uid
, dns_packet_uid
, query_identifier
. Or perhaps use the general purpose uid
attribute.
Create a domain
object which contains information about the domain registrar, creation date, etc. Replace uses of domain
string_t
type dictionary attribute with hostname
. Also, create a domain profile to be used by network
class and potentially other classes.
Ported from Slack as a requirement for the MVS release.
The problem with event_uid is -- it gives a notion that the identifier is uniquely identifying an event (a unique occurrence of an activity as recorded by a log, think Cloudtrail event), thats going by what an Incident responder would think, at least based on my own experience.
However, the way we currently have event_uid it is simply saying this is a unique type of an event within OCSF which is akin to an event code/type.
First release Event Classes, checkbox checked if class complete and merged into main
Issue to track requirement for correct versioning for initial release.
To simplify the object definitions remove the object_type
and use the type
only. For example:
Current definition:
{
"caption":"Metadata",
"description":"The metadata associated with the event.",
"group":"context",
"object_type":"metadata",
"requirement":"required",
"type":"object_t"
}
New definition:
{
"caption":"Metadata",
"description":"The metadata associated with the event.",
"group":"context",
"requirement":"required",
"type":"metadata"
}
Tracking the contribution guidelines required for the initial OCSF release.
I am in the process of translating Windows Security Event 4771 (Kerberos pre-authentication failed) .
Since this is a failure event, the status_id
is always 2: Failure
.
Failure Code
field.Failure Code
to status_detail
Failure Code
is a hexadecimal value that represents a string, such as 0x18 == Pre-authentication information was invalid
After discussion with the endpoint-activity team, we found it useful to add two fields for Failure Code:
failure_code
failure_code_desc
Issue to track initial release requirement for documentation.
Should the tls_server_protocol
attribute have the same values as the tls_client_protocol
attribute?
As far as I checked, none of the JSON files currently refers to a JSON schema. I would like to suggest to define JSON schemas to make sure that changes in these files are valid according to the rules.
I'm happy to submit a PR but I might need some help to fully understand some design decisions.
The session_uid
attribute in System Activity -> Authentication/Authorization Audit
is redundant, as it is already covered within two other places: src_user.session_uid
and dst_user.session_uid
. Resolve this redunancy by removing the top-level session_uid
object from the two classes.
session_uid
from the Authentication Audit
and Authorization Audit
classes.session_uid
attribute in the dictionary.session_uuid
object to map session guids, such as Windows Logon GUID
Determine whether we need the storage event class or any additional classes within the cloud category or whether the cloud api class suffices.
Currently, the classes under the 'Audit' category all end with 'Audit'. This is an unnecessary redundancy.
Also, the includes/authentication.json
and enums/authentication.json
includes are only used by the Authentication
class. They should be removed as include files and consolidated into the main Authentication class.
OCSF is a framework where the base event classes should produce the same outcome for an event regardless of the vendor generating that event. To that end, "supporting" log sources means ensuring that multiple producers confirm that the schema is robust and flexible enough to accommodate the data they produce / report on.
For example, if product A and product B both send process_activity
logs, the end result after conversion to OCSF should look structurally the same, with the exception of optional attributes and any producer / vendor specific fields and objects which would be supported via an extension.
This issue aims to track the requirement for v1.0 that a minimum of 2 vendors or producers have confirmed that their data / logs can be converted into OCSF format, and as a secondary objective to confirm that two producers converting the same event data to OCSF format yield the same result.
The lease_time
and handshake_time
attribute represent time duration, thus they should be renamed as lease_dur
and handshake_dur
or lease_dur
and handshake_dur
This is the new list of reserved attributes that must not be used by the event producers:
_time
: timestamp_t, required
The normalized event occurrence time. Normalized time means the original event time ref_time was corrected for the clock skew and it was converted to the OCSF timestamp_t.
metadata._uid
: string, required
The unique identifier of an event instance. The attribute is used in the metadata object.
_raw_data
: string, optional
The event data as received from the event source. This attribute must be used when events are translated from some other that OCSF format. If the event is created using the OCSF schema, then the _raw_data must not be used.
{
"category": "configuration",
"description": "Basic attributes that capture configuration state.",
"extends": "base_event",
"caption": "Base Configuration",
"name": "base_configuration",
"uid": 4,
"attributes": {
"mac": {
"description": "The device Media Access Control (MAC) address.",
"group": "primary",
"requirement": "recommended"
},
"ip": {
"description": "The device IP address, in either IPv4 or IPv6 format.",
"group": "context",
"requirement": "optional"
},
"hostname": {
"description": "The device hostname.",
"group": "context",
"requirement": "optional"
},
"name": {
"description": "The alternate device name, ordinarily as assigned by an administrator. ",
"group": "context",
"requirement": "optional"
},
"cis": {
"requirement": "optional"
},
"os": {
"requirement": "optional"
}
}
}
The customer_uid attribute is missing from the base event. We need it to handle multi-tenant systems such as Splunk on cloud that handles events from multiple customers.
origin
objectorigin
under metadata
More context - slack convo
Original object for reference -
{
"caption": "Event Origin",
"name": "event_origin",
"description": "The event origin is where the event was created.",
"extends": "object",
"attributes": {
"cloud": {},
"container": {},
"customer_uid": {},
"device": {
"requirement": "recommended"
},
"feature": {},
"product": {
"requirement": "recommended"
}
},
"constraints": {
"at_least_one": [
"device",
"product"
]
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.