asprsorg / las Goto Github PK
View Code? Open in Web Editor NEWLAS Specification
Home Page: https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
LAS Specification
Home Page: https://www.asprs.org/committee-general/laser-las-file-format-exchange-activities.html
The ExtraByte definitions rely on a 32-character name
field for their identification. It'd be helpful (for me at least) to clarify whether these are intended to be case-sensitive.
I'm going to propose that we make them explicitly not case-sensitive because some implementations of the future standard ExtraBytes have been all lowercase (@rapidlasso seems to follow this convention), while others have capitalized the first letters (Riegl @csevcik01 seems to follow this convention).
Making them case-insensitive makes it easier to standardize for #37 but harder to implement in some programming languages. However, I can also see some benefit to recommending that names always be lowercase because it's easier to program.
As-is, I interpret the specification to imply that it's case-sensitive. But I might be alone here.
Multiple warnings/errors are present in the TRAVIS build logs:
Screenshots from this build: https://travis-ci.org/ASPRSorg/LAS/builds/264392699
We've discussed listing some standardized extrabytes either in the specification itself or a supplementary document. This would encourage their adoption by the community and formalize common extrabytes as a guideline to future implementations.
We need to figure out the following:
data_type
be formalized?Below is a link to what I think is a decent start to the standardized extrabytes. Once we get some agreement on a few of these I can start building a wiki page or contribute to Martin's pull request. Which one we do depends on the answer to the 4th question.
Current ExtraByte implementation allows "tuple" and "triple" data types, although actual support for this appears to be practically nonexistent and inclusion of it in the specification complicates implementation by a significant margin.
This change would deprecate data_type
values 11-30 (Table 24), leaving them unassigned.
Additionally, the no_data
, min
, max
, scale
, and offset
triples should be converted to singles, leaving the space for doubles and triples as Reserved. Explicitly recommend that these values be set to zero.
Instead, add a recommendation that associated ExtraBytes be assigned an appropriate suffix, such as [1], [2], and [3]. For example, the topobathy domain profile should be altered to have three identical `sigma xyz' fields named 'sigma xyz[1]', 'sigma xyz[2]', and 'sigma xyz[3]'.
Clarification is required on the usage of the scan edge flag. Should it be the first return for a given pulse, last return, all returns, etc? I believe it should tbe the last return prior to changing direction/scanlines and the first return after changing direction/scanlines.
Also need clarity on rotating prism style scanners and circular scan patterns.
See attached relevant conversation.
edge flag email chain.pdf
Right now the specification states that the min and max are 'anytype' when in reality they should be double
precision floats. This would align with most popular implementations of ExtraByte definitions and aligns with the xyz min/max as defined in the LAS header.
This will result in a slight loss of range for 64-bit integer types, but the impact of this is likely extremely small. Consistency outweighs this loss.
The headings and text file division don't really make a lot of logical sense to me, weren't really properly implemented in the original spec, and seem hard to maintain in the long run. It should be pretty straightforward to redo the headings and text file divisions into something more sensible.
I'm thinking something like the attached organization should work fine, with one text file for each major section except number 2, which will have a text file for each subsection:
Can probably be implemented alongside #28. Note that the TOC only shows sections and subsections, so 3-digit sub-sub-sections won't appear on the TOC, such as 2.6.7 (PDRF6).
las toc.docx
Harris has expressed concern that LAS doesn't adequately support Geiger-Mode (GM) lidar. Is there a way to enhance/revise LAS to support GM lidar?
LAS 1.4 allows internal and external storage of WDPs in a special EVLR. Clarify explanation on page 10 that the `Start of Waveform Data Packet Record" in the LAS 1.4 header should be set to zero if storing externally.
Right now the LAS Specification is published as a static PDF. However, it could just as easily be published as HTML using the reStructuredText that @hobu implemented for this interface, and might even be the more convenient format for our users. What are the arguments for and against this transition?
The current GitHub port has a few typographic errors that cause it to be unintentionally different from the original. Fix them.
At ILMF I received a request that we add a String ExtraByte as data_type=31
for things like source file name, descriptors, etc. This would basically be a char
array of some length. I see three possible ways to do this:
N
as an unsigned short
, followed by N char
values that compose the string itself.length
attribute in the ExtraByte definition, such as in two of the unused
bytes in the EXTRA_BYTES struct
. Unused char
s would be set to zero.There's potential for this to cause LAS files of tens of millions of points to explode, so I'm generally against the idea. Every use case I can think of is better served using one of the int
data types and a lookup table, but since it came up at ILMF I think it's worth discussing.
Links can be added to the document using the :ref: directive. Update the documentation with internal links to other parts of the specification. Requires creating labels in front of relevant headings like so:
.. _sectionname_label:
My Section Header
Then referencing the text following that label like so:
Blah blah :ref:`sectionname_label` links are really cool.
Which becomes:
Blah blah My Section Header links are really cool.
References:
Post deleted.
(This thread became a discussion about deprecating the legacy PDRFs. Rather than move all of that discussion to a new thread, I instead recreated the original thread about zeroing legacy point counts in a new thread here: #12)
See the first post by @HKHeidemann for this thread's initlization.
Currently the Travis-CI build is setup to have a PDF link with the commit # built into its URL, so as far as I can tell it's impossible for GitHub to have a link to the current PDF in the README.md. This would be very useful to have for intermediate builds, although one could argue that the ONLY valid link is the current version posted on the ASPRS website.
Some ideas:
I received an email from the USGS QC team indicating that the System Identifier (SystemID) field in the LAS header quite often is something other than the sensor used to collect the data. The spec is deliberately vague on its actual implementation, and in my opinion that's a good thing.
I believe it's clear how to implement it for a file containing a single flightline, but it's unclear how this field can be best utilized if data from multiple sensors are in the LAS file. For example, 32 characters is barely enough for something simple like "Leica ALS80 & Riegl 1560i" but if serial numbers are desired it gets too long.
Does anyone have experience with a good multi-sensor implementation they're happy with that we can recommend? I'm not sure that this merits a change to the specification, but we can perhaps provide guidance to USGS for their contractors.
@jstoker74
I think it makes sense to add references to the GitHub page to the specification, specifically in the Authority section and perhaps the Domain Profiles section. Any objections?
I recently learned this historic information from Lewis:
Return 0 is intentionally in the spec. It was originally for Hardware vendors to be able to indicate that a pulse was emitted but no returns were detected (I don’t recall if we were going to do 0 of 0 or 0 of 1). This was never implemented by anyone. It would be incredibly valuable if it were!
Clarify the meaning of Return 0 in the LAS specification.
We need to expand the attribute for scan angle to accommodate Palmer/elliptical scanning configurations.
The original PDF version had the following components that are missing from the title page of the GitHub LaTeX PDF output:
I can't figure out how to modify the title page contents to add them back in, so maybe @hobu can? Most of this can probably be moved to a different page, but I think at the very least the approval dates should be on the cover.
The last page of the PDF from index.txt looks like it's unresolved code that's supposed to create indices. I'm not sure what it's supposed to do.
From Martin:
We should also fix these copy and paste errors (and whichever fixes Lewis and others have suggested since rev 13).
OGC COORDINATE SYSTEM WKT:
User ID: LASF_Projection
Record ID: 2112
This record contains the textual data representing a Coordinate System WKT as defined in
section 7 of the Coordinate Transformation Services Spec, with the following notes:
● The OGC Math Transform WKT VLR data shall be a null-terminated string.
● The OGC Math Transform WKT VLR data shall be considered UTF-8.
● The OGC Math Transform WKT VLR data shall be considered C locale-based, and no localization of the numeric strings within the WKT should be performed.
It came to my attention that there's some conflict between the geotiff specification (http://geotiff.maptools.org/spec/geotiff6.html#6.3.4)...
...and the EPSG registry (http://epsg-registry.org/export.htm?wkt=urn:ogc:def:crs:EPSG::5703)...
...on the correct way to define heights for the LAS 1.2 geotiff style of vertical coordinate systems.
Is one of those more correct than the other? Or are both equally correct?
On the one hand, the LAS spec states throughout that it's intended to follow the Geotiff specification without mentioning EPSG explicitly, but on the other hand the Geotiff specification doesn't list all possible coordinate systems and so it's been generally understood to imply that EPSG codes should be supplemented for missing CRS definitions.
See this LAStools conversation for some context: https://groups.google.com/d/msg/lastools/9fUZaLKPReg/1ekK6mNCAwAJ
In the current implementation, Adjusted Standard GPS Time is gradually losing precision. Currently it's able to store timestamps at the resolution of 0.0596 microseconds. In 3 years that goes down to 0.1192 microseconds.
Relevant discussion here: https://groups.google.com/forum/#!msg/lasroom/s3-OR4LP1IE/Br-PndbgCwAJ;context-place=forum/lasroom
Howard's port from Word didn't include bold text in the headers of the tables. Fix that to make tables more readable. Also, ensure that line breaks occur properly and that tables break across pages.
Section numbers are getting assigned in the Table of Contents but not in the actual body text. For example, note how "Format Definition" was assigned section number 4 in the TOC, but the 4 is missing on Page 8.
Currently, the spec only describes the bytes size for elements in the header and the different point data records. Additionally providing byte offsets to each attribute would be a huge help to developers writing las readers.
e.g. current version
Item | Format | Size | Required |
---|---|---|---|
File Signature (“LASF”) | char[4] | 4 bytes | * |
File Source ID | unsigned short | 2 bytes | * |
Global Encoding | unsigned short | 2 bytes | * |
Project ID - GUID data 1 | unsigned long | 4 bytes | * |
Project ID - GUID data 2 | unsigned short | 2 bytes | * |
Project ID - GUID data 3 | unsigned short | 2 bytes | * |
Project ID - GUID data 4 | unsigned char[8] | 8 bytes | * |
With offsets:
Item | Format | Size | Offset | Required |
---|---|---|---|---|
File Signature (“LASF”) | char[4] | 4 bytes | 0 | * |
File Source ID | unsigned short | 2 bytes | 4 | * |
Global Encoding | unsigned short | 2 bytes | 6 | * |
Global Encoding | unsigned short | 2 bytes | 8 | * |
Project ID - GUID data 1 | unsigned long | 4 bytes | 10 | * |
Project ID - GUID data 2 | unsigned short | 2 bytes | 14 | * |
Project ID - GUID data 3 | unsigned short | 2 bytes | 16 | * |
Project ID - GUID data 4 | unsigned char[8] | 8 bytes | 18 | * |
While odd-numbered pages correctly use the most recent H1-level heading in the footer, even-numbered pages use "Contents". I can't figure out where this is set in the Alabaster template/style. Maybe you can, Howard?
The "extra space" could be used to describe two alternate units as follows. After deprecating "tuples" and "triples" we could reuse the array entires [1] and [2] of no_data, min, max, scale, and offset. Below my "old" struct of 192 bytes that is the payload of the official "extra bytes" VLR.
struct LASattribute
{
U8 reserved[2]; // 2 bytes
U8 data_type; // 1 byte
U8 options; // 1 byte
CHAR name[32]; // 32 bytes
U8 unused[4]; // 4 bytes
F64 no_data[3]; // 24 = 3*8 bytes // last 16 bytes deprecated
F64 min[3]; // 24 = 3*8 bytes // last 16 bytes deprecated
F64 max[3]; // 24 = 3*8 bytes // last 16 bytes deprecated
F64 scale[3]; // 24 = 3*8 bytes // last 16 bytes deprecated
F64 offset[3]; // 24 = 3*8 bytes // last 16 bytes deprecated
CHAR description[32]; // 32 bytes
};
and - just to present a tangible example - we could re-use it as shown below to offer two alternate units assuming the conversion can be done by changing the scale and the offset.
struct LASattribute
{
U8 reserved[2]; // 2 bytes
U8 data_type; // 1 byte
U8 options; // 1 byte
CHAR name[32]; // 32 bytes
U8 unused[4]; // 4 bytes
F64 no_data; // 8 = 1*8 bytes
U8 unit // 1 byte
CHAR unit_name[15]; // 15 bytes
F64 min; // 8 = 1*8 bytes
U8 alt_unit1 // 1 byte
CHAR alt_unit1_name[15]; // 15 bytes
F64 max; // 8 = 1*8 bytes
U8 alt_unit2 // 1 byte
CHAR alt_unit2_name[15]; // 15 bytes
F64 scale; // 8 = 1*8 bytes
F64 alt_unit1_scale; // 8 = 1*8 bytes
F64 alt_unit2_scale; // 8 = 1*8 bytes
F64 offset; // 8 = 1*8 bytes
F64 alt_unit1_offset; // 8 = 1*8 bytes
F64 alt_unit2_offset; // 8 = 1*8 bytes
CHAR description[32]; // 32 bytes
};
Originally posted by @rapidlasso in #37 (comment)
Currently we still exchange emails via being cc-ed on personal emails. Can we switch to a mailing list such as google groups or else? I know that Jason Stoker had created some group. Has there been any activity? I have not checked it as it seems to be behind the ASPRS paywall and thereby not providing the community transparency that I was hoping for ...
I accidentally included a change in #42 to modify "dynamic range" to "bit-depth" in the Intensity field description. While the proposed change may not be appropriate for #35, I am curious if there should be a deeper discussion on the bit-depth mapping specified. Specifically, shouldn't the scale factor for mapping bit-depths be: (2output-depth - 1)/(2input-depth -1)? In the example given in the Intensity section, this would make the scale factor (65535/1023) instead of (65536/1024). This change provides the correct mapping of full-ranges regardless of whether bit-depth is increasing or decreasing (as may be the case for NRGB fields).
Further, there appears to be some confusion in the spec on whether the dynamic range is being mapped, or the bit-depth. The language of the spec states "dynamic range", but the math implies "bit-depth." To map dynamic range, shouldn't a histogram stretch or similar approach be applied prior to the bit-depth mapping? Otherwise, there is no guarantee that the bit values carry similar meaning across various files.
I believe it would be useful to append a row to each PDRF description table that provides the total PDRF size. I do this on my printed copy anyway, and it'd be nice to have that supplied to avoid any computational errors.
The specification does NOT say the the "file source ID" and the "point source ID" MUST be related. Should it say that? My understanding so far was that - at least for airborne / mobile / terrestrial LiDAR - the "file source ID" should be non-zero for individual flightlines / trajectories / scan positions and zero for tiles (who instead use the "point source ID" to specify which flightline / trajectory / scan position a point originates from). The issue was raised here.
It's hard to know which source file to edit based on the PDF. Add section numbering to the txt filenames to make it easier.
The "Return Number" and "Number of Returns" descriptions for PDRFs 0-5 claim that only 5 returns can be stored, but the bits actually allow storage of up to 7 returns. I've seen a LOT of LAS files that go up to 7 returns. We should change the description to allow for this and add a note that returns 6-7 will not be tabulated in the header's Number of Returns tallies (or should they be tabulated as 5th returns?)
It looks like we could use the GitHub Releases API to deploy the LAS.PDF and LAS.TEX files from the TRAVIS build directly into the GitHub interface. I really like the idea of exposing the raw LaTeX that's produced to help with troubleshooting, and having direct access to the PDF. Could help with #22, too, and maybe eliminate the need to rely on @hobu 's AWS account.
Main issues I see are that it requires tagging commits in order to trigger GitHub's creation of a Release, which could get messy, and that I couldn't figure out how to embed an API key in the travis.yml file without having it purged/revoked by GitHub.
Link here: https://docs.travis-ci.com/user/deployment/releases/
Lewis formally requested that Class 19 be reserved as "Conveyor and overhead mine equipment." Please document objections here.
Every other PDRF says that Channel is required, but PDRF 9 says it's not. Make it consistent with the others.
The intended type for LAS 14 ExtraByte MinMaxNodata.pdf
The community seems to agree that when non-legacy PDRFs are used (6-10), then the legacy point count should be zeroed. Normally, a literal interpretation of the specification would cause a developer to assign the legacy point count to be identical to the extended point count, but this could deceive legacy LAS readers into attempting to read points that they don't support. Better to assign it to zero so that legacy readers correctly interpret the header to mean that there are no points that it can read.
Clarify this recommendation in the LAS header explanation. Also apply this change to the point counters by return.
Karl responded initially with "I concur".
Martin produced some helpful graphics for explaining the expectations for full waveform (FWF) data encoding. He has worked with the major sensor manufacturers on adopting these conventions, so it would be helpful to reflect that in the specification itself.
Relevant discussion here:
https://groups.google.com/forum/#!msg/lasroom/g4VsXH3CRZo/pMn_JFQD0kwJ;context-place=msg/lasroom/s3-OR4LP1IE/Br-PndbgCwAJ
LAS specification explicitly cites OGC WKT 2001 as the standard for WKT dialect, but OGC(2015) has since migrated to align with ISO(2009). Address any changes to LAS 1.4 that need to happen to modernize.
In addition, coordinate with ISO/EPSG for a standardized method to identify geoid model.
Finally, standardize language for definition of compound coordinate systems (COMPD_CS tag).
Relevant discussion here: https://groups.google.com/forum/#!msg/lastools/BqKml7QPIik/0JohI5qTDQAJ;context-place=forum/lastools
The Table of Contents correctly assigns section numbers:
But the section & subsection headings themselves don't seem to have the numbers:
I've tried a few different iterations of \setcounter{secnumdepth}
but can't figure out what, exactly, is causing LaTeX to swallow the section numbers. This is what it's supposed to do for HTML output, but as far as I can tell it's supposed to produce these numbers for PDF output.
@hobu Have any ideas?
From a private conversation I had with Karl back in 2014:
Overlap is any area that is covered by 2 or more swaths. Nice, but not the phenomenon or condition we want and need to identify. What we are concerned with is more properly called OVERAGE. Harder to define without a picture, but, imagine a project intended for single coverage at 20% sidelap, with swaths 1km wide. The intent is to use the center 800m of each swath … what I refer to as “the tenderloin”, and to edge-match all of the project’s tenderloins together for a nice seamless, gapless single coverage.
In this scenario (which was and remains a VERY common practice for years), for each swath, the 100m outboard of the tenderloin (on each side) is “excess” data. Still perfectly good data, just not needed to accomplish the single coverage. That 100m on each side of each swath is the OverAGE for that swath. And those are the points that are to be flagged with the (regrettably mis-named) “Overlap” Flag. (On the other hand, if you tagged all of the points in the true “OverLAP”, only 60% of the swath would remain, and without those points, you’d have a 200m gap between swaths).
The genesis of all this was that data producers would classify all of the overage points as Class 12, and then do their DEMs based on the tenderloin-based single coverage. Which made nice DEMs, because you eliminated the stripes of double-density (higher variability) data that were plainly visible in the resulting rasters. The PROBLEM with that is users who were interested in something OTHER than the DEM could not do anything with those points without “damaging” the dataset classification (losing the differentiation of the “overage” points). It’s all a remnant of the days when DEMs were King and nobody regarded the point cloud as anything but a means to that end. The practices became entrenched, but then we realized that Lidar was not just for DEMs anymore. Now we are struggling to break old bad habits.
In the v1.0 Spec (using LAS v1.2/1.3), we required that “overage” point be flagged using some means other than Class 12, that allowed both “normal” classification AND differentiation of the overage condition. Makes the data useful beyond just the DEM. How this was done was defined, because data producers had their own systems and processes and there could not be a one-size-fits-all solution. NOW, however, using LAS v1.4, there IS a one-size-fits-all solution: The “Overlap Flag”.
And from a followup conversation:
For “Overlap/age”, your definition is keyed on modeling, and gives the impression that these points should not be used for analysis, in a general or broad-ranging sense. I think the real key is the spatial location, relative to the swath and its overlapping swaths. I would say: Overlap: bit flag to mark points that are unnecessary to achieve a consistent “depth of coverage” (meaning single, double, whatever). No implications are made regarding point quality. To be certain, that description needs a great deal of refinement before “publication”, but it properly limits the term to spatial location/relationships. I would not want to make any implication as to whether anybody should or should not use the points however they choose. That’s their business. Additionally, SOME people may interpret your definition as extending to not “modeling” these points for classification purposes.
The specification itself never explicitly states the origin date/time for timestamps in Adjusted Standard GPS Time. I believe it's midnight of the morning of January 6th, 1980.
Currently the header lists statistics like pt count by return and min/max XYZ, but it'd be useful to have stats for other attributes. Examples:
I know there's existing implementations of something like this, so ideally we could adopt/extend one of those to encourage adoption.
From Lewis:
19 – Conveyor or other overhead machinery (see #26)
20 – Ignored Ground (typ. breakline proximity)
21 – Snow
22 – Temporal Exclusion (often where a tidal-coordinated or topobathy swath supercedes other data)
19 is a de facto class that has been in use for mine site mapping.
20-22 are requests from USGS
Because the (E)VLR description field is a fixed length, the char
array doesn't have to be null-terminated like the spec claims, if all 32 characters are used in the description. All the readers I've tested support this functionality.
Don't forget to update both the VLR and EVLR descriptions.
There has historically been significant confusion regarding scan angles and whether or not the Scan Angle Rank should be roll-compensated, cross-track vs. along-track vs. true, etc. It might be useful to convert some of the offline and Google Group discussions into a wiki page. I've attached one such LWG discussion for the record.
Quantum Spatial Mail - Scan Angle Rank.pdf
There's extensive confusion about how to implement the 128-bit GUID in the LAS header, particularly since the standard method for UUID on wikipedia uses five elements (4-2-2-2-6 in bytes) and the LAS header uses four elements (4-2-2-8 in bytes). It's unclear how they should be split, whether UTF-8 characters are encoded vs. numbers, and whether the byte ordering should be Little-Endian (as the rest of the LAS file) or Big-Endian (like the UUID standard).
More detail in the following discussions:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.