Better handle trailing commas with comments#126
Better handle trailing commas with comments#126willbeason wants to merge 7 commits intogoogle:mainfrom
Conversation
|
Leaving it as open for discussion about whether/how to detect the correct comment characters. |
| 3 # three | ||
| keep-sorted-test end | ||
|
|
||
| Trailing comma except for last, last has a comment |
There was a problem hiding this comment.
Note the inconsistent spacing here. I played with an approach that made spacing consistently one space after the comma, but it was a bit more complex. I have this behavior in the PR since the implementation is simpler.
There was a problem hiding this comment.
Huh, yeah, that's weird. I'd somewhat prefer if we kept the spacing the same as what the user wrote originally. Oh I see, the .in file had two spaces to keep the comments aligned, and then we added a comma which bumped it out by one.
Maybe we should replace a single space with a comma, but if the comment only has a single leading space, preserve that space?
For lines/sequences of blocks which appear to be a comma-delimited list, do not consider comments when performing this check. Also, insert commas before comments to avoid mangling list. I kept the match/replace logic in lineGroup, as hasSuffix/trimSuffix were there. This approach does have a few weaknesses. It doesn't properly handle sorting what appear to be quoted comments, e.g.: - "a, #", - "b, //", - c It also doesn't properly handle commented commas: - a, # some comment, # more comment - b, # another comment, # a comment It is better at handling the desired cases, and the above counterexamples where it fails feel more like less-common edge cases. Lastly, it assumes "#" and "//" are always comments, and doesn't allow for alternatives. Handling the above would need quite a bit more work - likely parsing the lines themselves and adding list_delimiter and/or comment_delimiter options.
Comment spacing in lists is now handled consistently. Comments begin one space character after the comma delimiter, except for the terminal element, which has no comma and instead has two spaces before any comment.
endsInComma was unnecessary since it's handled by a lineGroup method.
Undo changes to comments made by renaming via the IDE.
Remove global regexp variables and move them into the function directly. Also change to "allMatchSuffix" to be closer to original code.
Simplify implementation to not worry about spaces consistency. Instead, just preserve spaces before comments as-is. Also add examples that show when comments still aren't handled consistently.
cf32a58 to
ab7d555
Compare
|
|
||
| if n := len(dataGroups); n > 1 && allHaveSuffix(dataGroups[0:n-1], ",") && !dataGroups[n-1].hasSuffix(",") { | ||
| dataGroups[n-1].append(",") | ||
| knownCommentMarkersList := knownCommentMarkers() |
There was a problem hiding this comment.
block.metadata.opts records which comment marker is used in the keep-sorted start directive. That's what we use for sticky_comments and that can be expanded with sticky_prefixes (and that commentMarker is automatically added to the list of sticky_prefixes assuming sticky_comments is true)
What do you think about using block.metadata.opts.StickyPrefixes for this instead of knownCommentMarkers()? That gives users a lever they can use to configure this behavior
| lineEnd := regexp.MustCompile(fmt.Sprintf("(\\s*(%s).*)?$", knownCommentMarkersStr)) | ||
|
|
||
| if n := len(dataGroups); n > 1 && allMatchSuffix(dataGroups[0:n-1], commaLineEnd) && !dataGroups[n-1].matchesSuffix(commaLineEnd) { | ||
| dataGroups[n-1].replaceSuffix(lineEnd, ",$1") |
There was a problem hiding this comment.
Do you happen to know how nested capturing groups behave in go? Is $1 referencing (\\s*(%s).*) or is it referencing just (%s)?
Either way, we only seem to be using $1, so consider using a non-capturing group for the other set of parens so that you don't need to answer this question ;)
| } | ||
| knownCommentMarkersStr := strings.Join(knownCommentMarkersList, "|") | ||
|
|
||
| commaLineEnd := regexp.MustCompile(fmt.Sprintf(",(\\s*(%s).*)?$", knownCommentMarkersStr)) |
There was a problem hiding this comment.
Use grave ticks so you don't need to escape the backslash
| for i := len(lgs) - 1; i >= 0; i-- { | ||
| if len(lgs[i].lines) > 0 { | ||
| lgs[i].trimSuffix(",") | ||
| lgs[i].replaceSuffix(commaLineEnd, "$1") |
There was a problem hiding this comment.
Is this why the spacing seems weird? Should we replace this with " $1" so that the comma is essentially replaced with a space?
| 3 # three | ||
| keep-sorted-test end | ||
|
|
||
| Trailing comma except for last, last has a comment |
There was a problem hiding this comment.
Huh, yeah, that's weird. I'd somewhat prefer if we kept the spacing the same as what the user wrote originally. Oh I see, the .in file had two spaces to keep the comments aligned, and then we added a comma which bumped it out by one.
Maybe we should replace a single space with a comma, but if the comment only has a single leading space, preserve that space?
| TODO: https://github.com/google/keep-sorted/issues/33 - Fix this | ||
| keep-sorted-test start | ||
| "c", | ||
| "b, //", |
There was a problem hiding this comment.
block=yes has some logic that looks at quotes and tries to understand whether we're in a quoted section or not:
keep-sorted/keepsorted/line_group.go
Line 354 in e2a9a7f
I wonder if we could leverage that here to exclude comment leaders that are actually part of a string constant
Per my other comment, that code should probably use StickyPrefixes instead of just commentMarker, but that's again me just thinking out loud
Issue: #33
For lines/sequences of blocks which appear to be a comma-delimited list, do not consider comments when performing this check. Also, insert commas before comments to avoid mangling list entries.
I put the match/replace logic in lineGroup, as hasSuffix/trimSuffix were there.
This approach does have a few weaknesses. It doesn't properly handle sorting entries with quoted comment characters, e.g.:
It is better at handling the desired cases, and the above counterexamples where it fails feel more like less-common edge cases. Importantly though - with the preexisting logic, the above would be handled correctly, but now would not be.
Lastly, it always tries to match all comment types. So even if "#" isn't a valid comment character in the context, it will still try to use it. I attempted an approach that used the opt.commentMarker, but that is often not set in tests. I could go and add it to existing tests where it matters, and then we could assume from how the blocks are defined which comment markers are relevant. This would get closer to handling the above well.
Fully "correctly" handling the above would need a bit more work - likely parsing the lines themselves and adding quote_delimiter and/or comment_delimiter options.