Skip to content

Commit f9b84d7

Browse files
authored
JSON Lines support (#14439)
- Add a JSON lines file format. - Matches `.jsonl` extension. - Content types: `application/x-ndjson`, `application/jsonl`. - Follows on_problems when reading.
1 parent ee84679 commit f9b84d7

File tree

13 files changed

+239
-10
lines changed

13 files changed

+239
-10
lines changed

CHANGELOG.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -88,11 +88,9 @@
8888
- [Implement `Text_Column.text_mid` for in-memory and database backends.][14420]
8989
- [Initial file writing from DuckDB.][14421]
9090
- [Parquet file reading and writing, DuckDB formats.][14427]
91-
- [Trigonometry and other maths function on Column.][14433]
92-
- [Implement Text_Column to_case for DB backends][14386]
93-
- [Implement bulk loading to DuckDB][14402]
94-
- [Initial file writing from DuckDB][14421]
9591
- [Add Text_Column.index_of][14428]
92+
- [Trigonometry and other maths function on Column.][14433]
93+
- [Support for reading JSON lines files.][14439]
9694

9795
[13769]: https://github.com/enso-org/enso/pull/13769
9896
[14026]: https://github.com/enso-org/enso/pull/14026
@@ -120,8 +118,9 @@
120118
[14420]: https://github.com/enso-org/enso/pull/14420
121119
[14421]: https://github.com/enso-org/enso/pull/14421
122120
[14427]: https://github.com/enso-org/enso/pull/14427
123-
[14433]: https://github.com/enso-org/enso/pull/14433
124121
[14428]: https://github.com/enso-org/enso/pull/14428
122+
[14433]: https://github.com/enso-org/enso/pull/14433
123+
[14439]: https://github.com/enso-org/enso/pull/14439
125124

126125
#### Enso Language & Runtime
127126

distribution/lib/Standard/Base/0.0.0-dev/docs/api/System/File_Format.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,20 @@
3939
- read self file:Standard.Base.Any.Any on_problems:Standard.Base.Errors.Problem_Behavior.Problem_Behavior -> Standard.Base.Any.Any
4040
- read_stream self stream:Standard.Base.System.Input_Stream.Input_Stream metadata:Standard.Base.System.File_Format_Metadata.File_Format_Metadata -> Standard.Base.Any.Any
4141
- resolve constructor:Standard.Base.Any.Any -> Standard.Base.Any.Any
42+
- Standard.Base.System.File_Format.JSON_Lines_Format.for_file_write file:Standard.Base.Any.Any -> Standard.Base.Any.Any
43+
- Standard.Base.System.File_Format.JSON_Lines_Format.get_dropdown_options -> Standard.Base.Any.Any
44+
- Standard.Base.System.File_Format.JSON_Lines_Format.get_name_patterns -> (Standard.Base.Data.Vector.Vector Standard.Base.System.File_Format.File_Name_Pattern)
45+
- Standard.Base.System.File_Format.JSON_Lines_Format.read_stream self stream:Standard.Base.System.Input_Stream.Input_Stream metadata:Standard.Base.System.File_Format_Metadata.File_Format_Metadata -> Standard.Base.Any.Any
46+
- parse_boolean_with_infer field_name:Standard.Base.Data.Text.Text value:(Standard.Base.Data.Boolean.Boolean|Standard.Base.Data.Text.Text|Standard.Base.Nothing.Nothing) -> (Standard.Base.Data.Boolean.Boolean|Standard.Base.System.File_Format.Infer)
47+
- Standard.Base.System.File_Format.JSON_Format.from that:Standard.Base.Data.Json.JS_Object -> Standard.Base.System.File_Format.JSON_Format
48+
- type JSON_Lines_Format
49+
- for_file_write file:Standard.Base.Any.Any -> Standard.Base.Any.Any
50+
- get_dropdown_options -> Standard.Base.Any.Any
51+
- get_name_patterns -> (Standard.Base.Data.Vector.Vector Standard.Base.System.File_Format.File_Name_Pattern)
52+
- read_stream self stream:Standard.Base.System.Input_Stream.Input_Stream metadata:Standard.Base.System.File_Format_Metadata.File_Format_Metadata -> Standard.Base.Any.Any
53+
- for_read file:Standard.Base.System.File_Format_Metadata.File_Format_Metadata -> Standard.Base.Any.Any
54+
- read self file:Standard.Base.Any.Any on_problems:Standard.Base.Errors.Problem_Behavior.Problem_Behavior -> Standard.Base.Any.Any
55+
- resolve constructor:Standard.Base.Any.Any -> Standard.Base.Any.Any
4256
- type Plain_Text_Format
4357
- Plain_Text encoding:(Standard.Base.Data.Text.Encoding.Encoding|Standard.Base.System.File_Format.Infer)=
4458
- for_file_write file:Standard.Base.Any.Any -> Standard.Base.Any.Any
@@ -50,5 +64,4 @@
5064
- resolve constructor:Standard.Base.Any.Any -> Standard.Base.Any.Any
5165
- resolve_encoding self metadata:Standard.Base.System.File_Format_Metadata.File_Format_Metadata -> Standard.Base.Any.Any
5266
- get_format callback:Standard.Base.Any.Any -> Standard.Base.Any.Any
53-
- parse_boolean_with_infer field_name:Standard.Base.Data.Text.Text value:(Standard.Base.Data.Boolean.Boolean|Standard.Base.Data.Text.Text|Standard.Base.Nothing.Nothing) -> (Standard.Base.Data.Boolean.Boolean|Standard.Base.System.File_Format.Infer)
54-
- Standard.Base.System.File_Format.JSON_Format.from that:Standard.Base.Data.Json.JS_Object -> Standard.Base.System.File_Format.JSON_Format
67+
- Standard.Base.System.File_Format.JSON_Lines_Format.from that:Standard.Base.Data.Json.JS_Object -> Standard.Base.System.File_Format.JSON_Lines_Format

distribution/lib/Standard/Base/0.0.0-dev/src/Main.enso

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,7 @@ export project.System.File_Format.Bytes
194194
export project.System.File_Format.File_Format
195195
export project.System.File_Format.Infer
196196
export project.System.File_Format.JSON_Format
197+
export project.System.File_Format.JSON_Lines_Format
197198
export project.System.File_Format.Plain_Text_Format
198199
export project.System.Platform
199200
export project.System.Process

distribution/lib/Standard/Base/0.0.0-dev/src/System/File_Format.enso

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ import project.Data.Json.Json
44
import project.Data.Text.Case.Case
55
import project.Data.Text.Encoding.Encoding
66
import project.Data.Text.Text
7+
import project.Data.Vector.No_Wrap
78
import project.Data.Vector.Vector
89
import project.Errors.Common.Type_Error
910
import project.Errors.File_Error.File_Error
@@ -18,10 +19,12 @@ import project.Metadata.Display
1819
import project.Metadata.Widget
1920
import project.Panic.Panic
2021
import project.Runtime
22+
import project.Runtime.Ref.Ref
2123
import project.System.File.File
2224
import project.System.File.Generic.Writable_File.Writable_File
2325
import project.System.File_Format_Metadata.File_Format_Metadata
2426
import project.System.Input_Stream.Input_Stream
27+
import project.Warning.Warning
2528
from project.Data.Boolean import Boolean, False, True
2629
from project.Data.Json import Invalid_JSON
2730
from project.Data.Text.Extensions import all
@@ -419,6 +422,96 @@ JSON_Format.from (that : JS_Object) =
419422
_ = that
420423
JSON_Format
421424

425+
## A file format for reading and writing files as JSON Lines (i.e. a JSON object per line).
426+
https://jsonlines.org/
427+
type JSON_Lines_Format
428+
## ---
429+
private: true
430+
---
431+
Resolve an unresolved constructor to the actual type.
432+
resolve : Function -> JSON_Lines_Format | Nothing
433+
resolve constructor =
434+
_ = constructor
435+
Nothing
436+
437+
## ---
438+
private: true
439+
---
440+
If the File_Format supports reading from the file, return a configured
441+
instance.
442+
for_read : File_Format_Metadata -> JSON_Lines_Format | Nothing
443+
for_read file:File_Format_Metadata =
444+
content_type = file.interpret_content_type
445+
from_content_type = content_type.if_not_nothing <|
446+
case content_type.base_type of
447+
"application/x-ndjson" -> JSON_Lines_Format
448+
"application/jsonl" -> JSON_Lines_Format
449+
_ -> Nothing
450+
from_content_type.if_nothing <| case file.guess_extension of
451+
".jsonl" -> JSON_Lines_Format
452+
_ -> Nothing
453+
454+
## ---
455+
private: true
456+
---
457+
If this File_Format should be used for writing to that file, return a
458+
configured instance.
459+
for_file_write : Writable_File -> JSON_Lines_Format | Nothing
460+
for_file_write file = JSON_Lines_Format.for_read file
461+
462+
## ---
463+
private: true
464+
---
465+
get_dropdown_options : Vector Option
466+
get_dropdown_options = [Option "JSON Lines" (JSON_Lines_Format.to Meta.Type . qualified_name)]
467+
468+
## ---
469+
private: true
470+
---
471+
get_name_patterns -> Vector File_Name_Pattern =
472+
[File_Name_Pattern.Value "JSON Lines" ["*.jsonl"]]
473+
474+
## ---
475+
private: true
476+
---
477+
Implements the `File.read` for this `File_Format`
478+
read : File -> Problem_Behavior -> Any
479+
read self file on_problems:Problem_Behavior =
480+
text = file.read_text
481+
self.read_text text on_problems file
482+
483+
## ---
484+
private: true
485+
---
486+
Implements decoding the format from a stream.
487+
read_stream : Input_Stream -> File_Format_Metadata -> Any
488+
read_stream self stream:Input_Stream (metadata : File_Format_Metadata) =
489+
_ = metadata
490+
text = Text.from_bytes (stream.read_all_bytes) Encoding.utf_8
491+
self.read_text text Problem_Behavior.Report_Warning Nothing
492+
493+
private read_text text on_problems file =
494+
warnings = Ref.new []
495+
result = text.lines.map_with_index on_problems=No_Wrap.Value i->l->
496+
parsed = l.parse_json
497+
if parsed.is_error.not then parsed else case on_problems of
498+
Problem_Behavior.Ignore -> Nothing
499+
Problem_Behavior.Report_Warning ->
500+
warnings.put (warnings.get + [File_Error.Corrupted_Format file ("Malformed JSON line ("+i.to_text+"): "+l.to_display_text)])
501+
Nothing
502+
Problem_Behavior.Report_Error ->
503+
Error.throw (File_Error.Corrupted_Format file ("Malformed JSON line ("+i.to_text+"): "+l.to_display_text))
504+
result.if_not_error <|
505+
Telemetry.log "File_Format.read" "Read file: format={}, output={}" ["JSON Lines", result.to Meta.Type . name]
506+
warnings.get.reverse.fold result (r->w-> Warning.attach w r)
507+
508+
## ---
509+
private: true
510+
---
511+
JSON_Lines_Format.from (that : JS_Object) =
512+
_ = that
513+
JSON_Lines_Format
514+
422515
## A setting to infer the default behaviour of some option.
423516
type Infer
424517

distribution/lib/Standard/Base/0.0.0-dev/src/System/Internal/File_Helpers.enso

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import project.System.File.File
88
import project.System.File_Format.Bytes
99
import project.System.File_Format.File_Format_SPI
1010
import project.System.File_Format.JSON_Format
11+
import project.System.File_Format.JSON_Lines_Format
1112
import project.System.File_Format.Plain_Text_Format
1213

1314
polyglot java import java.nio.file.attribute.FileTime
@@ -40,5 +41,6 @@ File_Format_SPI.from (that:Default_File_Formats) =
4041
_ = that
4142
bytes = File_Format_SPI.new Bytes
4243
json = File_Format_SPI.new JSON_Format "json"
44+
jsonl = File_Format_SPI.new JSON_Lines_Format "jsonlines"
4345
plain = File_Format_SPI.new Plain_Text_Format
44-
bytes+json+plain
46+
bytes+json+jsonl+plain

distribution/lib/Standard/Table/0.0.0-dev/docs/api/Extensions/Table_Conversions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
- append_to_json_table file:Standard.Base.System.File.File table:Standard.Base.Any.Any on_problems:Standard.Base.Errors.Problem_Behavior.Problem_Behavior -> Standard.Base.Any.Any
44
- Standard.Base.Data.Time.Date_Range.Date_Range.to_table self name:Standard.Base.Data.Text.Text= -> Standard.Base.Any.Any
55
- Standard.Base.System.File_Format.JSON_Format.write_table self file:Standard.Base.System.File.Generic.Writable_File.Writable_File table:Standard.Base.Any.Any on_existing_file:Standard.Base.Any.Any match_columns:Standard.Base.Any.Any on_problems:Standard.Base.Errors.Problem_Behavior.Problem_Behavior= -> Standard.Base.Any.Any
6+
- Standard.Base.System.File_Format.JSON_Lines_Format.write_table self file:Standard.Base.System.File.Generic.Writable_File.Writable_File table:Standard.Base.Any.Any on_existing_file:Standard.Base.Any.Any match_columns:Standard.Base.Any.Any on_problems:Standard.Base.Errors.Problem_Behavior.Problem_Behavior= -> Standard.Base.Any.Any
67
- Standard.Base.Data.Json.JS_Object.to_table self fields:Standard.Base.Any.Any= -> Standard.Base.Any.Any
78
- Standard.Base.Data.Range.Range.to_table self name:Standard.Base.Data.Text.Text= -> Standard.Base.Any.Any
89
- Standard.Table.Table.Table.from_objects value:Standard.Base.Any.Any fields:(Standard.Base.Data.Vector.Vector|Standard.Base.Nothing.Nothing)= -> Standard.Base.Any.Any

distribution/lib/Standard/Table/0.0.0-dev/src/Extensions/Table_Conversions.enso

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,25 @@ JSON_Format.write_table self file:Writable_File table on_existing_file match_col
170170
file.write_requiring_local_file Existing_File_Behavior.Append local_temp_file->
171171
append_to_json_table local_temp_file table on_problems
172172
_ -> table.to_json.write file on_existing_file=on_existing_file on_problems=on_problems
173+
## ---
174+
private: true
175+
advanced: true
176+
---
177+
Implements the `Table.write` for this `JSON_Lines_Format`.
178+
179+
## Arguments
180+
- `file`: The file to write to.
181+
- `table`: The table to write.
182+
- `on_existing_file`: What to do if the file already exists.
183+
- `match_columns`: How to match columns between the table and the file. Not
184+
used for JSON.
185+
- `on_problems`: What to do if there are problems reading the file.
186+
JSON_Lines_Format.write_table : Writable_File -> Table -> Existing_File_Behavior -> Match_Columns -> Problem_Behavior -> File
187+
JSON_Lines_Format.write_table self file:Writable_File table on_existing_file match_columns on_problems:Problem_Behavior=..Report_Warning =
188+
_ = match_columns
189+
table_json = table.to_js_object.map .to_json . join '\n'
190+
used_json = if on_existing_file==Existing_File_Behavior.Append && file.exists then '\n'+table_json else table_json
191+
used_json.write file on_existing_file=on_existing_file on_problems=on_problems
173192

174193
## ---
175194
private: true

distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Widget_Helpers.enso

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,7 @@ parse_type_selector include_auto=True =
309309
Selector for writing a table to a file.
310310
write_table_selector : Widget
311311
write_table_selector =
312-
can_write type = if type == JSON_Format then True else
312+
can_write type = if type==JSON_Format || type==JSON_Lines_Format then True else
313313
Meta.meta type . methods . contains "write_table"
314314
all_types = [Auto_Detect] + (File_Format_SPI.get_types.filter can_write)
315315
Single_Choice display=Display.Always values=(all_types.flat_map .get_dropdown_options)
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{"arr": [1, 2, 3],"num": 42.5,"not": null}
2+
[1,2,3,null,5]
3+
true
4+
null
5+
"Hello World!"
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{"arr": [1, 2, 3],"num": 42.5,"not": null}
2+
[1,2,3,n
3+
True
4+
null
5+
"Hello World!"

0 commit comments

Comments
 (0)