-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
feat(TaskProcessing): Add OCR TaskType #56908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bca2b42 to
e339591
Compare
|
Looks good! We could add an input for the language to be extracted and have it default to automatic detection, or add that as optional input only for providers that make use of it. Both fine for me, wdyt @julien-nc @kyteinsky ? |
|
Not sure the OCR libraries take a "language" param to help them perform an optimal extraction. @marcelklehr Do they? |
The latest models don't require a language input, but older libraries like tesseract may require this. I think an optional input is fine. |
e339591 to
483a4b2
Compare
483a4b2 to
42bf379
Compare
Signed-off-by: Marcel Klehr <[email protected]>
42bf379 to
3355e6a
Compare
| public function getInputShape(): array { | ||
| return [ | ||
| 'input' => new ShapeDescriptor( | ||
| $this->l->t('Input Image'), | ||
| $this->l->t('The image to extract text from'), | ||
| EShapeType::Image | ||
| ), | ||
| ]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be nice if it were a ListOfFiles so it can accept images and pdfs both, and multiple of them instead of a single one for a single task, which also keeps the task list shorter in the DB.
Summary
Adds a task processing task type for doing OCR
TODO
Checklist
3. to review, feature component)stable32)