Skip to content
This repository was archived by the owner on Mar 31, 2026. It is now read-only.

WIP: New hashing schema#59

Open
quaxsze wants to merge 9 commits intoetalab:masterfrom
quaxsze:hash-enhancement
Open

WIP: New hashing schema#59
quaxsze wants to merge 9 commits intoetalab:masterfrom
quaxsze:hash-enhancement

Conversation

@quaxsze
Copy link
Copy Markdown
Contributor

@quaxsze quaxsze commented Jan 27, 2020

No description provided.

@quaxsze
Copy link
Copy Markdown
Contributor Author

quaxsze commented Feb 4, 2020

@abulte I would need a review here. I left several comments in the code after implementing the double cache feature.
The question comes from the upload view.
The uploaded file will not have a URL hash. Therefore the way we look for it in tableview or exportview will not work.

A work around would be to change the endpoint:
"{scheme}://{request.host}/api/{urlhash}"
to
"{scheme}://{request.host}/api/{filehash}"

This would work but would it make the URL hash useless and not relevant?
Since we would not use it for anything else than to retrieve an DB entry for the first step of cache validation (and could be done by seeking with the filehash directly?)

@quaxsze quaxsze requested a review from abulte February 4, 2020 16:26
Copy link
Copy Markdown
Contributor

@abulte abulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About upload view: I think we can rely on file hash only in this case, and store a url hash = None.

Comment thread csvapi/utils.py
else:
raise RuntimeError('Func get_db_info need at least one not none argument')

res = c.fetchone()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code below can probably be made cleaner/shorter by getting column values in a dict instead of a list eg res[‘uuid’].

Comment thread csvapi/parseview.py
logger=app.logger,
sniff_limit=app.config.get('CSV_SNIFF_LIMIT'),
max_file_size=app.config.get('MAX_FILE_SIZE')
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

Copy link
Copy Markdown
Contributor

@abulte abulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests :-)

Comment thread csvapi/parser.py
encoding = detect_encoding(filepath) if not encoding else encoding
table = from_csv(filepath, encoding=encoding, sniff_limit=sniff_limit)
return to_sql(table, urlhash, storage)
return to_sql(table, urlhash, filehash, storage) No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing line at EOF

Comment thread csvapi/parseview.py
filehash = X.hexdigest()
logger.debug('* Downloaded %s', filehash)
if not is_hash_relevant(urlhash, filehash):
print("HASH IS NOT RELEVANT")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread csvapi/parseview.py
except Exception as e:
raise APIError('Error parsing CSV: %s' % e)
else:
print("HASH IS RELEVANT")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread csvapi/parseview.py
raise APIError('Error parsing CSV: %s' % e)
else:
app.logger.info(f"{urlhash}.db already exists, skipping parse.")
print("AGE IS OK")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants