When scrapping documentation pages from the web, we should make sure that any links are converted to fully qualified version of themselves (e.g. going from something like:
[migrate your entire database at once](/self-hosted/latest/migration/entire-database/]
to
[migrate your entire database at once](https://docs.tigerdata.com/self-hosted/latest/migration/entire-database/]
Right now the LLM likes to quote the returned markdown chunks where the former end up showing as weird broken text vs the latter. While we could maybe fix this via prompting as well, I think better to just eat the extra tokens in embedding and then make it easier for the LLMs to use.
It'll probably be easier/better though to try to do this manipulation against the HTML source, vs after we convert it to markdown.