Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions docs/helpers.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This page is a work-in-progress.
## Getting Started
If you plan to work on MathCAT development, you need to make use of github:
1. Fork the MathCAT repo at `github.com/NSoiffer/MathCAT`
2. Clone the the forked copy so you have a local copy to work on.
2. Clone the forked copy so you have a local copy to work on.
3. Checkout the branch I create for your work (typically the country code for your translation) and work in that branch.

If you are unfamiliar with these steps, a simple search will turn up lots of places that describe how to do them. They are simple, so don't get put off by your unfamiliarity.
Expand All @@ -25,11 +25,11 @@ Note: The MathCAT settings dialog looks for files named `XXX_Rules.yaml` and add
These files have auto-generated initial translations. Even though they are translated, `t:` (see below) is used, not the upper case `T:`. This is because each translation should be verified to be correct and when verified, then change to the uppercase version.
See below for more comments about the auto translations.

* In some languages it doesn't make sense to says "_the_ square root of x" (and maybe "of"). If that is the case, just change those to empty strings.
* In some languages it doesn't make sense to say "_the_ square root of x" (and maybe "of"). If that is the case, just change those to empty strings.
* Some languages, the word order changes -- feel free to move the words around, but pay attention to the indentation.
Indentation is meaningful in YAML.
* In some languages, you may want to add words that aren't in the English version, perhaps before or after existing phrases. Feel free to add them -- they can be conditionally added using `test` if needed. Please contact @NSoiffer if you need help with this.
* Pausing between words/phrases can greatly help make understandable. The pausing is choosen based on English. You should adjust pauses based on what sounds good in speech synthesizers for your language. It is very simple to add, remove, or change the amount of pauses. All pauses are scaled to the current speech rate.
* Pausing between words/phrases can greatly help make speech understandable. The pausing is chosen based on English. You should adjust pauses based on what sounds good in speech synthesizers for your language. It is very simple to add, remove, or change the amount of pauses. All pauses are scaled to the current speech rate.
3. The unicode files (`unicode.yaml` and `unicode-full.yaml`). These contain characters like `<` and `∫`.
* You should start with translating `unicode.yaml`. These represent the vast majority of math symbols used. Currently the list is based on experience as to which are the most commonly used Unicode symbols, but I plan to make use of statistics from actual books to refine the list even further. There are about 270 characters to translate in `unicode.yaml`, although ~50 of them are Greek letters (which is hopefully simple).
Just like the speech rule files, these files have auto-generated initial translations and the translations should be verified and the `t:` changed to `T:`.
Expand Down Expand Up @@ -81,15 +81,15 @@ If SRE and MathPlayer agree, or if only one of SRE or MathPlayer has a translati
- "!": [t: "factorielle"] # 0x21 (en: 'factorial')
```

If the MathPlayer and SRE translations disagree, then the translations that agrees with the google translation will be chosen and the other translation included in a comment. For example:
If the MathPlayer and SRE translations disagree, then the translation that agrees with the google translation will be chosen and the other translation included in a comment. For example:
```
else: [t: "parenthèse gauche"] # (en: 'left paren', MathPlayer: 'parenthèse ouvrante')
```
If none of the translations agree, than one of the translations is picked and the other translations are in comment. For example:
If none of the translations agree, then one of the translations is picked and the other translations are in a comment. For example:
```
else: [t: "parenthèse gauche"] # (en: 'open paren', MathPlayer: 'parenthèse ouvrante', google: 'parenthèse ouverte')
```
Finally, if there there is no translation, then the google translation is given and is marked with a comment "google translation". There is a significant chance that this is not a good translation so pay special attention to those. Here is an example where there is only a google translation
Finally, if there is no translation, then the google translation is given and is marked with a comment "google translation". There is a significant chance that this is not a good translation so pay special attention to those. Here is an example where there is only a google translation
```
then: [t: "ligne verticale"] # (en: 'vertical line', google translation)
```
Expand All @@ -104,7 +104,7 @@ Once you've done some translations and want to try them out, you can do so immed
5. If there is an error (often you won't hear speech), open NVDA's log (in NVDA's "Tools" submenu). The error should be listed there. The error messages are explained below.
6. When you make a change, MathCAT should notice the file is changed and reload it. There is currently a bug that this is not done for files that are `include`d in from a file (e.g., all those in the Shared directory). If you make a change to one of those files, either reload MathCAT (NVDA Tools:Reload Plugins) or restart NVDA.

Translating the settings dialog: this is a separate process from translating the speech. This done by volunteers that do other addon translations also. See [this mailing list](https://groups.io/g/nvda-translations) for more info.
Translating the settings dialog: this is a separate process from translating the speech. This is done by volunteers that do other addon translations also. See [this mailing list](https://groups.io/g/nvda-translations) for more info.

### Automatic tests for your translation
Testing is very important! MathCAT is written in Rust and has a large number of automated tests. These tests take advantage of the builtin Rust test system. Hence, to write and verify your own tests, you need to [download and install Rust](https://www.rust-lang.org/tools/install). You do not need to know Rust -- you will simply change some strings from what they are in English to what you think they should be in your language.
Expand Down Expand Up @@ -163,7 +163,7 @@ These tools will look for untranslated and translated text.
If you want support for a new braille language, you probably need to start from scratch unless the language is similar to an existing braille language.
You will need to create three `.yaml` files in `Rules\Braille\your-braille-language`. This should mirror the files that are in the other braille directories:
1. xxx_Rules.yaml -- where 'xxx' is the name of your new braille language. These will contain the rules that translate MathML to braille
2. unicode.yaml -- this is a translation of the more common braille characters. Use `Nemeth\unicode.yaml` as a starting point for the the translation. Convert the `t: xxx` into what is appropriate for your language. You likely need to delete some logic or maybe add some of your own for characters that might be represented differently based on context. For example, in Nemeth, a "," is represented differently if it is part of a number.
2. unicode.yaml -- this is a translation of the more common braille characters. Use `Nemeth\unicode.yaml` as a starting point for the translation. Convert the `t: xxx` into what is appropriate for your language. You likely need to delete some logic or maybe add some of your own for characters that might be represented differently based on context. For example, in Nemeth, a "," is represented differently if it is part of a number.
3. unicode-full.yaml -- this is the rest of the character translations.

The reason for two separate unicode files is that having a shorter file for the most common characters means startup takes less time. The goal of that file is to capture 99.99% of the characters used.
Expand Down Expand Up @@ -411,7 +411,7 @@ Note: all YAML files begin with "---". That indicates the beginning of a "docume

Note: for "pause", the "auto" value will calculate a pausing amount based on the complexity of the surrounding parts. The more complex they are, the longer the pause (up to a limit). The basic idea is that you want to give the listener time to digest and separate out the two parts when one or both are more complicated.

In addition to having a named rule, the speech rule file supports including other speech rules files. This lets various speech speech rule styles share common features. Inclusion is done via an entry in place of a speech rule:
In addition to having a named rule, the speech rule file supports including other speech rules files. This lets various speech rule styles share common features. Inclusion is done via an entry in place of a speech rule:
```
-include: file_name
```
Expand Down
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ is a library that supports conversion of MathML to:
* Braille (Nemeth, UEB Technical, and eventually other braille math codes)
* Navigation of math (in multiple ways including overviews)

A goal of MathCAT is to be an easy to use library for screen readers and other assistive technology to use to produce high quality speech and/or braille from MathML. It is a follow-on project from MathPlayer (see below) and uses lessons learned from it to do to produce even higher quality speech, navigation, and braille. MathCAT takes advantage of some new ideas the [MathML Working Group](https://mathml-refresh.github.io/charter-drafts/math-2020.html) is developing to allow authors to express their intent when they use a notation. E.g., $(3, 6)$ could be a point in the plane or an open interval, or even a shorthand notation for the greatest common divisor. When that information is conveyed in the MathML, MathCAT will use it to generate more natural sounding speech.
A goal of MathCAT is to be an easy to use library for screen readers and other assistive technology to use to produce high quality speech and/or braille from MathML. It is a follow-on project from MathPlayer (see below) and uses lessons learned from it to produce even higher quality speech, navigation, and braille. MathCAT takes advantage of some new ideas the [MathML Working Group](https://mathml-refresh.github.io/charter-drafts/math-2020.html) is developing to allow authors to express their intent when they use a notation. E.g., $(3, 6)$ could be a point in the plane or an open interval, or even a shorthand notation for the greatest common divisor. When that information is conveyed in the MathML, MathCAT will use it to generate more natural sounding speech.

Todo: incorporation of third party libraries to support a common subset of TeX math commands along with ASCIIMath.

Expand Down Expand Up @@ -214,4 +214,4 @@ For more information about what happened to MathPlayer and how MathCAT came to b

All along, I've been pushing to make math work on the web and make it accessible. While at Wolfram Research, I helped get the W3C MathML effort started and have been involved with the working group ever since. I currently chair the W3C Math Working Group. I've been a member on several other committees over the years pushing strongly to make sure they incorporated math accessibility into their standards. Some of the these groups include NIMAS, EPUB, and PDF/UA.

I'm very honored that in 2023, the National Federation of the Blind gave me the <span>$</span>25,000 Jacob Bolotin award. I donated <span>$</span>15,000 of that to the _open collective_ to improve MathML support in browsers. [Click this link for how you can help improve MathML support in browsers](https://opencollective.com/mathml-core-support).
I'm very honored that in 2023, the National Federation of the Blind gave me the <span>$</span>25,000 Jacob Bolotin award. I donated <span>$</span>15,000 of that to the _open collective_ to improve MathML support in browsers. [Click this link for how you can help improve MathML support in browsers](https://opencollective.com/mathml-core-support).
4 changes: 2 additions & 2 deletions docs/nav-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Note: while navigating an expression, "control+c" copies the math content of the
</td>
<td valign=top style='border:solid 1.0pt;border-left:none;
padding:0in 5.4pt 0in 5.4pt'>
<b>+Cntrl+Shift</b>
<b>+Ctrl+Shift</b>
</td>
</tr>
</thead>
Expand Down Expand Up @@ -319,7 +319,7 @@ MathCAT supports three different navigation modes: enhanced, simple, and charact
## Typical Use

Typically, you will start at the first term of an expression and move right as needed.
You might move up and down levels if needed. This done with the arrow keys.
You might move up and down levels if needed. This is done with the arrow keys.
`alt+ctrl+arrow` is used to move around tabular entries.

<i>Backspace</i> will take you back to where you were, which
Expand Down
4 changes: 2 additions & 2 deletions src/braille.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1618,7 +1618,7 @@ fn stands_alone(chars: &[char], i: usize) -> (bool, &[char], usize) {
}
return (is_alone, &chars[i..i+2+n_right_matched], n_letters);

/// chars before before 'L'
/// chars before 'L'
fn left_side_stands_alone(chars: &[char]) -> bool {
// scan backwards to skip letters and intervening chars
// once we hit an intervening char, only intervening chars are allowed if standing alone
Expand Down Expand Up @@ -2561,7 +2561,7 @@ impl BrailleChars {
(None, None) => (),
(Some(dot), Some(comma)) => {
if comma < dot {
// switch dot/comma -- using "\x01" as a temp when switching the the two chars
// switch dot/comma -- using "\x01" as a temp when switching the two chars
let switched = text.replace('.', "\x01").replace(',', ".").replace('\x01', ",");
mn_node.set_text(&switched);
}
Expand Down
15 changes: 7 additions & 8 deletions src/canonicalize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ pub fn get_presentation_element(element: Element) -> (usize, Element) {
/// 2. normalize the characters
/// 3. clean up "bad" MathML based on known output from some converters (TODO: still a work in progress)
/// 4. the tree is "parsed" based on the mo (priority)/mi/mn's in an mrow
/// * this adds mrows mrows and some invisible operators (implied times, function app, ...)
/// * this adds mrows and some invisible operators (implied times, function app, ...)
/// * extra mrows are removed
/// * implicit mrows are turned into explicit mrows (e.g, there will be a single child of 'math')
///
Expand Down Expand Up @@ -1127,7 +1127,7 @@ impl CanonicalizeContext {
i += 1;
}
}
children = mathml.children(); // 'children' moved above, so need need new values
children = mathml.children(); // 'children' moved above, so need new values
} else {
// bad mathml such as '<annotation-xml> </annotation-xml>' -- don't add to new_children
i += 1;
Expand Down Expand Up @@ -1266,7 +1266,7 @@ impl CanonicalizeContext {
following.len() == 1 && name(following_child) == "mn" {
return true;
}
// only want want one "∷"
// only want one "∷"
let is_before = is_proportional_before_colon(preceding.iter().rev());
if let Some(is_before) = is_before
&& !is_before {
Expand Down Expand Up @@ -1374,7 +1374,7 @@ impl CanonicalizeContext {
fn clean_chemistry_leaf(mathml: Element) -> Element {
if !(is_chemistry_off(mathml) || mathml.attribute(MAYBE_CHEMISTRY).is_some()) {
assert!(name(mathml)=="mi" || name(mathml)=="mtext");
// this is hack -- VII is more likely to be roman numeral than the molecule V I I so prevent that from happening
// this is a hack -- VII is more likely to be roman numeral than the molecule V I I so prevent that from happening
// FIX: come up with a less hacky way to prevent chem element misinterpretation
let text = as_text(mathml);
if text.len() > 2 && is_roman_number_match(text) {
Expand Down Expand Up @@ -3410,7 +3410,7 @@ impl CanonicalizeContext {
// if there isn't an obvious one, we have parsed the left, but not the right, so discount that

// Trig functions have some special syntax
// We need to to treat '-' as prefix for things like "sin -2x"
// We need to treat '-' as prefix for things like "sin -2x"
// Need to be careful because (sin - cos)(x) needs an infix '-'
// Return either the prefix or infix version of the operator
if next_node.is_some() &&
Expand Down Expand Up @@ -3712,7 +3712,7 @@ impl CanonicalizeContext {

// Names like "Tr" are likely function names, single letter names like "M" or "J" are iffy
// This needs to be after the chemical state check above to rule out Cl(g), etc
// This would be better if if were part of 'likely_names' as "[A-Za-z]+", but reg exprs don't work in HashSets.
// This would be better if it were part of 'likely_names' as "[A-Za-z]+", but reg exprs don't work in HashSets.
// FIX: create our own struct and write appropriate traits for it and then it could work
let mut chars = base_name.chars();
let first_char = chars.next().unwrap(); // we know there is at least one byte in it, hence one char
Expand Down Expand Up @@ -4309,7 +4309,7 @@ impl CanonicalizeContext {
if !ptr_eq(current_op.op, &ILLEGAL_OPERATOR_INFO) {
if current_op.op.is_left_fence() || current_op.op.is_prefix() {
if top(&parse_stack).is_operand {
// will end up with operand operand -- need to choose operator associated with prev child
// will end up with duplicate operands -- need to choose operator associated with prev child
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really sure here.

// we use the original input here because in this case, we need to look to the right of the ()s to deal with chemical states
let likely_function_name = self.is_function_name(as_element(children[i_child-1]), Some(&children[i_child..]));
let implied_operator = if likely_function_name== FunctionNameCertainty::True {
Expand Down Expand Up @@ -6701,4 +6701,3 @@ mod canonicalize_tests {


}

10 changes: 5 additions & 5 deletions src/definitions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@
//! There is no escaping some implementation details.
//! Because these definitions are stored in global variables, the variables need to be protected
//! in some way so they can be written at runtime when the files are read.
//! This is done by putting them in side of a lock (`thread_local`).
//! This is done by putting them inside of a lock (`thread_local`).
//!
//! Furthermore, it was necessary to use use `RefCell` and `Rc` to deal with interior mutability.
//! Furthermore, it was necessary to use `RefCell` and `Rc` to deal with interior mutability.
//! All of this means that a lock needs to be obtained _and_ the contents borrowed to access a definition.
//!
//! To minimize the global variable footprint, all of the definitions are put inside of a single global variable [`DEFINITIONS`].
//!
//! //! Note: some of the variable are `vec`s and some are `hashset`s.
//! //! Note: some of the variables are `vec`s and some are `hashset`s.
//! Numbers are typically vectors so that indexing a digit is easy.
//! Others such a `functions_names` are a hashset because you just want to know if an `mi` is a known name or not.
//! Others such as `functions_names` are a hashset because you just want to know if an `mi` is a known name or not.
//! The functions `as_vec` and `as_hashset` should be used on the appropriate variable.
//! ## Names
//! The names of "variables" in the definition files use camel case (e.g., "FunctionNames"). In the code, to fit with rust
Expand Down Expand Up @@ -394,4 +394,4 @@ mod tests {
assert_eq!(names.get("xxx"), None);
});
}
}
}
2 changes: 1 addition & 1 deletion src/navigate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -491,7 +491,7 @@ pub fn do_navigate_command_string(mathml: Element, nav_command: &'static str) ->
};
}
// we should always find the start node.
// however, if were were navigating by character, then switched the NavMode, the intent tree might not have that node in it
// however, if we were navigating by character, then switched the NavMode, the intent tree might not have that node in it
let start_node = match get_start_node(nav_intent, nav_state) {
Ok(node) => node,
Err(_) => {
Expand Down
Loading
Loading