Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,27 @@ conf.setNaaccrVersion("220");
new MetafileTranslator().executeFullTranslation(conf);
```

## Command Line
The compiled JAR can be executed from the command line to perform a full translation.

**Command Line Options:**
- `-p`: The working folder containing the metafile
- `-c`: The file name of the configuration properties file within the working folder

**Example usage:**
```sh
$ java -jar validation-translation.jar -c config.properties -p WORKING_DIR

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean to be a pain about this pull request, sorry!

The CsvFieldResolver class makes sense, it looks great, we can add it.

I am still trying to figure out the rest.

You are providing an example of how to run the CLI; have you tried that example yourself? It won't work because this library depends on other libraries, and the only way to make it a standalone JAR is to make a shadow JAR (or "fat" JAR, or whatever you want to call it). Basically a JAR that contains all the classes of this library plus all the classes of the dependencies. It's doable, but you will need to provide instructions on how to build that fat JAR, which would require a special Gradle task again.

That makes the CLI pretty much unusable, unless it's embedded into another project. But if it's embedded, then you can simply copy its code into that project and not require a CLI class in this project.

So again, my question to you is: are you using the CLI class yourself, and if you are, how do you call it? From the command line, but you need dependencies. From an other project, but you could just have the CLI class in that project.

I guess what bothers me the most is that you are adding a dependency to a logging implementation. It might not look like much, but it's rally not a great thing to do for a library that is supposed to abstract its logging via just an API and let the parent project deal with whatever implementation they want. This little CLI class makes this project "not really a library", which is fine, except that IMO the CLI class is not useable as-is. And I would prefer to not jump through hoops to make that work unless you are telling me that you need things to be this way.

My preference would be to only keep the CsvFieldResolver and nothing else; it's a great addition to the library.

If you feel strongly about keeping the CLI class, then I would like you to move it to the test package, and put back the logging implementation at the test level so it's not released. You will also need to adjust the these instructions in the README file because those examples won't work as-is.

A third option is for you to move the CLI class and the examples in your other project where you plan on moving the compilation stuff that you removed earlier.

Plenty of options, and we can certainly discuss this more if you don't agree with my point-of-view.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Sorry for the delay, we are deep in 2024 changes...

I have run the CLI to translate the NCDB edits several times now, so I know it works! I originally modified the jar task to build a fat jar, but I have now split that off into a separate task, 'buildFatJar'. I will do some further work to move the CLI into the test package to prevent log4j from cluttering up the dependency tree.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it will work with a fat JAR. It wasn't clear to me that "validation-translation.jar" was supposed to be a fat JAR in your example:

$ java -jar validation-translation.jar -c config.properties -p WORKING_DIR

And that's kind of the whole argument for this; the full translation process (which includes creating the runtime classes) is really meant to be part of another project that takes care of compiling the runtime classes. And if this project can't do the compilation, then why adding the rest (the CLI and the fat JAR)? IMO, those two things should be part of your other project. They are part of a "full translation" that requires another project setup.

It's hard to explain.

My preference would be for you to move the CLI and the fat JAR generation in your other project. The only change that really makes sense for this library is the CSV field mapper.

If you feel strongly about keeping the CLI and JAR generation in this project, that's fine, as long as log4j is not brought up into the released dependencies. There have been too many security issues with it in the past few years...

```
For an example of a translation configuration, see `examples/config.properties`

## Pre-Compiled Groovy Edits
For faster loading of large metafiles, the optionally generated Groovy source code can be pre-compiled into bytecode. The folder `examples/compile-groovy`
contains an example of a build.gradle and settings.gradle that can be placed in the output folder to generate a JAR containing compiled bytecode.

## About SEER

This library was developed through the [SEER](http://seer.cancer.gov/) program.

The Surveillance, Epidemiology and End Results program is a premier source for cancer statistics in the United States.
The SEER program collects information on incidence, prevalence and survival from specific geographic areas representing
a large portion of the US population and reports on all these data plus cancer mortality data for the entire country.
a large portion of the US population and reports on all these data plus cancer mortality data for the entire country.
29 changes: 27 additions & 2 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,12 @@ dependencies {
implementation 'net.sf.squirrel-sql.thirdparty-non-maven:java-cup:0.11a'
implementation 'org.xerial:sqlite-jdbc:3.43.2.1'
implementation 'org.apache.logging.log4j:log4j-api:2.21.1'
implementation 'commons-cli:commons-cli:1.6.0'
implementation 'org.apache.logging.log4j:log4j-core:2.21.1'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against this change, but I am a bit surprised it was needed. The library uses the logging interface and rely on the caller to provide a logging implementation.

I assume you added this for your new CLI class, which is fine. But that class doesn't directly log anything.

What required the addition of this library?

@chui101 chui101 Mar 7, 2024

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a remnant of when I was trying to resolve a minor slf4j warning. There was an error message without it:

ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

I will remove the dependency for now as the message does not affect the functionality.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the log4j-core dependency back after implementing log4j2 output for the CLI.

implementation 'org.slf4j:slf4j-nop:2.0.12'

testImplementation 'junit:junit:4.13.2'
testImplementation 'com.imsweb:data-generator:1.31'
testImplementation 'org.apache.logging.log4j:log4j-core:2.21.1'
}

// enforce UTF-8, display the compilation warnings
Expand Down Expand Up @@ -67,7 +69,7 @@ jar {
'Built-By': System.getProperty('user.name'),
'Built-Date': new Date(),
'Built-JDK': System.getProperty('java.version'),
'Automatic-Module-Name': 'com.imsweb.validation-translation'
'Automatic-Module-Name': 'com.imsweb.validation-translation',
)
}
}
Expand Down Expand Up @@ -140,11 +142,34 @@ tasks.register('generateRegexParser') {
}
}

// use this task to generate a fat JAR to easily run a CLI
tasks.register('buildFatJar',Jar) {
manifest {
attributes('Implementation-Title': project.name,
'Implementation-Version': project.version,
'Implementation-Vendor': 'Information Management Services Inc.',
'Created-By': System.properties.getProperty('java.vm.version') + ' (' + System.properties.getProperty('java.vm.vendor') + ')',
'Built-By': System.getProperty('user.name'),
'Built-Date': new Date(),
'Built-JDK': System.getProperty('java.version'),
'Automatic-Module-Name': 'com.imsweb.validation-translation',
'Main-Class': 'com.imsweb.validation.translation.MetafileTranslatorCli'
)
}

from {
configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
}
duplicatesStrategy=DuplicatesStrategy.EXCLUDE
with jar
}

// Nexus vulnerability scan (https://github.com/sonatype-nexus-community/scan-gradle-plugin)
ossIndexAudit {
outputFormat = 'DEPENDENCY_GRAPH'
printBanner = false


excludeVulnerabilityIds = [
'CVE-2022-42003',
'CVE-2022-42004',
Expand Down
16 changes: 16 additions & 0 deletions examples/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# metafile filename. should be located in the working path along with this configuration file
metafile-name=testing-metafile-v2.smf
# prefix for the validator rule IDs to generate
translation-prefix=TEST
# working directory for a previous translation run (optional). this will be relative to the path where the program is executed
previous-working-directory-path=testing-metafile-v1
# base naaccr dictionary version to use to resolve fields
naaccr-version=230
# user dictionary for resolving fields should the base naaccr dictionary fail to resolve a field
user-dictionary-file=test-user-dictionary.xml
# field mapping csv file for resolving fields should the base and user naaccr dictionaries fail to resolve a field
field-mappings-file=test-field-mappings.csv
# generate groovy source file? 1, yes, or true will enable groovy source generation
generate-groovy-src=1
# number of files to split the groovy source into. this can help if groovy compilation runs out of memory.
groovy-src-num-files=2
5 changes: 5 additions & 0 deletions examples/test-field-mappings.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
1234,xmlFieldNameMappingForNaaccrId
9101,stateSpecificField1
9102,stateSpecificField2
10010,userDictionaryField1
10020,userDictionaryField2
Empty file modified gradlew
100644 → 100755
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
package com.imsweb.validation.translation;

import com.imsweb.naaccrxml.NaaccrXmlDictionaryUtils;
import com.imsweb.validation.translation.metafile.MetafileField;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.io.File;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.util.HashMap;
import java.util.Map;

/**
* Implementation of the FieldResolver that reads a comma-delimted file containing item number
* and field name and uses that to supplement NAACCR and user dictionary fields when resolving
* fields used in a EDITS metafile.
* (note: does not conform to RFC 4180! quoting and escaping are not supported)
*/
@SuppressWarnings("unused")
public class CsvFieldResolver extends FieldResolver {
private static final Logger _LOG = LogManager.getLogger(CsvFieldResolver.class);

private Map<Integer, String> _mappings;

public CsvFieldResolver() {
setMappings(new HashMap<>());
}

/**
* Reads the comma delimited file into memory. If mappings are already in place,
* new mappings are loaded over top of the old ones. This way, multiple mapping files
* can be loaded.
* @param mappingsFile
* @throws IOException
*/
public void loadMappingsFile(File mappingsFile) throws IOException {
if (mappingsFile == null) {
throw new IOException("Mappings file cannot be null");
}
if (! mappingsFile.exists()) {
throw new IOException("Mappings file does not exist");
}

for (String line : Files.readAllLines(mappingsFile.toPath(), StandardCharsets.UTF_8)) {
String[] mapping = line.trim().split("\\s*,\\s*");
if (mapping.length != 2) {
throw new IOException("Invalid line in mapping file: " + line);
}
getMappings().put(Integer.parseInt(mapping[0]), mapping[1]);
_LOG.info(" >> Loaded mapping from " + mappingsFile.getName() + ": " + mapping[0] + "->" + mapping[1]);
}
}

@Override
protected String resolveFieldPostDictionary(MetafileField field, TranslationConfiguration conf) {
String propName = super.resolveFieldPostDictionary(field, conf);

// if unable to resolve by superclass implementation, try the in memory mapping
if (propName == null) {
int itemNumber = field.getNumber();
if (getMappings().containsKey(itemNumber)) {
propName = getMappings().get(itemNumber);
}
}

// if unable to resolve from the mapping file, create a (properly named) dummy item and warn the user
if (propName == null) {
propName = NaaccrXmlDictionaryUtils.createNaaccrIdFromItemName(field.getName());
_LOG.info(" >> Unsupported field: " + field.getName() + " (#" + field.getNumber() + "); deriving the ID from the name: " + propName);
}
return propName;
}

public Map<Integer, String> getMappings() {
return _mappings;
}

public void setMappings(Map<Integer, String> mappings) {
this._mappings = mappings;
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
package com.imsweb.validation.translation;

import com.imsweb.naaccrxml.NaaccrXmlDictionaryUtils;
import com.imsweb.naaccrxml.entity.dictionary.NaaccrDictionary;
import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.CommandLineParser;
import org.apache.commons.cli.DefaultParser;
import org.apache.commons.cli.ParseException;
import org.apache.commons.cli.Options;
import org.apache.commons.cli.HelpFormatter;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

public class MetafileTranslatorCli {
private static final Logger _LOG = LogManager.getLogger(MetafileTranslatorCli.class);

public static void main(String[] args)
throws Exception {
CommandLineParser parser = new DefaultParser();


// set up options
Options opts = new Options();
opts.addRequiredOption("p", "path", true, "Path to working directory containing metafile where output will be placed");
opts.addRequiredOption("c", "config", true, "Filename of configuration properties file within the working directory to use for this translation");
opts.addOption("h", "help", false, "Prints this help message");

try {
CommandLine cmdLine = parser.parse(opts, args);

// if help requested, print and exit
if (cmdLine.hasOption("h")) {
HelpFormatter helpFormatter = new HelpFormatter();
helpFormatter.printHelp(MetafileTranslatorCli.class.getName(), opts);
return;
}

// process options into translation configuration
TranslationConfiguration conf = getConfiguration(cmdLine.getOptionValue("p"), cmdLine.getOptionValue("c"));

// run the translation
new MetafileTranslator().executeFullTranslation(conf);
}
catch (ParseException e) {
HelpFormatter helpFormatter = new HelpFormatter();
helpFormatter.printHelp("validation-translation", opts);
}
}

/**
* Creates a TranslationConfiguration object from the configuration properties file
* @param path - path to the directory containing the configuration properties file
* @param filename - name of the configuration properties file
* @return
*/
private static TranslationConfiguration getConfiguration(String path, String filename)
throws IOException, TranslationException {
// make sure the directory exists
File configDir = new File(path);
if (! configDir.exists() || ! configDir.isDirectory()) {
throw new IOException("Working directory path must point to an existing directory");
}

// check to see if the properties file exists
File configFile = new File(configDir, filename);
if (! configFile.exists() || ! configFile.isFile()) {
throw new IOException("Configuration file not found");
}

// load the properties file
Properties configProperties = new Properties();
try (InputStream inputStream = Files.newInputStream(configFile.toPath())) {
configProperties.load(inputStream);
}

// set up the translation configuration based on contents of the config properties file
TranslationConfiguration translationConfiguration = new TranslationConfiguration();
translationConfiguration.setWorkingDirectoryPath(path);

// load the required items
if (configProperties.containsKey("metafile-name")) {
translationConfiguration.setMetafileName(configProperties.getProperty("metafile-name"));
}
else {
throw new TranslationException("No metafile name specified in configuration");
}

if (configProperties.containsKey("translation-prefix")) {
translationConfiguration.setTranslationPrefix(configProperties.getProperty("translation-prefix"));
}
else {
throw new TranslationException("Translation prefix is not specified in configuration");
}

if (configProperties.containsKey("naaccr-version")) {
translationConfiguration.setNaaccrVersion(configProperties.getProperty("naaccr-version"));
}
else {
throw new TranslationException("No NAACCR version specified in configuration");
}

// load optional previous translation output path
if (configProperties.containsKey("previous-working-directory-path")) {
File previousWorkingDirectoryPath = new File(configProperties.getProperty("previous-working-directory-path"));
if (! previousWorkingDirectoryPath.exists() || !previousWorkingDirectoryPath.isDirectory()) {
throw new IOException("Previous working directory specified in configuration file not found");
}
translationConfiguration.setPreviousWorkingDirectoryPath(configProperties.getProperty("previous-working-directory-path"));
}

// if a field mapping csv file is specified then create a new CsvFieldResolver to handle it
if (configProperties.containsKey("field-mappings-file")) {
File fieldMappingsFile = new File(configDir, configProperties.getProperty("field-mappings-file"));
if (! fieldMappingsFile.exists() || ! fieldMappingsFile.isFile()) {
throw new IOException("Field mappings file specified in configuration file not found");
}
CsvFieldResolver csvFieldResolver = new CsvFieldResolver();
csvFieldResolver.loadMappingsFile(fieldMappingsFile);

translationConfiguration.setFieldResolver(csvFieldResolver);
}

// if a user dictionary is specified, load it
if (configProperties.containsKey("user-dictionary-file")) {
File userDictionaryFile = new File(configDir, configProperties.getProperty("user-dictionary-file"));
if (! userDictionaryFile.exists() || ! userDictionaryFile.isFile()) {
throw new IOException("User dictionary file specified in configuration file not found");
}

List<NaaccrDictionary> userDictionaries = new ArrayList<NaaccrDictionary>(1);
NaaccrDictionary dict = NaaccrXmlDictionaryUtils.readDictionary(userDictionaryFile);
userDictionaries.add(dict);
_LOG.info(" >> Loaded user dictionary " + userDictionaryFile.getName());

translationConfiguration.setUserDefinedDictionaries(userDictionaries);
}

// check to see if groovy source code needs to be generated
if (configProperties.containsKey("generate-groovy-src")) {
// if it is truthy, enable source generation
if (configProperties.getProperty("generate-groovy-src").equals("1")
|| configProperties.getProperty("generate-groovy-src").equalsIgnoreCase("yes")
|| configProperties.getProperty("generate-groovy-src").equalsIgnoreCase("true")) {
translationConfiguration.setGenerateGroovySourceCode(true);
}
}
// check if groovy source needs to be split
if (configProperties.containsKey("groovy-src-num-files")) {
translationConfiguration.setGroovySourceCodeNumFiles(Integer.parseInt(configProperties.getProperty("groovy-src-num-files")));
}

return translationConfiguration;
}
}
5 changes: 5 additions & 0 deletions src/main/resources/log4j2.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
rootLogger=INFO, STDOUT
appender.console.type = Console
appender.console.name = STDOUT
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%-5level] %d{yyyy-MM-dd HH:mm:ss.SSS} [%t] %c{1} - %msg%n