Skip to content
This repository was archived by the owner on Apr 1, 2025. It is now read-only.

Switch over to using lingo for language detection#230

Merged
tclem merged 6 commits into
masterfrom
lingo
Aug 26, 2019
Merged

Switch over to using lingo for language detection#230
tclem merged 6 commits into
masterfrom
lingo

Conversation

@tclem

@tclem tclem commented Aug 14, 2019

Copy link
Copy Markdown
Member

Right now semantic uses some very basic file extension matching for language detection. This moves us toward a slightly more sophisticated approach based on linguist's canonical list of languages.

I don't think we're ready to pull in a full dependency on linguist (it's a ruby gem), but this is a nice intermediate step. Lingo does some compile time map generation of extensions and common filenames to language, allowing fast lookups. We don't support anything where languages share a file extension (first one wins), but that's not an issue for the languages we currently target.

@tclem tclem requested a review from a team August 16, 2019 21:42
Comment thread src/Semantic/CLI.hs
@@ -180,8 +180,22 @@ filePathReader = eitherReader parseFilePath
parseFilePath arg = case splitWhen (== ':') arg of

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that we allow this on the cli, but didn't want to make breaking changes in this PR. I think in a future iteration we should remove this ability to specify a file language now that our file detection is a bit better.

Comment thread src/Data/Language.hs
"typescript" -> Just TypeScript
"php" -> Just PHP
_ -> Nothing
pure $ textToLanguage l

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made our json parsing a little stricter here, languages are case sensitive and must be exactly how they are specified in linguist. Would like some feedback here on if that's OK.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it’s a good thing.

"TSX" -> Data.TSX
"PHP" -> Data.PHP
_ -> Data.Unknown
bridging = iso Data.textToLanguage Data.languageToText

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed these helpers and couldn't figure out how to use the APIBridge directly due to circular references with Data.Language.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine by me.

@patrickt

patrickt commented Aug 19, 2019

Copy link
Copy Markdown
Contributor

Can we publish lingo to Hackage? Adding more pinned dependencies makes it harder for us to upload semantic to Hackage someday (cf #16). (Happy to 🍐 on this if you want a hand, @tclem).

Comment thread test/Spec.hs Outdated
@tclem

tclem commented Aug 19, 2019

Copy link
Copy Markdown
Member Author

Can we publish lingo to Hackage?

Yup, Done! Mind taking another look at this @patrickt?

@patrickt patrickt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

@tclem tclem merged commit c1486db into master Aug 26, 2019
@tclem tclem deleted the lingo branch August 26, 2019 18:05

@kristinoliver37 kristinoliver37 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to have changed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants