`scraper`

A CLI application designed to scrape a dummy website for product information

Prerequisites

A java environment with at least version 8. Tests are run against java 8 and 11, using both openJDK and oracleJDK

Running the application

The easiest way to run the application is to run the application using the maven wrapper and the exec plugin:-

./mvnw exec:java -q

Alternatively maven can be used to build the application and then you execute the jar file directly

./mvnw install
cd target
jar -jar scraper-0.0.1-SNAPSHOT.jar

Note for a windows environment use the mvnw.cmd script instead of the mvnw shell script. This has not been tested.

Tests

A basic set of unit tests can be ran using the maven wrapper:-

./mvnw test

Dependencies

Runtime

gson 2.8.2
jsoup 1.11.3

Plugins

spotbugs-maven-plugin 3.1.10
fmt-maven-plugin 2.4.0
exec-maven-plugin 1.6.0

Known issues and future work

The extractor logic has been designed to be flexible with changes to positioning of DOM elements within the page (e.g. if a whole table is moved). However, it is brittle to changes to the internal structure of the DOM elements it is looking for (e.g. if columns are moved around looking for kcal_per_100g). This could be resolved by being more creative with the css queries and also checking for null values to avoid NPEs.
Add integration tests.
Add a dockerfile so that users do not require a functioning java install.

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.mvn/wrapper		.mvn/wrapper
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`scraper`

Prerequisites

Running the application

Tests

Dependencies

Runtime

Plugins

Known issues and future work

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scraper

Prerequisites

Running the application

Tests

Dependencies

Runtime

Plugins

Known issues and future work

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`scraper`

Packages