echelon


C++ AMP

The third post in my C++ accelerator library series is about C++ AMP. C++ AMP is many things: a programming model, a C++ library and an associated compiler extension. It can be used to program accelerators, that is GPUs and other massive parallel coprocessors. And it is an open specification that constitutes a novel way of describing data parallelism in C++. Very exciting.

ul
Calculating Machine, image by Kim P

This is what C++ AMP looks like:

const int size = 5;
int aCPP[] = {1, 2, 3, 4, 5};
int bCPP[] = {6, 7, 8, 9, 10};
int sumCPP[size];

array_view a(size, aCPP), b(size, bCPP);
array_view sum(size, sumCPP);

parallel_for_each(
	sum.extent,
	[=](index<1> idx) restrict(amp)
	{
		sum[idx] = a[idx] + b[idx];
	}
);


I believe the appeal of this programming model is immediately visible from the above code sample. Existing, contiguous blocks of data that should be computed on are made known to the runtime through the array_view concept. Parallel computation is expressed inline through a call to the parallel_for_each meta-function. The loop boundaries are specified, an index object is created that is used in the following lambda function to describes the computation on a per element basis. The programmer says: “do this for each element”. It is high-level, it is inline, it is terse, it is pretty.

The downsides of this model can be uncovered by asking a few key questions: When is data copied to the accelerator (if that is necessary)? When are results copied back? Can the underlying array be modified after the view is created? Is the meta-function synchronous or asynchronous? C++ AMP does a great job at behaving as one would expect without any extra effort when these questions are irrelevant. In my opinion this programming model was designed for an audience that does not ask these questions.

If these questions are important things start to get a little messy. Data copying to accelerators starts when parallel_for_each is invoked. Results are copied back when they are accessed from the host (through the view) or when they are explicitly synchronized with the underlying array. If the underlying array can be modified after the view is created depends on the plattform. If it supports zero-copy, that is, host and accelerator share memory, the answer is: probably no. If code is written with a distributed memory system in mind the answer is: probably yes. The parallel_for_each function is asynchronous. It is queued and executed at some point in the future. It does not return a future though. Instead data can be synchronized explicitly and returns a future (source).

C++ AMP has many more features beyond what the above code example shows. There are tiles that represent the well known work-groups (OpenCL) or blocks (CUDA). There are accelerator and accelerator_view abstractions to select and manage accelerators. There are atomics, parallel_for_each throws in case of errors for simplified error handling and there is more.

There exists an implementation by Microsoft in their Visual Studio product line and a free implementation my AMD together with Microsoft. There is supposed to be an implementation by Intel called Shelvin Park but I could not find anything about it beyond the much cited presentation here. Indeed, in June of 2013 Intel proclaimed that they are looking into different solutions that ease the programming experience on their graphics processors but none are in a status that they can share today (source). Microsoft on the other hand is doing a good job at marketing and documenting C++ AMP. Their Parallel Programming in Native Code Blog has some helpfull entries including for example “How to measure the performance of C++ AMP algorithms?“.

For me the central feature of C++ AMP is the elegant way kernels can be implemented inline. The concepts they built around that are great but could be designed better. The problem is: if you go one level below what C++ AMP is on the outside things get messy. But it does not have to be this way. There are simple and well understood concepts that underly the whole problem that is accelerator/co-processor programming. I would love to see a well thought out low level abstraction for accelerator programming that C++ AMP builds upon and that programmers can fall back to if circumstances require it.

Move Now

Last week it was my great pleasure to attend the C++Now conference in Aspen. This is my trip report.

Due to snow the day before the conference and the remote location of Aspen, the conference got off to a rough start for most attendees. But almost everybody arrived in time. The conference got going with an interesting keynote by Gabriel Dos Reis and got better from there. Michael Caisse’s tutorial about the canonical class set the stage for what would become one of the central themes of the conference. The title of this report reflects this theme as well.

aspen-28better
Aspen, C++Now 2014, image by Chandler Carruth

Just what does it mean to move an object? In particular, what can you expect from the thing that you moved from? And what is this thing anyway? Eric Niebler had a well thought out opinion on this topic. He argued that a moved-from object should not break its invariants. This makes it a lot easier to reason about code. Not everyone agreed with this. Sean Parent for example argues, that a programmer can only expect to destroy and reassign a moved-from object.

Following this reasoning, he presented an unsafe move operation that does not change the moved-from object. This is unsafe because bad things will happen if such an object is destroyed and then the moved to object is used or destroyed. So the unsafe move requires the programmer to do something with the moved from object – usually moving another object there. Sean stated that this trick can improve performance by up to 15% when for example reverse is called on a large container. David Sankel responded to this proposal in his talk with a simple “No!” and presented a better solution. Instead of breaking the invariants of the container, he added another container that does not have the original invariants and thus allows unsafe operations. For a list, such a container would be called ListFragments. Fragments are extracted from the list, some operations not legal on lists are performed on the fragments, and the fragments are put back into the list. This way, no invariants are broken. And Sean Parent approves of this idea.

For me, the second theme of the conference was asynchrony. Christophe Henry, the author of the Boost Meta State Machine library presented his latest creation Boost Asynchronous. In an exciting talk he convinced his audience that most approaches to asynchrony are doomed, and that he thinks he has found an elegant solution. From the discussion that followed his talk it became clear that there are some similarities between his work and that of the HPX library. Another talk on that topic was about the libcppa project and its application in a high performance network security auditing database.

Followers of the gospel of functional programming could enjoy the excellent talks by David Sankel and Bartosz Milewski. Louis Dionne presented initial results from his work on MPL11, the metaprogramming library proposal for C++11 and onwards. Ábel Sinkovics presented his interactive metaprogramming shell based on Clang. These two meta-programming wizards delighted their audience with deep insights into the language and an excellent tool to explore and exploit the language.

My own talk about accelerator programming libraries on the last day of the conference went well. However I ended the talk with the feeling that something was missing. And indeed, I got some feedback that maybe some performance comparison of the different libraries I reviewed would be interesting. And that is what I am working on now.

All in all, C++Now was an educating and delightful experience. I hope to go back next year.

Boost.Compute

This is the second post in my C++ accelerator library series. It is about Boost.Compute, a header-only C++ accelerator programming library developed by Kyle Lutz and others. It is based on OpenCL and is released under the Boost 1.0 license. The name of the library, the license, as well as the documentation suggest that Boost.Compute one day will apply for an official review by the Boost community to become a part of the Boost C++ library suite. Acceptance into Boost can be a stepping stone to inclusion into the C++ Standard. But this is not the only reason why it is worthwhile to take a look at Boost.Compute.

ul
Compute, stand by, operate, stand by to operate, image by Marcin Wichary

Boost.Compute manages memory through the compute::vector template, a concept similar to std::vector. Memory is either transferred synchronously through copy or asynchronously through the copy_async function. I consider this split an excellent design decision: copy does exactly what one would expect as it is the same call we can find in the STL. But asynchronous copy is also available for those programmers that need more performance and know what they are doing. An asynchronous copy returns an instance of compute::future. The future can be blocked on to ensure deterministic and correct execution of the program. Both copy function interfaces are iterator based.

All functions resulting in commands issued to an accelerator accept an optional command_queue argument. A command_queue has a wait function to block until completion of all enqueued commands. In addition, barriers and markers can be enqueued in a command_queue for additional control over asynchronous operations. These are in my opinion all the tools necessary to express concurrency and control command-level parallelism.

Boost.Compute comes with a multitude of parallel primitives. These STL like functions compose the core of Boost.Compute. This is an incredible amount of accelerator-enabled general algorithms and I believe a listing of the more complex functions is in order:


/**
  Returns the result of applying function to the elements in the 
  range [first, last) and init.
 */
T accumulate(InputIterator first, InputIterator last, T init, 
  BinaryFunction function)

/**
  Returns true if value is in the sorted range [first, last).
 */
bool binary_search(InputIterator first, InputIterator last, 
  const T & value)

/**
  Performs an exclusive scan on the elements in the range 
  [first, last) and stores the results in the range beginning 
  at result.
 */
OutputIterator exclusive_scan(InputIterator first, 
  InputIterator last, OutputIterator result) 

/**
  Calls function on each element in the range [first, last).
 */
UnaryFunction for_each(InputIterator first, InputIterator last, 
  UnaryFunction function)

/**
  Copies the elements using the indices from the range [first, last) 
  to the range beginning at result using the input values from the 
  range beginning at input.
 */
void gather(MapIterator first, MapIterator last, 
  InputIterator input, OutputIterator result)

/**
  Performs an inclusive scan on the elements in the range [first, last) 
  and stores the results in the range beginning at result.
 */
OutputIterator inclusive_scan(InputIterator first, 
  InputIterator last, OutputIterator result)

/**
  Returns the inner product of the elements in the range 
  [first1, last1) with the elements in the range beginning at first2. 
 */
T inner_product(InputIterator1 first1, InputIterator1 last1, 
  InputIterator2 first2, T init, 
  BinaryAccumulateFunction accumulate_function, 
  BinaryTransformFunction transform_function)

/**
  Merges the sorted values in the range [first1, last1) with the sorted 
  values in the range [first2, last2) and stores the result in the range 
  beginning at result. Values are compared using the comp function. 
 */
OutputIterator merge(InputIterator1 first1, InputIterator1 last1, 
  InputIterator2 first2, InputIterator2 last2, 
  OutputIterator result, Compare comp)

/**
  Calculates the cumulative sum of the elements in the range 
  [first, last) and writes the resulting values to the range beginning 
  at result. 
 */
OutputIterator partial_sum(InputIterator first, 
  InputIterator last, OutputIterator result)

/**
  Randomly shuffles the elements in the range [first, last).
 */
void random_shuffle(Iterator first, Iterator last)

/**
  Returns the result of applying function to the elements in the range 
  [first, last). Requires the binary operator to be commutative.
 */
void reduce(InputIterator first, InputIterator last, 
  OutputIterator result, BinaryFunction function)

/**
  Performs left rotation such that element at n_first comes 
  to the beginning.
 */
void rotate(InputIterator first, InputIterator n_first, 
  InputIterator last)

/**
  Copies the elements from the range [first, last) to the range 
  beginning at result using the output indices from the range 
  beginning at map.
 */
void scatter(InputIterator first, InputIterator last, 
  MapIterator map, OutputIterator result)

/**
  Sorts the values in the range [first, last) according to compare.
 */
void sort(Iterator first, Iterator last, Compare compare)

/**
  Performs a key-value sort using the keys in the range 
  [keys_first, keys_last) on the values in the range 
  [values_first, values_first + (keys_last - keys_first)) using compare.
 */
void sort_by_key(KeyIterator keys_first, KeyIterator keys_last, 
  ValueIterator values_first, Compare compare)

/**
  Sorts the values in the range [first, last) according to compare. 
  The relative order of identical values is preserved.
 */
void stable_sort(Iterator first, Iterator last, Compare compare)

/**
  Transforms the elements in the range [first, last) using transform 
  and stores the results in the range beginning at result. 
 */
OutputIterator transform(InputIterator1 first1, InputIterator1 last1, 
  InputIterator2 first2, OutputIterator result, BinaryOperator op)

/**
  Transforms each value in the range [first, last) with the unary 
  transform_function and then reduces each transformed value with 
  reduce_function.
 */
void transform_reduce(InputIterator first, InputIterator last, 
  OutputIterator result, UnaryTransformFunction transform_function, 
  BinaryReduceFunction reduce_function)


There is no special support for numerical analysis but a number of numerical operations can be built from aforementioned parallel primitives.

Boost.Compute provides a number of built in functions to pass to the parallel primitives but programmers may also specify custom functions. This can be done by creating an instance of a compute::function. A shorthand macro BOOST_COMPUTE_FUNCTION() simplifies this task. Next to custom functions, programmers can also declare a BOOST_COMPUTE_CLOSURE() with the added benefit of capturing variables that can then be used within the accelerator function. As a third option, programmers can specify a lambda expression instead of a custom function object. This is accomplished with the help of Boost.Proto. Kyle Lutz talks about these features in a recent blog post.

Boost.Compute contains a well designed C++ wrapper for OpenCL. Each wrapper type has conversion operators to the underlying OpenCL object; a handy feature that enables full compatibility to the lower layer of accelerator programming. Boost.Compute is inter-operable with OpenGL, OpenCV, Eigen, QT and VTK. It is available on github, the documentation is available here.

I would categorize Boost.Compute as a general purpose, high productivity library. Depending on the quality of implementation and specialization of the provided parallel primitives, close to peak performance should be possible with Boost.Compute. I like the combination of low level OpenCL wrappers and higher level parallel primitives as well as the possibility to fall back to OpenCL code if certain features are not yet wrapped by Boost.Compute. I think the work in Boost.Compute can be an important part of a yet to be defined standard C++ accelerator library.

VexCL

This is the first post in my C++ accelerator library series. It is about VexCL, a header-only C++ library for accelerator programming, developed by Denis Demidov and others. The library is released under the MIT licence.

VexCL supports OpenCL and CUDA as accelerator backends. For the CUDA backend it is important to note that the CUDA SDK must be installed on systems running VexCL code, because kernel code is generated and compiled at runtime.

ul
LAb[au] : Spectr|a|um, image by Marc Wathieu

The central memory abstraction concept in VexCL is a vex::vector. The template represents contiguous data on one accelerator. It can also act as segmented data container that manages disjoint blocks of memory on multiple accelerators. The library considers device bandwidth measurements when choosing memory segment sizes or a user-defined device weighting function. Explicit copy functions allow programmers to move data from, to and between accelerators. An iterator as well as a range-based syntax is supported. Additional data types include sparse matrix types vex::SpMat as well as vex::multivector types, representing lists of vectors that can be processed in a single step.

Concurrent execution of multiple kernels or of copy and kernel operations is partially supported by VexCL. Copy operations are synchronous by default but can be configured to be asynchronous. Each container has associated command_queues that are used to enqueue operations. A command_queue has a finish() method that blocks until all commands in the queue have completed. This is not the most elegant way to handle concurrency, but VexCL does not abstract away the option for parallel execution of operations, which is nice.

VexCL supports numerous parallel primitives such as inclusive_scan, exclusive_scan, sort, sort_by_key and reduce_by_key. But the core functionality of VexCL is its kernel generation from vector expression mechanism. If X, Y and Z are a vex::vector type, the expression

X = 2 * Y - sin(Z);

generates a single kernel that is automatically executed on all accelerators that the vectors occupy:

kernel void vexcl_vector_kernel(
    ulong n, global double * prm_1,
    int prm_2, global double * prm_3,
    global double * prm_4)
{
    for(size_t i = get_global_id(0); i < n; i += get_global_size(0)) {
        prm_1[i] = ( ( prm_2 * prm_3[i] ) - sin( prm_4[i] ) );
    }
}

This expression template mechanism has many features, including numerous built-ins, support for constants, access to vector indices, inclusion of user defined functions, tagging of data to avoid reading memory more than once, temporary values, random numbers, permutations, slicing, reducing, reshaping, scattered data interpolation and fast Fourier transform. A performance improvement can be expected from expression template generated kernels, since such fused kernels save on memory reads and writes over individual calls to BLAS functions.

Programmers may also generate kernels by feeding a vex::symbolic type to an algorithm. The symbol records any arithmetic operation it is subjected to and an accelerator kernel can be generated. As an alternative, the function generator also accepts a function object that can then be used in vector expressions. And finally, a custom kernel can be specified in source code and registered with VexCL through the shorthand VEX_STRINGIZE_SOURCE macro or by creating an instance of vex::backend::kernel.

VexCL is inter-operable with ViennaCL, Boost.Compute and CLOGS. The library is available on github, the documentation is available here.

Denis Demidov mentions performance results of VexCL are published in one of his papers and are included in one of his recent talks.

I would categorize VexCL as a high productivity prototyping library that can also be used in scientific production code. It is particularly well suited to implement numerical analysis. I see minor problems when it comes to kernel caching, the fact that the CUDA SDK is required and the lack of elegant concurrency constructs. Nevertheless, the number of features in VexCL is enormous. VexCL is an excellent example of how sophisticated C++ programming techniques can lead to efficient code as well as a beautiful and concise interface.

C++ Accelerator Libraries

In preparation for my C++Now talk entitled The Future of Accelerator Programming in C++ I am currently reviewing numerous C++ libraries. I put together a catalogue of questions for these reviews. The questions are intended to gauge scope, use-cases, performance, quality and level of abstraction of each library.

ul
Iqra: Read, image by Farrukh
  1. Is concurrency supported?
    Accelerators are massive parallel devices, but due to memory transfer overhead, concurrency is a central aspect for many efficient programs.
  2. How is memory managed?
    This is a central question since simple and efficient management of distributed memory is not trivial.
  3. What parallel primitives are provided?
    Parallel primitives are essential building blocks for many accelerator-enabled programs.
  4. How is numerical analysis supported?
    Massive parallel accelerator architectures lend themselves well to numerical analysis.
  5. How can users specify custom accelerator functions?
    A useful accelerator library should allow users to specify custom functions.
  6. What is the intended use-case for the library? Who is the target audience?
    Is the library suitable for i.e. high performance computing, prototyping or signal processing?
  7. What are noteworthy features of the library?


This is a list of all libraries that I am reviewing:

Library CUDA OpenCL Other Type1
Thrust X OMP, TBB header
Bolt X2 TBB, C++ AMP link
VexCL X3 X header
Boost.Compute X header
C++ AMP X4 DX11 compiler
SyCL X5 compiler
ViennaCL X X OMP header
SkePU X X OMP, seq header
SkelCL X link
HPL X link
ArrayFire X X link
CLOGS X link
hemi X header
MTL4 X header
Kokkos X OMP, PTH, seq link
Aura6 X X header

If I missed a library, please let me know. I will add it immediately. I’m going to publish selected library reviews here on my blog. I’m hoping to discuss specific reviews with the original library authors. The conclusions of these reviews will be part of my talk at C++Now.


  1. either header-only library, link-library or library that requires compiler support
  2. custom AMD OpenCL Static C++ Kernel Language extension required
  3. CUDA SDK required at runtime
  4. Prototype implemenation available here
  5. only specification released so far
  6. disclaimer: library developed by the author

Migrating multiple repositories to Git

A few weeks ago I faced the challenge of migrating and merging multiple SVN and Git repositories into one single repository. The stackoverflow discussion “Merge two separate SVN repositories into a single Git repository” contains all the information required to solve this problem. This is a concise reproduction of all the bits an pieces presented in the article.

ul
Migrating Birds, image by Emilian Robert Vicol

The plan is simple:

  1. clone the involved Git repositories
  2. migrate relevant SVN repositories to Git
  3. rewrite the repositories in case of overlaps or errors
  4. create new repository and add empty commit
  5. add remotes for all repositories
  6. fetch all remotes
  7. create a list of all commits of all repositories, sort it chronologically
  8. cherry-pick each commit in the list and apply it in the new repository

And here are the commands that implement the plan above. First clone and migrate Git and SVN repositories.

mkdir ~/delme
cd ~/delme/
git clone ~/dev/repo1
git clone ~/dev/repo2
git svn clone svn://server:/repo3/
git svn clone svn://server:/repo4/

If the repositories have the same file or folder names a history rewrite is necessary. Assuming repo1 overlaps with other repositories, it is a good idea to put the contents of repo1 in a subfolder in the target repository. To accomplish this, the history of the master branch of repo1 is rewritten and all its contents is moved to the folder “subfolder”.

cd repo1
git filter-branch --tree-filter 'mkdir -p subfolder; find -mindepth 1 -maxdepth 1 -not -name subfolder -exec mv {} $fname subfolder \;' master

In this step, it is also possible to completely remove files from a repository. The following command removes the file “invalidfile” in “subfolder” from the repository completely.

git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch subfolder/invalidfile;' master

This can be repeated for other repositories as well if necessary or desired. In the next step, the target repository that should contain all merges is created. Remote repositories are added and fetched.

mkdir ~/newpreo
cd ~/newpreo
git init .
git commit --allow-empty -m'Initial commit (empty)'
git branch seed
git checkout seed

git remote add repo1 ~/delme/repo1
git remote add repo2 ~/delme/repo2
git remote add repo3 ~/delme/repo3
git remote add repo4 ~/delme/repo4

git fetch repo1
git fetch repo2
git fetch repo3
git fetch repo4

Finally, file containing lists are created for all commits from all repositories. The lists include the timestamp for each commit (seconds since 1/1/1970). The lists are then sorted and merged. The final result is stored in the file “ordered_commits”. This list is then iterated over and each entry is fed to the git cherry-pick command.

git --no-pager log --format='%at %H' repo1/master > reco1_commits
git --no-pager log --format='%at %H' repo2/master > reco2_commits
git --no-pager log --format='%at %H' repo3/master > reco3_commits
git --no-pager log --format='%at %H' repo4/master > reco4_commits

cat *_commits | sort | cut -d' ' -f2 > ordered_commits

cat ordered_commits | while read commit; do git cherry-pick $commit; done

The cherry-pick command prompts git to apply the commit to the current branch. This results in a repository containing all commits from all 4 repositories in a chronological order. That’s all there is to it.

Stop teaching Matlab

Many universities rely on Matlab for their mathematical and technical computation curriculum. This is because the syntax of the Matlab language is very intuitive and a perfect fit for numerical computation. Matlab also comes with a huge library of sophisticated math functions and excellent documentation. And universities are often equipped with campus-wide Matlab licenses. Professors as well as students can use Matlab for free.

ul
Disorderly Conduct, image by Ken

Mathworks, the company behind Matlab, is pursuing an obvious plan with these generous campus licenses. Their strategy is aimed at selling software to mathematicians, engineers, physicists and computer scientists after they graduate. Since Matlab is often the only or most convenient tool these scientists get to know during their studies, Mathworks’ plan is very successful.

If for-profit companies decide to base their research and product development on Matlab I have no objections. The market will decide if it is the right decision.

But I find it appalling that a wide variety of todays scientific advances are based on a proprietary software1 product such as Matlab. Institutes that base their research on Matlab are at the mercy of a for-profit US company to sell them and renew licenses.

Scientific results based on Matlab are not free2. To reproduce, validate and build upon them, a Matlab software license is required. It is my opinion that, since science is largely paid for by the public, its results must also be available to the public. They must be free. It is thus a fatal mistake to train students and young scientists, the future creators of scientific knowledge, in using tools that restrict the freedom of their results.

There are many excellent free alternatives to Matlab. I would just like to point out two of them here: The long-established Matlab alternative is Python with the computing environment SciPy. The other alternative is new: Julia, a dynamic programming language designed to address the requirements of high-performance numerical and scientific computing. From a C++ developer’s perspective, Julia’s expressive type system and the excellent performance compared to compiled languages are very attractive.

There are numerous free software alternatives that allows researchers to do open and reproducible science. As of Spring 2014, the MIT linear algebra course suggests Julia as a Matlab alternative to solve homework problems. I hope teachers and professors will switch to free software and instruct the next generation of scientists how to produce free results.


  1. Proprietary software is software that does not give the user freedoms to study, modify and share the software, and threatens users with legal penalties if they do not conform to the terms of restrictive software licenses (source).
  2. free as in freedom, both negative (free of oppression or coercion) and positive liberty

Cutting off Google’s Tentacles

I just realized how easy it is to cut off one of Google’s tentacles throughout the web. This is a WordPress blog and I used the Ultimate Google Analytics plugin to keep track of the number of visitors, where they come from (referer, not geo-location), keywords and so forth.

Google Analytics is just one of the many tentacles, Google spreads throughout the web. There is the Google Fonts API, there is Google Hosted Libraries and probably numerous other things that I am not aware of. Visitors to websites that include any one of these Google services will always contact one or multiple Google servers, thus identifying themselves (to some degree).

So I uninstalled the Google Analytics plugin, deleted my Google Analytics account and installed the WordPress Statistics plugin instead. It works flawlessly so far. I can recommend it.

Mere users of the web can monitor and block these tentacles as they browse the web through a Firefox add-on called Disconnect. I can recommend this plugin as well.

Let’s work together to make the web a less centralized, more private place for everyone. Let’s try to exclude large corporations from our interactions between each other. It is probably none of their business and certainly should not be their business.

8 GPU GeForce Titan Tyan System

A box arrived a few days ago at work.

8 GPU Titan System box

The box contained a little supercomputer comprised of 8 GeForce Titan GPUs in a Tyan FT77A platform.

8 GPU Nvidia Titan System

The system fits nicely in our server room.

8 GPU Nvidia Titan System Installed

And the 8 GPUs light up too!

8 GPU Nvidia Titan System Glow

scikit-image – Image processing in Python

I just discovered scikit-image – an image processing toolbox for the python programming language. The project appears to be very active and under heavy development. I have been looking for a while for ways to replace my scripts that rely on the Matlaband this seems to do the trick. I’m especially excited about the libraries functionality to measure region properties. This will come in very handy. I have either been sleeping under a stone or this project is not very well known. Let’s change that. Go check out scikit-image!

« Older Entries